OSMOSIS: Enabling Multi-Tenancy in Datacenter SmartNICs
The paper presented at the USENIX ATC 24.
Repo structure
-
OSMOSIS/PsPIN Verilator simulation environment in Docker:
-
./Dockerfile
and./docker-compose.yml
-
-
OSMOSIS/PsPIN source code ran built in Docker:
./pspin-osmosis/
-
OSMOSIS source code:
-
hw/verilator_model/src/AXIMaster.hpp
- C++ model of hardware DMA AXI fragmentation -
hw/verilator_model/src/FMQEngine.hpp
- C++ model of WLBVT FMQ scheduler -
examples/osmosis
- OSMOSIS host/kernel C API/runtine -
examples/mt_apps/handlers
- evaluated SmartNIC kernels -
examples/mt_apps/driver
- deployment scenarios (e.g., tenant mixtures)
-
-
Experiment infrastructure:
-
examples/mt_apps/scripts/*.sh
- scripts to batch experiments -
examples/mt_apps/scripts/*.py
- scripts to parse/visualize experimental logs -
examples/mt_apps/traces
- various input traffic packet traces with different number of tenants and packet sizes -
examples/mt_apps/scripts/tracegen.py
- packet trace generator using statistical parameters
-
-
OSMOSIS hardware blocks for usage with ASIC/FPGA:
- These hardware blocks are used for OSMOSIS ASIC area estimations in the paper and can be syntesized using Synopsis Design Compiler NXT.
-
hardware/bvt_arb_tree.sv
- WLBVT hardware scheduler prototype written in SystemVerilog. - iDMA - DMA engine written in SystemVerilog with support of high-performance hardware protocol for AXI stream fragmentation.
A. Verilator environment setup
1. Set up Docker container
Takes approximately 15-20 minutes to build from scratch.
host$ git clone git@spclgitlab.ethz.ch:mkhalilov/pspin-osmosis.git
host$ cd ./pspin-osmosis/ && git submodule update --init --recursive
host$ docker-compose -p osmosis-ae up -d
As a result container osmosis-ae
will be built and run in the background.
2. Build PsPIN/OSMOSIS simulation core inside of container (30 minutes)
host $ docker ps # get the containerID
host $ docker exec -it <containerID> bash
osmosis-ae # cd /opt/pspin/
osmosis-ae # source ./sourceme.sh
osmosis-ae # cd ./hw/verilator_model/
-
osmosis-ae # VERILATOR_COMPILER_WORKERS=$(nproc) make release
- this step needs around 30 minutes and 128-256 GB of RAM.
B. Experiments
0. Setup simulation environment
- In case you stopped the container, re-run
docker-compose
as shown in Step A1 - (Re)-attach to the running
osmosis-ae
container by re-running commands 1 - 4 from the Step A2 -
osmosis-ae # cd /opt/pspin/examples/mt_apps/
- change to the root OSMOSIS experiments working directory -
osmosis-ae # make SPIN_KERNEL_NAME=kernels deploy
- compile SmartNIC application kernels
Experiment pipeline
The experiment pipeline contains the following steps:
-
osmosis-ae # make SPIN_APP_NAME=<experiment_name> osmosis
- compile experiment scenario from./driver
directory, so the experiment binarysim_<experiment_name>
will be generated in the current directory. -
osmosis-ae # nohup bash ./scripts/run_<experiment_name>.sh $(pwd)/traces/ $(pwd)/logs/ &
- run batched simulation to perform all measurements (we controlsim_<experiment_name>
binary config through environment variables) and input traffic trace. Raw simulation logs will be stored in./logs/
directory. Pre-generated traces are stored in./traces/ directory
. -
osmosis-ae # python3 ./scripts/postprocess_<experiment_name>.py $(pwd)/logs
- post-process experiment logs to data*.csv
in./logs/
directory. -
osmosis-ae # python3 ./scripts/plot_<experiment_name>.py
- run plotting script on*.csv
data to visualize it in*.pdf
format in the./figures/
directory.
- The figure is located at the
$host <AE-repo-root-dir>/pspin/examples/mt_apps/figures/
path in thehost $
file-system, and can be opened outside ofosmosis-ae $
container, e.g., to open it in the PDF reader.
Experiment 1: Fairness of HPU utilization
- The experiment takes approximately 2 minutes.
- We recommend to use this experiment to go through basic functionality of the experiment pipeline (e.g., AE kick-the-tires stage).
- The experiment demonstrates that OSMOSIS WLBVT scheduler achieves fair share of SmartNIC compute engines between two tenants when compared to Round Robin (see Fig. 9 in the paper).
osmosis-ae # nohup bash ./scripts/run_hpu_contention.sh $(pwd)/traces/ $(pwd)/logs/ &
osmosis-ae # python3 ./scripts/postprocess_hpu_contention.py $(pwd)/logs
-
osmosis-ae # python3 ./scripts/plot_hpu_contention.py
- producesfigures/hpu_occupation.pdf
Experiment 2: DMA engine contention
- The experiment takes approximately 2-3 hours.
- TODO: describe (see Figure 10 in the paper).
osmosis-ae # make SPIN_APP_NAME=io_contention osmosis
osmosis-ae # nohup bash ./scripts/run_io_contention.sh $(pwd)/traces/ $(pwd)/logs/ &
osmosis-ae # python3 ./scripts/postprocess_io_contention.py $(pwd)/logs
osmosis-ae # python3 ./scripts/plot_io_contention.py
Experiment 3: Application throuhgput overheads
- The experiment takes approximately 2-3 hours.
- TODO: describe (see Figure 11 in the paper).
osmosis-ae # make SPIN_APP_NAME=raw_tput osmosis
osmosis-ae # nohup bash ./scripts/run_raw_tput.sh $(pwd)/traces/ $(pwd)/logs/ &
osmosis-ae # python3 ./scripts/postprocess_raw_tput.py $(pwd)/logs
osmosis-ae # python3 ./scripts/plot_raw_tput.py
Experiment 4: Application mixes
- The experiment takes approximately 4 hours.
- TODO: describe, see Figure 12a/b in the paper.
osmosis-ae # make SPIN_APP_NAME=compute_mix osmosis
osmosis-ae # make SPIN_APP_NAME=io_mix osmosis
osmosis-ae # nohup bash ./scripts/run_mixes.sh $(pwd)/traces/ $(pwd)/logs/ &
osmosis-ae # python3 ./scripts/postprocess_mixes.py $(pwd)/logs
osmosis-ae # python3 ./scripts/plot_mixes_compute.py
osmosis-ae # python3 ./scripts/plot_mixes_io.py