This is the guide for using the self-contained docker image to build and run the artifact, and to reproduce the experimental results described in the paper. The image is a ubuntu with all the prerequisites installed.
Make sure Docker is installed and the docker daemon is started. The version we used is 20.10.5.
You can either download the image by cliking the link, or using the following command:
$ wget https://github.com/presto-osu/ecoop21/releases/download/docker-image/presto_dp_trace_coverage_image.docker
$ sudo docker load -i presto_dp_trace_coverage_image.docker
# 'presto/dp-trace-coverage' should appear in the list
$ sudo docker images
$ sudo docker run -it presto/dp-trace-coverage
After starting the container, you should be logged in as root.
$ cd /root/artifact
$ tar -xzvf traces.tar.gz
The dataset is in the directory named traces
. For each app evaluated in the paper, there’s a sub-directory in it which contains 1000 low-level traces for 1000 users simulated using monkey. Each low-level trace is a sequence of “Enter” and “eXit” events for methods, where each method is denoted by a unique ID. Our experiments are conducted using these low-level traces. Specifically, the traces referred to by the paper, i.e., call chains and enter/exit traces, are extracted from the low-level traces. Besides the low-level traces, there’re also three other files in each sub-directory: callpairs
contains the calling relationship between methods; v
contains the number of methods of the app; and list
contains the name of traces.
The randomization is implemented in Java, and is organized as a Gradle project.
app
contains the code for both call-chain and enter/exit-trace analysis.run.sh
is a start-up script for running the experiments.plot.py
is a Python script for plotting the results.Before running the experiments, make sure you build the project first by using the following commands:
$ cd code
$ ./gradlew :app:shadowJar
Instead of running the analyses by invoking the java
command, we provide a wrapping script run.sh
for convinience. The instructions for reproducing the results are based on this script. Here’s the description about how to use the script.
Flag | Description |
---|---|
-a TYPE |
Required. Analysis type: cca for call-chain analysis, eeta for enter/exit-trace analysis. |
-n APP_NAME |
Required. Name of app. Must be one of the apps in directory ../traces/ . |
-r VALUE |
Optional. # of replication per user. E.g., -r 10 means 10000 users. Default is 1, i.e., 1000 users. |
-e VALUE |
Optional. Value for the privacy parameter. Will use the natural log of this value, i.e., ε=ln(VALUE) . E.g., -e 49 sets ε to ln(49). Default is 9. |
-s |
Optional. To use the strict algorithm. The relaxed algorithm is used by default. |
--runs VALUE |
Optional. If specified, the experiment will be repeated for VALUE times. To get the confidence intervals, the experiments should be repeated for at least 30 times. Default is 1. |
The script prints the result, including error-all
, error-hot
, recall
, and precision
, to stdout. It also saves the result to a file in results/
.
Example:
The following command runs the call-chain analysis on drumpads
for 1000 users, using the relaxed algorithm and ε=ln(49)
:
$ bash run.sh -n drumpads -a cca -e 49
Note that to get the complete figures in Section 5 of the paper for all 15 apps and the confidence intervals (the little caps in the bar charts as shown in the paper), you need to run the experiments for 30 runs, which is expected to take more than a month. It is more practical to pick a few fast apps and run a small number of runs or just one run. In this case, the resulting figures will show blanks for some apps and the confidence intervals will be missing.
The following commands will generate experimental data for 2 runs of speedlogic
. You may change the option -n speedlogic
to other apps for all of the commands to get the data for them. Change the option --runs 2
to run more rounds if necessary.
We suggest a few fast apps: speedlogic
, loctracker
, parking
, drumpads
, equibase
, and moonphases
.
# call chain analysis, 1000 users (Fig. 2a & 3a & 5a)
$ bash run.sh -n speedlogic --runs 2 -a cca
# enter/exit analysis, 1000 users (Fig. 2b & 3b & 5b)
$ bash run.sh -n speedlogic --runs 2 -a eeta
# call chain analysis, 10000 users (Fig. 2a & 3a & 5a)
$ bash run.sh -n speedlogic --runs 2 -a cca -r 10
# enter/exit analysis, 10000 users (Fig. 2b & 3b & 5b)
$ bash run.sh -n speedlogic --runs 2 -a eeta -r 10
# strict algorithm (Fig. 4)
$ bash run.sh -n speedlogic --runs 2 -a cca -s
$ bash run.sh -n speedlogic --runs 2 -a eeta -s
# compare different e values (ln3 and ln49) versus the default (ln9) (Fig 6)
$ bash run.sh -n speedlogic --runs 2 -a cca -e 3
$ bash run.sh -n speedlogic --runs 2 -a cca -e 49
$ bash run.sh -n speedlogic --runs 2 -a eeta -e 3
$ bash run.sh -n speedlogic --runs 2 -a eeta -e 49
Running the experiments above saves the results into directory results/
. The plotting script read and plot the data in this folder. If there isn’t enough data for 30 runs, the confidence intervals will be missing in the figures.
Execute the following command and the figures will be saved automatically in PDF format:
$ python3 plot.py
To view the figures generated, you can copy them from the container to the host machine by using the following commands. Note that these commands are run in the host machine.
# get the container id
$ sudo docker ps
# for example, copy fig2.pdf to the current directory
$ sudo docker cp CONTAINER_ID:/root/artifact/code/fig2.pdf .