Differential Privacy for Coverage Analysis of Software Traces

This is the guide for using the self-contained docker image to build and run the artifact, and to reproduce the experimental results described in the paper. The image is a ubuntu with all the prerequisites installed.

Instantiate Container

Install docker

Make sure Docker is installed and the docker daemon is started. The version we used is 20.10.5.

Download the image

You can either download the image by cliking the link, or using the following command:

$ wget https://github.com/presto-osu/ecoop21/releases/download/docker-image/presto_dp_trace_coverage_image.docker

Load the image

$ sudo docker load -i presto_dp_trace_coverage_image.docker

Check the image is successfully loaded

# 'presto/dp-trace-coverage' should appear in the list
$ sudo docker images

Run the image in container

$ sudo docker run -it presto/dp-trace-coverage

Run

After starting the container, you should be logged in as root.

Go to artifact directory

$ cd /root/artifact

Extract the dataset

$ tar -xzvf traces.tar.gz

The dataset is in the directory named traces. For each app evaluated in the paper, there’s a sub-directory in it which contains 1000 low-level traces for 1000 users simulated using monkey. Each low-level trace is a sequence of “Enter” and “eXit” events for methods, where each method is denoted by a unique ID. Our experiments are conducted using these low-level traces. Specifically, the traces referred to by the paper, i.e., call chains and enter/exit traces, are extracted from the low-level traces. Besides the low-level traces, there’re also three other files in each sub-directory: callpairs contains the calling relationship between methods; v contains the number of methods of the app; and list contains the name of traces.

Code structure

The randomization is implemented in Java, and is organized as a Gradle project.

Folder app contains the code for both call-chain and enter/exit-trace analysis.
run.sh is a start-up script for running the experiments.
plot.py is a Python script for plotting the results.
Other files/folders are for Gradle.

Build

Before running the experiments, make sure you build the project first by using the following commands:

$ cd code
$ ./gradlew :app:shadowJar

Usage

Instead of running the analyses by invoking the java command, we provide a wrapping script run.sh for convinience. The instructions for reproducing the results are based on this script. Here’s the description about how to use the script.

Flag	Description
`-a TYPE`	Required. Analysis type: `cca` for call-chain analysis, `eeta` for enter/exit-trace analysis.
`-n APP_NAME`	Required. Name of app. Must be one of the apps in directory `../traces/`.
`-r VALUE`	Optional. # of replication per user. E.g., `-r 10` means 10000 users. Default is 1, i.e., 1000 users.
`-e VALUE`	Optional. Value for the privacy parameter. Will use the natural log of this value, i.e., `ε=ln(VALUE)`. E.g., `-e 49` sets ε to ln(49). Default is 9.
`-s`	Optional. To use the strict algorithm. The relaxed algorithm is used by default.
`--runs VALUE`	Optional. If specified, the experiment will be repeated for `VALUE` times. To get the confidence intervals, the experiments should be repeated for at least 30 times. Default is 1.

The script prints the result, including error-all, error-hot, recall, and precision, to stdout. It also saves the result to a file in results/.

Example: The following command runs the call-chain analysis on drumpads for 1000 users, using the relaxed algorithm and ε=ln(49):

$ bash run.sh -n drumpads -a cca -e 49

Reproducing The Results

Note that to get the complete figures in Section 5 of the paper for all 15 apps and the confidence intervals (the little caps in the bar charts as shown in the paper), you need to run the experiments for 30 runs, which is expected to take more than a month. It is more practical to pick a few fast apps and run a small number of runs or just one run. In this case, the resulting figures will show blanks for some apps and the confidence intervals will be missing.

Run the randomization

The following commands will generate experimental data for 2 runs of speedlogic. You may change the option -n speedlogic to other apps for all of the commands to get the data for them. Change the option --runs 2 to run more rounds if necessary.

We suggest a few fast apps: speedlogic, loctracker, parking, drumpads, equibase, and moonphases.

# call chain analysis, 1000 users (Fig. 2a & 3a & 5a)
$ bash run.sh -n speedlogic --runs 2 -a cca

# enter/exit analysis, 1000 users (Fig. 2b & 3b & 5b)
$ bash run.sh -n speedlogic --runs 2 -a eeta

# call chain analysis, 10000 users (Fig. 2a & 3a & 5a)
$ bash run.sh -n speedlogic --runs 2 -a cca -r 10

# enter/exit analysis, 10000 users (Fig. 2b & 3b & 5b)
$ bash run.sh -n speedlogic --runs 2 -a eeta -r 10

# strict algorithm (Fig. 4)
$ bash run.sh -n speedlogic --runs 2 -a cca -s
$ bash run.sh -n speedlogic --runs 2 -a eeta -s

# compare different e values (ln3 and ln49) versus the default (ln9) (Fig 6)
$ bash run.sh -n speedlogic --runs 2 -a cca -e 3
$ bash run.sh -n speedlogic --runs 2 -a cca -e 49
$ bash run.sh -n speedlogic --runs 2 -a eeta -e 3
$ bash run.sh -n speedlogic --runs 2 -a eeta -e 49

Plotting the results

Running the experiments above saves the results into directory results/. The plotting script read and plot the data in this folder. If there isn’t enough data for 30 runs, the confidence intervals will be missing in the figures.

Execute the following command and the figures will be saved automatically in PDF format:

$ python3 plot.py

Copy the figures from container to host machine

To view the figures generated, you can copy them from the container to the host machine by using the following commands. Note that these commands are run in the host machine.

# get the container id
$ sudo docker ps

# for example, copy fig2.pdf to the current directory
$ sudo docker cp CONTAINER_ID:/root/artifact/code/fig2.pdf .