This is the repeatability evaluation package for the tool paper "SceneChecker: Boosting Scenario Verification using Symmetry Abstractions" by: Hussein Sibai, Yangge Li, and Sayan Mitra, submitted to CAV'21. The link to SceneChecker's website is: https://publish.illinois.edu/scenechecker/ The link to this artifact VM on Google drive is: https://drive.google.com/file/d/1FBGRxI1v4GGf-ly-B3Tv2WIRIzBo90FA/view The link to the source code git repository for SceneChecker is: https://gitlab.engr.illinois.edu/sibai2/multi-drone_simulator/-/tree/CAV_artifact The link to the other tools that we compare against in the paper is: https://gitlab.engr.illinois.edu/sibai2/multi-drone_simulator/-/tree/CAV_artifact_othertools All tools including SceneChecker with their source codes are provided with this VM. There is no tool or library that should be downloaded before usage, and all tools can be used in this VM without an internet connection. ########################### This file explains how to replicate the results presented in the paper. There is a typo in the last row of Table 1, for Q-S5, |S| is 188 instead of 280, while the remaining stats in the row remains unchanged. This does not affect the analysis provided in the paper. The results of SceneChecker+Flow*, CacheReach+Flow*, and Flow* should be the same as those of the paper since their algorithms and implementations are deterministic. The results of the experiments with tools that involve DryVR (i.e. SceneChecker+DryVR, CacheReach+DryVR, and DryVR) are stochastic and change between runs. The reason is that each time DryVR is called, it randomly samples traces of the system from which it computes the requested reachset. We offer the reviewers with two options to replicate our results with these tools: 1- a fixed random seed approach: we fix the random seed for DryVR so that repeated experiments lead to the same results. The random seed we fix for the repeatability package results in the same or better (fewer number of refinements, fewer modes and edges of the abstract automaton, fewer reachsets computed, and shorter computation time) statistics by SceneChecker+DryVR than those reported in the paper. SceneChecker+DryVR still outperforms CacheReach+DryVR and DryVR satisfying the analyses in the paper. The results that are expected when using this random seed are shown at the end of this file. 2- an averaging approach: instead of fixing the random seed, the commands we provide for this approach would run the experiment of each scenario 10 times and then take the average of their statistics. This approach also leads to values that are close to those reported in the paper and satisfy the same analyses presented there. The downside for this approach is that it takes longer computation time. Sample results from this approach are shown at the end of this file. In general, the computation time results (Reach time (Rt) and total time (Tt)) will differ from those of the paper as they are machine specific. The machine specifications used for generating the results in the paper are shown at the end of this file. ########################### Dependencies: All dependencies for the repeatability package are already installed in the Virtual Machine Python 3.7 or newer versions numpy scipy matplotlib polytope dill ray PyTorch To replicate our experiments from the paper please run commands on our example files from the root directory of our local repo. Note that currently SceneChecker will visualize the computed reachsets and the visualization can be time consuming. You can kill the process after the final results appear in the terminal, if the figures are not needed. 1. For the experiment results in Table 1: SceneChecker+DryVR: change directory by running the following command: cd ~/SceneChecker and add the folder to python path by running the following command: export PYTHONPATH=$(pwd):$PYTHONPATH and then run the following commands Fixed random seed approach: C-S1: python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s1_dr.json C-S2: python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s2_dr.json C-S3: python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s3_dr.json C-S4.a: python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s4a_dr.json C-S4.b: python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s4b_dr.json Q-S1: python3.7 src/VerificationScenario.py examples/scenarios_cav/q_s1_dr.json Q-S2: python3.7 src/VerificationScenario.py examples/scenarios_cav/q_s2_dr.json Q-S3: python3.7 src/VerificationScenario.py examples/scenarios_cav/q_s3_dr.json Q-S4.a: python3.7 src/VerificationScenario.py examples/scenarios_cav/q_s4_dr.json Q-S5: python3.7 src/VerificationScenario.py examples/scenarios_cav/q_s5_dr.json Averaging Approach: C-S1: python3.7 src/VerificationMultiple.py examples/scenarios_cav/c_s1_dr.json C-S2: python3.7 src/VerificationMultiple.py examples/scenarios_cav/c_s2_dr.json C-S3: python3.7 src/VerificationMultiple.py examples/scenarios_cav/c_s3_dr.json C-S4.a: python3.7 src/VerificationMultiple.py examples/scenarios_cav/c_s4a_dr.json C-S4.b: python3.7 src/VerificationMultiple.py examples/scenarios_cav/c_s4b_dr.json Q-S1: python3.7 src/VerificationMultiple.py examples/scenarios_cav/q_s1_dr.json Q-S2: python3.7 src/VerificationMultiple.py examples/scenarios_cav/q_s2_dr.json Q-S3: python3.7 src/VerificationMultiple.py examples/scenarios_cav/q_s3_dr.json Q-S4.a: python3.7 src/VerificationMultiple.py examples/scenarios_cav/q_s4_dr.json Q-S5: python3.7 src/VerificationMultiple.py examples/scenarios_cav/q_s5_dr.json SceneChecker+Flow*: change directory to the folder by running the following command: cd ~/SceneChecker and add the folder to python path by running the following command: export PYTHONPATH=$(pwd):$PYTHONPATH and then run the following commands C-S1: python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s1_fs.json C-S2: python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s2_fs.json C-S3: python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s3_fs.json C-S4.a: python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s4a_fs.json C-S4.b: python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s4b_fs.json Q-S1: python3.7 src/VerificationScenario.py examples/scenarios_cav/q_s1_fs.json Q-S2: python3.7 src/VerificationScenario.py examples/scenarios_cav/q_s2_fs.json Q-S3: python3.7 src/VerificationScenario.py examples/scenarios_cav/q_s3_fs.json Q-S4.a: python3.7 src/VerificationScenario.py examples/scenarios_cav/q_s4_fs.json Q-S5: python3.7 src/VerificationScenario.py examples/scenarios_cav/q_s5_fs.json CacheReach+DryVR: change directory by running the following command: cd ~/other_tools/CacheReach and add the folder to python path by running the following command: export PYTHONPATH=$(pwd):$PYTHONPATH and then run the following commands Fixed random seed approach: C-S1: python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s1_dr.json C-S2: python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s2_dr.json C-S3: python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s3_dr.json C-S4a: python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s4a_dr.json C-S4b: python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s4b_dr.json Q-S1: python3.7 src/VerificationScenario.py examples/scenarios_cav/q_s1_dr.json Q-S2: python3.7 src/VerificationScenario.py examples/scenarios_cav/q_s2_dr.json Note that Q-S1 and Q-S2 timeout (more than 2 hours on the reference machine), hence we did not try the more complicated scenarios of Q-S3, Q-S4.a, and Q-S5. Recall that Q-S1 is a scenario with just six segments, while Q-S3, Q-S4.a, and Q-S5 have 458, 520, and 188 segments, respectively. We will update our description of NA to include timing out. Averaging approach: C-S1: python3.7 src/VerificationMultiple.py examples/scenarios_cav/c_s1_dr.json C-S2: python3.7 src/VerificationMultiple.py examples/scenarios_cav/c_s2_dr.json C-S3: python3.7 src/VerificationMultiple.py examples/scenarios_cav/c_s3_dr.json C-S4a: python3.7 src/VerificationMultiple.py examples/scenarios_cav/c_s4a_dr.json C-S4b: python3.7 src/VerificationMultiple.py examples/scenarios_cav/c_s4b_dr.json Q-S1: python3.7 src/VerificationMultiple.py examples/scenarios_cav/q_s1_dr.json Q-S2: python3.7 src/VerificationMultiple.py examples/scenarios_cav/q_s2_dr.json DryVR: Change directory by running the following command: cd ~/other_tools/DryVR and run the following commands in folder Fixed random seed approach: C-S1: python3.7 main.py input/scenario_cav/c_s1.json C-S2: python3.7 main.py input/scenario_cav/c_s2.json C-S3: python3.7 main.py input/scenario_cav/c_s3.json C-S4.a: python3.7 main.py input/scenario_cav/c_s4a.json C-S4.b: python3.7 main.py input/scenario_cav/c_s4b.json Q-S1: python3.7 main.py input/scenario_cav/q_s1.json Q-S2: python3.7 main.py input/scenario_cav/q_s2.json Q-S3: python3.7 main.py input/scenario_cav/q_s3.json Q-S4.a: python3.7 main.py input/scenario_cav/q_s4.json Q-S5: python3.7 main.py input/scenario_cav/q_s5.json Averaging approach: C-S1: python3.7 main_multiple.py input/scenario_cav/c_s1.json C-S2: python3.7 main_multiple.py input/scenario_cav/c_s2.json C-S3: python3.7 main_multiple.py input/scenario_cav/c_s3.json C-S4.a: python3.7 main_multiple.py input/scenario_cav/c_s4a.json C-S4.b: python3.7 main_multiple.py input/scenario_cav/c_s4b.json Q-S1: python3.7 main_multiple.py input/scenario_cav/q_s1.json Q-S2: python3.7 main_multiple.py input/scenario_cav/q_s2.json Q-S3: python3.7 main_multiple.py input/scenario_cav/q_s3.json Q-S4.a: python3.7 main_multiple.py input/scenario_cav/q_s4.json Q-S5: python3.7 main_multiple.py input/scenario_cav/q_s5.json CacheReach+Flow*: change directory by running the following command: cd ~/other_tools/CacheReach and add the folder to the Python path by running the following command: export PYTHONPATH=$(pwd):$PYTHONPATH and then run the following commands C-S1: python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s1_fs.json C-S2: python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s2_fs.json C-S3: python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s3_fs.json C-S4.a: python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s4a_fs.json C-S4.b: python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s4b_fs.json Q-S1: python3.7 src/VerificationScenario.py examples/scenarios_cav/q_s1_fs.json Q-S2: python3.7 src/VerificationScenario.py examples/scenarios_cav/q_s2_fs.json Note that Q-S1 and Q-S2 timeout (more than 2 hours on the reference machine), hence we did not try the more complicated scenarios of Q-S3, Q-S4.a, and Q-S5. Recall that Q-S1 is a scenario with just six segments, while Q-S3, Q-S4.a, and Q-S5 have 458, 520, and 188 segments, respectively. We will update our description of NA to include timing out. Flow*: change directory by running the following command: cd ~/other_tools/Flowstar and run the following commands in the folder C-S1: ./flowstar < example_cav/c_s1.model C-S2: ./flowstar < example_cav/c_s2.model C-S3: ./flowstar < example_cav/c_s3.model C-S4.a: ./flowstar < example_cav/c_s4a.model C-S4.b: ./flowstar < example_cav/c_s4b.model Q-S1: ./flowstar < example_cav/q_s1.model Q-S2: ./flowstar < example_cav/q_s2.model Q-S3: ./flowstar < example_cav/q_s3.model Q-S4.a: ./flowstar < example_cav/q_s4.model Q-S5: ./flowstar < example_cav/q_s5.model 2. For the experiment results before Table 2: SceneChecker+DryVR: change directory by running the following command: cd ~/SceneChecker and add the folder to python path by running the following command: export PYTHONPATH=$(pwd):$PYTHONPATH and then run the following commands QNN-S2: python3.7 src/VerificationScenario.py examples/scenarios_cav/qnn_s2.json QNN-S3: python3.7 src/VerificationScenario.py examples/scenarios_cav/qnn_s3.json QNN-S4: python3.7 src/VerificationScenario.py examples/scenarios_cav/qnn_s4.json DryVR: change directory by running the following command: cd ~/other_tools/DryVR and run the following commands in folder QNN-S2: python3.7 main.py input/scenario_cav/qnn_s2.json QNN-S3: python3.7 main.py input/scenario_cav/qnn_s3.json QNN-S4: python3.7 main.py input/scenario_cav/qnn_s4.json 3. For the experiment in Table 2, change directory by running the following command: cd ~/SceneChecker and add the folder to the Python path by running the following command: export PYTHONPATH=$(pwd):$PYTHONPATH and then run the following commands: Fixed random seed approach: CNN-S2: python3.7 src/VerificationScenario.py examples/scenarios_cav/cnn_s2TR_dr.json CNN-S4: python3.7 src/VerificationScenario.py examples/scenarios_cav/cnn_s4TR_dr.json QNN-S2_TR: python3.7 src/VerificationScenario.py examples/scenarios_cav/qnn_s2TR_dr.json QNN-S3_TR: python3.7 src/VerificationScenario.py examples/scenarios_cav/qnn_s3TR_dr.json QNN-S4_TR: python3.7 src/VerificationScenario.py examples/scenarios_cav/qnn_s4TR_dr.json QNN-S2_T: python3.7 src/VerificationScenario.py examples/scenarios_cav/qnn_s2T_dr.json QNN-S3_T: python3.7 src/VerificationScenario.py examples/scenarios_cav/qnn_s3T_dr.json QNN-S4_T: python3.7 src/VerificationScenario.py examples/scenarios_cav/qnn_s4T_dr.json Averaging approach: CNN-S2: python3.7 src/VerificationMultiple.py examples/scenarios_cav/cnn_s2TR_dr.json CNN-S4: python3.7 src/VerificationMultiple.py examples/scenarios_cav/cnn_s4TR_dr.json QNN-S2_TR: python3.7 src/VerificationMultiple.py examples/scenarios_cav/qnn_s2TR_dr.json QNN-S3_TR: python3.7 src/VerificationMultiple.py examples/scenarios_cav/qnn_s3TR_dr.json QNN-S4_TR: python3.7 src/VerificationMultiple.py examples/scenarios_cav/qnn_s4TR_dr.json QNN-S2_T: python3.7 src/VerificationMultiple.py examples/scenarios_cav/qnn_s2T_dr.json QNN-S3_T: python3.7 src/VerificationMultiple.py examples/scenarios_cav/qnn_s3T_dr.json QNN-S4_T: python3.7 src/VerificationMultiple.py examples/scenarios_cav/qnn_s4T_dr.json 4. To replicate the figures in Figure 4, change directory by running the following command: cd ~/SceneChecker and add the folder to the Python path by running the following command: export PYTHONPATH=$(pwd):$PYTHONPATH and then run the following commands: python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s2_dr.json python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s3_dr.json python3.7 src/VerificationScenario.py examples/scenarios_cav/c_s4_dr.json 5. To replicate the figures in Figure 5, change directory by running the following command: cd ~/SceneChecker and add the folder to Python path by running the following command: export PYTHONPATH=$(pwd):$PYTHONPATH and then run the following commands: python3.7 src/VerificationScenario.py examples/scenarios_cav/qnn_s3TR_dr.json python3.7 src/VerificationScenario.py examples/scenarios_cav/qnn_s3T_dr.json The statistics are displayed at the end in terminal after each command finishes. The important statistics of SceneChecker are: Nref: Number of mode-splits performed |S|: Number of concrete modes |S_v|^i: Number of modes in the initial abstraction |E_v|^i: Number of edges in the initial abstraction |S_v|^f: Number of modes in the final (after refinement) abstraction |E_v|^f: Number of edges in the final (after refinement) abstraction Rc: The total number of calls to computeReachset Rt: The total time spent in reachset computation in minutes Tt: The total computation time in minutes Our results for the fixed random seed and averaging approaches for tables 1 and 2 are shown below. Note that entries that we mark with NA are experiments that are also marked in the paper with NA representing cases when the corresponding tool resulted in an error or timed out (more than 2 hours on the reference machine). The entries that we mark with TO are experiments that take a total time that exceed 5 hours and may not be feasible for the reviewers to repeat: Fixed random seed approach: Table1 SceneChecker+DryVR CacheReach DryVR sc Nrefs Rc Rt Tt Rc Tt Tt C-S1 1 4 0.14 0.15 46 1.73 1.28 C-S2 0 1 0.04 0.65 524 19.92 10.57 C-S3 0 1 0.04 4.24 502 19.33 71.41 C-S4.a 2 7 0.26 4.37 404 15.84 94.62 C-S4.b 10 39 1.43 8.69 404 16.06 96.02 Q-S1 1 4 0.04 0.05 NA NA 0.25 Q-S2 0 1 0.04 0.88 NA NA 4.97 Q-S3 0 1 0.06 5.9 NA NA 46.34 Q-S4.a 0 1 0.06 3.17 NA NA 56.19 Q-S5 0 36 0.85 3.04 NA NA 8.03 Table2 sc Nref |S| |S_v|^i |E_v|^i |S_v|^f |E_v|^f Rc Rt Tt CNN-S2: 6 140 1 1 7 17 19 1.51 3.05 CNN-S4: 9 520 1 1 10 28 47 3.77 11.25 QNN-S2_TR: 3 140 1 1 4 9 9 0.61 3.55 QNN-S3_TR: 5 458 1 1 6 16 15 1.51 12.7 QNN-S4_TR: 4 520 1 1 5 13 11 1.11 7.43 QNN-S2_T: 0 140 7 19 7 19 8 0.53 1.38 QNN-S3_T: 4 458 7 30 11 58 29 2.92 16.88 QNN-S4_T: 0 520 7 30 7 30 13 1.32 5.34 Averaging approach: Table1 SceneChecker+DryVR CacheReach DryVR sc Nrefs Rc Rt Tt Rc Tt Tt C-S1 1 4 0.14 0.15 46 1.73 1.24 C-S2 0 1 0.04 0.65 524 20.02 9.41 C-S3 0 1 0.04 4.21 502 19.73 TO C-S4.a 2 7 0.26 4.4 404 15.96 TO C-S4.b 10 39 1.42 8.71 404 15.83 TO Q-S1 1 4 0.05 0.05 NA NA 0.24 Q-S2 0 1 0.04 0.86 NA NA 4.74 Q-S3 0 1 0.06 5.87 NA NA TO Q-S4.a 0 1 0.06 3.19 NA NA TO Q-S5 0 36 0.86 3.05 NA NA 7.47 Table2 sc Nref |S| |S_v|^i |E_v|^i |S_v|^f |E_v|^f Rc Rt Tt CNN-S2: 5.7 140 1 1 6.7 16.3 20.2 1.48 3.47 CNN-S4: 10.33 520 1 1 11.33 33.33 47.67 3.56 14.97 QNN-S2_TR: 4.63 140 1 1 5.63 13.88 11.88 0.85 3.72 QNN-S3_TR: 5.89 458 1 1 6.89 19.11 16.56 1.6 12.18 QNN-S4_TR: 4 520 1 1 5 13 11 1.26 8.46 QNN-S2_T: 0 140 7 19 7 19 8.4 0.73 1.94 QNN-S3_T: 3.67 458 7 30 10.67 56.33 28.5 2.83 18.01 QNN-S4_T: 0 520 7 30 7 30 13 1.41 5.75 The computer that we used to do the testing has the following specifications: 32 GB RAM Ubuntu 16.04 AMD Ryzen 7 5800X CPU @ 3.80GHz The main file that parse the input, computes the abstraction, implements the verification and refinement algorithms is VerificationScenario.py. The file Agent.py stores the data structures of the abstraction and calls the reachability engine tools (DryVR or Flow*) to compute the reachsets. The examples folder has examples scenario files (.json files) and model files (dynamics + symmetries). The log files of the runs of SceneChecker will be stored in the SceneChecker/log directory.