# (Experimental) Testing with a Large Number of Repositories This section explains how to run Kani on a large number of crates downloaded from git forges. You may want to do this if you are going to test Kani's ability to handle Rust features found in projects out in the wild. For the first half, we will explain how to use data from crates.io to pick targets. Second half will explain how to use a script to run on a list of selected repositories. ## Picking Repositories In picking repositories, you may want to select by metrics like popularity or by the presence of certain features. In this section, we will explain how to select top ripostes by download count. We will use the `db-dump` method of getting data from crates.io as it is zero cost to their website and gives us SQL access. To start, have the following programs set up on your computer. - docker - docker-compose. 1. Start PostgreSQL. Paste in the following yaml file as `docker-compose.yaml`. `version: '3.3'` may need to change. ```yaml version: '3.3' services: db: image: postgres:latest restart: always environment: - POSTGRES_USER=postgres - POSTGRES_PASSWORD=postgres volumes: - crates-data:/var/lib/postgresql/data logging: driver: "json-file" options: max-size: "50m" volumes: crates-data: driver: local ``` Then, run the following to start the setup. ```bash docker-compose up -d ``` Once set up, run `docker ls` to figure out the container's name. We will refer to the name as `$CONTAINER_NAME` from now on. 2. Download actual data from crates.io. First, run the following command to get a shell in the container: `docker exec -it --user postgres $CONTAINER_NAME bash`. Now, run the following to grab and install the data into the repository. Please note that this may take a while. ```bash wget https://static.crates.io/db-dump.tar.gz tar -xf db-dump.tar.gz psql postgres -f */schema.sql psql postgres -f */import.sql ``` 3. Extract the data. In the same docker shell, run the following to extract the top 1k repositories. Other SQL queries may be used if you want another criteria ```sql \copy (SELECT name, repository, downloads FROM crates WHERE repository LIKE 'http%' ORDER BY DOWNLOADS DESC LIMIT 1000) to 'top-1k.csv' csv header; ``` 4. Clean the data. The above query will capture duplicates paths that are deeper than the repository. You can clean these out. - URL from CSV: `cat top-1k.csv | awk -F ',' '{ print $2 }' | grep -v 'http.*'` - Remove long paths: `sed 's/tree\/master.*$//g'` - Once processed, you can dedup with `sort | uniq --unique` ## Running the List of Repositories In this step we will download the list of repositories using a script [assess-scan-on-repos.sh](../../scripts/exps/assess-scan-on-repos.sh) Make sure to have Kani ready to run. For that, see the [build instructions](cheat-sheets.md#build). From the repository root, you can run the script with `./scripts/exps/assess-scan-on-repos.sh $URL_LIST_FILE` where `$URL_LIST_FILE` points to a line-delimited list of URLs you want to run Kani on. Repositories that give warnings or errors can be grepping for with "STDERR Warnings" and "Error exit in" respectively.