- Code to extract data using ACH and Target curation from ChEMBL as described in the manuscript:
001_Grab_Assays_ChEMBL34.ipynb. - Example code to run GNN regressors on these data splits:
002A_Run_GNNs_script.py - Example code to run RF regressors on these data splits:
002B_Run_RFs_script.py - Script to visualize the results:
003_Analysis.ipynb. Data required to run this notebook is available on the ETH Research Collection (https://doi.org/10.3929/ethz-c-000798140)
Note that the example scripts write out to results to undescriptive filenames (such as 'results.pkl'), to be changed to appropriate names by the user.
The aggregated results CSVs (as used in 003_Analysis.ipynb) were created by aggregating statistics as written to stats.pkl.
To obtain these dataframes, one can run the models over all the provided data sets and aggregate the performance metrics into CSVs.
The datasets used to train and evaluate the models are available on the ETH Research Collection (https://doi.org/10.3929/ethz-c-000798140) as CSVs and PyTorch graphs.
- src: scripts to support running of the models and visualization of the performance results
mamba env create -f environment.yml