Skip to content

rinikerlab/ChEMBL_MTL

Repository files navigation

ChEMBL MTL

Code

  • Code to extract data using ACH and Target curation from ChEMBL as described in the manuscript: 001_Grab_Assays_ChEMBL34.ipynb.
  • Example code to run GNN regressors on these data splits: 002A_Run_GNNs_script.py
  • Example code to run RF regressors on these data splits: 002B_Run_RFs_script.py
  • Script to visualize the results: 003_Analysis.ipynb. Data required to run this notebook is available on the ETH Research Collection (https://doi.org/10.3929/ethz-c-000798140)

Note that the example scripts write out to results to undescriptive filenames (such as 'results.pkl'), to be changed to appropriate names by the user. The aggregated results CSVs (as used in 003_Analysis.ipynb) were created by aggregating statistics as written to stats.pkl. To obtain these dataframes, one can run the models over all the provided data sets and aggregate the performance metrics into CSVs.

The datasets used to train and evaluate the models are available on the ETH Research Collection (https://doi.org/10.3929/ethz-c-000798140) as CSVs and PyTorch graphs.

Directories

  • src: scripts to support running of the models and visualization of the performance results

Environment

mamba env create -f environment.yml

About

Evaluation of data curation strategies for bioactivity prediction using ChEMBL data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors