Drug Discovery Readiness Audit of EGFR Inhibitors: A Reproducible ChEMBL-to-ADMET Pipeline
We present a fully executable pipeline for assessing the translational viability of bioactive chemical matter from public databases. Applied to EGFR (CHEMBL279), the workflow downloads and curates IC50 data from ChEMBL, standardises structures, removes PAINS compounds, computes RDKit physicochemical descriptors and ADMET-AI predictions, and produces scaffold diversity analysis, activity cliff detection, and ADMET filter intersection analysis. Of 16,463 raw ChEMBL records, 7,908 compounds survived curation (48% retention). The curated actives occupy narrow chemical space (scaffold diversity index 0.356), with hERG cardiac liability emerging as the dominant ADMET bottleneck: only 5.3% of actives are predicted safe, collapsing the all-filter pass rate to 1.2% (95/7,908 compounds). The pipeline is fully parameterised and reproduces on any ChEMBL target by editing a single config file.


