The development of drug candidates that combine an acceptable biological activity and an appropriate physico-chemical profile is a key challenge. Therefore, in the drug discovery process, physico-chemical properties are important parameters for the characterization of compounds. In the search for new drug candidates, medicinal chemists routinely evaluate data such as biological activities and physico-chemical properties associated to numerous compounds. This is to prioritize the most promising ones for further optimization or study and discard the others. The present workflow helps chemists evaluate whether the compounds possess desirable physico-chemical properties such as solubility, pKa, and Lipinski criteria.
The goal of this project was to provide essential information to all medicinal chemists in a timely manner. All scientists, about one hundred people – and not only MedChem or computational chemists - all benefit from these calculated properties at various stages of the discovery process.
The specific requirements of this project included:
To aid chemists in evaluating whether compounds have desirable physico-chemical properties, a KNIME workflow was developed that that routinely updates compounds registered in the proprietary database, with the corresponding predicted physico-chemical properties (LogP, LogD, LogS and pKa). A commercial program for property calculation (ACD/Labs Percepta) has been coupled with KNIME Analytics Platform and KNIME Server to fully automate this procedure for all new chemical entities registered in the company database. The KNIME workflow, deployed on KNIME Server, is executed automatically at a given time, and results are stored in the Chiesi proprietary corporate DB.
The project started with verifying that ACD/Labs Percepta (batch module) could calculate all the needed properties via command-line and that the results were compatible with standard KNIME nodes – specifically SDF Reader and CSV Reader. A set of molecule structures was then received from a public structure database to use in setting up a properties calculation different enough to cover most calculation problems. Then the structure’s format of the input table coming from the ORACLE™ view was defined (i.e. SDF or SMILES format of molecules, identifier, primary keys, other fields), as well as the output format to write to ORACLE™ tables (table names, fields name and type, accessory columns).
The construction of the workflow looked like this:
This project has resulted in approximately 50,000 compounds with calculated properties over a timeframe of more than 4 years - without any problem or intervention. The biggest impact that this solution has had, is the time saved by scientists who no longer need to calculate properties on demand – as a result, customer satisfaction has also increased considerably. The biggest lesson learned: solving a real-life business case using integration and automation increase productivity and user experience.
Before the project began, two key features were required. The first being the ability to interact (read, write and update permissions) with ORACLE™ Database. The second: the ability to interact via command line with third party software (ACD/Labs Percepta Batch). KNIME Analytics Platform enabled us to do these two things.
To start with, the free and open source KNIME Analytics Platform played an important role – largely due to the significant cost advantages that an open source software has, as well as the number of internal KNIME advocates who had already been using KNIME . Once acceptance for KNIME Analytics Platform grew, getting a license for KNIME Server was much simpler. Furthermore, adoption of KNIME Server was driven by the possibility to solve other use cases across different departments.