Alessandra Lumini and Loris Nanni* Pages 4007 - 4012 ( 6 )
Background: Anatomical Therapeutic Chemical (ATC) classification of unknown compound has raised high significance for both drug development and basic research. The ATC system is a multi-label classification system proposed by the World Health Organization (WHO), which categorizes drugs into classes according to their therapeutic effects and characteristics. This system comprises five levels and includes several classes in each level; the first level includes 14 main overlapping classes. The ATC classification system simultaneously considers anatomical distribution, therapeutic effects, and chemical characteristics, the prediction for an unknown compound of its ATC classes is an essential problem, since such a prediction could be used to deduce not only a compound’s possible active ingredients but also its therapeutic, pharmacological, and chemical properties. Nevertheless, the problem of automatic prediction is very challenging due to the high variability of the samples and the presence of overlapping among classes, resulting in multiple predictions and making machine learning extremely difficult.
Methods: In this paper, we propose a multi-label classifier system based on deep learned features to infer the ATC classification. The system is based on a 2D representation of the samples: first a 1D feature vector is obtained extracting information about a compound’s chemical-chemical interaction and its structural and fingerprint similarities to other compounds belonging to the different ATC classes, then the original 1D feature vector is reshaped to obtain a 2D matrix representation of the compound. Finally, a convolutional neural network (CNN) is trained and used as a feature extractor. Two general purpose classifiers designed for multi-label classification are trained using the deep learned features and resulting scores are fused by the average rule.
Results: Experimental evaluation based on rigorous cross-validation demonstrates the superior prediction quality of this method compared to other state-of-the-art approaches developed for this problem.
Conclusion: Extensive experiments demonstrate that the new predictor, based on CNN, outperforms other existing predictors in the literature in almost all the five metrics used to examine the performance for multi-label systems, particularly in the “absolute true” rate and the “absolute false” rate, the two most significant indexes. Matlab code will be available at https://github.com/LorisNanni.
Anatomical therapeutic chemical, drug development, convolutional neural network, deep learned features, chemical properties, fingerprint.
DISI, Università di Bologna, Campus di Cesena, Via Macchiavelli, 47521 Cesena, DEI - University of Padova, Via Gradenigo, 6 - 35131- Padova