Molecule mining
(Redirected from Molecule kernel)
This article may require cleanup to meet Wikipedia's quality standards. The specific problem is: The article is just lists. The reason for each list needs an introduction. (September 2024) |
Molecule mining is the process of data mining, or extracting and discovering patterns, as applied to molecules. Since molecules may be represented by molecular graphs, this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data instances. One way to do this is chemical similarity metrics, which has a long tradition in the field of cheminformatics.
Typical approaches to calculate chemical similarities use chemical fingerprints, but this loses the underlying information about the molecule topology. Mining the molecular graphs directly avoids this problem. So does the inverse QSAR problem which is preferable for vectorial mappings.
Coding(Moleculei,Moleculej≠i)
[edit | edit source]Kernel methods
[edit | edit source]- Marginalized graph kernel[1]
- Optimal assignment kernel[2][3][4]
- Pharmacophore kernel[5]
- C++ (and R) implementation combining
Maximum common graph methods
[edit | edit source]- MCS-HSCS[9] (Highest Scoring Common Substructure (HSCS) ranking strategy for single MCS)
- Small Molecule Subgraph Detector (SMSD)[10]- is a Java-based software library for calculating Maximum Common Subgraph (MCS) between small molecules. This will help us to find similarity/distance between two molecules. MCS is also used for screening drug like compounds by hitting molecules, which share common subgraph (substructure).[11]
Coding(Moleculei)
[edit | edit source]Molecular query methods
[edit | edit source]- Warmr[12][13]
- AGM[14][15]
- PolyFARM[16]
- FSG[17][18]
- MolFea[19]
- MoFa/MoSS[20][21][22]
- Gaston[23]
- LAZAR[24]
- ParMol[25] (contains MoFa, FFSM, gSpan, and Gaston)
- optimized gSpan[26][27]
- SMIREP[28]
- DMax[29]
- SAm/AIm/RHC[30]
- AFGen[31]
- gRed[32]
- G-Hash[33]
Methods based on special architectures of neural networks
[edit | edit source]See also
[edit | edit source]References
[edit | edit source]- ^ a b H. Kashima, K. Tsuda, A. Inokuchi, Marginalized Kernels Between Labeled Graphs, The 20th International Conference on Machine Learning (ICML2003), 2003. PDF
- ^ H. Fröhlich, J. K. Wegner, A. Zell, Optimal Assignment Kernels For Attributed Molecular Graphs, The 22nd International Conference on Machine Learning (ICML 2005), Omnipress, Madison, WI, USA, 2005, 225-232. PDF
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ H. Fröhlich, J. K. Wegner, A. Zell, Assignment Kernels For Chemical Compounds, International Joint Conference on Neural Networks 2005 (IJCNN'05), 2005, 913-918. CiteSeer
- ^ a b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ L. Dehaspe, H. Toivonen, King, Finding frequent substructures in chemical compounds, 4th International Conference on Knowledge Discovery and Data Mining, AAAI Press., 1998, 30-36.
- ^ A. Inokuchi, T. Washio, T. Okada, H. Motoda, Applying the Apriori-based Graph Mining Method to Mutagenesis Data Analysis, Journal of Computer Aided Chemistry, 2001;, 2, 87-92.
- ^ A. Inokuchi, T. Washio, K. Nishimura, H. Motoda, A Fast Algorithm for Mining Frequent Connected Subgraphs, IBM Research, Tokyo Research Laboratory, 2002.
- ^ A. Clare, R. D. King, Data mining the yeast genome in a lazy functional language, Practical Aspects of Declarative Languages (PADL2003), 2003.
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ T. Meinl, C. Borgelt, M. R. Berthold, Discriminative Closed Fragment Mining and Perfect Extensions in MoFa, Proceedings of the Second Starting AI Researchers Symposium (STAIRS 2004), 2004.
- ^ T. Meinl, C. Borgelt, M. R. Berthold, M. Philippsen, Mining Fragments with Fuzzy Chains in Molecular Databases, Second International Workshop on Mining Graphs, Trees and Sequences (MGTS2004), 2004.
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ S. Nijssen, J. N. Kok. Frequent Graph Mining and its Application to Molecular Databases, Proceedings of the 2004 IEEE Conference on Systems, Man & Cybernetics (SMC2004), 2004.
- ^ C. Helma, Predictive Toxicology, CRC Press, 2005.
- ^ M. Wörlein, Extension and parallelization of a graph-mining-algorithm, Friedrich-Alexander-Universität, 2006. PDF
- ^ K. Jahn, S. Kramer, Optimizing gSpan for Molecular Datasets, Proceedings of the Third International Workshop on Mining Graphs, Trees and Sequences (MGTS-2005), 2005.
- ^ X. Yan, J. Han, gSpan: Graph-Based Substructure Pattern Mining, Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), IEEE Computer Society, 2002, 721-724.
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ A. Gago Alonso, J.E. Medina Pagola, J.A. Carrasco-Ochoa and J.F. Martínez-Trinidad Mining Connected Subgraph Mining Reducing the Number of Candidates, Proc. of ECML--PKDD, pp. 365–376, 2008.
- ^ Xiaohong Wang, Jun Huan, Aaron Smalter, Gerald Lushington, Application of Kernel Functions for Accurate Similarity Search in Large Chemical Databases , BMC Bioinformatics Vol. 11 (Suppl 3):S8 2010.
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
Further reading
[edit | edit source]- Schölkopf, B., K. Tsuda and J. P. Vert: Kernel Methods in Computational Biology, MIT Press, Cambridge, MA, 2004.
- R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, John Wiley & Sons, 2001. Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- Gusfield, D., Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997. Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
- R. Todeschini, V. Consonni, Handbook of Molecular Descriptors, Wiley-VCH, 2000. Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
External links
[edit | edit source]- Small Molecule Subgraph Detector (SMSD) - is a Java-based software library for calculating Maximum Common Subgraph (MCS) between small molecules.
- 5th International Workshop on Mining and Learning with Graphs, 2007
- Overview for 2006
- Molecule mining (basic chemical expert systems)
- ParMol and master thesis documentation - Java - Open source - Distributed mining - Benchmark algorithm library
- TU München - Kramer group
- Molecule mining (advanced chemical expert systems)
- DMax Chemistry Assistant - commercial software
- AFGen - Software for generating fragment-based descriptors