A Graph-Based Kernel Method for Scientific Machine Learning

Yu-Hang Tang, LBNL
2/5, 2020 at 4:10PM-5PM in https://berkeley.zoom.us/j/186935273

Machine learning and artificial intelligence are keys to transforming scientific research at the Department of Energy. However, success stories so far have concentrated on select forms of data. In this talk, I will present our recent work on revitalizing the marginalized graph kernel to enable direct machine learning on graph datasets while bypassing any explicit feature vector representation. The marginalized graph kernel provides a generic framework that is customizable for learning on diverse types of graphs. I will then introduce GraphDot, a Python package that implements the marginalized graph kernel on general-purpose GPUs. GraphDot significantly reduces the barrier for machine learning scientists to adopt the marginalized graph kernel and interoperates seamlessly with other graph and machine learning packages in Python, such as NetworkX and scikit-learn. The package also delivers thousands of times of speedups against existing CPU-only packages thanks to a set of state-of-the-art algorithms that take advantage of the sparsity and the generalized Kronecker product structure in the linear algebra form of the graph kernel. Finally, as a demonstration, I will showcase how GraphDot can enable an active learning protocol on a molecular database to quickly train models that can accurately predict the energy of molecules within a matter of minutes.