Big data machine learning to explore ocean model output
- Prof. Dr. Arne Biastoch, GEOMAR Helmholtz Centre for Ocean Research Kiel, Ocean Dynamics, email@example.com
- Prof. Dr. Anand Srivastav, University of Kiel, Department of Mathematics, firstname.lastname@example.org
Disciplines: physical oceanography, applied mathematics, computer science
Keywords: physical ocean modelling, machine learning, numerical algorithms
Motivation: High-resolution ocean models produce large amounts of output data, typically in the range of tens to hundreds of terabytes. The physical oceanographer analyses the 4D output in space and time in respect to specific hypotheses and analyses, leaving the majority of data unexplored. This PhD topic is to research the spatio-temporal output of ocean models with big data techniques. The aim is to identify patterns that point to numerical and computational specifics, maybe find inconsistencies in the used software codes. Ultimately, they should lead to new physical understanding which helps the domain scientists to derive and test new hypotheses
The challenge is to cope with only limited RAM (main memory) of the computer, where all computations take place. Today, expensive platforms with about 1 TB RAM are available, but this is the technical limitation. Modern methods for the analysis or for computational problems with big data addresses either external memory or streaming algorithms. In this project we will develop new machine learning algorithms with supervised learning (annotated data) as well as unsupervised learning using new algorithmic techniques for big data computations.
Aim: In the project we will investigate hypotheses from oceanography. We aim to design taylor-made algorithms in the limited memory models of computation and test them on HPC platforms with maximum available RAM. Such techniques make sense, when the hypothesis leads to a formalizable goal implantable into mathematical search or optimization problems. If the hypothesis is “blind”, for example, if we wish to learn unknown patterns, new machine learning methods are required. Here, we wish to develop algorithms for the specific problems based on recent breakthrough in stochastic gradient decent (Allen-Zhu, 2018).
Objectives: (1) Develop algorithm for 4D ocean models and train with small data samples, (2) Improve and optimize memory usage, (3) Apply to high-resolution ocean model output.
- Allen-Zhu, Z., 2018. Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization. Proceedings of the 35th International Conference on Machine Learning, ICML 2018:179-185
- Biastoch, A., Durgadoo, J. V, Morrison, A.K., van Sebille, E., Weijer, W., Griffies, S.M., 2015. Atlantic Multi-decadal Oscillation covaries with Agulhas leakage. Nat. Commun. 6, 10082. doi:10.1038/ncomms10082
- Wedemeyer, A., Kliemann, L., Srivastav, A., Schielke, C., Reusch, T.B., Rosenstiel, P., 2017. An improved filtering algorithm for big read datasets and its application to single-cell assembly. BMC Bioinformatics 18, 324. doi:10.1186/s12859-017-1724-7