Productionizing Machine Learning Models – Lessons Learned in the Hadoop Ecosystem and the Way Ahead (EN)

Productionizing Machine Learning Models – Lessons Learned in the Hadoop Ecosystem and the Way Ahead (EN)

13:45 - 15:15 | tech.stage | Deep Dive

The deployment of machine learning models can be challenging. Especially in the context of distributed systems: Python being the dominant language among data scientists creates frictions when integrating with JVM-based tools such as Spark or managing application dependencies on clusters of heterogenous machines. Many data scientists developing on such systems struggle with the subtleties of these challenges. This presentation will share lessons learned working on large-scale Hadoop clusters and examine the most promising approaches to alleviate common issues. In particular, we will discuss our experience with leveraging containerization to tackle the dependency management challenge from a data scientist’s point of view.

Steffen Bunzel, Data Scientist, Alexander Thamm GmbH
Simon Weiß, Data Scientist, Alexander Thamm GmbH