- Data Festival 2019
- Speaker 2019
- Program 2019
- Pre-Conference Workshops
This talk is about how to design a good data science project from scratch based on a real world dataset. As a showcase project we analyze the rental prices for apartments in Berlin.This talk will guide you through all the steps of a short-term data science project: motivation, extraction of data from the web, cleaning and engineering of features using external APIs, storytelling, and building machine learning models. We will dive into the pitfalls and design patterns when scraping data from the web. The importance of the interactive dashboards should not be understated as they help you find useful insights on your own. We will apply the human judgment of the apartment’s address to engineer new features using google API and use correlated features to impute the feature of interest. In the end several machine learning models will be used to explore the idea of bagging and of stacked models.
Jekaterina Kokatjuhha, Data Analyst (Business Excellence), Zalando SE