HPCC Systems | PhD Student | Humboldt-Universität zu Berlin |
Fabian studied computer science and worked as a consultant for project management, process optimization, and related software solutions. He is currently writing his PhD thesis on set-similarity joins and search on Big Data. Set-similarity is useful for entity linkage, record deduplication, and plagiarism detection. He showed that existing distributed approaches on Hadoop/MapReduce are not scalable to large amounts of data. In his work, he creates a framework that mitigates the scalability problem by evenly distributing the compute load over the cluster and regarding system restrictions such as limited main memory. He shows the practical applicability of the framework by implementing it on HPCC. He optimizes the local join execution by exploiting multicore computation and cache-aware memory access patterns.