Open Source versus Commercial Software for Advanced Analytics

Open source software is everywhere in advanced analytics. In data preparation, visualisation and mathematical modelling languages such as R, Python and Scala as well as graphic design tools like WEKA, RapidMiner and KNIME are widespread. However, the use of open source software also raises some issues. Many data scientists rely on languages ​​such as R and Python because they are free and run on a range of platforms while also offering very high functionality. So it is not surprising that every data scientist brings their own tool, as initial hurdles are low. For a data lab and the company’s IT, however, the question arises how a multitude of new software solutions can be integrated into the company’s software landscape and IT architecture. Another important consideration when processing large data volumes is the limited performance of open source languages. Likewise, this also plays an important role in the operationalisation of advanced analytics solutions. With open source languages it can be difficult to deal with a large number of queries within short periods of time or even to analyse data in real time. In the course of model deployment there is also the issue of how models that are developed can be integrated directly into operative applications. Operative applications are often based on languages such as Java, Java Script, C or Ruby, so that it is impossible to integrate R or Python code directly. Another issue during the operationalisation relates to model management opportunities. A finished model that is in operation is subject to controls, new data input and versioning. With open source languages in each specific case ​​functionalities must be created to manage the solutions. For many of these issues solutions are available in the open source field or rather many commercial suppliers have created solutions that make it possible to handle them. The strengths of commercial solutions predominantly lie in the areas of visualisation, operationalisation, collaboration and model management.

Since open source software is widespread amongst data scientists and offers wide-ranging functionalities in the data processing and modelling fields, there are now also numerous possibilities to integrate open source solutions into commercial advanced analytics platforms and to thereby address the issues mentioned above. On the user side issues mostly relate to whether all components of the open source advanced analytics ecosystem, i.e. engines, libraries and IDEs can be integrated via commercial platforms, whether developments can take place in a single open source platform and what opportunities exist to increase performance.

More information about commercial software for advanced analytics and integration opportunities for open source software can be found in the BARC Score Advanced Analytics Platforms. BARC supports software selections via MyScore offers that evaluate relevant providers based on existing solutions and architectures to guarantee objective software selection processes.