Today, data science is the field of a Big War between Python and R.
As we pointed out in a previous post, “Data Science, Monogamy or Menage à 3” there are ways to make different languages cohabit in data science.
But cohabitation has its own rules, and even if Dataiku Data Science Studio (DSS) can make it smooth, not everyone is ready for it (yet) .
The Long Awaited Solution to everything
After one year of intense development, we are proud to open source PolYamoR, the first forward and reverse-automated translation system for Python and R. PolYamoR is the first multi-lingual translation system that enables full transparency, no ambiguity, and manages all of the edge cases of complex programming. PolYamoR can translate plain Python into plain R and vice versa, leading to an unexpected new era of conversations between cultures.
The source code is available on GitHub today: https://github.com/dataiku/PolYamoR
Modern translation systems rely on deep learning in order to achieve their performance. Of course, PolYamoR is no exception.
We trained PolYamoR by providing millions of lines and Python, millions of lines of R, and their respective translation, training a recurrent neural network.
Of course, the very first translation was crude:
AMAZING GPU clusters at work
After thousands of hours of training, involving a 20 nodes clusters with dozen of GPUs, PolYAmoR managed to produce, clean, manageable code (even in R). The code generated by the tool can be very long though:
Lost in translation ?
PolYamoR was originally written in Python, but after a programming error by a team member on Friday night , the program decided to translates itself in R. After a change of mind, PolYamoR is now half Python, half R, and stable enough for production use.
We are confident that PolYamoR will change the way data science teams collaborate on a day-to-day basis. Maybe some day a lingua franca will emerge across the layers of the system and will unify the work of all data scientists. But in the meantime, get some fresh R if you want, bite the Python if you like, and keep having fun with data science until next year!