With over 100,000 participants, 3,000 technical institutions, and 200 organizations involved, the Smart India Hackathon (SIH) is one of the biggest student software and hardware hackathons in the world. It is a 36-hour-long digital product development competition where students are encouraged to use the latest technology to solve present-day problems, and thus form a culture of innovation and a problem solving mindset. In the 2019 edition, the students also had the opportunity to work on challenges faced within the private sector organizations and create world-class solutions for some of the top companies in the world.
One of the winning teams of the prestigious competition - team “Dropout” of the Mahindra École Centrale, a leading technology school based at Hyderabad, worked on a complex big data problem and came up with an innovative solution for the healthcare and pharmaceutical industry using the Dataiku Community edition. We met with one of the team members, technology student Jayesh Jain (JJ), as well as their professor and project mentor, data scientist Dr. Achal Agrawal (AA), to tell us more about the fascinating initiative.
Q: How does it feel to be the winners of one of the biggest hackathons in the world? Do you feel famous yet? ;-)
JJ: I don’t really feel famous, but it’s a good achievement. We worked on solving a problem that really matters and Dataiku helped us a lot in doing that. We also weren’t the only ones to win the hackathon, there were 400 problem statements, and for each of them there was one winner.
Q: What was your problem statement?
JJ: The problem statement was to analyze the market for a pharmaceutical company and give a strategic recommendation based on the data available as to whether a company should invest in a particular country or not.
Q: What were the biggest challenges or obstacles you encountered in the process? How did you address them?
JJ: The main challenge was getting the data. We weren’t provided with any data, so we had to screen the internet to obtain it. We ran through all the websites like I-NURSE etc. to get all the necessary data to analyze the market of a particular country using different metrics, such as GDP, what is the market of specific types of medicine or what diseases are most present in a given country, how is medicine distributed, etc.
The problem was challenging as we had to scrap and integrate multiple data sources. In addition, there were a lot of missing values, especially for underdeveloped countries which were the main countries of interest for the client. Various data imputation techniques needed to be employed. In the end, we chose 37 features for our analysis based on their pertinence and availability of data.
The next part was applying an AI algorithm to it to predict how the market is going to evolve over the next years, but collecting and ensuring the viability of the data was by far the most difficult part.
Q: How many people were on your team? How did you divide the tasks?
JJ: We were a team of six people along with two mentors. We divided the team in three parts: two people were working on gathering the data, two people were in charge of modeling the data and two people were working on representing the model and the final product. Our mentors were helping us along the way with conceptualizing the data. Thanks to their expertise and specific domain knowledge, which we did not have, they were able to help us figure out whether the different data we collected was actually relevant or not.
Q: Tell us about the solution you came up with and the final product you presented.
JJ: We created a simple UI which gave us a graph showing the history and predicting the future trends in the market for each country. The graph was composed of an “internal index” on the X axis and an “external index” on the Y axis. The internal index represents how strong the pharmaceutical company is in a given country: if it has a high market share, then the internal index for that particular country will be high. The external index shows, based on relevant indicators, how well developed the pharmaceuticals market is in a particular country.
The relevant positioning of the company for a given country on these two axes becomes the basis for strategic recommendation as to whether it should invest in this market or not. If both the internal and external indexes are high, this means the company is well established in that country and should continue doing business there. If, however, the external index is high, but the internal is low, this means the country has a lot of potential and the company should focus on increasing its market share there.
Q: How did you choose to use Dataiku? What competitive advantages did the platform offer and what features were most useful to your project?
AA: I was actually the one who introduced it to my students. I was in France two years ago and I heard about the Dataiku platform from a friend of mine who knew one of the Dataiku founders. Initially I was very skeptical, because at the time there were already multiple other platforms that had been around for longer … So I tried a few different platforms and I found Dataiku to be the most intuitive and user-friendly one. One thing I particularly enjoyed was being able to collaborate with other teams on various projects.
So that’s how I found out about Dataiku. When I realized we’ll have a significant time constraint for the hackathon, I recommended Dataiku to my students, and they were all delighted by it.
JJ: Dataiku allowed us to analyze the correlations between different parameters and decide which ones are the most relevant for our output. So Dataiku helped us a lot in eliminating some of the features. As I mentioned, we originally had 37 features, which is impossible to work with, but we were able to narrow them down to 11 using Principle Component Analysis.
There was another advantage when modeling the data as well. In Dataiku, you have a drag and drop tool which allows you to simply select the model you want to run and instantly apply it to the target you want to use. This lets you experiment with and try a lot of different models without wasting too much time. If we had to code all of this, it would’ve taken much longer. Given our 36-hour time limit, using Dataiku helped us optimize and save time.
Q: You applied AI and data science solution to a healthcare-related problem. Is this something you would like to work on in the future as well?
AA: We’re very interested in applying AI for social good. Technology is currently being applied in a lot of domains, but a lot of them do not respond to social needs. I think in a place like India where we have many social issues, and especially with healthcare, AI could actually have a big positive impact. We’re hoping to work on more socially beneficial AI projects in the future as well.