How to Activate your Academic Project with Data

Use Cases & Projects Claire Carroll

We talked to a project team from Georgia Tech with members in the Analytics MS and the Computer Science MS. Shelly Kunkle at Michelin, Melanie Laffin at Capgemini, Katrina Green at Vrbo.com, and Taylor Gift at AT&T collaborated on a project to better evaluate quality of life metrics when planning a move. Instead of just looking at square footage or number of bedrooms, they included cost of living variations, census data, and other metrics to better predict how to make the right choice when planning a move. The team got a perfect score on their project, developed using a suite of tools including Azure and Dataiku. Find out more about how their team did it in this interview and see if you qualify for your own free academic license.

dox in a cardboard box

Claire: What was the process of finding data to base the project on like for you?

Shelly: It turned out the Zillow data we found was on a Kaggle competition and it was kind of minimal, only containing data from a few counties in California and with limited features. So then finding official real estate data, Multiple Listing Service or MLS data,  ended up being challenging too. Not just anybody can go get that data and there are multiple databases in the US that you would need to individually gain access to. In our search process we found a site that exposed property data for Austin, TX and a small set of San Francisco, CA properties.  It is meant to be a platform to allow developers to get their feet wet with real property data, so that was perfect for our project.

We found lots of additional data sources but opted for the free data that we could find, which is primarily government data. In particular, we used census data that we felt could give some flavor of a neighborhood, and IRS tax return data as well for income information.

Taylor: There was plenty of data, like RD, easily accessible.san francisco houses

Shelly: We wanted to use some other data sources but that ended up being cost prohibitive. For example, we wanted to say "How close are you to a Whole Foods?" That would have required using Google Places API for a fee, but given that this was a school project, we weren't going to spend a lot of money and it turned out it was going to be pretty expensive to populate our entire data set calling those API's. So that's why we ended up going with the free government data sources, which ended up being pretty valuable.

Melanie: I know one thing that I particularly wanted to look at was proximity to National Parks. I call it a Green Score, but actually, there weren’t any National parks by Austin. There weren't even any State parks so it might not have actually been representative of what the dimension was that you wanted to show.

Claire: Is there anything new that you guys feel that you learned working on this project or any experience you thought was particularly valuable?

Shelly: The class has been a crazy learning curve, so for someone like me who has never done JavaScript, that was challenging. So we learned a ton in the class. It's all about big data tools and visualization tools, all the different cloud tools. I don't know how many different languages we used in class, but I would say for me for the project specifically was more about having a real use case for clustering and seeing the web app being developed.

Melanie: It was  like drinking from a fire hose. As far as the project goes, the thing that I took away from it, was the data cleansing and preparation and putting everything together.

Katrina: I agree with Melanie, being able to easily clean and prep the data was my favorite part about Dataiku. We learned a lot of tools in the class, so adding another one was crazy, but  Dataiku was so easy to learn.

Claire: Is this a project you would consider expanding on in any way or did you turn it in and are ready to go onto the next thing?

Katrina: I think in a perfect world I would totally go and expand on this, but there are lots of other classes I still have to take.

Melanie: I would want to if we had a budget for the data. That would have been cool because I want to know if I’m moving to a new area, proximity to a store, like Target, but we didn't have a budget for that, so that's definitely something that I would have wanted to expand on.

Katrina: Being able to use the Google API to pull more information would have been really cool.

Claire: What was the experience like working with Dataiku Academics?

Shelly: I was really appreciative of the team. To get the free licenses was super nice,. It made things a lot easier, given the workloads we had going on with this, other things in the class, personal life, work, that was just really, really helpful.

Melanie: Dataiku totally exceeded my expectations, it blew them out of the water. I had experience with other online sites like Azure and AWS, so I thought it would just be like a clone of that and it wasn't. I am sold on this product, especially with the ability to easily integrate into our web app and publish it. I'm quite the fan.


 

casa pic

You May Also Like

Taming LLM Outputs: Your Guide to Structured Text Generation

Read More

No-Code ML and GenAI With Dataiku and Fabric

Read More

The Objects of an LLM Mesh for Building LLM-Powered Applications

Read More

Data Lineage: The Key to Impact and Root Cause Analysis

Read More