We talked to a project team from Georgia Tech with members in the Analytics MS and the Computer Science MS. Shelly Kunkle at Michelin, Melanie Laffin at Capgemini, Katrina Green at Vrbo.com, and Taylor Gift at AT&T collaborated on a project to better evaluate quality of life metrics when planning a move. Instead of just looking at square footage or number of bedrooms, they included cost of living variations, census data, and other metrics to better predict how to make the right choice when planning a move. The team got a perfect score on their project, developed using a suite of tools including Azure and Dataiku. Find out more about how their team did it in this interview and see if you qualify for your own free academic license.
Claire: What was the process of finding data to base the project on like for you?
Shelly: It turned out the Zillow data we found was on a Kaggle competition and it was kind of minimal, only containing data from a few counties in California and with limited features. So then finding official real estate data, Multiple Listing Service or MLS data, ended up being challenging too. Not just anybody can go get that data and there are multiple databases in the US that you would need to individually gain access to. In our search process we found a site that exposed property data for Austin, TX and a small set of San Francisco, CA properties. It is meant to be a platform to allow developers to get their feet wet with real property data, so that was perfect for our project.
We found lots of additional data sources but opted for the free data that we could find, which is primarily government data. In particular, we used census data that we felt could give some flavor of a neighborhood, and IRS tax return data as well for income information.
Taylor: There was plenty of data, like RD, easily accessible.
Shelly: We wanted to use some other data sources but that ended up being cost prohibitive. For example, we wanted to say "How close are you to a Whole Foods?" That would have required using Google Places API for a fee, but given that this was a school project, we weren't going to spend a lot of money and it turned out it was going to be pretty expensive to populate our entire data set calling those API's. So that's why we ended up going with the free government data sources, which ended up being pretty valuable.
Melanie: I know one thing that I particularly wanted to look at was proximity to National Parks. I call it a Green Score, but actually, there weren’t any National parks by Austin. There weren't even any State parks so it might not have actually been representative of what the dimension was that you wanted to show.
Claire: Is there anything in particular that you guys feel that you learned new working on this project or any experience you thought was particularly valuable?
Melanie: It was like drinking from a fire hose,. As far as the project goes, I think the thing that I took away most from it, was the data cleansing and preparation and putting everything together.
Katrina: I agree with Melanie, being able to easily clean and prep the data was my favorite part about Dataiku. We learned a lot of tools in the class, so adding another one was crazy, but Dataiku was so easy to learn.
Claire: Is this a project you would consider expanding on in any way or you turned it in, you’re ready to go onto the next thing?
Katrina: I think in a perfect world I would totally go and expand on this, but there are lots of other classes I still have to take.
Melanie: I would want to if we had a budget for the data. That would have been cool because I want to know if I’m moving to a new area, proximity to a store, like Target, but we didn't have a budget for that, so that's definitely something that I would have wanted to expand on.
Katrina: Being able to use the Google API to pull more information would have been really cool.
Claire: What was the experience like working with Dataiku Academics?
Shelly: I was really appreciative of the team. To get the free licenses was super nice, it made things a lot easier, given the workloads we had going on, with this, other things in the class, personal life, work, that was just really, really helpful.
Melanie: Dataiku totally exceeded my expectations, it blew them out of the water. I had experience with other online sites like Azure and AWS, so I thought it would just be like a clone of that and it wasn't. I am kind of sold on this product, especially with the ability to easily integrate into our web app and publish it. I'm quite the fan.