Exploring San Francisco Open Data with DSS

Data Visualization| data science| Technology | | Hanna Julienne

Hi, I am Hanna and I'am girl data scientist (the rarest unicorn) at dataiku and I am going to tell you how much fun I had exploring San Francisco open data with the Data Science Studio version 2.0!

San Francisco has a very progressive and efficient politics in terms of open data. The city provides clean and fascinating datasets on various subjects : SFPD Crime Incidents, Business locations and even Film locations). This politics is fruitful and leads to numerous fun and insightful exploitations.

tenderloin san francisco.jpg

Among these datasets, the SFPD Crime Incident dataset is very fascinating. Indeed, all incidents that happened from 1/1/2003 up to two weeks ago are reported with the type of incident, date, hour and the latitude and longitude. A real treat for a Datascientist!

I retrieved the dataset and uploaded it in DSS was as simple as draging the file into a drop zone. A small preparation script later, I had parsed the date, extracted date component such day of the week, month, week of the year and I was ready to graphically explore the dataset. I had hardly worked!

Graphical exploration with DSS charts

Before going any further I had to search for patterns and tendencies in my dataset. Again it was real easy with the DSS charts. I just dragged the features I wanted to explore and made several instructive barchart in no time.

Notably, I remarked that:

  • Crime incident are clearly not uniformly scattered across the city districts.

Crime District

  • Seems like each district have its "specialty". This suggest that the spatial repartition of crimes depends on the category of the crime.

Crime Dist Cat

  • Also, crimes follow a very strong pattern accross the day. There is much less crimes early in the morning. This makes senses everybody is sleepling. More surprisingly, the most dangerous hour is 6 p.m. Who would have guessed?

Crime Hours

Ok , I now know for sure that crimes in san francisco have a strong time and spatial repartition. I also know that different type of crimes don't all follow the same spatial repartition. They may also not happen at the same time.

What could i do to explore further these possibilities? An interactive dashboard of course!

Making a beautiful interactive dashboard in DSS

I would like to be able to see the spatial and temporal repartition of crimes in San francisco by categories. In other words, I want to display 4 dimensions on a 2 dimensionals layout. I will achieve this by programming an interactive dasboard that include a map, a barchart of the number of crime by hour, and a tool to filter by category of crime.

I used the neat DSS web app editor:

Web App

The map

First, I drawed the map using a JS code snippet given in our web app editor. Added two sliders to select the degree of spatial aggregation and the year and here is my map :

Map SmallMap big

The more red the more the number of crimes is high. I now can explore the spatial repartition of crime in San Francisco.

Crime category filter

Let's add the possibility to filter by crime category. I could have added a plain selector to filter by category. But I am a datascientist so I told myself "Hey, wouldn't it be cool that the element enabling to filter by category carry information". Instead of a selector, I used a D3.js icicle to display the proportion of categories and sub categories.

Map Icic1

A few tricks latter clicking on the icicle enable to filter by the category or sub category:

Map Icic2

The time repartition

The last element I added is a barchart of hours. When I change the year or filter by a new category the barchart is updated accordingly.

Let's filter by thefts from a person, we see that it happens mostly downtown and during the afternoon:

Thefts From Person

Whereas vandalism occurs during the evening and is more scattered through the city :


Fascinating! I am going to spend time exploring San Francisco crime dataset to search other patterns. Jealous? Don't be. If you want to try the app, you can download the free DSS community edition and email me to retrieve my project (hanna.julienne@dataiku.com).

You can even learn how to build the San Francisco crime map our Web App tutorial in the learn section.

In a following post, we will try to predict when crime will happen in specified area of San Francisco with DSS. Stay tuned!

Try Dataiku DSS

Other Content You May Like