Incorporating Twitter (And Other Social) Data into Your Big Data Strategy

Data Visualization| Data Preparation| Data analysis | | Lynn Heidmann

Data from social networks, particularly Twitter, Facebook, LinkedIn, Instagram, Foursquare, and Meetup, is a trove of valuable insight into the mentality, behaviors, and preferences of consumers. And since much of social media data is public, a wealth of information sits at the fingertips of any company ready to tap into it.

Gone are the days where manual or small-scale analysis of social media data is possible, so when it comes to big data and social, companies often don’t know where to start. In general, there are two ways businesses today use social media data as a part of their big data strategy:

  • Social listening: This is the most common way companies today use social data, and it involves analyzing a specific set of data (usually text) from one or more medium in order to derive some insight and take action based on that insight.
  • Predictive modeling:  Good predictive use cases for social media data are rare, but those that exist are powerful and employ advanced machine learning techniques (like deep learning, especially for learning from image recognition on visual networks like Instagram and Pinterest). As opposed to social listening where the end goal is simply making sense of the data that exists, the goal here is to take social data and use it to predict future unknowns. To date, a wide range of models have been used successfully for predictive analytics with social data including Regression, Neural Network, SVM, Decision Trees, ARIMA, Dynamic Systems, Bayesian Networks, and combined models. If you’re interested in getting into deeper detail and some specific use cases, this is an excellent read.

But the first step in incorporating social media data into a big data strategy is to have a firm grasp on the end goal.


Social data is abundant and can be insightful, but how can businesses use it effectively?

Ultimately, social media data can’t exist in a vacuum - analyzing it isn’t usually the goal per say; it’s a step along the way to a larger business goal. Here are a few examples of business goals for which including or incorporating social media data analysis can be powerful:



Even with an end goal in mind, proceed with caution; incorporating social media data into a data project seems like a worthwhile endeavor in this day and age, partially for the reasons mentioned above - it’s abundant, and (seemingly) free. But successfully incorporating social media into a data project can be more challenging than meets the eye because:

  1. Social media data is unstructured (as opposed to traditional structured data - here’s a good explanation of the difference) and can be more difficult to work with partially because unstructured data sets tend to be so large. But also, traditional systems - like relational database management systems - were not built for unstructured data, so it’s harder to analyze without proper tools that help parse, organize, manage, and make sense of it. Cutting-edge companies working with social data (particularly images) are turning more toward deep learning to bring meaning to large-scale unlabeled data. So today, it is actually the preprocessing of social data that is widely considered to be the most challenging component from a computational, big data analytics perspective.structured-unstructured_data.png
  2. Depending on exactly what type of social media data you want to use, there can be an issue of too much noise to be valuable (e.g., looking at a popular hashtag on Twitter) or not enough data to be valuable (e.g., analyzing company social media mentions for a brand that doesn’t see very much social traffic). Both factors can limit how advanced a predictive model based on social data can be.
  3. Often, analyzing social media data in and of itself doesn’t turn out to be very useful, so many companies fail to see value from these projects. For example, let’s say via analysis that you find a spike in buzz around your product on social media. That’s really only half the story - does this also correlate with a spike in sales? How many of those talking about the product on social media actually made a purchase? This problem becomes compounded if you have massive amounts of real time social data, but, for example, sales numbers are only available quarterly (or even monthly) - the massive lag between explanatory factors and outcome often proves to be difficult to identify and use.
  4. To get the other half of the story, some businesses try to combine social data with other data to get more value out of the analysis - this is a great idea (when it works, it’s best practice). But when combining social media data with other sources (like transactional data), it’s often difficult to tie a customer’s identity online to their customer ID, or the method being used to identify them in your company’s systems. If tying these two identities proves to be impossible, then it won’t be possible to build effective predictive models based on social data; you would never be able to test whether an observed behavior in social data lead to a desired behavior.
  5. When consumers post about businesses or services, there’s generally a bias toward extremes - that is, posts are dominated by the very satisfied and the very unsatisfied. Also, certain populations (like males vs. females, specific age ranges, or even specific geographies) may be overrepresented. Questioning social data and analyzing it for cognitive biases before using it as part of a larger strategy is critical to preventing errors due to poor context.

Get Started

If you’ve decided to make the leap despite its challenges, a data science platform can speed up time spent on data prep for social data as well as make the project accessible to someone without coding or machine learning experience. Check out this post on predicting ISIS association based on Tweet content or this one looking at sentiment toward public transit for examples.

Or watch the video to see how one of our partners, Hewlett Packard Enterprise, along with Luciad analyzed millions of Tweets using Dataiku and built a visualization of the extracted, classified, and clustered Tweets:

If you’re feeling ready to start incorporating social media into your big data strategy, connect to your favorite social API and try Dataiku Data Science Studio for an easy (and free!) way to get started.


Try Dataiku

Other Content You May Like