When the Ball Stops Bouncing: An NLP Analysis of Naomi Osaka's Press Conferences

Use Cases & Projects, Dataiku Product Kyle Berry and Frank Silva

In this post, we explore press conference transcript data as an example use case for using natural language processing (NLP) in Dataiku. We seek to understand the sentiment and topics of questions asked by reporters and answers given by star tennis players.→ Brand New to Dataiku? Watch the 13-Minute Demo

On May 31, 2021, star tennis player Naomi Osaka withdrew from the French Open after expressing concerns with her mental health. Athletes often withdraw from tournaments for physical injuries, but Osaka’s reasoning to step away was one of the first of its kind. She is a notoriously private person and her introverted, soft-spoken personality is thrust into center stage during press conferences. Osaka skipped her press conference the day before the tournament started and officials were displeased — attend all press conferences or be increasingly fined, they asserted.

Her withdrawal was a widely criticized move by many who cover the sport, even after she opened up in an Instagram post and detailed her bouts of depression over the past few years. “Tour officials have long believed that news conferences are an important part of promoting the sport and the athletes themselves,” one New York Times article cited. Drawing attention to the sport (positive or negative) is foremost on the minds of reporters. This week, Osaka left a press conference in tears after a question on her complicated relationship with the media. We want to explore why.

So, What Are We Trying to Learn?

Reporters often control the narrative around a sport or particular athlete, so what impact do they have on the mental health of athletes? And how do press conferences, which probe the minds of athletes in their most glorious or miserable states of being, play a role in all of this? How does the tone and general sentiment of questions asked during press conferences differ between types of athletes?

Project Overview and Methodology:

To find answers to these questions, we gathered transcripts from Osaka’s and Novak Djokovic’s press conferences over the last five years and then cleansed and aggregated them. We then used Dataiku’s NLP functionality to explore the questions asked, the answers given, and how the sentiments of these exchanges vary after wins or losses.

The analysis performed on the unstructured text data was conducted using the sentiment analysis plugin, which provides a sentiment score between 1-5 (1 being most negative, 5 being most positive) for each “document” or row of data. This plugin can be used by companies to review Twitter data and understand consumer sentiments towards brands, for example, or by hotel chains to analyze mass amounts of online reviews. Each sentiment score receives a prediction of confidence, so in an effort to analyze stronger predictions we removed all records with a confidence score lower than one standard deviation away from the mean. 

reporter sentiment in Dataiku

The other major NLP task performed was topic modeling, which is essentially grouping words that appear frequently together. Topic modeling in Dataiku can currently be done using a predefined notebook template that leverages Python packages. We chose to create 20 topics, consisting of the 10 most commonly grouped words for the questions asked to both Osaka and Djokovic. 

Dataiku flow

Press Conference Sentiments: What Did We Find?

Overall, there were similarities between the sentiment of the questions that Djokovic and Osaka received. However, it was clear that Osaka's responses to those questions were more negative than Djokovic's responses. The average sentiment score of questions Osaka was asked is 3.39. That same score for Djokovic's was 3.48, so slightly more positive overall for Djokovic, but not by much.

tennis player and average sentiment score in Dataiku

What is really interesting is the difference in average sentiment score in Osaka's answers to Djokovic's answers. Overall, the average sentiment score of her answers is 3.22 compared to 3.53 for Djokovic. After a won match, the average score of her answers is 3.29, but when she loses, that number drops down to 3.03. Djokovic's responses to questions, much like Osaka, also vary greatly depending on the result of the match — after a win the average score is 3.61 and that drops to 3.14 after a loss.

Based on this analysis, the questions they receive are fairly similar in terms of sentiment, but the players are clearly affected differently. Osaka uses words with more neutral and negative sentiment than the reporters and Djokovic do, regardless of the outcome, but especially after losses. It is a clear indication of her animus towards press conferences. 

It’s worth noting that even the most advanced NLP functionality still can’t fully grasp human emotion after being asked certain words, phrases, and what kind of emotional response that might trigger. 

Topic Modeling: What Did We Find?

Regarding the topics of the questions, it becomes abundantly clear that Osaka is asked more introspective questions about her feelings and state of mind than Djokovic is. The word “talk” (for example, “talk about” or “talk us through”) appeared nine times for Osaka and only three times for Djokovic. More strikingly, the words “feel” or “feeling” appeared ten different times in the questions topics asked to Osaka and only four times for Djokovic. In fact, the most frequently occurring topic that came out of the questions asked to Osaka were: 

'set', 'match', 'feeling', 'break', 'second', 'point', 'doing', 'feel', 'kind', 'felt' 

The topics for the questions asked to Djokovic contained the words “crowd” and “opponent” twice each, but appeared zero times for Osaka. Once again, this highlights how the questions are less introspective for Djokovic. 

The press is clearly much more curious about Osaka's state of mind — compared to Djokovic's — and the ensuing conversion that results from a probing question. This likely fuels Osaka's anxiety about press conferences and adds to her aversion to sharing personal feelings and emotions. 

What Are the Takeaways?

Through sentiment analysis and topic modeling, NLP in Dataiku can provide valuable insights into consumer preferences on social media, hotel reviews online, and even press conference transcript data. Although the overall sentiment of the questions asked to Osaka and Djokovic does not vary greatly, how they respond to those questions certainly does. It is largely evident that the topics and what they represent, not necessarily the sentiment of the individual words, incite an emotional and negative response from Osaka.

Osaka has competed in 304 singles matches in her professional career, compared to Djokovic's 1,169. He has far more experience in dealing with reporters and the types of questions asked. Media personnel quickly learned, however, that the most effective way of drawing attention to Osaka is by asking personal questions. Their inclination to generate page views, clicks, and discussion supersedes their moral duty to make her comfortable in press conferences. It's the nature of the business, albeit not a pleasant one.

In a 2016 interview, Osaka was asked what her goals were for the future. The 16 year-old, tennis “Newcomer of the Year” responded “Oh, to be the No. 1 and to win a lot of Grand Slams and to play Fed Cup and then to play the Olympics and to be happy.”

No. 1? Check. Grand Slams? Check. Fed Cup? Check. Olympics? Check.

To be happy? That’s much more complicated. Here’s to hoping Osaka can check that one off, too.

If you’re interested in learning more about this project and NLP in Dataiku, please reach out to Kyle Berry and Frank Silva.

You May Also Like

Alteryx to Dataiku: Working With Datasets

Read More

Demystifying Multimodal LLMs

Read More

I Have AWS, Why Do I Need Dataiku?

Read More

Why Data Quality Matters in the Age of Generative AI

Read More