Dear Jennifer Morgan,

We want to thank you for letting us take on the challenge of better understanding the climate change landscape in the news. We believe that this data-driven approach can complement Greenpeace’s existing – more qualitative – methods and provide you with a coherent understanding of how the climate change discourse has evolved within the news. After three months of extensive research, we have reached the point where our findings will bring a tangible improvement to Greenpeace’s core mission.

Attached you will find our final report.

Sincerely your data scienstist team,



“Global warming is likely to be the greatest threat of the 21st century.”

Climate change refers to long-term shifts in temperatures and weather patterns. These shifts can be natural, such as through variations in the solar cycle. Nevertheless, since the 1800s, human activities have been the main driver of climate change, more precisely in global warming. Indeed, in the last decades we observe a general increase in average temperatures of the Earth, which modifies the weather balances and ecosystems. At the pace of current CO2 emissions, scientists expect an increase of between 1.5° and 5.3°C in average temperature by 2100. If no action is taken, it will have harmful consequences to humanity and the biosphere.

It is critical that these scientific facts be widely dissimenated and understood by the general population. This will empower the public to take the necessary steps to curb climate change and reduce their carbon footprint. This topic is hugely relevant today and for future generaitons, and it is critical that we understand how the media discusses climate change to form our campaigns and future research efforts.


We want to understand how to create a productive awareness campaign about climate change. Using “Quotebank” [1] and Robertson’s [2] website political scores, we quantify the climate change landscape in the media over the last five years. This article will serve as our foundation when forming our future climate change awareness campaigns.

Research Questions

Throughout this article we will answer the following questions:

  • What were the topics and events that triggered conversation about climate change?
  • Who are the main personalities driving the climate change discussion?
  • Which news sites focus most on climate change?
    • What issues do they focus on?
  • How are the issues politicized?
  • Is climate change getting more polarized?

The data

We built our dataset from Quotebank, an open corpus of 178 million quotations attributed to the speakers who uttered them. These quotes were extracted from 162 million English news articles published between 2015 and 2020. Additionally, we filtered the dataset and took only climate change related quotes.

We wanted to have as much data as possible to have an accurate analysis. The final climate change database is composed of:

   Quotes                            Speakers                            Domains  
   260'924                            178'716                             7'782 

Now let’s turn to analyzing the data!

Is conversation about climate change constant over time?

Let’s start broadly by analysing the evolution of climate change quotes in the last 5 years:

Figure 1: Number of quotes about climate change over time.

We see that the climate change discussion has evolved over the years, featuring several peaks that correspond to key events taking place globally. We aim to track these key events by analyzing events that took place throughout the years.

  • November 2015 : “Pope Francis encourages bishops from around the world to sign an appeal to world leaders, meeting in Paris next month, for crucial climate change talks.”
  • June 2017 : The 1st of June “United States withdrawal from the Paris Agreement”
  • June 2019: “The House of Representatives of the Netherlands passes the final bill of the climate agreement. The goal of the accord is to have the level of greenhouse gasses in the atmosphere in 2030 the same as the level of greenhouse gasses in the atmosphere in 1990.”
  • September 2019: “Millions of young people take to the streets and numerous businesses worldwide go on strike days before the UN Climate Summit, demanding that further action be taken to confront climate change.”

Main Topics of Climate Change

Technical aside: Thanks to LDA we are able to determine different topics within the climate change quote!

If there is one thing Greenpeace knows it’s that the way climate change is invoked can vary widely. Some people may focus on its business implications, others will be more concerned with the environmental consequences. To better grasp this nuance, we run topic modelling to uncover some of the latent concepts that are invoked with climate change. Below are the thirty topics we found, alongside the top 20 words for each topic.

Figure 2: 30 topics and the top 20 words in each topic.

We observe a diversity of subtopics like financial, environmental, society, political aspects and many more. These topics uncover the subtext that is present when climate change quotes are invoked.

By understanding the issues that are intrinsic to climate change, we will be able to see which issues have been raised at which time.

Evolution of topics over time

topic distribution
Figure 3: Monthly occurance of topics between 2015 and 2020. The database is missing data for a few months in 2016.

Tracking the evolution of climate interest reveals hot-topics and how they evolve. For example, while the Paris Agreement was a hot-topic in November 2015 and 2017 (when Trump pulled out), its interest declined in other months. On the other hand, interest for sustainability and finance tends to persist more consistently. Additionally, the Eurozone and Europe were key issues with climate change in the beginning of 2015.

In general, we find that pressing issues like gases, impact on vulnerability, power, footprint and sustainability have a lasting interest. But topics related to events like the Paris Agreement, the strikes in 2019, or even what Trump receive less consistent attention.

Are more people talking about climate change?

We performed a linear trend curve and observed a minimal increase of quotes per month that was not statistically significant. Indeed, the flat slope indicates that climate change did not see a large increase throughout the years. Furthermore, the spikes in the graph reveal the events the trigger conversation about climate change.

Who talks about climate change?

Over the last five years, here are the people that were most quoted in relation to climate change:

Do the most quoted people talk mostly about climate change?

Figure 4: Most quoted speakers.

It seems not! Surprisingly, on average a third of speakers’ quotes are about climate change.

Are there trends between the most quoted speakers? To answer this question we delve into the backgrounds of the speakers on climate change. Metadata on the speakers was extracted from Wikidata, a large knowledge base containing volunteer-inputed information about entities. We find that climatologists and scientists more generally are more likely to be quoted about climate change, whereas athletes and artists are less likely to speak about climate change.

Figure 5: Top 5 and bottom 5 occupations among speakers, as a proportion of their quotes that revolve around climate change.

Who said "embedding"?

Connecting speakers and sites in an al-gore-ithmic embedding.

Understanding climate change related quotes requires uncovering both the speakers and where the quote takes place. We develop a high-fidelity embedding of news sources where similarity is measured by commonality in who the site quotes. Two sites are then close if they share a lot of speakers, and far away if they have little similarity in the speakers. Simply put, this embedding is created through a latent semantic analysis (PCA on the tf-idf matrix), where the documents are sites and words are speakers.

The embedding is capable of effectively clustering news sites into coherent groups. The clusters are both based on topical similarity (fashion, sports, climate, news, and finance) and geographic proximity, which are captured in the KMeans clustering we do on the space. Here we include a small video of the embedding, but the full embedding can be played around with here


Visualization: Watch this video to get a feel for the space and the embedding.

Now Jennifer, at this point you may be wondering, “What the embedding is going on?!” It is better to show than tell, why we used this representation. Thanks to the embedding (without which it would not have been imaginable), we were able to conduct numerous analyes.

Developing climate topic vectors

After embedding each of the news sites, we turn to the embedding of concepts. A concept embedding is an attempt to vectorize linguistic concepts in the speaker space. The concepts we aimed to embed were climate change, and various climate change subtopics found from LDA. Each concept is calculated as the weighted average of the communities that share the concept related quote. Below we include the ten sites that projected highest onto the climate change vector.

domain Projection score 0.890519 0.884039 0.830271 0.830028 0.829195 0.807636 0.804560 0.802666 0.800073 0.791573
Table 1: Top ten sites by projection on the climate vector.

Specialized or generalized! How widely discusses are the topics?

One question we’re often tasked with at Greenpeace is understanding which issues pertaining to climate change need our attention. Do we find a very specialized issue that will win the hearts of a specific audience? Or should we try to be more general to rally larger support? Before we can even do this, however, we are required to understand which issues are generally discussed, and which issues are highly specialized. To do this, we find the average cosine similarity between the center of mass for each topic in our embedding, and the sites which talk about that topic. This idea was taken from the work of Waller and Anderson [4]. A topic is widely shared if it is invoked in a wide range of communities, and it is locally shared if it occurs in a very specific area of the embedding.

Most generalized Most specialized
society eurozone
concern models
reduction tax/trade
power/footprint trump
biodiversity science
Table 2: Most generalist and specialist climate topics

Projecting concept vectors

Now that we know how generalized and specialized the topics are, we assess which sites care about which issues. When publishing or sharing our developments with specific news sites we need to develop a persona for the audience of that site, to know which climate related concept is important to them. By doing this, we can maximize our efforts to disseminate information the reader will find most beneficial. We want to speak the language of the reader, not in gibberish that’s likely to go over their head.

Since each concept vector is projected in the same space as the actual site, we can calculate the cosine similarity between the two vectors to measure alignment between the news site and the topic. Our findings are in line with what one would intuitively expect. When climate change is invoked, business sites are more likely to speak about trade and taxes, whereas international organizations like the UN focus on development and rising sea levels.

However, one finding that surprised is the role that Trump played in driving climate change discussion across many news sites. While himself a known climate change denalist, Trump’s outbursts dramatically increased the number of discussions that took place surrounding climate change. We find that HuffPost and Fox News are far more likely to discuss climate change alongside Trump, rather than other pressing climate concerns like the global south and environmental impacts. Further analysis is needed to understand whether this new form of discussion is beneficial, but if the maxim “all publicity is good publicity” holds, then increased exposure to climate change discussion – regardless of origin – will help our cause. Below we feature a visual that includes the issue alignment on several key issues for seven sites.