Data


Dataset

Below is the data we used to create the visualizations on the Factors tab. We also have two tabs, one where the data is sorted by Happiness ladder score, and one tab to create a scatter plot visualization for our project. This data is directly from the World Happiness Report website, and is a data source that anyone can download and analyze on their own. Feel free to explore using the data directly!

Data Critique

What information is included in the dataset?

The World Happiness Report 2020 dataset ranks 153 countries in order of their overall happiness that citizens reported through a survey conducted between 2017 and 2019. Each country is listed by their name and region, as well as their average happiness, or “ladder” score. The ladder scores were determined by asking people to rate how they felt about their quality of life on a scale from 0 (worst) to 10 (best) and estimating the true population scores by using 95% confidence intervals. The dataset also contains six factors that can be used to explain the overall happiness of a country, including log GDP per capita and average healthy life expectancy. The other four factors are measured using citizens’ average binary responses (0 = “no” and 1 = “yes”) to different survey questions concerning social support, freedom to make life choices, perception of corruption, and generosity.

While none of the six factors were used to calculate the final happiness score, the dataset includes estimates of how much of each country’s ladder score could be explained by each of the six factors if they were used in a linear regression model to predict happiness score. Finally, the dataset references an imaginary country called “Dystopia,” composed of the worst measured values for the six factors, thus creating the theoretical unhappiest possible country. The last column “Dystopia + residual” adds Dystopia’s ladder score (~1.97) with each country’s residual in the regression model, which, when added with the portions of the ladder scores explained by the six factors, can be used to estimate happiness scores and compare with the actual scores.

What information, events, or phenomena can this data illuminate?

This dataset allows us to make comparisons between the different countries in the dataset since information is categorized by country. We can see how much each feature contributes to the happiness in each country, and from that, we can draw conclusions regarding which features are the most impactful to a country’s happiness rating overall using this study’s data. Each country is categorized by region, so we can also uncover regional trends in data. 

How was the data generated and what were the original sources it drew from?

According to the World Happiness Report 2020, the most important original source is the Gallup World Poll as it is unique in the range and comparability of its global series of annual surveys. It surveys citizens from 153 countries and provides background and information from life evaluations to determine the level of happiness annually. Specifically, questions within the surveys touched upon topics regarding the country’s environment and factors within them such as social, urban, and natural determinants. Other original sources the researchers used to analyze happiness include the Gallup US Poll, natural environment data from OECD countries, data collected from just-in-time reports from a sample of Londoners, Sustainable Development Goals (SDGs) data, London Air Quality Network air pollution data. 

To create the World Happiness Report of 2020, researchers in the Sustainable Development Solutions Network and The Center for Sustainable Development at Columbia University collected data from the Gallup World Poll to generate statistics about global happiness levels. After gathering this information, the researchers compiled the data using summary statistics and regression analysis to present the information in a coherent way. Specifically, the researchers used the average response from the Gallup survey to represent each country. To accommodate for any missing values, instead of using estimations, the researchers either used data from outside resources or previous World Happiness Reports that were at most three years old to calculate the statistics appropriately. 

What can this dataset not reveal and what information is left out?

However, this dataset cannot reveal what makes each country happy or unhappy. While the various “Explained by” scores for Log GDP per capita, Social support, Healthy life expectancy, Freedom to make life choices, Generosity, and Perceptions of corruption allow us to determine which factors are most influential in a country’s happiness, they do not specifically reveal what kind of events or policies could have led to each ranking. Moreover, because the survey was conducted over the course of three years (2017-2019) and combined into one dataset, it is difficult to observe trends over time without consulting reports from other years. Thus, this prevents us from making observations about how a major event, such as a pandemic, could have disrupted trends in a country’s happiness. 

While this data does contain a lot of information, there are also many exclusions. The dataset does not include all 195 countries, and excludes 42 of them. “Happiness” is also only determined by citizens’ responses in a survey, so the perspectives of people who did not fill out the survey are excluded. People may not have filled out the survey for a number of reasons, whether due to personal choice or lack of access. This dataset also excludes other definitions and quantifications of “happiness” and similar measures of well-being, since Gallup’s survey only contains categories such as Positive Experience Index, Where-to-be-born Index, and Legatum Prosperity Index.

What are the effects of the dataset’s ontology on its presentation of information?

The purpose of this dataset is to measure social well-being and economic development, where the numbers are viewed from a macro perspective and focus more on developed urban cities. The source has been divided into countries as a unit, which might ignore outliers and individuals’ feelings toward their current environments. Therefore, the happiness that this dataset quantifies may not perfectly represent specific cities or states that are experiencing long-term mishaps, such as recovering from natural disasters and unpreventable influenza. Also, the data may leave out the suburban population and the local development process in the suburbs. Furthermore, the data is divided into “Explained by” categories which indicates how much a particular feature (freedom, social support, etc.) contributed to the country’s predicted ladder score. Most countries have a decent percentage where each feature can explain how it relates to the ladder score, but some countries have no such feature. There is also a correlation between each feature, such as “Freedom to make life choices” and “Generosity” having a positive correlation, which the creators did not mention.

css.php