Top Related Projects
Data on COVID-19 (coronavirus) cases, deaths, hospitalizations, tests • All countries • Updated daily by Our World in Data
An index of all our open-source data, analysis, libraries, tools, and guides.
Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
An index of all open-source data
Quick Overview
The fivethirtyeight/data repository is a collection of data sets and code used in articles published by FiveThirtyEight, a data journalism website. It provides raw data, data dictionaries, and some analysis scripts for various topics covered in their articles, making it a valuable resource for data enthusiasts, researchers, and journalists.
Pros
- Diverse range of topics covered, including politics, sports, economics, and social issues
- Well-documented data sets with accompanying README files and data dictionaries
- Regular updates with new data sets as new articles are published
- Open-source and freely available for public use and analysis
Cons
- Some data sets may be incomplete or require additional context from the associated articles
- Not all data sets are consistently formatted or follow the same structure
- Limited analysis scripts provided; users often need to perform their own analysis
- Some older data sets may not be actively maintained or updated
Getting Started
To use the data sets from this repository:
-
Clone the repository:
git clone https://github.com/fivethirtyeight/data.git
-
Navigate to the desired data set folder:
cd data/dataset-name
-
Read the README.md file for information about the data set and its structure.
-
Use your preferred data analysis tools (e.g., Python, R, Excel) to load and analyze the data.
Note: This repository does not contain a code library, so there are no specific code examples or installation instructions. Users are expected to work with the raw data files directly using their preferred tools and methods.
Competitor Comparisons
Data on COVID-19 (coronavirus) cases, deaths, hospitalizations, tests • All countries • Updated daily by Our World in Data
Pros of covid-19-data
- More focused and specialized dataset, specifically for COVID-19 data
- More frequently updated, providing near real-time information
- Includes a wider range of global data sources and countries
Cons of covid-19-data
- Limited to a single topic, whereas data covers various subjects
- May require more domain-specific knowledge to interpret and use effectively
- Potentially more complex data structure due to the nature of COVID-19 reporting
Code comparison
covid-19-data:
import pandas as pd
# Load COVID-19 data
df = pd.read_csv('https://covid.ourworldindata.org/data/owid-covid-data.csv')
df['date'] = pd.to_datetime(df['date'])
data:
import pandas as pd
# Load FiveThirtyEight data
df = pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/poll-quiz-guns/guns-polls.csv')
df['date'] = pd.to_datetime(df['date'])
Both repositories provide CSV files that can be easily loaded using pandas. The main difference lies in the specific datasets and their structures. covid-19-data focuses on COVID-19 statistics, while data covers a variety of topics, requiring users to navigate different datasets based on their needs.
An index of all our open-source data, analysis, libraries, tools, and guides.
Pros of everything
- Broader scope, covering various topics beyond just data
- More frequent updates and active community engagement
- Includes code, analysis, and methodologies alongside datasets
Cons of everything
- Less structured organization compared to data
- May contain more opinion-based or editorial content
- Potentially overwhelming for users seeking specific datasets
Code comparison
data:
import pandas as pd
df = pd.read_csv('fivethirtyeight_dataset.csv')
df.head()
everything:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('buzzfeed_dataset.csv')
df.plot(x='date', y='value')
plt.show()
Key differences
- data focuses primarily on datasets and statistical analysis
- everything includes a wider range of content, including investigative reporting
- data provides more consistent formatting and documentation for datasets
- everything offers more diverse types of information and resources
- data is more suitable for academic or research purposes
- everything caters to a broader audience, including journalists and general public
Both repositories serve as valuable resources for data-driven journalism and analysis, with each having its own strengths and target audience. The choice between them depends on the specific needs and preferences of the user.
Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
Pros of COVID-19
- Focused specifically on COVID-19 data, providing comprehensive and detailed information
- Frequently updated, often on a daily basis, ensuring up-to-date statistics
- Global coverage with data from countries and regions worldwide
Cons of COVID-19
- Limited to COVID-19 data only, lacking diversity in topics
- Raw data format may require more processing for analysis
- Potential inconsistencies in reporting methods across different regions
Code Comparison
COVID-19 data format (CSV):
Province/State,Country/Region,Lat,Long,Date,Confirmed,Deaths,Recovered
New York,US,40.7128,-74.0060,2021-03-01,1000000,50000,900000
FiveThirtyEight data format (CSV):
date,state,positive,negative,pending,hospitalized,death
2021-03-01,New York,1000000,5000000,1000,10000,50000
The COVID-19 repository focuses on global data with geographical coordinates, while FiveThirtyEight includes more detailed categories for US states. FiveThirtyEight's data is more diverse, covering various topics beyond COVID-19, making it suitable for a wider range of analyses. However, COVID-19 provides more frequent updates and global coverage specific to the pandemic.
An index of all open-source data
Pros of GoogleTrends/data
- More focused on search trends and user interest data
- Regularly updated with current trending topics
- Provides data in multiple formats (CSV, JSON)
Cons of GoogleTrends/data
- Limited scope compared to the diverse datasets in fivethirtyeight/data
- Less comprehensive documentation and context for datasets
- Fewer historical datasets available
Code comparison
GoogleTrends/data:
import pandas as pd
df = pd.read_csv('multiTimeline.csv', skiprows=1)
df['date'] = pd.to_datetime(df['date'])
print(df.head())
fivethirtyeight/data:
import pandas as pd
df = pd.read_csv('nba-elo/nbaallelo.csv')
df['date'] = pd.to_datetime(df['date'])
print(df.head())
Both repositories use similar data loading techniques, but the specific datasets and their structures differ. GoogleTrends/data focuses on search trend data, while fivethirtyeight/data covers a wider range of topics and may require more complex data manipulation depending on the dataset.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
See the index for a list of the data and code we've published and their accompanying stories.
As of June 13, 2023, sports predictions and forecasts are no longer being updated.
Unless otherwise noted, our data sets are available under the Creative Commons Attribution 4.0 International License, and the code is available under the MIT License. If you find this information useful, please let us know.
Top Related Projects
Data on COVID-19 (coronavirus) cases, deaths, hospitalizations, tests • All countries • Updated daily by Our World in Data
An index of all our open-source data, analysis, libraries, tools, and guides.
Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
An index of all open-source data
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot