Convert Figma logo to code with AI

fivethirtyeight logodata

Data and code behind the articles and graphics at FiveThirtyEight

16,760
10,947
16,760
18

Top Related Projects

Data on COVID-19 (coronavirus) cases, deaths, hospitalizations, tests • All countries • Updated daily by Our World in Data

An index of all our open-source data, analysis, libraries, tools, and guides.

29,136

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE

4,606

An index of all open-source data

Quick Overview

The fivethirtyeight/data repository is a collection of data sets and code used in articles published by FiveThirtyEight, a data journalism website. It provides raw data, data dictionaries, and some analysis scripts for various topics covered in their articles, making it a valuable resource for data enthusiasts, researchers, and journalists.

Pros

  • Diverse range of topics covered, including politics, sports, economics, and social issues
  • Well-documented data sets with accompanying README files and data dictionaries
  • Regular updates with new data sets as new articles are published
  • Open-source and freely available for public use and analysis

Cons

  • Some data sets may be incomplete or require additional context from the associated articles
  • Not all data sets are consistently formatted or follow the same structure
  • Limited analysis scripts provided; users often need to perform their own analysis
  • Some older data sets may not be actively maintained or updated

Getting Started

To use the data sets from this repository:

  1. Clone the repository:

    git clone https://github.com/fivethirtyeight/data.git
    
  2. Navigate to the desired data set folder:

    cd data/dataset-name
    
  3. Read the README.md file for information about the data set and its structure.

  4. Use your preferred data analysis tools (e.g., Python, R, Excel) to load and analyze the data.

Note: This repository does not contain a code library, so there are no specific code examples or installation instructions. Users are expected to work with the raw data files directly using their preferred tools and methods.

Competitor Comparisons

Data on COVID-19 (coronavirus) cases, deaths, hospitalizations, tests • All countries • Updated daily by Our World in Data

Pros of covid-19-data

  • More focused and specialized dataset, specifically for COVID-19 data
  • More frequently updated, providing near real-time information
  • Includes a wider range of global data sources and countries

Cons of covid-19-data

  • Limited to a single topic, whereas data covers various subjects
  • May require more domain-specific knowledge to interpret and use effectively
  • Potentially more complex data structure due to the nature of COVID-19 reporting

Code comparison

covid-19-data:

import pandas as pd

# Load COVID-19 data
df = pd.read_csv('https://covid.ourworldindata.org/data/owid-covid-data.csv')
df['date'] = pd.to_datetime(df['date'])

data:

import pandas as pd

# Load FiveThirtyEight data
df = pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/poll-quiz-guns/guns-polls.csv')
df['date'] = pd.to_datetime(df['date'])

Both repositories provide CSV files that can be easily loaded using pandas. The main difference lies in the specific datasets and their structures. covid-19-data focuses on COVID-19 statistics, while data covers a variety of topics, requiring users to navigate different datasets based on their needs.

An index of all our open-source data, analysis, libraries, tools, and guides.

Pros of everything

  • Broader scope, covering various topics beyond just data
  • More frequent updates and active community engagement
  • Includes code, analysis, and methodologies alongside datasets

Cons of everything

  • Less structured organization compared to data
  • May contain more opinion-based or editorial content
  • Potentially overwhelming for users seeking specific datasets

Code comparison

data:

import pandas as pd

df = pd.read_csv('fivethirtyeight_dataset.csv')
df.head()

everything:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('buzzfeed_dataset.csv')
df.plot(x='date', y='value')
plt.show()

Key differences

  • data focuses primarily on datasets and statistical analysis
  • everything includes a wider range of content, including investigative reporting
  • data provides more consistent formatting and documentation for datasets
  • everything offers more diverse types of information and resources
  • data is more suitable for academic or research purposes
  • everything caters to a broader audience, including journalists and general public

Both repositories serve as valuable resources for data-driven journalism and analysis, with each having its own strengths and target audience. The choice between them depends on the specific needs and preferences of the user.

29,136

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE

Pros of COVID-19

  • Focused specifically on COVID-19 data, providing comprehensive and detailed information
  • Frequently updated, often on a daily basis, ensuring up-to-date statistics
  • Global coverage with data from countries and regions worldwide

Cons of COVID-19

  • Limited to COVID-19 data only, lacking diversity in topics
  • Raw data format may require more processing for analysis
  • Potential inconsistencies in reporting methods across different regions

Code Comparison

COVID-19 data format (CSV):

Province/State,Country/Region,Lat,Long,Date,Confirmed,Deaths,Recovered
New York,US,40.7128,-74.0060,2021-03-01,1000000,50000,900000

FiveThirtyEight data format (CSV):

date,state,positive,negative,pending,hospitalized,death
2021-03-01,New York,1000000,5000000,1000,10000,50000

The COVID-19 repository focuses on global data with geographical coordinates, while FiveThirtyEight includes more detailed categories for US states. FiveThirtyEight's data is more diverse, covering various topics beyond COVID-19, making it suitable for a wider range of analyses. However, COVID-19 provides more frequent updates and global coverage specific to the pandemic.

4,606

An index of all open-source data

Pros of GoogleTrends/data

  • More focused on search trends and user interest data
  • Regularly updated with current trending topics
  • Provides data in multiple formats (CSV, JSON)

Cons of GoogleTrends/data

  • Limited scope compared to the diverse datasets in fivethirtyeight/data
  • Less comprehensive documentation and context for datasets
  • Fewer historical datasets available

Code comparison

GoogleTrends/data:

import pandas as pd

df = pd.read_csv('multiTimeline.csv', skiprows=1)
df['date'] = pd.to_datetime(df['date'])
print(df.head())

fivethirtyeight/data:

import pandas as pd

df = pd.read_csv('nba-elo/nbaallelo.csv')
df['date'] = pd.to_datetime(df['date'])
print(df.head())

Both repositories use similar data loading techniques, but the specific datasets and their structures differ. GoogleTrends/data focuses on search trend data, while fivethirtyeight/data covers a wider range of topics and may require more complex data manipulation depending on the dataset.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

GitHub repo size

See the index for a list of the data and code we've published and their accompanying stories.

As of June 13, 2023, sports predictions and forecasts are no longer being updated.

Unless otherwise noted, our data sets are available under the Creative Commons Attribution 4.0 International License, and the code is available under the MIT License. If you find this information useful, please let us know.