Top Related Projects
📊 Path to a free self-taught education in Data Science!
10 Weeks, 20 Lessons, Data Science for All!
:memo: An awesome Data Science repository to learn and apply for real world problems.
A curated list of awesome Machine Learning frameworks, libraries and software.
:books: Freely available programming books
A complete computer science study plan to become a software engineer.
Quick Overview
The "datasciencemasters/go" repository is a curated list of free resources for learning data science, focusing on the Go programming language. It serves as a comprehensive curriculum for those interested in mastering data science concepts and techniques using Go, providing links to various learning materials, tutorials, and tools.
Pros
- Offers a structured learning path for data science with Go
- Provides free, high-quality resources from reputable sources
- Regularly updated with new content and community contributions
- Covers a wide range of data science topics and tools
Cons
- May lack depth in some advanced topics compared to paid courses
- Requires self-discipline and motivation to follow through
- Some linked resources may become outdated or unavailable over time
- Limited focus on practical projects or hands-on exercises
Getting Started
To get started with the datasciencemasters/go curriculum:
- Visit the repository at https://github.com/datasciencemasters/go
- Browse through the README.md file to get an overview of the curriculum
- Start with the "Basic Programming" section if you're new to Go
- Progress through the sections in order, or focus on specific areas of interest
- Click on the provided links to access the learning materials
- Consider forking the repository to track your progress and add personal notes
Remember to check for updates regularly, as the curriculum is continuously evolving with new resources and community contributions.
Competitor Comparisons
📊 Path to a free self-taught education in Data Science!
Pros of data-science
- More comprehensive curriculum covering a wider range of data science topics
- Better organized structure with clear learning paths and prerequisites
- Active community and regular updates to course materials
Cons of data-science
- May be overwhelming for beginners due to its extensive content
- Requires a significant time commitment to complete the entire curriculum
Code comparison
While both repositories focus on data science education, they don't contain actual code samples. Instead, they provide links to courses and resources. Here's a comparison of how they structure their content:
data-science:
# Core Math
## Linear Algebra
- [Linear Algebra - Foundations to Frontiers](https://www.edx.org/course/linear-algebra-foundations-to-frontiers)
- [Applications of Linear Algebra Part 1](https://www.edx.org/course/applications-linear-algebra-part-1-davidsonx-d003x-1)
go:
### Linear Algebra
* Linear Algebra [Coding the Matrix: Linear Algebra through Computer Science Applications](http://codingthematrix.com/)
* Linear Algebra [MIT OCW Taught by Gilbert Strang](http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/)
Both repositories provide curated lists of resources for learning data science, but data-science offers a more structured and comprehensive approach, while go provides a more concise list of resources.
10 Weeks, 20 Lessons, Data Science for All!
Pros of Data-Science-For-Beginners
- More structured and comprehensive curriculum for beginners
- Includes hands-on projects and quizzes for practical learning
- Regularly updated with contributions from Microsoft and the community
Cons of Data-Science-For-Beginners
- Focused primarily on beginners, may lack advanced topics
- Less emphasis on specific programming languages or tools
- Larger repository size, which may be overwhelming for some users
Code Comparison
Data-Science-For-Beginners:
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('data.csv')
plt.scatter(data['x'], data['y'])
plt.show()
go:
import numpy as np
from sklearn.linear_model import LinearRegression
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.dot(X, np.array([1, 2])) + 3
reg = LinearRegression().fit(X, y)
The Data-Science-For-Beginners repository provides a more structured approach to learning data science, making it ideal for beginners. It offers a comprehensive curriculum with hands-on projects and quizzes. However, it may lack advanced topics and specific tool focus compared to go.
The go repository, on the other hand, offers a curated list of resources for self-taught data scientists. It provides links to various learning materials and tools, allowing for a more flexible and personalized learning path. However, it may require more self-discipline and direction from the learner.
:memo: An awesome Data Science repository to learn and apply for real world problems.
Pros of awesome-datascience
- More comprehensive and regularly updated resource list
- Better organized with clear categories and subcategories
- Includes a wider range of topics, from beginner to advanced
Cons of awesome-datascience
- Can be overwhelming due to the sheer volume of resources
- Less focused on a specific learning path or curriculum
- May lack depth in some areas due to its broad scope
Code comparison
Not applicable for these repositories, as they are primarily curated lists of resources rather than code-based projects.
Summary
awesome-datascience is a more extensive and diverse collection of data science resources, covering a broader range of topics and skill levels. It offers better organization and regular updates but may be overwhelming for beginners.
go is more focused and structured as a curriculum, making it easier for newcomers to follow a specific learning path. However, it may not cover as wide a range of topics or be as frequently updated as awesome-datascience.
Both repositories serve as valuable resources for data science learners, with awesome-datascience being more suitable for those seeking a comprehensive overview of the field, while go is better for those preferring a more structured learning approach.
A curated list of awesome Machine Learning frameworks, libraries and software.
Pros of awesome-machine-learning
- More comprehensive coverage of machine learning topics and resources
- Regularly updated with new content and contributions
- Includes resources for multiple programming languages
Cons of awesome-machine-learning
- Less structured learning path for beginners
- May be overwhelming due to the sheer volume of resources
- Lacks specific course recommendations or curriculum structure
Code comparison
While both repositories primarily focus on curating resources rather than providing code examples, awesome-machine-learning occasionally includes code snippets for specific libraries or tools. For example:
# awesome-machine-learning: Example using scikit-learn
from sklearn import svm
X = [[0, 0], [1, 1]]
y = [0, 1]
clf = svm.SVC()
clf.fit(X, y)
go doesn't typically include code snippets, as it's more focused on providing a curriculum and resource list for data science education.
Summary
awesome-machine-learning offers a vast collection of resources across various machine learning topics and programming languages, making it suitable for both beginners and experienced practitioners. However, its breadth can be overwhelming for newcomers.
go provides a more structured approach to learning data science, with a curated curriculum and specific course recommendations. While it may not be as comprehensive as awesome-machine-learning, it offers a clearer learning path for those starting their data science journey.
Both repositories serve as valuable resources for the data science and machine learning community, catering to different learning styles and needs.
:books: Freely available programming books
Pros of free-programming-books
- Broader scope, covering various programming languages and topics
- Larger community contribution, resulting in more frequent updates
- Includes resources in multiple languages, making it accessible to a global audience
Cons of free-programming-books
- Less focused on data science specifically
- May be overwhelming for beginners due to the vast amount of resources
- Lacks a structured learning path compared to go
Code comparison
Not applicable for these repositories, as they primarily contain curated lists of resources rather than code samples.
Summary
free-programming-books offers a comprehensive collection of programming resources across various languages and topics, making it suitable for a wide range of learners. However, it may lack the specific focus on data science that go provides.
go, on the other hand, offers a more structured approach to learning data science, which can be beneficial for those specifically interested in this field. However, it may not be as frequently updated or have as large a community contribution as free-programming-books.
Both repositories serve as valuable resources for learners, with free-programming-books catering to a broader audience and go focusing specifically on data science enthusiasts.
A complete computer science study plan to become a software engineer.
Pros of coding-interview-university
- Comprehensive coverage of computer science fundamentals and algorithms
- Well-structured learning path with clear progression
- Includes study tips and advice for interview preparation
Cons of coding-interview-university
- Primarily focused on interview preparation rather than practical application
- May be overwhelming for beginners due to its extensive content
- Less emphasis on specific programming languages or frameworks
Code comparison
While both repositories don't primarily focus on code examples, coding-interview-university does include some pseudocode for algorithms:
// coding-interview-university example (Binary search)
function binary_search(list, item):
low = 0
high = len(list) - 1
while low <= high:
mid = (low + high) / 2
guess = list[mid]
if guess == item:
return mid
if guess > item:
high = mid - 1
else:
low = mid + 1
return None
go doesn't typically include code snippets, as it's more of a curated list of resources.
Summary
coding-interview-university is a comprehensive guide for computer science fundamentals and interview preparation, while go focuses on data science resources and learning paths. The former is more structured and in-depth, while the latter offers a broader range of topics specific to data science. Choose based on your learning goals and career aspirations.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Note from the Editor: Take Two
In the old days of 2013, the OSDSM was born. Then, there were "little to no Data Scientists with 5 years experience, because the job simply did not exist." (David Hardtke, Nov 2012) Since then, history has witnessed many things, including:
⢠Data Scientists working across industries and the world
⢠social media manipulation disrupts many elections
⢠BLM and #metoo and Extinction Rebellion and many other social movements
⢠machine learning begins falling under engineering domain
⢠a pandemic
⢠climate change disasters becoming very frequent while climate warms faster than predicted
⢠remote work becoming common
⢠multiple global recession shocks
In that decade, Data Science has seen growth of jobs, shortfall of goals, success in many industries, abject failure in others, and nefarious use cases. In particular, adverse consequences and complications of learning from data appear in too many examples: elections undermined by psychographics, dismal gender (Men=74%) and BIPOC diversity in the AI field, a revived eugenics, an explainability crisis, facial recognition used to identify people and systematically detain them, "aggression" detection microphones in schools, and many others. It has never been more clear that we need to talk about the real world impacts of our work, and consider how our creations are used. As you consider this, read a prescient novel that grapples with the consequences of birthing, of creation, of technology.
Like any tool, data-driven technologies are indifferent to the morality of their ends. Perhaps the greatest risk of all is leaving this tool in the hands of the few expensively-educated people who cannot possibly represent all of us. To balance this, open source movements seek to lower the barriers to education for everyone. Data science and data literacy must be widespread, accessible, and leveraged for building our collective future. More than ever, we need that future to be built by members of society who are diverse and focused on generative, sustainable, resilient, emergent solutions. After all, the things we build are mirrors of ourselves (seriously, read Shelley's Frankenstein).
Computers reflect the biases and belief systems of the people programming them -@alicegoldfuss
The OSDSM is built with the belief that open source education makes a diverse, collective, generative future-building possible. I hope that you are one of the next people -- whether you call yourself a Data Scientist or not -- to help make better decisions with the scientific process, critical thinking, and everything else your unique perspective brings to the table. This rewritten curriculum focuses on what is needed to be successful in the entry-level role, but that is just a generic outline; truly, I hope where you take it extends far beyond that.
Start here ð
The Open Source Data Science Masters
The open-source curriculum for learning to be a Data Scientist. Curriculum resources from both universities and working Data Scientists focuses on foundational theory and applied skills. The OSDSM is collectively-maintained and open to PRs.
The goal of this curriculum is to prepare the student for an entry level Data Scientist role, using open source materials, at no cost but with the same calibur of materials found in the most reputable paid programs. Books not offered for free are often available through a public library, also indicated here with current list price. The Masters is self-guided and self-accredited. To better support credibility, the structure now includes a Capstone project intended to demonstrate the student's problem solving approach, skills in execution, and communication. Upon completion, the student can award oneself a Credential on LinkedIn from the Open Source Data Science Masters. As with all things, the OSDSM is best played as a team sport (try finding people on r/learndatascience).
This is called a "Masters" because it is primarily concerned with "upper-level" college course material in mathematics, programming, economics, or related disciplines. Come as you are!
- ð The Core - This is a critical foundation for what is to come; don't skip the foundational lessons.
- âï¸ Specialty - Choose what is most interesting to you, or most relevant to the work you plan to do.
- ð¤ Doing Data Science - Learn about how doing science with others and for businesses can work.
- ð§âð» Capstone Project - Choose a meaningful project or dataset to demonstrate what you've learned.
ð The Core
This is a critical foundation for what is to come; don't skip!
What is Data Science?
One could argue that "Data Science" is a recent term for an already existing information analysis discipline. Humans instinctually search for patterns, a purpose we also see in this more digitized discipline. Read different sources (and search beyond this list) about the uses of data science.
- The Signal and The Noise / Nate Silver Book
$18
-- Narrated cases of Data Science at play in the real world. - Dataclysm: Who We Are (When We Think No One's Looking) / Christian Rudder Book
$17
-- From the inside of OKCupid, real examples of how data science can illustrate human behavior. - Informatics of the Oppressed / Rodrigo Ochigame Logic Magazine -- Algorithms of oppression have been around for a long time. So have radical projects to dismantle them and build emancipatory alternatives.
- A showcase of Jupyter Python Data Analysis Notebooks across disciplines.
Foundations of Data Science
Problem Solving
When there are no answers in the back of the book, how do you proceed? Breaking down problems is a skill, one that can and should be learned. Follow Pólya's process, and for extra credit, seek out resources on computer science decomposition.
- Problem-Solving Heuristics "How To Solve It" George Pólya Berkeley / Summary Book
$18
The Scientific Process & Experimentation
It is crucial as a Data Scientist that you show integrity in and transparency of scientific process. Even if you've been here before, review and draw out the process diagram for the scientific method.
Querying Data
Get familiar and comfortable with manipulating data in a database with a common relational querying language. There are diverse query languages, but SQL is a widely used foundation.
- SQL School Mode Analytics / Tutorials
Math & Statistics
Calculus
- Single Variable Calculus MIT OpenCourseWare
- Multivariable Calculus MIT OpenCourseWare
Linear Algebra
The foundational mathematics for working with large samples of data. Spend time in exercises until you feel highly confident in the key topics of Linear Algebra. It will serve you well.
- An Intuitive Guide to Linear Algebra Better Explained / Article
- A Programmer's Intuition for Matrix Multiplication Better Explained / Article
- Vector Calculus: Understanding the Cross Product Better Explained / Article
- Vector Calculus: Understanding the Dot Product Better Explained / Article
- Linear Algebra Khan Academy / Videos
- Linear Algebra MIT
Statistics
How can we answer questions with data? Everywhere you look, you'll see methods from statistics. Spend a lot of time here!
- Stats in a Nutshell Book
$46
- Think Stats: Probability and Statistics for Programmers Digital & Book
$34
- Think Bayes Digital & Book
$25
- Probabilistic Programming and Bayesian Methods for Hackers Github / Tutorials
Working in Python
Learn Python
If you're starting from scratch with Python, start with this series.
Environment & Libraries
Set up your computer to use tools locally.
- Installing Basic Packages: Python, virtualenv, NumPy, SciPy, matplotlib and IPython
- For scientific uses: Using Python Scientifically & Command Line Install Script for Scientific Python Packages
Data Analysis
Get familiar with using tools to do data analysis. Pro tip: Write out what you're going to do before you do it! When you hit a snag, return to your plan and rechart as necessary.
- pandas tutorials
- Pandas Cookbook Examples
- Data Analysis in Python Tutorial
- Big Data Analysis with Twitter UC Berkeley / Lectures
- Intro to Data Science / Course $0
Python Programming + Algorithms
How does a computer know what to do? Algorithms are instructions with a fancy name. Learn how instructions are encoded, how to think about structuring those instructions, and patterns for making it work in code.
- Algorithms Design & Analysis I Stanford / Coursera
- numpy Tutorial / Stanford CS231N
Survey Courses
Courses with many of the topics above included. Be sure you fill in any gaps!
- Intro to Data Science / University of Washington Lectures
- (Short Survey) Doing Data Science: Straight Talk from the Frontline O'Reilly / Book
$50
âï¸ Specialty: Choose 2
Choose what is most interesting to you, or most relevant to the work you plan to do.
Causation
A branch of statistics that uses graphical models and specialized statistics to describe and model cause and effect.
Natural Language Processing
The imperfect and immensely useful art (science?) of transforming human language into data.
- From Languages to Information / Stanford CS147 Materials
- NLP with Python (NLTK library) Digital, Book
$55
- How to Write a Spelling Correcter / Norvig Tutorial
Graph Analysis
Human relationships can be modeled as a network or graph. Many other things suit this model, too. Working with graphs
- Social and Economic Networks: Models and Analysis / Stanford / Coursera
- Social Network Analysis for Startups Chapter 1 & Book
$25
networkx
Machine Learning
This is a huge space with infinite things to learn. For advanced statistical foundation, see The Elements of Statistical Learning.
- Intro to scikit-learn, SciPy2013 youtube tutorials
- Machine Learning for Hackers ipynb / digital book
- Machine Learning Ng Stanford / Coursera & Stanford CS 229
- Programming Collective Intelligence Book
$46
Visualization
The most persuasive data stories are ones you can see with your own eyes. Make it visual!
Courses
- Data Visualization University of Washington / Slides & Resources
- Rice University's Data Viz class Rice University / Slides
Books
- Envisioning Information Tufte / Book
$36
- Interactive Data Visualization for the Web / Scott Murray Online Book & Book
$50
Linear Programming + Convex Optimization
If you have interest in operations management, manufacturing, supply chains, or other real world queuing problems, dig in here.
- Linear Programming (Math 407) University of Washington / Course
- Convex Optimization / Boyd Stanford / Lectures / Book
Deep Learning / Neural Networks
- Neural Networks Andrej Karpathy / Python Walkthrough
- Neural Networks for Machine Learning Geoffrey Hinton / U Toronto
- Deep Learning for Natural Language Processing CS224d Stanford
ð¤ Doing Data Science
Learn about how doing science with others and for businesses can work.
What is the job?
In ideal terms, a Data Scientist advises strategic decision-making using data-backed analysis and tested hypotheses. YMMV as this depends on the company needs and the team being supported.
- What Professional Data Scientists Actually Do Video
- The Data Science Handbook: Advice and Insights from 25 Amazing Data Scientists Book
$25
- Required reading: Why might machine learning not always the best approach? Machine Learning: The High-Interest Credit Card of Technical Debt
Communication and Teamwork
For a Data Scientist's work to be impactful, they must be effective at communicating their work and findings. In any setting, clear logic and effective business writing are crucial to reaching your audience. And of course, doing Data Science with a team over zoom is different from being in person in an office. There is much more written communication and asynchronous consumption of content in the remote office environment. More than ever, writing and communication skills are crucial to being an effective Data Scientist for yourself and your team.
- LEADERSHIP LAB: The Craft of Writing Effectively UChicago / Video. Recommend watching this twice and taking notes.
The Data Scientist works in a Team
In the modern organization, it is very rare that a Data Scientist works in isolation. Communicating the value of the work being done is crucial to getting buy-in from partners whose decisions and operations depend on your work. Those partners might be:
- Product Managment
- Engineering
- Design (User Experience, Research, Product)
- Operations (Project Management, Customer Service Agents, Data Management)
- Marketing
- Finance Operations
- etc.
Typically, the more clearly you are able to communicate the "why", the value of what you are doing, the more these teams will be able to support you and your work in conversations you may not be a part of. Even if others don't understand "how" you do your work (which is very important to you and your manager!), they will be able to understand and repeat a well-communicated "why". This is why we write Specs, to get buy-in and allow for questions or input, before the work starts.
The Spec
A document conveying the motives, direction, investment, and expected value of the work.
- Goal / "Why" -- What is the point of this work? What decision is the organization trying to make?
- Impact -- What decisions might be made differently as a result of this work? What is the expected value?
- Data -- What evidence will this draw on?
- Assumptions -- What evidence does not exist? What assumptions are necessary or agreed upon?
- Methods / "How" -- Overview methods expected to be used. Analysis, with what tools? Experimentation, with what methodology?
- Results -- (to be filled in as completed)
Results Presentation
A slide deck or document with the goal of conveying the results of the work and how the findings support an important decision(s).
Best appended to the Spec, and summarized in a slide deck for easy consumption. Depending on the culture of the group, slides or a short document may be easier to look through to understand the results of the work. In the remote work era, think about how your work will be passed around and make sure your "above the fold" is easy to understand and clearly conveys the "why" and results in particular.
Example: A particularly polished presentation of map quality study results showing higher data quality in US maps on OSM than commercially available alternatives. The impact of this work was a) increased confidence in service reliability for the company and b) enabled the company to decide against buying a commercially available annual license costing millions of dollars annually.
ð§âð» Capstone Project
Choose a meaningful project or dataset to demonstrate what you've learned.
Pick a dataset that you care about
- The very detailed data is plural
- Collection of datasets
Formulate a Hypothesis & Write a Spec
Review the earlier reading on The Scientific Process. Formulate a clear, concise hypothesis. This is the headliner of your Spec, flesh that out.
Show your work + Explain why you chose this project
Show the process you used to disprove your hypothesis, preferably in a jupyter notebook. See examples to get a taste of how you can showcase your work.
Graduate!
- Create a document or github repo showcasing the list of courses and materials you completed. Include your project materials. Also recommended: include a personal statement about why you chose this course of study and what you seek to do with it.
- Award yourself a Credential on LinkedIn from The Open Source Data Science Masters, with a link to the documentation you created.
- Congratulations! ð
So Extra "Extracurriculars"
- The Elements of Statistical Learning / Stanford Digital & Book
$90
& Study Group - Python Data Science Handbook: Essential Tools for Working with Data Book
$60
- The Manga Guide to Linear Algebra Book
$19
- Mining The Social Web Book
$46
- The Truthful Art: Data, Charts, and Maps for Communication Cairo / Book
$50
- Exploratory Data Analysis Tukey / Book
$81
$113
- Mining Massive Data Sets / Stanford Course & Digital Textbook & Book
$58
- Introduction to Information Retrieval / Stanford Digital & Book
$70
- Probabilistic Graphical Models Stanford / Coursera
- Differential Equations in Data Science Python Tutorial
- Algorithm Design, Kleinberg & Tardos Book
$125
- Tidy Data in Python
- Designing, Visualizing and Understanding Deep Neural Networks Berkeley CS294-129
- Python for Data Analysis Book
$55
- Think Python Digital & Book
$45
- The Visual Display of Quantitative Information Tufte / Book
$27
- Information Dashboard Design: Displaying Data for At-a-Glance Monitoring Stephen Few / Book
$29
- D3 Library / Scott Murray Blog / Tutorials
- SQL Tutorials SQLZOO / Tutorials
- Machine Learning Caltech / Edx
- A Course in Machine Learning UMD / Digital Book
- Designing Data Intensive Applications Book
$56
Take Two
Change Log
- Restructured ala the 2022 Plan.
- Pruned broken links. It's been a while, and some of these resources have moved -- or worse -- been taken down.
- Pared down links to a more opinionated list.
- Proceeds. Bookshop.org links for all books, which supports independent bookshops with commissions. Since the first commits in 2014, I have donated any related commissions to Planned Parenthood, which was one of the few healthcare providers in my community growing up and is the largest single provider of reproductive health services in the US. Though donations should flow to independent bookshops from now on, my personal commitment to PP remains.
Please Contribute; this is Open Source!
Fearless Maintainer: @clarecorthell
RIP v1.0 commit
Top Related Projects
📊 Path to a free self-taught education in Data Science!
10 Weeks, 20 Lessons, Data Science for All!
:memo: An awesome Data Science repository to learn and apply for real world problems.
A curated list of awesome Machine Learning frameworks, libraries and software.
:books: Freely available programming books
A complete computer science study plan to become a software engineer.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot