Convert Figma logo to code with AI

datawhalechina logopumpkin-book

《机器学习》(西瓜书)公式详解

24,199
4,761
24,199
2

Top Related Projects

12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

VIP cheatsheets for Stanford's CS 229 Machine Learning

The "Python Machine Learning (3rd edition)" book code repository

"Probabilistic Machine Learning" - a book series by Kevin Murphy

24,235

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

Quick Overview

The pumpkin-book repository is a community-driven project that provides detailed derivations and explanations for the formulas in the "Machine Learning" textbook by Zhou Zhihua. It aims to help readers better understand the theoretical foundations of machine learning algorithms and techniques.

Pros

  • Offers in-depth explanations of complex machine learning concepts
  • Collaborative effort with contributions from multiple community members
  • Regularly updated with new content and improvements
  • Free and open-source resource for machine learning enthusiasts and students

Cons

  • Content is primarily in Chinese, which may limit accessibility for non-Chinese speakers
  • Focuses on a specific textbook, potentially limiting its applicability to other learning resources
  • May require a strong mathematical background to fully understand some derivations
  • Lacks interactive elements or code implementations of the discussed algorithms

Note: As this is not a code library but rather a collection of explanations and derivations, the code example and getting started sections have been omitted as per the instructions.

Competitor Comparisons

12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

Pros of ML-For-Beginners

  • Comprehensive curriculum covering various ML topics
  • Hands-on approach with practical exercises and projects
  • Available in multiple languages, making it accessible to a wider audience

Cons of ML-For-Beginners

  • May be too basic for advanced learners
  • Focuses more on breadth than depth in some topics
  • Limited coverage of deep learning concepts

Code Comparison

ML-For-Beginners:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

Pumpkin-book:

import numpy as np
def sigmoid(x):
    return 1 / (1 + np.exp(-x))
def logistic_loss(w, X, y):
    return -np.sum(y * np.log(sigmoid(X.dot(w))) + (1 - y) * np.log(1 - sigmoid(X.dot(w))))

The ML-For-Beginners repository provides a more practical, hands-on approach with code examples using popular libraries like scikit-learn. In contrast, the Pumpkin-book repository focuses on implementing algorithms from scratch, providing a deeper understanding of the underlying mathematics and principles.

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Pros of handson-ml2

  • More comprehensive coverage of machine learning topics, including deep learning
  • Includes Jupyter notebooks with interactive code examples
  • Regularly updated with new content and improvements

Cons of handson-ml2

  • Primarily in English, which may be a barrier for non-English speakers
  • Focuses more on practical implementation rather than theoretical foundations

Code Comparison

handson-ml2:

from sklearn.ensemble import RandomForestClassifier

rf_clf = RandomForestClassifier(n_estimators=100, random_state=42)
rf_clf.fit(X_train, y_train)
y_pred = rf_clf.predict(X_test)

pumpkin-book:

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def logistic_loss(w, X, y):
    return -np.mean(y * np.log(sigmoid(X.dot(w))) + (1 - y) * np.log(1 - sigmoid(X.dot(w))))

The handson-ml2 repository provides practical examples using popular libraries like scikit-learn, while pumpkin-book focuses on implementing algorithms from scratch, emphasizing theoretical understanding.

VIP cheatsheets for Stanford's CS 229 Machine Learning

Pros of stanford-cs-229-machine-learning

  • Offers concise cheatsheets for quick reference
  • Covers a wide range of machine learning topics
  • Available in multiple languages, making it accessible to a global audience

Cons of stanford-cs-229-machine-learning

  • Lacks detailed explanations and proofs
  • May not be suitable for beginners without prior machine learning knowledge
  • Limited code examples and practical implementations

Code Comparison

stanford-cs-229-machine-learning doesn't provide extensive code examples, focusing more on theoretical concepts and formulas. In contrast, pumpkin-book offers some code snippets to illustrate concepts:

pumpkin-book example (Python):

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def logistic_regression(X, y, learning_rate, num_iterations):
    m, n = X.shape
    theta = np.zeros(n)
    
    for _ in range(num_iterations):
        h = sigmoid(np.dot(X, theta))
        gradient = np.dot(X.T, (h - y)) / m
        theta -= learning_rate * gradient
    
    return theta

This code snippet demonstrates the implementation of logistic regression, which is not present in the stanford-cs-229-machine-learning repository.

The "Python Machine Learning (3rd edition)" book code repository

Pros of python-machine-learning-book-3rd-edition

  • Comprehensive coverage of machine learning concepts with practical Python implementations
  • Regularly updated with new content and code examples
  • Extensive documentation and explanations for each code snippet

Cons of python-machine-learning-book-3rd-edition

  • Primarily focused on Python, which may not be suitable for users of other programming languages
  • More complex and advanced topics might be challenging for beginners

Code Comparison

python-machine-learning-book-3rd-edition:

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)

pumpkin-book:

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def logistic_loss(X, y, w, b):
    m = X.shape[0]
    z = np.dot(X, w) + b
    loss = -np.sum(y * np.log(sigmoid(z)) + (1 - y) * np.log(1 - sigmoid(z))) / m
    return loss

The code snippets demonstrate the difference in approach between the two repositories. python-machine-learning-book-3rd-edition uses scikit-learn for machine learning tasks, while pumpkin-book implements algorithms from scratch using NumPy.

"Probabilistic Machine Learning" - a book series by Kevin Murphy

Pros of pml-book

  • More comprehensive coverage of probabilistic machine learning topics
  • Includes Jupyter notebooks with code examples and interactive visualizations
  • Regularly updated with new content and improvements

Cons of pml-book

  • Larger repository size, which may take longer to clone and navigate
  • More complex structure, potentially making it harder for beginners to follow
  • Primarily focused on Python, limiting language diversity

Code Comparison

pml-book:

import numpy as np
import matplotlib.pyplot as plt

def plot_gaussian(mu, sigma):
    x = np.linspace(mu - 3*sigma, mu + 3*sigma, 100)
    y = np.exp(-(x - mu)**2 / (2 * sigma**2)) / (sigma * np.sqrt(2 * np.pi))
    plt.plot(x, y)

pumpkin-book:

import numpy as np

def gaussian_pdf(x, mu, sigma):
    return 1 / (sigma * np.sqrt(2 * np.pi)) * np.exp(-(x - mu)**2 / (2 * sigma**2))

The pml-book example includes visualization, while the pumpkin-book focuses on the core mathematical function. pml-book tends to provide more complete, application-ready code snippets, whereas pumpkin-book offers concise implementations of fundamental concepts.

24,235

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

Pros of d2l-en

  • Comprehensive coverage of deep learning topics with interactive code examples
  • Multi-framework support (PyTorch, TensorFlow, and MXNet)
  • Available in multiple languages and formats (web, PDF, print book)

Cons of d2l-en

  • Steeper learning curve for beginners due to its depth and breadth
  • Larger repository size, which may impact download and setup time

Code Comparison

d2l-en example (PyTorch):

import torch
from torch import nn

net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))
X = torch.rand(size=(2, 4))
net(X)

pumpkin-book example (NumPy):

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

X = np.random.randn(2, 4)
W = np.random.randn(4, 1)
b = np.random.randn(1)
y = sigmoid(np.dot(X, W) + b)

The d2l-en example demonstrates the use of PyTorch's high-level neural network modules, while the pumpkin-book example shows a more basic implementation using NumPy. This reflects the different focus areas of the two repositories, with d2l-en providing a more practical, framework-oriented approach and pumpkin-book offering a more theoretical, foundational perspective.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

“周志华老师的《机器学习》(西瓜书)是机器学习领域的经典入门教材之一,周老师为了使尽可能多的读者通过西瓜书对机器学习有所了解, 所以在书中对部分公式的推导细节没有详述,但是这对那些想深究公式推导细节的读者来说可能“不太友好”,本书旨在对西瓜书里比较难理解的公式加以解析,以及对部分公式补充具体的推导细节。”

读到这里,大家可能会疑问为啥前面这段话加了引号,因为这只是我们最初的遐想,后来我们了解到,周老师之所以省去这些推导细节的真实原因是,他本尊认为“理工科数学基础扎实点的大二下学生应该对西瓜书中的推导细节无困难吧,要点在书里都有了,略去的细节应能脑补或做练习”。所以......本南瓜书只能算是我等数学渣渣在自学的时候记下来的笔记,希望能够帮助大家都成为一名合格的“理工科数学基础扎实点的大二下学生”。

使用说明

  • 南瓜书的所有内容都是以西瓜书的内容为前置知识进行表述的,所以南瓜书的最佳使用方法是以西瓜书为主线,遇到自己推导不出来或者看不懂的公式时再来查阅南瓜书;
  • 对于初学机器学习的小白,西瓜书第1章和第2章的公式**强烈不建议深究**,简单过一下即可,等你学得有点飘的时候再回来啃都来得及;
  • 每个公式的解析和推导我们都力争以本科数学基础的视角进行讲解,所以超纲的数学知识我们通常都会以附录和参考文献的形式给出,感兴趣的同学可以继续沿着我们给的资料进行深入学习;
  • 若南瓜书里没有你想要查阅的公式,或者你发现南瓜书哪个地方有错误,请毫不犹豫地去我们GitHub的Issues( 地址:https://github.com/datawhalechina/pumpkin-book/issues )进行反馈,在对应版块提交你希望补充的公式编号或者勘误信息,我们通常会在24小时以内给您回复,超过24小时未回复的话可以微信联系我们(微信号:at-Sm1les);

配套资源

视频教程:https://www.bilibili.com/video/BV1Mh411e7VU

组队学习:https://www.datawhale.cn/learn/summary/2

在线阅读:https://www.datawhale.cn/learn/summary/2

PDF版本下载:https://github.com/datawhalechina/pumpkin-book/releases

纸质版

购买链接:京东 | 当当 | 天猫

勘误表:https://datawhalechina.github.io/pumpkin-book/#/errata

纸质版和开源版的区别

开源版本是我们寄送出版社的全书初稿,经由人民邮电出版社的编辑老师们对初稿进行了反复修缮最终诞生了纸质书籍,在此向人民邮电出版社的编辑老师的认真严谨表示衷心的感谢!(附:校对样稿)

配套的西瓜书版本

版次:2016年1月第1版

勘误表:http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/MLbook2016.htm

编委会

职责名单
主编@Sm1les @archwalker @jbb0523
编委@juxiao @Majingmin @MrBigFan @shanry @Ye980226

封面设计

构思创作
@Sm1les林王茂盛

致谢

特别感谢@awyd234、@feijuan、@Ggmatch、@Heitao5200、@xhqing、@LongJH、@LilRachel、@LeoLRH、@Nono17、@spareribs、@sunchaothu、@StevenLzq 在最早期的时候对南瓜书所做的贡献。

关注我们

扫描下方二维码关注公众号:Datawhale,然后发送“南瓜书”,即可获取“南瓜书读者交流群”入群方式

LICENSE

本作品采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可。