Convert Figma logo to code with AI

apachecn logoailearning

AiLearning:数据分析+机器学习实战+线性代数+PyTorch+NLTK+TF2

39,065
11,403
39,065
3

Top Related Projects

12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.

The "Python Machine Learning (1st edition)" book code repository and info resource

TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

📺 Discover the latest machine learning / AI courses on YouTube.

Quick Overview

The apachecn/ailearning GitHub repository is a comprehensive collection of machine learning and artificial intelligence resources, including tutorials, code examples, and reference materials. It serves as a valuable resource for both beginners and experienced practitioners in the field of AI and machine learning.

Pros

  • Comprehensive Content: The repository covers a wide range of topics, from fundamental machine learning concepts to advanced techniques and applications.
  • Multilingual Support: The materials are available in multiple languages, including English, Chinese, and others, making it accessible to a global audience.
  • Active Community: The project has a vibrant community of contributors, ensuring regular updates and improvements to the content.
  • Practical Examples: The repository includes numerous code examples and hands-on tutorials, allowing learners to apply the concepts they've learned.

Cons

  • Uneven Quality: As the content is contributed by a large community, the quality and depth of the materials may vary across different sections.
  • Lack of Structured Curriculum: The repository is organized as a collection of resources, rather than a structured curriculum, which may make it challenging for beginners to navigate.
  • Potential Outdated Content: Given the rapid pace of advancements in AI and machine learning, some of the content may become outdated over time.
  • Language Barriers: While the materials are available in multiple languages, learners who are not proficient in the available languages may face difficulties.

Code Examples

The apachecn/ailearning repository contains a wide range of code examples and tutorials covering various machine learning and AI topics. Here are a few examples:

  1. Linear Regression:
import numpy as np
from sklearn.linear_model import LinearRegression

# Generate sample data
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.array([5, 8, 9, 11])

# Create and train the linear regression model
model = LinearRegression()
model.fit(X, y)

# Make a prediction
print(model.predict([[3, 5]]))

This code demonstrates the use of the LinearRegression model from the scikit-learn library to perform linear regression on a simple dataset.

  1. K-Means Clustering:
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# Generate sample data
X = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]])

# Create and train the K-Means model
model = KMeans(n_clusters=2)
model.fit(X)

# Visualize the clustering results
plt.scatter(X[:, 0], X[:, 1], c=model.labels_, cmap='viridis')
plt.scatter(model.cluster_centers_[:, 0], model.cluster_centers_[:, 1], color='red')
plt.show()

This code demonstrates the use of the KMeans model from the scikit-learn library to perform K-Means clustering on a simple 2D dataset and visualize the results.

  1. Convolutional Neural Network (CNN) for Image Classification:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

# Load and preprocess the dataset
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
X_train = X_train.reshape(-1, 28, 28, 1) / 255.0
X_test = X_test.reshape(-1, 28, 28, 1) / 255.0

# Create the CNN model
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    MaxPooling2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D((2, 2)),
    Flatten(),
    Dense(64, activation='relu'),
    Dense

Competitor Comparisons

12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

Pros of ML-For-Beginners

  • More structured curriculum with clear learning paths
  • Extensive documentation and explanations for each concept
  • Multi-language support for code examples

Cons of ML-For-Beginners

  • Less focus on advanced topics and cutting-edge techniques
  • Fewer practical projects and real-world applications

Code Comparison

ML-For-Beginners:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

ailearning:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

Both repositories use similar code for splitting datasets, with minor differences in parameters.

Summary

ML-For-Beginners offers a more structured approach to learning machine learning, with clear documentation and multi-language support. However, it may lack depth in advanced topics. ailearning provides a broader range of topics and practical applications but may be less organized for beginners. Both repositories use similar code structures for common machine learning tasks.

⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.

Pros of handson-ml

  • More comprehensive coverage of machine learning topics
  • Better organized with clear chapter structure
  • Includes Jupyter notebooks for interactive learning

Cons of handson-ml

  • Less focus on deep learning and neural networks
  • Fewer practical examples for real-world applications

Code Comparison

handson-ml:

from sklearn.ensemble import RandomForestClassifier

forest_clf = RandomForestClassifier(n_estimators=100, random_state=42)
forest_clf.fit(X_train, y_train)
y_pred = forest_clf.predict(X_test)

ailearning:

import torch
import torch.nn as nn

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

Summary

handson-ml provides a more structured approach to learning machine learning concepts, with a focus on scikit-learn and traditional ML algorithms. ailearning offers a broader range of topics, including deep learning and neural networks, using frameworks like PyTorch. While handson-ml excels in organization and clarity, ailearning provides more diverse and advanced examples for those interested in cutting-edge AI techniques.

The "Python Machine Learning (1st edition)" book code repository and info resource

Pros of python-machine-learning-book

  • More focused on machine learning concepts and implementations
  • Provides comprehensive code examples and explanations
  • Regularly updated with new content and improvements

Cons of python-machine-learning-book

  • Limited coverage of deep learning and neural networks
  • Less diverse range of AI topics compared to ailearning
  • Primarily in English, which may limit accessibility for non-English speakers

Code Comparison

python-machine-learning-book:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=1, stratify=y)

ailearning:

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Both repositories provide code examples for machine learning tasks, but python-machine-learning-book tends to offer more detailed explanations and context for each code snippet. The ailearning repository covers a broader range of AI topics and includes content in multiple languages, making it more accessible to a diverse audience. However, python-machine-learning-book is more focused on machine learning specifically and provides a more structured learning path for this subject.

TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)

Pros of TensorFlow-Examples

  • More focused on TensorFlow-specific examples and tutorials
  • Cleaner, more organized repository structure
  • Regularly updated with newer TensorFlow versions and features

Cons of TensorFlow-Examples

  • Limited to TensorFlow framework only
  • Less comprehensive coverage of general AI/ML concepts
  • Fewer explanations and theoretical background

Code Comparison

TensorFlow-Examples:

import tensorflow as tf

# Create a constant tensor
hello = tf.constant('Hello, TensorFlow!')

# Start a TensorFlow session
sess = tf.Session()

# Run the op
print(sess.run(hello))

AILearning:

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Load and prepare data
data = pd.read_csv('data.csv')
X = data[['feature1', 'feature2']]
y = data['target']

# Split data and train model
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LinearRegression().fit(X_train, y_train)

The code comparison shows that TensorFlow-Examples focuses on TensorFlow-specific code, while AILearning covers a broader range of libraries and techniques in machine learning.

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

Pros of ML-From-Scratch

  • Focuses on implementing machine learning algorithms from scratch, providing a deeper understanding of the underlying mechanics
  • Clear and concise Python implementations with minimal dependencies
  • Includes a wide range of algorithms, from basic to advanced

Cons of ML-From-Scratch

  • Less comprehensive in terms of overall AI/ML topics compared to AILearning
  • Lacks extensive documentation and explanations for each algorithm
  • May not cover the latest cutting-edge techniques in the field

Code Comparison

ML-From-Scratch (Linear Regression implementation):

class LinearRegression(Regression):
    def fit(self, X, y):
        X = np.insert(X, 0, 1, axis=1)
        self.w = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

AILearning (Linear Regression implementation):

def fit_normal(X, y):
    X = np.insert(X, 0, 1, axis=1)
    w = np.linalg.inv(X.T @ X) @ X.T @ y
    return w

Both repositories provide implementations of machine learning algorithms, but ML-From-Scratch focuses more on building algorithms from the ground up, while AILearning offers a broader range of AI and machine learning topics with more extensive documentation and resources.

📺 Discover the latest machine learning / AI courses on YouTube.

Pros of ML-YouTube-Courses

  • Curated list of high-quality, free ML courses from YouTube
  • Organized by topics and skill levels for easy navigation
  • Regularly updated with new content and community contributions

Cons of ML-YouTube-Courses

  • Limited to video content only, lacking hands-on exercises or projects
  • May not cover all AI/ML topics as comprehensively as ailearning
  • Dependent on external YouTube links, which may become unavailable

Code Comparison

ML-YouTube-Courses doesn't contain code samples, while ailearning includes practical examples. Here's a snippet from ailearning:

# Example from ailearning
import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

ML-YouTube-Courses focuses on organizing and presenting course information:

## Machine Learning

### Beginner
- [Machine Learning — Andrew Ng, Stanford University](https://www.youtube.com/playlist?list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN)

Both repositories serve different purposes: ML-YouTube-Courses as a curated list of video resources, and ailearning as a comprehensive AI/ML learning platform with code examples and explanations.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README


AI learning

协议:CC BY-NC-SA 4.0

一种新技术一旦开始流行,你要么坐上压路机,要么成为铺路石。——Stewart Brand

路线图

补充

1.机器学习 - 基础

支持版本

VersionSupported
3.6.x:x:
2.7.x:white_check_mark:

注意事项:

  • 机器学习实战: 仅仅只是学习,请使用 python 2.7.x 版本 (3.6.x 只是修改了部分)

基本介绍

学习文档

模块章节类型负责人(GitHub)QQ
机器学习实战第 1 章: 机器学习基础介绍@毛红动1306014226
机器学习实战第 2 章: KNN 近邻算法分类@尤永江279393323
机器学习实战第 3 章: 决策树分类@景涛844300439
机器学习实战第 4 章: 朴素贝叶斯分类@wnma3mz
@分析
1003324213
244970749
机器学习实战第 5 章: Logistic回归分类@微光同尘529925688
机器学习实战第 6 章: SVM 支持向量机分类@王德红934969547
网上组合内容第 7 章: 集成方法(随机森林和 AdaBoost)分类@片刻529815144
机器学习实战第 8 章: 回归回归@微光同尘529925688
机器学习实战第 9 章: 树回归回归@微光同尘529925688
机器学习实战第 10 章: K-Means 聚类聚类@徐昭清827106588
机器学习实战第 11 章: 利用 Apriori 算法进行关联分析频繁项集@刘海飞1049498972
机器学习实战第 12 章: FP-growth 高效发现频繁项集频繁项集@程威842725815
机器学习实战第 13 章: 利用 PCA 来简化数据工具@廖立娟835670618
机器学习实战第 14 章: 利用 SVD 来简化数据工具@张俊皓714974242
机器学习实战第 15 章: 大数据与 MapReduce工具@wnma3mz1003324213
Ml项目实战第 16 章: 推荐系统(已迁移)项目推荐系统(迁移后地址)
第一期的总结2017-04-08: 第一期的总结总结总结529815144

网站视频

知乎问答-爆炸啦-机器学习该怎么入门?

当然我知道,第一句就会被吐槽,因为科班出身的人,不屑的吐了一口唾沫,说傻X,还评论 Andrew Ng 的视频。。

我还知道还有一部分人,看 Andrew Ng 的视频就是看不懂,那神秘的数学推导,那迷之微笑的英文版的教学,我何尝又不是这样走过来的?? 我的心可能比你们都痛,因为我在网上收藏过上10部《机器学习》相关视频,外加国内本土风格的教程: 7月+小象 等等,我都很难去听懂,直到有一天,被一个百度的高级算法分析师推荐说: 《机器学习实战》还不错,通俗易懂,你去试试??

我试了试,还好我的Python基础和调试能力还不错,基本上代码都调试过一遍,很多高大上的 "理论+推导",在我眼中变成了几个 "加减乘除+循环",我想这不就是像我这样的程序员想要的入门教程么?

很多程序员说机器学习 TM 太难学了,是的,真 TM 难学,我想最难的是: 没有一本像《机器学习实战》那样的作者愿意以程序员 Coding 角度去给大家讲解!!

最近几天,GitHub 涨了 300颗 star,加群的200人, 现在还在不断的增加++,我想大家可能都是感同身受吧!

很多想入门新手就是被忽悠着收藏收藏再收藏,但是最后还是什么都没有学到,也就是"资源收藏家",也许新手要的就是 MachineLearning(机器学习) 学习路线图。没错,我可以给你们的一份,因为我们还通过视频记录下来我们的学习过程。水平当然也有限,不过对于新手入门,绝对没问题,如果你还不会,那算我输!!

视频怎么看?

  1. 理论科班出身-建议去学习 Andrew Ng 的视频(Ng 的视频绝对是权威,这个毋庸置疑)
  2. 编码能力强 - 建议看我们的《机器学习实战-教学版》
  3. 编码能力弱 - 建议看我们的《机器学习实战-讨论版》,不过在看理论的时候,看 教学版-理论部分;讨论版的废话太多,不过在讲解代码的时候是一行一行讲解的;所以,根据自己的需求,自由的组合。

【免费】数学教学视频 - 可汗学院 入门篇

概率统计线性代数
可汗学院(概率)可汗学院(统计学)可汗学院(线性代数)

机器学习视频 - ApacheCN 教学版

AcFunB站
优酷网易云课堂

【免费】机器/深度学习视频 - 吴恩达

机器学习深度学习
吴恩达机器学习神经网络和深度学习

2.深度学习

支持版本

VersionSupported
3.6.x:white_check_mark:
2.7.x:x:

入门基础

  1. 反向传递: https://www.cnblogs.com/charlotte77/p/5629865.html
  2. CNN原理: http://www.cnblogs.com/charlotte77/p/7759802.html
  3. RNN原理: https://blog.csdn.net/qq_39422642/article/details/78676567
  4. LSTM原理: https://blog.csdn.net/weixin_42111770/article/details/80900575

Pytorch - 教程

-- 待更新

TensorFlow 2.0 - 教程

-- 待更新

目录结构:

切分(分词)

词性标注

命名实体识别

句法分析

WordNet可以被看作是一个同义词词典

词干提取(stemming)与词形还原(lemmatization)

TensorFlow 2.0学习网址

3.自然语言处理

支持版本

VersionSupported
3.6.x:white_check_mark:
2.7.x:x:

学习过程中-内心复杂的变化!!!

自从学习NLP以后,才发现国内与国外的典型区别:
1. 对资源的态度是完全相反的:
  1) 国内: 就好像为了名气,举办工作装逼的会议,就是没有干货,全部都是象征性的PPT介绍,不是针对在做的各位
  2)国外: 就好像是为了推动nlp进步一样,分享者各种干货资料和具体的实现。(特别是: python自然语言处理)
2. 论文的实现: 
  1) 各种高大上的论文实现,却还是没看到一个像样的GitHub项目!(可能我的搜索能力差了点,一直没找到)
  2)国外就不举例了,我看不懂!
3. 开源的框架
  1)国外的开源框架:  tensorflow/pytorch 文档+教程+视频(官方提供)
  2) 国内的开源框架: 额额,还真举例不出来!但是牛逼吹得不比国外差!(MXNet虽然有众多国人参与开发,但不能算是国内开源框架。基于MXNet的动手学深度学习(http://zh.d2l.ai & https://discuss.gluon.ai/t/topic/753)中文教程,已经由沐神(李沐)以及阿斯顿·张讲授录制,公开发布(文档+第一季教程+视频)。)
每一次深入都要去翻墙,每一次深入都要Google,每一次看着国内的说: 哈工大、讯飞、中科大、百度、阿里多牛逼,但是资料还是得国外去找!
有时候真的挺恨的!真的有点瞧不起自己国内的技术环境!

当然谢谢国内很多博客大佬,特别是一些入门的Demo和基本概念。【深入的水平有限,没看懂】

1.使用场景 (百度公开课)

第一部分 入门介绍

第二部分 机器翻译

第三部分 篇章分析

第四部分 UNIT-语言理解与交互技术

应用领域

中文分词:

  • 构建DAG图
  • 动态规划查找,综合正反向(正向加权反向输出)求得DAG最大概率路径
  • 使用了SBME语料训练了一套 HMM + Viterbi 模型,解决未登录词问题

1.文本分类(Text Classification)

文本分类是指标记句子或文档,例如电子邮件垃圾邮件分类和情感分析。

下面是一些很好的初学者文本分类数据集。

  1. 路透社Newswire主题分类(路透社-21578)。1987年路透社出现的一系列新闻文件,按类别编制索引。另见RCV1,RCV2和TRC2。
  2. IMDB电影评论情感分类(斯坦福)。来自网站imdb.com的一系列电影评论及其积极或消极的情绪。
  3. 新闻组电影评论情感分类(康奈尔)。来自网站imdb.com的一系列电影评论及其积极或消极的情绪。

有关更多信息,请参阅帖子: 单标签文本分类的数据集。

情感分析

比赛地址: https://www.kaggle.com/c/word2vec-nlp-tutorial

  • 方案一(0.86): WordCount + 朴素 Bayes
  • 方案二(0.94): LDA + 分类模型(knn/决策树/逻辑回归/svm/xgboost/随机森林)
    • a) 决策树效果不是很好,这种连续特征不太适合的
    • b) 通过参数调整 200 个topic,信息量保存效果较优(计算主题)
  • 方案三(0.72): word2vec + CNN
    • 说实话: 没有一个好的机器,是调不出来一个好的结果 (: 逃

通过AUC 来评估模型的效果

2.语言模型(Language Modeling)

语言建模涉及开发一种统计模型,用于预测句子中的下一个单词或一个单词中的下一个单词。它是语音识别和机器翻译等任务中的前置任务。

它是语音识别和机器翻译等任务中的前置任务。

下面是一些很好的初学者语言建模数据集。

  1. 古腾堡项目,一系列免费书籍,可以用纯文本检索各种语言。
  2. 还有更多正式的语料库得到了很好的研究; 例如: 布朗大学现代美国英语标准语料库。大量英语单词样本。 谷歌10亿字语料库。

新词发现

句子相似度识别

文本纠错

  • bi-gram + levenshtein

3.图像字幕(Image Captioning)

mage字幕是为给定图像生成文本描述的任务。

下面是一些很好的初学者图像字幕数据集。

  1. 上下文中的公共对象(COCO)。包含超过12万张带描述的图像的集合
  2. Flickr 8K。从flickr.com获取的8千个描述图像的集合。
  3. Flickr 30K。从flickr.com获取的3万个描述图像的集合。 欲了解更多,请看帖子:

探索图像字幕数据集,2016年

4.机器翻译(Machine Translation)

机器翻译是将文本从一种语言翻译成另一种语言的任务。

下面是一些很好的初学者机器翻译数据集。

  1. 加拿大第36届议会的协调国会议员。成对的英语和法语句子。
  2. 欧洲议会诉讼平行语料库1996-2011。句子对一套欧洲语言。 有大量标准数据集用于年度机器翻译挑战; 看到:

统计机器翻译

机器翻译

5.问答系统(Question Answering)

问答是一项任务,其中提供了一个句子或文本样本,从中提出问题并且必须回答问题。

下面是一些很好的初学者问题回答数据集。

  1. 斯坦福问题回答数据集(SQuAD)。回答有关维基百科文章的问题。
  2. Deepmind问题回答语料库。从每日邮报回答有关新闻文章的问题。
  3. 亚马逊问答数据。回答有关亚马逊产品的问题。 有关更多信息,请参阅帖子:

数据集: 我如何获得问答网站的语料库,如Quora或Yahoo Answers或Stack Overflow来分析答案质量?

6.语音识别(Speech Recognition)

语音识别是将口语的音频转换为人类可读文本的任务。

下面是一些很好的初学者语音识别数据集。

  1. TIMIT声学 - 语音连续语音语料库。不是免费的,但因其广泛使用而上市。口语美国英语和相关的转录。
  2. VoxForge。用于构建用于语音识别的开源数据库的项目。
  3. LibriSpeech ASR语料库。从LibriVox收集的大量英语有声读物。

7.自动文摘(Document Summarization)

文档摘要是创建较大文档的简短有意义描述的任务。

下面是一些很好的初学者文档摘要数据集。

  1. 法律案例报告数据集。收集了4000份法律案件及其摘要。
  2. TIPSTER文本摘要评估会议语料库。收集了近200份文件及其摘要。
  3. 英语新闻文本的AQUAINT语料库。不是免费的,而是广泛使用的。新闻文章的语料库。 欲了解更多信息:

文档理解会议(DUC)任务。 在哪里可以找到用于文本摘要的良好数据集?

命名实体识别

文本摘要

Graph图计算【慢慢更新】

知识图谱

  • 知识图谱,我只认 SimmerChan: 【知识图谱-给AI装个大脑】
  • 说实话,我是看这博主老哥写的博客长大的,写的真的是深入浅出。我很喜欢,所以就分享给大家,希望你们也喜欢。

进一步阅读

如果您希望更深入,本节提供了其他数据集列表。

  1. 维基百科研究中使用的文本数据集
  2. 数据集: 计算语言学家和自然语言处理研究人员使用的主要文本语料库是什么?
  3. 斯坦福统计自然语言处理语料库
  4. 按字母顺序排列的NLP数据集列表
  5. 该机构NLTK
  6. 在DL4J上打开深度学习数据
  7. NLP数据集
  8. 国内开放数据集: https://bosonnlp.com/dev/resource

参考

致谢

最近无意收到群友推送的链接,发现得到大佬高度的认可,并在热心的推广。在此感谢:

赞助我们

微信&支付宝