Convert Figma logo to code with AI

bojone logobert4keras

keras implement of transformers for humans

5,375
929
5,375
164

Top Related Projects

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

38,368

TensorFlow code and pre-trained models for BERT

77,006

Models and examples built with TensorFlow

85,015

Tensors and Dynamic neural networks in Python with strong GPU acceleration

62,199

Deep Learning for humans

35,868

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Quick Overview

Bert4keras is a lightweight, high-performance BERT implementation in Keras and TensorFlow. It aims to provide a flexible and easy-to-use interface for fine-tuning BERT models on various NLP tasks. The project is designed to be compatible with both TensorFlow 1.x and 2.x versions.

Pros

  • Easy integration with existing Keras and TensorFlow projects
  • Supports multiple BERT variants (e.g., BERT, RoBERTa, ALBERT)
  • Flexible architecture allowing for custom model modifications
  • Comprehensive documentation and examples

Cons

  • Primarily focused on Chinese NLP tasks, which may limit its applicability for other languages
  • Requires some familiarity with Keras and TensorFlow
  • May have a steeper learning curve compared to some other BERT implementations

Code Examples

  1. Loading a pre-trained BERT model:
from bert4keras.models import build_transformer_model

config_path = 'bert_config.json'
checkpoint_path = 'bert_model.ckpt'
model = build_transformer_model(config_path, checkpoint_path)
  1. Tokenizing text for BERT input:
from bert4keras.tokenizers import Tokenizer

dict_path = 'vocab.txt'
tokenizer = Tokenizer(dict_path)
tokens = tokenizer.tokenize('Hello, BERT!')
print(tokens)
  1. Fine-tuning BERT for text classification:
from bert4keras.models import build_transformer_model
from bert4keras.optimizers import Adam

model = build_transformer_model(config_path, checkpoint_path)
output = Dense(num_classes, activation='softmax')(model.output)
model = Model(model.input, output)

model.compile(
    loss='categorical_crossentropy',
    optimizer=Adam(2e-5),
    metrics=['accuracy']
)

model.fit(train_generator, steps_per_epoch=1000, epochs=5)

Getting Started

To get started with bert4keras, follow these steps:

  1. Install the library:
pip install bert4keras
  1. Download pre-trained BERT weights and configuration files.

  2. Import the necessary modules:

from bert4keras.models import build_transformer_model
from bert4keras.tokenizers import Tokenizer
from bert4keras.snippets import sequence_padding, DataGenerator
  1. Load a pre-trained model and tokenizer:
model = build_transformer_model(config_path, checkpoint_path)
tokenizer = Tokenizer(dict_path)
  1. Prepare your data and fine-tune the model for your specific NLP task.

Competitor Comparisons

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Pros of Transformers

  • Extensive model support: Covers a wide range of transformer-based models
  • Active community and frequent updates
  • Comprehensive documentation and tutorials

Cons of Transformers

  • Steeper learning curve for beginners
  • Larger library size and potentially slower import times

Code Comparison

Transformers

from transformers import BertTokenizer, BertModel

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

bert4keras

from bert4keras.tokenizers import Tokenizer
from bert4keras.models import build_transformer_model

tokenizer = Tokenizer(dict_path)
model = build_transformer_model(config_path, checkpoint_path)

Summary

Transformers offers a more comprehensive solution with broader model support and active community involvement. However, it may be more complex for beginners. bert4keras provides a simpler, more focused approach specifically for BERT-like models in Keras, which can be advantageous for users primarily working with these architectures.

38,368

TensorFlow code and pre-trained models for BERT

Pros of BERT

  • Official implementation by Google Research, ensuring high reliability and adherence to the original paper
  • Extensive documentation and examples for various BERT applications
  • Large community support and frequent updates

Cons of BERT

  • Limited to TensorFlow 1.x, which may be outdated for some users
  • Less flexibility in terms of customization and integration with other frameworks
  • Steeper learning curve for users not familiar with TensorFlow

Code Comparison

BERT (TensorFlow):

import tensorflow as tf
from bert import modeling

bert_config = modeling.BertConfig.from_json_file("bert_config.json")
model = modeling.BertModel(config=bert_config, is_training=True, input_ids=input_ids)

bert4keras (Keras):

from bert4keras.models import build_transformer_model

model = build_transformer_model(
    config_path='bert_config.json',
    checkpoint_path='bert_model.ckpt',
    model='bert',
)

Key Differences

  • bert4keras offers a more user-friendly API with Keras integration
  • BERT provides a lower-level implementation, allowing for more control but requiring more setup
  • bert4keras supports multiple backends (TensorFlow, Keras, PyTorch), while BERT is TensorFlow-specific
77,006

Models and examples built with TensorFlow

Pros of models

  • Comprehensive collection of official TensorFlow models and examples
  • Extensive documentation and community support
  • Regular updates and maintenance by the TensorFlow team

Cons of models

  • Large repository size, potentially overwhelming for beginners
  • May include unnecessary components for specific BERT implementations
  • Steeper learning curve due to broader scope

Code comparison

models:

import tensorflow as tf
from official.nlp import modeling
bert_config = modeling.BertConfig(vocab_size=30522, hidden_size=768, num_hidden_layers=12, num_attention_heads=12)
bert_model = modeling.BertModel(config=bert_config)

bert4keras:

from bert4keras.models import build_transformer_model
config_path = 'bert_config.json'
checkpoint_path = 'bert_model.ckpt'
model = build_transformer_model(config_path, checkpoint_path)

Summary

models offers a comprehensive suite of TensorFlow models and examples, including BERT implementations. It provides extensive documentation and regular updates but may be overwhelming for users focused solely on BERT. bert4keras, on the other hand, is a lightweight alternative specifically designed for BERT implementations in Keras, offering a simpler API and easier integration for BERT-specific tasks.

85,015

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

  • Larger community and more extensive ecosystem
  • Supports a wider range of deep learning models and applications
  • More comprehensive documentation and tutorials

Cons of PyTorch

  • Steeper learning curve for beginners
  • Larger library size and potentially slower initial setup

Code Comparison

bert4keras:

from bert4keras.models import build_transformer_model
model = build_transformer_model(
    config_path='bert_config.json',
    checkpoint_path='bert_model.ckpt',
    model='bert'
)

PyTorch:

from transformers import BertModel, BertConfig
config = BertConfig.from_json_file('bert_config.json')
model = BertModel.from_pretrained('bert-base-uncased', config=config)

Key Differences

  • bert4keras is specifically designed for BERT and related models, while PyTorch is a general-purpose deep learning framework
  • bert4keras offers a simpler API for working with BERT models, making it easier for beginners to get started
  • PyTorch provides more flexibility and customization options for advanced users and researchers

Use Cases

  • bert4keras: Ideal for projects focused on BERT and its variants, especially for Chinese NLP tasks
  • PyTorch: Suitable for a wide range of deep learning projects, including computer vision, natural language processing, and reinforcement learning
62,199

Deep Learning for humans

Pros of Keras

  • Broader scope and functionality, supporting a wide range of deep learning models
  • Larger community and more extensive documentation
  • Official support from TensorFlow and integration with other TensorFlow tools

Cons of Keras

  • Less specialized for BERT and transformer models
  • May require more setup and configuration for BERT-specific tasks
  • Potentially steeper learning curve for BERT implementations

Code Comparison

bert4keras:

from bert4keras.models import build_transformer_model
model = build_transformer_model(
    config_path='bert_config.json',
    checkpoint_path='bert_model.ckpt',
    model='bert'
)

Keras:

import tensorflow as tf
from tensorflow import keras
bert_model = keras.models.load_model('bert_model.h5')

bert4keras provides a more streamlined approach for building BERT models, while Keras requires additional setup but offers more flexibility for various model architectures.

35,868

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

  • Offers advanced distributed training and optimization techniques for large-scale models
  • Supports a wider range of deep learning frameworks, including PyTorch and TensorFlow
  • Provides extensive documentation and tutorials for various use cases

Cons of DeepSpeed

  • Steeper learning curve due to its more complex architecture and features
  • May be overkill for smaller projects or simpler model training tasks

Code Comparison

DeepSpeed:

import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(args=args,
                                                     model=model,
                                                     model_parameters=params)

bert4keras:

from bert4keras.models import build_transformer_model
model = build_transformer_model(
    config_path=config_path,
    checkpoint_path=checkpoint_path,
    model='bert',
)

Summary

DeepSpeed is a more comprehensive and powerful library for large-scale model training, offering advanced optimization techniques and broader framework support. However, it may be more complex to set up and use compared to bert4keras, which is more focused on BERT-related tasks and easier to implement for simpler projects.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

bert4keras

  • Our light reimplement of bert for keras
  • 更清晰、更轻量级的keras版bert
  • 个人博客:https://kexue.fm/
  • 在线文档:http://bert4keras.spaces.ac.cn/ (还在构建中)

说明

这是笔者重新实现的keras版的transformer模型库,致力于用尽可能清爽的代码来实现结合transformer和keras。

本项目的初衷是为了修改、定制上的方便,所以可能会频繁更新。

因此欢迎star,但不建议fork,因为你fork下来的版本可能很快就过期了。

功能

目前已经实现:

  • 加载bert/roberta/albert的预训练权重进行finetune;
  • 实现语言模型、seq2seq所需要的attention mask;
  • 丰富的examples;
  • 从零预训练代码(支持TPU、多GPU,请看pretraining);
  • 兼容keras、tf.keras

使用

安装稳定版:

pip install bert4keras

安装最新版:

pip install git+https://www.github.com/bojone/bert4keras.git

使用例子请参考examples目录。

之前基于keras-bert给出的例子,仍适用于本项目,只需要将bert_model的加载方式换成本项目的。

理论上兼容Python2和Python3,兼容tensorflow 1.14+和tensorflow 2.x,实验环境是Python 2.7、Tesorflow 1.14+以及Keras 2.3.1(已经在2.2.4、2.3.0、2.3.1、tf.keras下测试通过)。

为了获得最好的体验,建议你使用Tensorflow 1.14 + Keras 2.3.1组合。

关于环境组合
  • 支持tf+keras和tf+tf.keras,后者需要提前传入环境变量TF_KERAS=1。

  • 当使用tf+keras时,建议2.2.4 <= keras <= 2.3.1,以及 1.14 <= tf <= 2.2,不能使用tf 2.3+。

  • keras 2.4+可以用,但事实上keras 2.4.x基本上已经完全等价于tf.keras了,因此如果你要用keras 2.4+,倒不如直接用tf.keras。

当然,乐于贡献的朋友如果发现了某些bug的话,也欢迎指出修正甚至Pull Requests~

权重

目前支持加载的权重:

注意事项

  • 注1:brightmart版albert的开源时间早于Google版albert,这导致早期brightmart版albert的权重与Google版的不完全一致,换言之两者不能直接相互替换。为了减少代码冗余,bert4keras的0.2.4及后续版本均只支持加载Google版以brightmart版中带Google字眼的权重。如果要加载早期版本的权重,请用0.2.3版本,或者考虑作者转换过的albert_zh。
  • 注2:下载下来的ELECTRA权重,如果没有json配置文件的话,参考这里自己改一个(需要加上type_vocab_size字段)。

更新

  • 2023.03.06: 无穷大改np.inf;优化显存占用。将无穷大改为np.inf,运算更加准确,而且在低精度运算时不容易出错;同时合并了若干mask算子,减少了显存占用。实测在A100上训练base和large级别模型时,速度有明显加快,显存占用也有降低。
  • 2022.03.20: 增加RoFormerV2。
  • 2022.02.28: 增加GatedAttentionUnit。
  • 2021.04.23: 增加GlobalPointer。
  • 2021.03.23: 增加RoFormer。
  • 2021.01.30: 发布0.9.9版,完善多GPU支持,增加多GPU例子:task_seq2seq_autotitle_multigpu.py。
  • 2020.12.29: 增加residual_attention_scores参数来实现RealFormer,只需要在build_transformer_model中传入参数residual_attention_scores=True启用。
  • 2020.12.04: PositionEmbedding引入层次分解,可以让BERT直接处理超长文本,在build_transformer_model中传入参数hierarchical_position=True启用。
  • 2020.11.19: 支持GPT2模型,参考CPM_LM_bert4keras项目。
  • 2020.11.14: 新增分参数学习率extend_with_parameter_wise_lr,可用于给每层设置不同的学习率。
  • 2020.10.27: 支持T5.1.1和Multilingual T5。
  • 2020.08.28: 支持GPT_OpenAI。
  • 2020.08.22: 新增WebServing类,允许简单地将模型转换为Web接口,详情请参考该类的说明。
  • 2020.07.14: Transformer类加入prefix参数;snippets.py引入to_array函数;AutoRegressiveDecoder修改rtype='logits'时的一个隐藏bug。
  • 2020.06.06: 强迫症作祟:将Tokenizer原来的max_length参数重命名为maxlen,同时保留向后兼容性,建议大家用新参数名。
  • 2020.04.29: 增加重计算(参考keras_recompute),可以通过时间换空间,通过设置环境变量RECOMPUTE=1启用。
  • 2020.04.25: 优化tf2下的表现。
  • 2020.04.16: 所有example均适配tensorflow 2.0。
  • 2020.04.06: 增加UniLM预训练模式(测试中)。
  • 2020.04.06: 完善rematch方法。
  • 2020.04.01: Tokenizer增加rematch方法,给出分词结果与原序列的映射关系。
  • 2020.03.30: 尽量统一py文件的写法。
  • 2020.03.25: 支持ELECTRA。
  • 2020.03.24: 继续加强DataGenerator,允许传入迭代器时进行局部shuffle。
  • 2020.03.23: 增加调整Attention的key_size的选项。
  • 2020.03.17: 增强DataGenerator;优化模型写法。
  • 2020.03.15: 支持GPT2_ML。
  • 2020.03.10: 支持Google的T5模型。
  • 2020.03.05: 将tokenizer.py更名为tokenizers.py。
  • 2020.03.05: application='seq2seq'改名为application='unilm'。
  • 2020.03.05: build_bert_model更名为build_transformer_model。
  • 2020.03.05: 重写models.py结构。
  • 2020.03.04: 将bert.py更名为models.py。
  • 2020.03.02: 重构mask机制(用回Keras自带的mask机制),以便更好地编写更复杂的应用。
  • 2020.02.22: 新增AutoRegressiveDecoder类,统一处理Seq2Seq的解码问题。
  • 2020.02.19: transformer block的前缀改为Transformer(本来是Encoder),使得其含义局限性更少。
  • 2020.02.13: 优化load_vocab函数;将build_bert_model中的keep_words参数更名为keep_tokens,此处改动可能会对部分脚本产生影响。
  • 2020.01.18: 调整文本处理方式,去掉codecs的使用。
  • 2020.01.17: 各api日趋稳定,为了方便大家使用,打包到pypi,首个打包版本号为0.4.6。
  • 2020.01.10: 重写模型mask方案,某种程度上让代码更为简练清晰;后端优化。
  • 2019.12.27: 重构预训练代码,减少冗余;目前支持RoBERTa和GPT两种预训练方式,详见pretraining。
  • 2019.12.17: 适配华为的nezha权重,只需要在build_bert_model函数里加上model='nezha';此外原来albert的加载方式albert=True改为model='albert'。
  • 2019.12.16: 通过跟keras 2.3+版本类似的思路给低版本引入层中层功能,从而恢复对低于2.3.0版本的keras的支持。
  • 2019.12.14: 新增Conditional Layer Normalization及相关demo。
  • 2019.12.09: 各example的data_generator规范化;修复application='lm'时的一个错误。
  • 2019.12.05: 优化tokenizer的do_lower_case,同时微调各个example。
  • 2019.11.23: 将train.py重命名为optimizers.py,更新大量优化器实现,全面兼容keras和tf.keras。
  • 2019.11.19: 将utils.py重命名为tokenizer.py。
  • 2019.11.19: 想来想去,最后还是决定把snippets放到bert4keras.snippets下面去好了。
  • 2019.11.18: 优化预训练权重加载逻辑,增加保存模型权重至Bert的checkpoint格式方法。
  • 2019.11.17: 分离一些与Bert本身不直接相关的常用代码片段到python_snippets,供其它项目共用。
  • 2019.11.11: 添加NSP部分。
  • 2019.11.05: 适配google版albert,不再支持非Google版albert_zh。
  • 2019.11.05: 以RoBERTa为例子的预训练代码开发完毕,同时支持TPU/多GPU训练,详见roberta。欢迎在此基础上构建更多的预训练代码。
  • 2019.11.01: 逐步增加预训练相关代码,详见pretraining。
  • 2019.10.28: 支持使用基于sentencepiece的tokenizer。
  • 2019.10.25: 引入原生tokenizer。
  • 2019.10.22: 引入梯度累积优化器。
  • 2019.10.21: 为了简化代码结构,决定放弃keras 2.3.0之前的版本的支持,目前只支持keras 2.3.0+以及tf.keras。
  • 2019.10.20: 应网友要求,现支持直接用model.save保存模型结构,用load_model加载整个模型(只需要在load_model之前执行from bert4keras.layers import *,不需要额外写custom_objects)。
  • 2019.10.09: 已兼容tf.keras,同时在tf 1.13和tf 2.0下的tf.keras测试通过,通过设置环境变量TF_KERAS=1来切换tf.keras。
  • 2019.10.09: 已兼容Keras 2.3.x,但只是临时方案,后续可能直接移除掉2.3之前版本的支持。
  • 2019.10.02: 适配albert,能成功加载albert_zh的权重,只需要在load_pretrained_model函数里加上albert=True。

背景

之前一直用CyberZHG大佬的keras-bert,如果纯粹只是为了在keras下对bert进行调用和fine tune来说,keras-bert已经足够能让人满意了。

然而,如果想要在加载官方预训练权重的基础上,对bert的内部结构进行修改,那么keras-bert就比较难满足我们的需求了,因为keras-bert为了代码的复用性,几乎将每个小模块都封装为了一个单独的库,比如keras-bert依赖于keras-transformer,而keras-transformer依赖于keras-multi-head,keras-multi-head依赖于keras-self-attention,这样一重重依赖下去,改起来就相当头疼了。

所以,我决定重新写一个keras版的bert,争取在几个文件内把它完整地实现出来,减少这些依赖性,并且保留可以加载官方预训练权重的特性。

鸣谢

感谢CyberZHG大佬实现的keras-bert,本实现有不少地方参考了keras-bert的源码,在此衷心感谢大佬的无私奉献。

相关

bert4torch:一个跟bert4keras风格很相似的pytorch-based的transofrmer库,使用pytorch的读者可以尝试。

引用

@misc{bert4keras,
  title={bert4keras},
  author={Jianlin Su},
  year={2020},
  howpublished={\url{https://bert4keras.spaces.ac.cn}},
}