Top Related Projects
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
百亿参数的中英文双语基座大模型
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Quick Overview
TaskMatrix is an open-source project that aims to create an AI agent capable of solving complex tasks by breaking them down into subtasks and leveraging various AI models and tools. It utilizes a task planning system and a diverse set of AI capabilities to tackle a wide range of problems efficiently.
Pros
- Modular architecture allows for easy integration of new AI models and tools
- Capable of handling complex, multi-step tasks through intelligent task decomposition
- Leverages multiple AI models to provide a more comprehensive problem-solving approach
- Open-source nature encourages community contributions and improvements
Cons
- May require significant computational resources for optimal performance
- Complexity of the system could lead to potential debugging and maintenance challenges
- Dependency on multiple external AI models and tools may introduce compatibility issues
- Still in early development stages, which may result in instability or incomplete features
Code Examples
# Initialize the TaskMatrix agent
from taskmatrix import TaskMatrixAgent
agent = TaskMatrixAgent()
# Solve a complex task
result = agent.solve_task("Create a comprehensive business plan for a sustainable energy startup")
print(result)
# Add a custom tool to the TaskMatrix agent
from taskmatrix import Tool
custom_tool = Tool("custom_analysis", "Perform custom market analysis", custom_analysis_function)
agent.add_tool(custom_tool)
# Use the custom tool in a task
result = agent.solve_task("Analyze the renewable energy market in Europe", tools=[custom_tool])
# Configure the agent with specific models
from taskmatrix import ModelConfig
config = ModelConfig(
language_model="gpt-4",
image_model="stable-diffusion-v2",
audio_model="whisper-large-v2"
)
agent = TaskMatrixAgent(config=config)
# Solve a multi-modal task
result = agent.solve_task("Create a promotional video for a new electric car", input_image="car_design.jpg")
Getting Started
To get started with TaskMatrix, follow these steps:
-
Install the TaskMatrix library:
pip install taskmatrix
-
Import and initialize the TaskMatrixAgent:
from taskmatrix import TaskMatrixAgent agent = TaskMatrixAgent()
-
Solve a task:
result = agent.solve_task("Your complex task description here") print(result)
Note: Make sure you have the necessary API keys and dependencies set up for the AI models and tools used by TaskMatrix.
Competitor Comparisons
Pros of TaskMatrix
- More comprehensive documentation and examples
- Active development with recent updates
- Larger community and more contributors
Cons of TaskMatrix
- Higher complexity, potentially steeper learning curve
- May have more dependencies and overhead
Code Comparison
TaskMatrix:
from taskmatrix import TaskMatrix
tm = TaskMatrix()
task = tm.create_task("Example task")
result = tm.execute(task)
print(result)
TaskMatrix>:
from taskmatrix_plus import TaskMatrixPlus
tmp = TaskMatrixPlus()
task = tmp.new_task("Example task")
output = tmp.run(task)
print(output)
Key Differences
- TaskMatrix uses
create_task
andexecute
, while TaskMatrix> usesnew_task
andrun
- TaskMatrix> may have a slightly different API structure
- TaskMatrix likely offers more features and flexibility, but TaskMatrix> might be more straightforward for simpler use cases
Community and Support
TaskMatrix:
- Larger user base and more active community
- More comprehensive documentation and examples
- Regular updates and maintenance
TaskMatrix>:
- Smaller community, potentially less third-party support
- May have more focused, specific use cases
- Could be easier to get started with for beginners
Performance
Without specific benchmarks, it's difficult to compare performance. TaskMatrix might be more optimized due to its larger user base and active development, but TaskMatrix> could potentially be more efficient for certain specialized tasks.
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
Pros of JARVIS
- More comprehensive and feature-rich, offering a wider range of AI-powered functionalities
- Better documentation and community support, making it easier for developers to contribute and use
- Regularly updated and maintained by Microsoft, ensuring long-term stability and improvements
Cons of JARVIS
- More complex architecture, which may be overwhelming for beginners or smaller projects
- Potentially higher resource requirements due to its extensive features and capabilities
- Steeper learning curve for developers unfamiliar with Microsoft's ecosystem
Code Comparison
TaskMatrix:
def execute_command(self, command):
# Simple command execution
return self.llm.generate(command)
JARVIS:
async def execute_command(self, command, context):
# Advanced command execution with context
result = await self.agent.process(command, context)
return self.response_generator.format(result)
The code comparison shows that JARVIS has a more sophisticated command execution process, incorporating context and asynchronous operations, while TaskMatrix uses a simpler approach. This reflects the overall difference in complexity and feature set between the two projects.
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Pros of ChatGLM-6B
- Larger model size (6 billion parameters) potentially offering more advanced language understanding and generation capabilities
- Specifically designed for Chinese language tasks, which may provide better performance for Chinese-related applications
- Includes quantization options for efficient deployment on various hardware configurations
Cons of ChatGLM-6B
- Limited to primarily Chinese language tasks, which may not be suitable for multilingual or English-only applications
- Requires more computational resources due to its larger size, potentially limiting deployment options on resource-constrained devices
Code Comparison
TaskMatrix:
from taskmatrix import TaskMatrix
tm = TaskMatrix()
result = tm.run_task("Summarize this text: ...")
ChatGLM-6B:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "你好", history=[])
The code comparison shows that TaskMatrix offers a simpler API for task execution, while ChatGLM-6B requires more setup and is specifically designed for chat-based interactions.
百亿参数的中英文双语基座大模型
Pros of CPM-Bee
- More comprehensive documentation and examples
- Larger community and more active development
- Better support for Chinese language processing
Cons of CPM-Bee
- Potentially more complex setup and configuration
- Less focus on multi-modal tasks compared to TaskMatrix
Code Comparison
TaskMatrix:
from taskmatrix import TaskMatrix
tm = TaskMatrix()
result = tm.execute_task("Describe this image", image_path="example.jpg")
print(result)
CPM-Bee:
from cpm_bee import CPMBee
model = CPMBee.from_pretrained("cpm-bee-10b")
response = model.generate("请描述一下这张图片", max_length=100)
print(response)
The code comparison shows that TaskMatrix is designed for multi-modal tasks, allowing direct input of image paths, while CPM-Bee focuses on text generation and processing. CPM-Bee's code demonstrates its emphasis on Chinese language support.
Both projects aim to provide powerful language processing capabilities, but they differ in their specific focus areas and implementation approaches. TaskMatrix appears more tailored for diverse task types, including image analysis, while CPM-Bee excels in Chinese language processing and offers a more extensive pre-trained model ecosystem.
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Pros of gpt-neox
- Focused on large-scale language model training and deployment
- Extensive documentation and community support
- Optimized for distributed training on multiple GPUs/nodes
Cons of gpt-neox
- More complex setup and configuration required
- Limited to language model tasks, less versatile for general AI applications
- Steeper learning curve for beginners
Code Comparison
TaskMatrix:
from taskmatrix import TaskMatrix
tm = TaskMatrix()
result = tm.run_task("Summarize this text: ...")
print(result)
gpt-neox:
from gpt_neox import GPTNeoX
model = GPTNeoX.from_pretrained("EleutherAI/gpt-neox-20b")
input_text = "Summarize this text: ..."
output = model.generate(input_text, max_length=100)
print(output)
TaskMatrix provides a simpler interface for running various AI tasks, while gpt-neox focuses on language model inference and fine-tuning. TaskMatrix is more suitable for users who need a versatile AI toolkit, whereas gpt-neox is better for those working specifically with large language models and requiring advanced training capabilities.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
TaskMatrix
TaskMatrix connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting.
See our paper: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
Updates:
-
Now TaskMatrix supports GroundingDINO and segment-anything! Thanks @jordddan for his efforts. For the image editing case,
GroundingDINO
is first used to locate bounding boxes guided by given text, thensegment-anything
is used to generate the related mask, and finally stable diffusion inpainting is used to edit image based on the mask.- Firstly, run
python visual_chatgpt.py --load "Text2Box_cuda:0,Segmenting_cuda:0,Inpainting_cuda:0,ImageCaptioning_cuda:0"
- Then, say
find xxx in the image
orsegment xxx in the image
.xxx
is an object. TaskMatrix will return the detection or segmentation result!
- Firstly, run
-
Now TaskMatrix can support Chinese! Thanks to @Wang-Xiaodong1899 for his efforts.
-
We propose the template idea in TaskMatrix!
- A template is a pre-defined execution flow that assists ChatGPT in assembling complex tasks involving multiple foundation models.
- A template contains the experiential solution to complex tasks as determined by humans.
- A template can invoke multiple foundation models or even establish a new ChatGPT session
- To define a template, simply adding a class with attributes
template_model = True
-
Thanks to @ShengmingYin and @thebestannie for providing a template example in
InfinityOutPainting
class (see the following gif)- Firstly, run
python visual_chatgpt.py --load "Inpainting_cuda:0,ImageCaptioning_cuda:0,VisualQuestionAnswering_cuda:0"
- Secondly, say
extend the image to 2048x1024
to TaskMatrix! - By simply creating an
InfinityOutPainting
template, TaskMatrix can seamlessly extend images to any size through collaboration with existingImageCaptioning
,Inpainting
, andVisualQuestionAnswering
foundation models, without the need for additional training.
- Firstly, run
-
TaskMatrix needs the effort of the community! We crave your contribution to add new and interesting features!
Insight & Goal:
On the one hand, ChatGPT (or LLMs) serves as a general interface that provides a broad and diverse understanding of a wide range of topics. On the other hand, Foundation Models serve as domain experts by providing deep knowledge in specific domains. By leveraging both general and deep knowledge, we aim at building an AI that is capable of handling various tasks.
Demo
System Architecture
Quick Start
# clone the repo
git clone https://github.com/microsoft/TaskMatrix.git
# Go to directory
cd visual-chatgpt
# create a new environment
conda create -n visgpt python=3.8
# activate the new environment
conda activate visgpt
# prepare the basic environments
pip install -r requirements.txt
pip install git+https://github.com/IDEA-Research/GroundingDINO.git
pip install git+https://github.com/facebookresearch/segment-anything.git
# prepare your private OpenAI key (for Linux)
export OPENAI_API_KEY={Your_Private_Openai_Key}
# prepare your private OpenAI key (for Windows)
set OPENAI_API_KEY={Your_Private_Openai_Key}
# Start TaskMatrix !
# You can specify the GPU/CPU assignment by "--load", the parameter indicates which
# Visual Foundation Model to use and where it will be loaded to
# The model and device are separated by underline '_', the different models are separated by comma ','
# The available Visual Foundation Models can be found in the following table
# For example, if you want to load ImageCaptioning to cpu and Text2Image to cuda:0
# You can use: "ImageCaptioning_cpu,Text2Image_cuda:0"
# Advice for CPU Users
python visual_chatgpt.py --load ImageCaptioning_cpu,Text2Image_cpu
# Advice for 1 Tesla T4 15GB (Google Colab)
python visual_chatgpt.py --load "ImageCaptioning_cuda:0,Text2Image_cuda:0"
# Advice for 4 Tesla V100 32GB
python visual_chatgpt.py --load "Text2Box_cuda:0,Segmenting_cuda:0,
Inpainting_cuda:0,ImageCaptioning_cuda:0,
Text2Image_cuda:1,Image2Canny_cpu,CannyText2Image_cuda:1,
Image2Depth_cpu,DepthText2Image_cuda:1,VisualQuestionAnswering_cuda:2,
InstructPix2Pix_cuda:2,Image2Scribble_cpu,ScribbleText2Image_cuda:2,
SegText2Image_cuda:2,Image2Pose_cpu,PoseText2Image_cuda:2,
Image2Hed_cpu,HedText2Image_cuda:3,Image2Normal_cpu,
NormalText2Image_cuda:3,Image2Line_cpu,LineText2Image_cuda:3"
GPU memory usage
Here we list the GPU memory usage of each visual foundation model, you can specify which one you like:
Foundation Model | GPU Memory (MB) |
---|---|
ImageEditing | 3981 |
InstructPix2Pix | 2827 |
Text2Image | 3385 |
ImageCaptioning | 1209 |
Image2Canny | 0 |
CannyText2Image | 3531 |
Image2Line | 0 |
LineText2Image | 3529 |
Image2Hed | 0 |
HedText2Image | 3529 |
Image2Scribble | 0 |
ScribbleText2Image | 3531 |
Image2Pose | 0 |
PoseText2Image | 3529 |
Image2Seg | 919 |
SegText2Image | 3529 |
Image2Depth | 0 |
DepthText2Image | 3531 |
Image2Normal | 0 |
NormalText2Image | 3529 |
VisualQuestionAnswering | 1495 |
Acknowledgement
We appreciate the open source of the following projects:
Hugging Face LangChain Stable Diffusion ControlNet InstructPix2Pix CLIPSeg BLIP
Contact Information
For help or issues using the TaskMatrix, please submit a GitHub issue.
For other communications, please contact Chenfei WU (chewu@microsoft.com) or Nan DUAN (nanduan@microsoft.com).
Trademark Notice
Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoftâs Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-partyâs policies.
Disclaimer
The recommended models in this Repo are just examples, used for scientific research exploring the concept of task automation and benchmarking with the paper published at Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models. Users can replace the models in this Repo according to their research needs. When using the recommended models in this Repo, you need to comply with the licenses of these models respectively. Microsoft shall not be held liable for any infringement of third-party rights resulting from your usage of this repo. Users agree to defend, indemnify and hold Microsoft harmless from and against all damages, costs, and attorneys' fees in connection with any claims arising from this Repo. If anyone believes that this Repo infringes on your rights, please notify the project owner email.
Top Related Projects
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
百亿参数的中英文双语基座大模型
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot