ai2thor

An open-source platform for Visual AI.

1,435

247

1,435

268

View on GitHub

Top Related Projects

ml-agents

18,478

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.

habitat-lab

2,500

A modular high-level library to train embodied AI agents across a variety of tasks and environments.

gym

36,310

A toolkit for developing and comparing reinforcement learning algorithms.

lab

7,245

A customisable 3D platform for agent-based AI research

AirSim

17,368

Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research

Quick Overview

AI2-THOR is an open-source interactive 3D environment for AI research, developed by the Allen Institute for AI. It provides a platform for training and testing embodied AI agents in realistic, physics-enabled indoor scenes, focusing on household tasks and navigation.

Pros

Highly realistic and interactive 3D environments
Supports a wide range of embodied AI tasks, including navigation, manipulation, and visual question answering
Offers both Unity-based and headless simulation options for flexibility in research setups
Regularly updated with new features and improvements

Cons

Steep learning curve for researchers new to 3D environments or Unity
Resource-intensive, requiring significant computational power for complex simulations
Limited to indoor household environments, which may not suit all research needs
Some users report occasional stability issues or bugs

Code Examples

Initializing the environment and moving the agent:

import ai2thor.controller

controller = ai2thor.controller.Controller()
controller.start()

# Move the agent forward
event = controller.step(action="MoveAhead")

Interacting with objects in the scene:

# Pick up an object
event = controller.step(
    action="PickupObject",
    objectId="Apple|+03.08|+00.90|-00.84"
)

# Open a drawer
event = controller.step(
    action="OpenObject",
    objectId="Drawer|+01.16|+00.57|-01.34"
)

Capturing observations from the environment:

# Get the current frame
frame = event.frame

# Get depth map
depth_map = event.depth_frame

# Get instance segmentation
instance_seg = event.instance_segmentation_frame

Getting Started

To get started with AI2-THOR, follow these steps:

Install the library:

pip install ai2thor

Create a simple script to initialize and interact with the environment:

import ai2thor.controller

controller = ai2thor.controller.Controller()
controller.start()

# Initialize in a kitchen scene
event = controller.reset(scene="FloorPlan1")

# Move the agent and print its position
event = controller.step(action="MoveAhead")
print(event.metadata["agent"]["position"])

# Close the environment
controller.stop()

This basic script initializes the environment, moves the agent, and prints its position. From here, you can explore more complex interactions and tasks within the AI2-THOR environment.

Competitor Comparisons

ml-agents

18,478

Pros of ml-agents

More general-purpose, supporting a wider range of AI/ML applications in Unity
Larger community and more extensive documentation
Integrates seamlessly with Unity's built-in features and asset pipeline

Cons of ml-agents

Steeper learning curve for non-Unity developers
Less focused on embodied AI and robotics simulations
Requires more setup and configuration for specific use cases

Code Comparison

ml-agents:

public class RollerAgent : Agent
{
    public override void OnEpisodeBegin()
    {
        // Reset agent position and state
    }
}

ai2thor:

controller = Controller()
event = controller.step(action="MoveAhead")

Summary

ml-agents is a more versatile toolkit for developing AI in Unity, with broader applications and a larger community. However, it may require more effort to set up for specific scenarios. ai2thor, on the other hand, is more focused on embodied AI and robotics simulations, offering a simpler API for these use cases but with less flexibility for general Unity development.

habitat-lab

2,500

A modular high-level library to train embodied AI agents across a variety of tasks and environments.

Pros of Habitat-lab

Supports a wider range of 3D datasets and environments
More flexible and customizable for various embodied AI tasks
Better performance and scalability for large-scale simulations

Cons of Habitat-lab

Steeper learning curve and more complex setup process
Less photorealistic environments compared to AI2-THOR
Fewer built-in interactive objects and physics simulations

Code Comparison

AI2-THOR example:

controller = Controller()
event = controller.step(action="MoveAhead")

Habitat-lab example:

env = habitat.Env(config=config)
observations = env.reset()
action = {"action": "MOVE_FORWARD"}
observations = env.step(action)

Both frameworks provide similar functionality for agent movement and interaction, but Habitat-lab's API is more flexible and allows for more customization in defining actions and observations.

gym

36,310

A toolkit for developing and comparing reinforcement learning algorithms.

Pros of Gym

Broader scope, supporting a wide range of environments beyond just robotics and vision
Larger community and ecosystem, with many third-party environments available
Simpler setup and installation process

Cons of Gym

Less realistic visual environments compared to AI2-THOR
Limited built-in support for complex, interactive 3D environments
Fewer options for physics-based interactions and object manipulations

Code Comparison

AI2-THOR example:

controller = Controller()
event = controller.step(action="MoveAhead")
rgb = event.frame

Gym example:

env = gym.make("CartPole-v1")
observation, reward, done, info = env.step(action)

Both libraries use a similar step-based approach for agent interactions, but AI2-THOR provides more detailed visual and physics-based information in its events, while Gym focuses on simpler numerical observations and rewards.

lab

7,245

A customisable 3D platform for agent-based AI research

Pros of DeepMind Lab

More extensive documentation and tutorials
Supports a wider range of research tasks and environments
Better integration with machine learning frameworks like TensorFlow

Cons of DeepMind Lab

Steeper learning curve for beginners
Less realistic visuals compared to AI2-THOR
Limited to first-person perspective interactions

Code Comparison

AI2-THOR example:

controller = Controller()
event = controller.step(action="MoveAhead")

DeepMind Lab example:

env = deepmind_lab.Lab("seekavoid_arena_01", ["RGB_INTERLEAVED"])
env.reset()
obs = env.step(actions=[0, 0, 0, 1, 0, 0, 0])

AI2-THOR focuses on realistic home environments with object interactions, while DeepMind Lab provides more abstract, game-like environments for reinforcement learning research. AI2-THOR offers a higher level of visual fidelity and physics simulation, making it suitable for tasks involving object manipulation and navigation in realistic settings. DeepMind Lab, on the other hand, excels in providing a diverse range of customizable environments for testing and developing AI agents in various scenarios.

AirSim

17,368

Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research

Pros of AirSim

More realistic physics simulation, especially for drones and vehicles
Supports multiple environments (indoor, outdoor, urban)
Offers APIs for various programming languages (C++, Python, C#, Java)

Cons of AirSim

Steeper learning curve and more complex setup
Requires Unreal Engine, which can be resource-intensive
Less focused on household environments and object interactions

Code Comparison

AirSim (Python):

import airsim

client = airsim.MultirotorClient()
client.takeoffAsync().join()
client.moveToPositionAsync(-10, 10, -10, 5).join()

AI2-THOR (Python):

from ai2thor.controller import Controller

controller = Controller()
event = controller.step(action="MoveAhead")
event = controller.step(action="RotateRight")

Summary

AirSim excels in realistic physics simulations for drones and vehicles, offering diverse environments and multi-language support. However, it has a steeper learning curve and requires more computational resources. AI2-THOR focuses on household environments and object interactions, with a simpler setup process but less realistic physics. The code examples demonstrate the different approaches: AirSim uses more specific movement commands, while AI2-THOR employs higher-level actions for navigation and interaction.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

A Near Photo-Realistic Interactable Framework for Embodied AI Agents

ð¡ Environments


`iTHOR`	`ManipulaTHOR`	`RoboTHOR`
A high-level interaction framework that facilitates research in embodied common sense reasoning.	A mid-level interaction framework that facilitates visual manipulation of objects using a robotic arm.	A framework that facilitates Sim2Real research with a collection of simulated scene counterparts in the physical world.

ð Features

ð¡ Scenes. 200+ custom built high-quality scenes. The scenes can be explored on our demo page. We are working on rapidly expanding the number of available scenes and domain randomization within each scene.

ðª Objects. 2600+ custom designed household objects across 100+ object types. Each object is heavily annotated, which allows for near-realistic physics interaction.

ð¤ Agent Types. Multi-agent support, a custom built LoCoBot agent, a Kinova 3 inspired robotic manipulation agent, and a drone agent.

ð¦¾ Actions. 200+ actions that facilitate research in a wide range of interaction and navigation based embodied AI tasks.

ð¼ Images. First-class support for many image modalities and camera adjustments. Some modalities include ego-centric RGB images, instance segmentation, semantic segmentation, depth frames, normals frames, top-down frames, orthographic projections, and third-person camera frames. User's can also easily change camera properties, such as the size of the images and field of view.

ðº Metadata. After each step in the environment, there is a large amount of sensory data available about the state of the environment. This information can be used to build highly complex custom reward functions.

ð° Latest Announcements

Date	Announcement
5/2021		`RandomizeMaterials` is now supported! It enables a massive amount of realistic looking domain randomization within each scene. Try it out on the demo
4/2021	We are excited to release ManipulaTHOR, an environment within the AI2-THOR framework that facilitates visual manipulation of objects using a robotic arm. Please see the full 3.0.0 release notes here.
4/2021		`RandomizeLighting` is now supported! It includes many tunable parameters to allow for vast control over its effects. Try it out on the demo!
2/2021	We are excited to host the AI2-THOR Rearrangement Challenge, RoboTHOR ObjectNav Challenge, and ALFRED Challenge, held in conjunction with the Embodied AI Workshop at CVPR 2021.
2/2021	AI2-THOR v2.7.0 announces several massive speedups to AI2-THOR! Read more about it here.
6/2020	We've released ð³ AI2-THOR Docker a mini-framework to simplify running AI2-THOR in Docker.
4/2020	Version 2.4.0 update of the framework is here. All sim objects that aren't explicitly part of the environmental structure are now moveable with physics interactions. New object types have been added, and many new actions have been added. Please see the full 2.4.0 release notes here.
2/2020	AI2-THOR now includes two frameworks: iTHOR and RoboTHOR. iTHOR includes interactive objects and scenes and RoboTHOR consists of simulated scenes and their corresponding real world counterparts.
9/2019	Version 2.1.0 update of the framework has been added. New object types have been added. New Initialization actions have been added. Segmentation image generation has been improved in all scenes.
6/2019	Version 2.0 update of the AI2-THOR framework is now live! We have over quadrupled our action and object states, adding new actions that allow visually distinct state changes such as broken screens on electronics, shattered windows, breakable dishware, liquid fillable containers, cleanable dishware, messy and made beds and more! Along with these new state changes, objects have more physical properties like Temperature, Mass, and Salient Materials that are all reported back in object metadata. To combine all of these new properties and actions, new context sensitive interactions can now automatically change object states. This includes interactions like placing a dirty bowl under running sink water to clean it, placing a mug in a coffee machine to automatically fill it with coffee, putting out a lit candle by placing it in water, or placing an object over an active stove burner or in the fridge to change its temperature. Please see the full 2.0 release notes here to view details on all the changes and new features.

ð» Installation

With Google Colab

AI2-THOR Colab can be used to run AI2-THOR freely in the cloud with Google Colab. Running AI2-THOR in Google Colab makes it extremely easy to explore functionality without having to set AI2-THOR up locally.

With pip

pip install ai2thor

With conda

conda install -c conda-forge ai2thor

With Docker

ð³ AI2-THOR Docker can be used, which adds the configuration for running a X server to be used by Unity 3D to render scenes.

Minimal Example

Once you've installed AI2-THOR, you can verify that everything is working correctly by running the following minimal example:

from ai2thor.controller import Controller
controller = Controller(scene="FloorPlan10")
event = controller.step(action="RotateRight")
metadata = event.metadata
print(event, event.metadata.keys())

Requirements

Component	Requirement
OS	Mac OS X 10.9+, Ubuntu 14.04+
Graphics Card	DX9 (shader model 3.0) or DX11 with feature level 9.3 capabilities.
CPU	SSE2 instruction set support.
Python	Versions 3.5+
Linux	X server with GLX module enabled

ð¬ Support

Questions. If you have any questions on AI2-THOR, please ask them on our GitHub Discussions Page.

Issues. If you encounter any issues while using AI2-THOR, please open an Issue on GitHub.

ð« Learn more

Section	Description
Demo	Interact and play with AI2-THOR live in the browser.
iTHOR Documentation	Documentation for the iTHOR environment.
ManipulaTHOR Documentation	Documentation for the ManipulaTHOR environment.
RoboTHOR Documentation	Documentation for the RoboTHOR environment.
AI2-THOR Colab	A way to run AI2-THOR freely on the cloud using Google Colab.
AllenAct	An Embodied AI Framework build at AI2 that provides first-class support for AI2-THOR.
AI2-THOR Unity Development	A (sparse) collection of notes that may be useful if editing on the AI2-THOR backend.
AI2-THOR WebGL Development	Documentation on packaging AI2-THOR for the web, which might be useful for annotation based tasks.

ð Citation

If you use AI2-THOR or iTHOR scenes, please cite the original AI2-THOR paper:

@article{ai2thor,
  author={Eric Kolve and Roozbeh Mottaghi and Winson Han and
          Eli VanderBilt and Luca Weihs and Alvaro Herrasti and
          Daniel Gordon and Yuke Zhu and Abhinav Gupta and
          Ali Farhadi},
  title={{AI2-THOR: An Interactive 3D Environment for Visual AI}},
  journal={arXiv},
  year={2017}
}

If you use ðï¸ ProcTHOR or procedurally generated scenes, please cite the following paper:

@inproceedings{procthor,
  author={Matt Deitke and Eli VanderBilt and Alvaro Herrasti and
          Luca Weihs and Jordi Salvador and Kiana Ehsani and
          Winson Han and Eric Kolve and Ali Farhadi and
          Aniruddha Kembhavi and Roozbeh Mottaghi},
  title={{ProcTHOR: Large-Scale Embodied AI Using Procedural Generation}},
  booktitle={NeurIPS},
  year={2022},
  note={Outstanding Paper Award}
}

If you use ManipulaTHOR agent, please cite the following paper:

@inproceedings{manipulathor,
  title={{ManipulaTHOR: A Framework for Visual Object Manipulation}},
  author={Kiana Ehsani and Winson Han and Alvaro Herrasti and
          Eli VanderBilt and Luca Weihs and Eric Kolve and
          Aniruddha Kembhavi and Roozbeh Mottaghi},
  booktitle={CVPR},
  year={2021}
}

If you use RoboTHOR scenes, please cite the following paper:

@inproceedings{robothor,
  author={Matt Deitke and Winson Han and Alvaro Herrasti and
          Aniruddha Kembhavi and Eric Kolve and Roozbeh Mottaghi and
          Jordi Salvador and Dustin Schwenk and Eli VanderBilt and
          Matthew Wallingford and Luca Weihs and Mark Yatskar and
          Ali Farhadi},
  title={{RoboTHOR: An Open Simulation-to-Real Embodied AI Platform}},
  booktitle={CVPR},
  year={2020}
}

ð Our Team

AI2-THOR is an open-source project built by the PRIOR team at the Allen Institute for AI (AI2). AI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of ml-agents

Cons of ml-agents

Code Comparison

Summary

Pros of Habitat-lab

Cons of Habitat-lab

Code Comparison

Pros of Gym

Cons of Gym

Code Comparison

Pros of DeepMind Lab

Cons of DeepMind Lab

Code Comparison

Pros of AirSim

Cons of AirSim

Code Comparison

Summary

Convert designs to code with AI

README

A Near Photo-Realistic Interactable Framework for Embodied AI Agents

ð¡ Environments

ð Features

ð° Latest Announcements

ð» Installation

With Google Colab

With pip

With conda

With Docker

Minimal Example

Requirements

ð¬ Support

ð« Learn more

ð Citation

ð Our Team

Top Related Projects

Convert designs to code with AI

ð¡ Environments

ð Features

ð° Latest Announcements

ð» Installation

ð¬ Support

ð« Learn more

ð Citation

ð Our Team