geotrellis

GeoTrellis is a geographic data processing engine for high performance applications.

1,363

363

1,363

243

View on GitHub

Top Related Projects

rasterio

2,373

Rasterio reads and writes geospatial raster datasets

gdal

5,431

GDAL is an open source MIT licensed translator library for raster and vector geospatial data formats.

shapely

4,160

Manipulation and analysis of geometric objects

QGIS

11,615

QGIS is a free, open source, cross platform (lin/win/mac) geographical information system (GIS)

Quick Overview

GeoTrellis is an open-source, distributed geographic data processing engine for high-performance applications. It is designed to work with large-scale geospatial data, providing fast and efficient processing capabilities for raster and vector datasets. GeoTrellis is built on top of Apache Spark and is written in Scala.

Pros

High-performance processing of large-scale geospatial data
Seamless integration with Apache Spark for distributed computing
Supports both raster and vector data operations
Extensive set of geospatial operations and algorithms

Cons

Steep learning curve, especially for those unfamiliar with Scala
Limited documentation and examples for some advanced features
Requires significant computational resources for large datasets
Smaller community compared to some other geospatial libraries

Code Examples

Reading a GeoTIFF file:

import geotrellis.raster.io.geotiff.reader.GeoTiffReader

val tiff = GeoTiffReader.readSingleband("path/to/file.tif")

Performing a raster operation:

import geotrellis.raster._

val raster: Raster[Tile] = // ... load raster
val result = raster.mapTile(tile => tile.map(cell => cell * 2))

Creating a vector feature:

import geotrellis.vector._

val point = Point(0, 0)
val feature = Feature(point, Map("name" -> "Example Point"))

Reprojecting a raster:

import geotrellis.proj4._

val raster: Raster[Tile] = // ... load raster
val sourceCRS = CRS.fromEpsgCode(4326)
val targetCRS = CRS.fromEpsgCode(3857)
val reprojected = raster.reproject(sourceCRS, targetCRS)

Getting Started

To get started with GeoTrellis, add the following dependencies to your build.sbt file:

libraryDependencies ++= Seq(
  "org.locationtech.geotrellis" %% "geotrellis-raster" % "3.6.3",
  "org.locationtech.geotrellis" %% "geotrellis-vector" % "3.6.3",
  "org.locationtech.geotrellis" %% "geotrellis-spark" % "3.6.3"
)

Then, import the necessary modules in your Scala code:

import geotrellis.raster._
import geotrellis.vector._
import geotrellis.spark._

You can now start using GeoTrellis functions and classes in your project.

Competitor Comparisons

rasterio

2,373

Rasterio reads and writes geospatial raster datasets

Pros of rasterio

Written in Python, making it more accessible to data scientists and GIS professionals
Simpler API and easier to get started with for basic raster operations
Better integration with other Python libraries like NumPy and scikit-image

Cons of rasterio

Less performant for large-scale distributed processing compared to GeoTrellis
Limited support for vector data operations and advanced geospatial analytics
Fewer built-in functionalities for complex geospatial workflows

Code Comparison

rasterio example:

import rasterio

with rasterio.open('example.tif') as src:
    data = src.read()
    profile = src.profile

GeoTrellis example:

import geotrellis.raster.io.geotiff.reader.GeoTiffReader

val tiff = GeoTiffReader.readSingleband("example.tif")
val tile = tiff.tile
val extent = tiff.extent

Both libraries provide methods for reading raster data, but rasterio's Python syntax is generally more concise and familiar to many data scientists. GeoTrellis, being Scala-based, offers strong typing and functional programming paradigms, which can be advantageous for complex, distributed processing tasks.

geopandas

4,839

Python tools for geographic data

Pros of GeoPandas

Easier to learn and use, especially for those familiar with pandas
Better integration with the Python data science ecosystem
More extensive documentation and community support

Cons of GeoPandas

Less performant for large-scale geospatial data processing
Limited support for distributed computing
Fewer advanced geospatial analysis capabilities

Code Comparison

GeoPandas:

import geopandas as gpd

# Read a shapefile
gdf = gpd.read_file("data.shp")

# Perform a spatial join
result = gpd.sjoin(gdf1, gdf2, how="inner", op="intersects")

GeoTrellis:

import geotrellis.vector._
import geotrellis.vector.io._

// Read a shapefile
val features = ShapeFileReader.readMultiPolygonFeatures("data.shp")

// Perform a spatial join
val joined = features.spatialJoin(otherFeatures)

GeoTrellis offers more advanced geospatial processing capabilities and better performance for large datasets, while GeoPandas provides a more user-friendly interface and better integration with the Python ecosystem. The choice between the two depends on the specific requirements of the project, such as data size, processing complexity, and the preferred programming language.

gdal

5,431

GDAL is an open source MIT licensed translator library for raster and vector geospatial data formats.

Pros of GDAL

Broader support for geospatial data formats and operations
More mature and widely adopted in the geospatial community
Extensive command-line utilities for data processing

Cons of GDAL

Steeper learning curve, especially for non-GIS specialists
Less integrated with big data processing frameworks
Primarily C/C++ based, which may be less accessible for some developers

Code Comparison

GDAL (Python bindings):

from osgeo import gdal
dataset = gdal.Open("example.tif")
band = dataset.GetRasterBand(1)
data = band.ReadAsArray()

GeoTrellis (Scala):

import geotrellis.raster.io.geotiff.reader.GeoTiffReader
val tiff = GeoTiffReader.readSingleband("example.tif")
val tile = tiff.tile

GeoTrellis focuses on distributed processing of geospatial data using Scala and Apache Spark, making it well-suited for big data applications. It offers a more functional programming approach and integrates seamlessly with other Spark-based workflows.

GDAL, on the other hand, provides a comprehensive toolkit for working with geospatial data across various formats and coordinate systems. It's widely used in the GIS industry and offers bindings for multiple programming languages, making it versatile for different development environments.

shapely

4,160

Manipulation and analysis of geometric objects

Pros of Shapely

Simpler API and easier to learn for beginners
Lightweight and focused on geometric operations
Better documentation and more examples available

Cons of Shapely

Limited to 2D geometries
Lacks advanced geospatial analysis capabilities
No built-in support for distributed processing

Code Comparison

Shapely:

from shapely.geometry import Point, Polygon

point = Point(0, 0)
polygon = Polygon([(0, 0), (1, 0), (1, 1), (0, 1)])
is_within = point.within(polygon)

GeoTrellis:

import geotrellis.vector._

val point = Point(0, 0)
val polygon = Polygon(List((0, 0), (1, 0), (1, 1), (0, 1)))
val isWithin = polygon.contains(point)

Both libraries provide similar functionality for basic geometric operations, but GeoTrellis offers more advanced features for large-scale geospatial data processing. Shapely is more accessible for Python developers and simpler projects, while GeoTrellis is better suited for complex, distributed geospatial applications in Scala.

QGIS

11,615

QGIS is a free, open source, cross platform (lin/win/mac) geographical information system (GIS)

Pros of QGIS

Comprehensive GUI for geospatial data visualization and analysis
Extensive plugin ecosystem for additional functionality
Supports a wide range of geospatial data formats and operations

Cons of QGIS

Steeper learning curve for non-GIS professionals
Performance can be slower for large datasets compared to GeoTrellis
Less suitable for distributed processing of big geospatial data

Code Comparison

QGIS (Python):

layer = QgsVectorLayer("path/to/shapefile.shp", "layer_name", "ogr")
if not layer.isValid():
    print("Layer failed to load!")
QgsProject.instance().addMapLayer(layer)

GeoTrellis (Scala):

val rdd: RDD[(ProjectedExtent, Tile)] = S3GeoTiffRDD.spatial("s3://bucket/key")
val (zoom, tiled) = TileLayerMetadata.fromRdd(rdd, FloatingLayoutScheme(512))
tiled.cache()

GeoTrellis focuses on distributed processing of geospatial data using Scala and Apache Spark, making it more suitable for big data applications. QGIS, on the other hand, provides a user-friendly interface for various GIS tasks and is more accessible to users without programming experience. The choice between the two depends on the specific use case and technical requirements of the project.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

GeoTrellis

GeoTrellis is a Scala library and framework that provides APIs for reading, writing and operating on geospatial raster and vector data. GeoTrellis also provides helpers for these same operations in Spark and for performing MapAlgebra operations on rasters. It is released under the Apache 2 License.

Please visit the project site for more information as well as some interactive demos.

You're also welcome to ask questions and talk to developers (let us know what you're working on!) via Gitter.

Getting Started

GeoTrellis is currently available for Scala 2.12 and 2.13, using Spark 3.3.x.

To get started with SBT, simply add the following to your build.sbt file:

libraryDependencies += "org.locationtech.geotrellis" %% "geotrellis-raster" % "<latest version>"

To grab the latest SNAPSHOT, RC or milestone build, add these resolvers:

// maven central snapshots
resolvers ++= Seq(
  "central-snapshots" at "https://central.sonatype.com/repository/maven-snapshots/"
)

// or eclipse snapshots
resolvers ++= Seq(
  "eclipse-releases" at "https://repo.eclipse.org/content/groups/releases",
  "eclipse-snapshots" at "https://repo.eclipse.org/content/groups/snapshots"
)

If you are just getting started with GeoTrellis, we recommend familiarizing yourself with the geotrellis-raster package, but it is just one of the many available. The complete list of published GeoTrellis packages includes:

geotrellis-accumulo: Accumulo store integration for GeoTrellis
geotrellis-accumulo-spark: Accumulo store integration for GeoTrellis + Spark
geotrellis-cassandra: Cassandra store integration for GeoTrellis
geotrellis-cassandra-spark: Cassandra store integration for GeoTrellis + Spark
geotrellis-gdal: GDAL bindings for GeoTrellis
geotrellis-geotools: Conversions to and from GeoTools Vector and Raster data
geotrellis-hbase: HBase store integration for GeoTrellis
geotrellis-hbase-spark: HBase store integration for GeoTrellis + Spark
geotrellis-layer: Datatypes to describe sets of rasters
geotrellis-macros: Performance optimizations for GeoTrellis operations
geotrellis-proj4: Coordinate Reference systems and reproject (Scala wrapper around Proj4j)
geotrellis-raster: Raster data types and operations, including MapAlgebra
geotrellis-raster-testkit: Testkit for testing geotrellis-raster types
geotrellis-s3: Amazon S3 store integration for GeoTrellis
geotrellis-s3-spark: Amazon S3 store integration for GeoTrellis + Spark
geotrellis-shapefile: Read ESRI Shapefiles into GeoTrellis data types via GeoTools
geotrellis-spark: Geospatially enables Spark and provides primitives for external data stores
geotrellis-spark-pipeline: DSL for geospatial ingest jobs using GeoTrellis + Spark
geotrellis-spark-testkit: Testkit for testing geotrellis-spark code
geotrellis-store: Abstract interfaces for storage services, with concrete implementations for local and Hadoop filesystems
geotrellis-util: Miscellaneous GeoTrellis helpers
geotrellis-vector: Vector data types and operations extending JTS
geotrellis-vector-testkit: Testkit for testing geotrellis-vector types
geotrellis-vectortile: Experimental vector tile support, including reading and writing

A more complete feature list can be found on the Module Hierarchy page of the GeoTrellis documentation. If you're looking for a specific feature or operation, we suggest searching there or reaching out on Gitter.

For older releases, check the complete list of packages and versions available at locationtech-releases.

Hello Raster

scala> import geotrellis.raster._
import geotrellis.raster._

scala> import geotrellis.raster.render.ascii._
import geotrellis.raster.render.ascii._

scala> import geotrellis.raster.mapalgebra.focal._
import geotrellis.raster.mapalgebra.focal._

scala> val nd = NODATA
nd: Int = -2147483648

scala> val input = Array[Int](
     nd, 7, 1, 1,  3, 5, 9, 8, 2,
      9, 1, 1, 2,  2, 2, 4, 3, 5,
      3, 8, 1, 3,  3, 3, 1, 2, 2,
      2, 4, 7, 1, nd, 1, 8, 4, 3)
input: Array[Int] = Array(-2147483648, 7, 1, 1, 3, 5, 9, 8, 2, 9, 1, 1, 2,
2, 2, 4, 3, 5, 3, 8, 1, 3, 3, 3, 1, 2, 2, 2, 4, 7, 1, -2147483648, 1, 8, 4, 3)

scala> val iat = IntArrayTile(input, 9, 4)  // 9 and 4 here specify columns and rows
iat: geotrellis.raster.IntArrayTile = IntArrayTile([I@278434d0,9,4)

// The renderAscii method is mostly useful when you're working with small tiles
// which can be taken in at a glance.
scala> iat.renderAscii(AsciiArtEncoder.Palette.STIPLED)
res0: String =
ââ  âââââ
â  ââââââ
ââ âââ ââ
âââ â âââ

scala> val focalNeighborhood = Square(1)  // a 3x3 square neighborhood
focalNeighborhood: geotrellis.raster.op.focal.Square =
 O  O  O
 O  O  O
 O  O  O

scala> val meanTile = iat.focalMean(focalNeighborhood)
meanTile: geotrellis.raster.Tile = DoubleArrayTile([D@7e31c125,9,4)

scala> meanTile.getDouble(0, 0)  // Should equal (1 + 7 + 9) / 3
res1: Double = 5.666666666666667

Documentation

Documentation is available at geotrellis.io/documentation.

Scaladocs for the the master branch are available here.

Further examples and documentation of GeoTrellis use-cases can be found in the docs/ folder.

Contributing

Feedback and contributions to the project, no matter what kind, are always very welcome. A CLA is required for contribution, see Contributing for more information. Please refer to the Scala style guide for formatting patches to the codebase.

Where is our commit history and contributor list prior to Nov 2016?

The entire old history is available in the _old/master branch.

Why?

In November 2016, GeoTrellis moved it's repository from the GeoTrellis GitHub Organization to it's current home in the LocationTech GitHub organization. In the process of moving our repository, we went through an IP review process. Because the Eclipse foundation only reviews a snapshot of the repository, and not all of history, we had to start from a clean master branch.

Unfortunately, we lost our commit and contributor count in the move. These are significant statistics for a repository, and our current counts make us look younger than we are. GeoTrellis has been an open source project since 2011. This is what our contributor and commit count looked like before the move to LocationTech:

Commit and contributor count before LocationTech move

Along with counts, we want to make sure that all the awesome people who contributed to GeoTrellis before the LocationTech move can still be credited on a contributors page. For posterity, I will leave the following contributors page to what it was before the move:

https://github.com/lossyrob/geotrellis-before-locationtech/graphs/contributors

Tie Local History to Old History

You can also tie your local clone's master history to the old history by running

> git fetch origin refs/replace/*:refs/replace/*

if origin points to https://github.com/locationtech/geotrellis. This will allow you to see the old history for commands like git log.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot