geotrellis
GeoTrellis is a geographic data processing engine for high performance applications.
Top Related Projects
Rasterio reads and writes geospatial raster datasets
Python tools for geographic data
GDAL is an open source MIT licensed translator library for raster and vector geospatial data formats.
Manipulation and analysis of geometric objects
QGIS is a free, open source, cross platform (lin/win/mac) geographical information system (GIS)
Quick Overview
GeoTrellis is an open-source, distributed geographic data processing engine for high-performance applications. It is designed to work with large-scale geospatial data, providing fast and efficient processing capabilities for raster and vector datasets. GeoTrellis is built on top of Apache Spark and is written in Scala.
Pros
- High-performance processing of large-scale geospatial data
- Seamless integration with Apache Spark for distributed computing
- Supports both raster and vector data operations
- Extensive set of geospatial operations and algorithms
Cons
- Steep learning curve, especially for those unfamiliar with Scala
- Limited documentation and examples for some advanced features
- Requires significant computational resources for large datasets
- Smaller community compared to some other geospatial libraries
Code Examples
- Reading a GeoTIFF file:
import geotrellis.raster.io.geotiff.reader.GeoTiffReader
val tiff = GeoTiffReader.readSingleband("path/to/file.tif")
- Performing a raster operation:
import geotrellis.raster._
val raster: Raster[Tile] = // ... load raster
val result = raster.mapTile(tile => tile.map(cell => cell * 2))
- Creating a vector feature:
import geotrellis.vector._
val point = Point(0, 0)
val feature = Feature(point, Map("name" -> "Example Point"))
- Reprojecting a raster:
import geotrellis.proj4._
val raster: Raster[Tile] = // ... load raster
val sourceCRS = CRS.fromEpsgCode(4326)
val targetCRS = CRS.fromEpsgCode(3857)
val reprojected = raster.reproject(sourceCRS, targetCRS)
Getting Started
To get started with GeoTrellis, add the following dependencies to your build.sbt
file:
libraryDependencies ++= Seq(
"org.locationtech.geotrellis" %% "geotrellis-raster" % "3.6.3",
"org.locationtech.geotrellis" %% "geotrellis-vector" % "3.6.3",
"org.locationtech.geotrellis" %% "geotrellis-spark" % "3.6.3"
)
Then, import the necessary modules in your Scala code:
import geotrellis.raster._
import geotrellis.vector._
import geotrellis.spark._
You can now start using GeoTrellis functions and classes in your project.
Competitor Comparisons
Rasterio reads and writes geospatial raster datasets
Pros of rasterio
- Written in Python, making it more accessible to data scientists and GIS professionals
- Simpler API and easier to get started with for basic raster operations
- Better integration with other Python libraries like NumPy and scikit-image
Cons of rasterio
- Less performant for large-scale distributed processing compared to GeoTrellis
- Limited support for vector data operations and advanced geospatial analytics
- Fewer built-in functionalities for complex geospatial workflows
Code Comparison
rasterio example:
import rasterio
with rasterio.open('example.tif') as src:
data = src.read()
profile = src.profile
GeoTrellis example:
import geotrellis.raster.io.geotiff.reader.GeoTiffReader
val tiff = GeoTiffReader.readSingleband("example.tif")
val tile = tiff.tile
val extent = tiff.extent
Both libraries provide methods for reading raster data, but rasterio's Python syntax is generally more concise and familiar to many data scientists. GeoTrellis, being Scala-based, offers strong typing and functional programming paradigms, which can be advantageous for complex, distributed processing tasks.
Python tools for geographic data
Pros of GeoPandas
- Easier to learn and use, especially for those familiar with pandas
- Better integration with the Python data science ecosystem
- More extensive documentation and community support
Cons of GeoPandas
- Less performant for large-scale geospatial data processing
- Limited support for distributed computing
- Fewer advanced geospatial analysis capabilities
Code Comparison
GeoPandas:
import geopandas as gpd
# Read a shapefile
gdf = gpd.read_file("data.shp")
# Perform a spatial join
result = gpd.sjoin(gdf1, gdf2, how="inner", op="intersects")
GeoTrellis:
import geotrellis.vector._
import geotrellis.vector.io._
// Read a shapefile
val features = ShapeFileReader.readMultiPolygonFeatures("data.shp")
// Perform a spatial join
val joined = features.spatialJoin(otherFeatures)
GeoTrellis offers more advanced geospatial processing capabilities and better performance for large datasets, while GeoPandas provides a more user-friendly interface and better integration with the Python ecosystem. The choice between the two depends on the specific requirements of the project, such as data size, processing complexity, and the preferred programming language.
GDAL is an open source MIT licensed translator library for raster and vector geospatial data formats.
Pros of GDAL
- Broader support for geospatial data formats and operations
- More mature and widely adopted in the geospatial community
- Extensive command-line utilities for data processing
Cons of GDAL
- Steeper learning curve, especially for non-GIS specialists
- Less integrated with big data processing frameworks
- Primarily C/C++ based, which may be less accessible for some developers
Code Comparison
GDAL (Python bindings):
from osgeo import gdal
dataset = gdal.Open("example.tif")
band = dataset.GetRasterBand(1)
data = band.ReadAsArray()
GeoTrellis (Scala):
import geotrellis.raster.io.geotiff.reader.GeoTiffReader
val tiff = GeoTiffReader.readSingleband("example.tif")
val tile = tiff.tile
GeoTrellis focuses on distributed processing of geospatial data using Scala and Apache Spark, making it well-suited for big data applications. It offers a more functional programming approach and integrates seamlessly with other Spark-based workflows.
GDAL, on the other hand, provides a comprehensive toolkit for working with geospatial data across various formats and coordinate systems. It's widely used in the GIS industry and offers bindings for multiple programming languages, making it versatile for different development environments.
Manipulation and analysis of geometric objects
Pros of Shapely
- Simpler API and easier to learn for beginners
- Lightweight and focused on geometric operations
- Better documentation and more examples available
Cons of Shapely
- Limited to 2D geometries
- Lacks advanced geospatial analysis capabilities
- No built-in support for distributed processing
Code Comparison
Shapely:
from shapely.geometry import Point, Polygon
point = Point(0, 0)
polygon = Polygon([(0, 0), (1, 0), (1, 1), (0, 1)])
is_within = point.within(polygon)
GeoTrellis:
import geotrellis.vector._
val point = Point(0, 0)
val polygon = Polygon(List((0, 0), (1, 0), (1, 1), (0, 1)))
val isWithin = polygon.contains(point)
Both libraries provide similar functionality for basic geometric operations, but GeoTrellis offers more advanced features for large-scale geospatial data processing. Shapely is more accessible for Python developers and simpler projects, while GeoTrellis is better suited for complex, distributed geospatial applications in Scala.
QGIS is a free, open source, cross platform (lin/win/mac) geographical information system (GIS)
Pros of QGIS
- Comprehensive GUI for geospatial data visualization and analysis
- Extensive plugin ecosystem for additional functionality
- Supports a wide range of geospatial data formats and operations
Cons of QGIS
- Steeper learning curve for non-GIS professionals
- Performance can be slower for large datasets compared to GeoTrellis
- Less suitable for distributed processing of big geospatial data
Code Comparison
QGIS (Python):
layer = QgsVectorLayer("path/to/shapefile.shp", "layer_name", "ogr")
if not layer.isValid():
print("Layer failed to load!")
QgsProject.instance().addMapLayer(layer)
GeoTrellis (Scala):
val rdd: RDD[(ProjectedExtent, Tile)] = S3GeoTiffRDD.spatial("s3://bucket/key")
val (zoom, tiled) = TileLayerMetadata.fromRdd(rdd, FloatingLayoutScheme(512))
tiled.cache()
GeoTrellis focuses on distributed processing of geospatial data using Scala and Apache Spark, making it more suitable for big data applications. QGIS, on the other hand, provides a user-friendly interface for various GIS tasks and is more accessible to users without programming experience. The choice between the two depends on the specific use case and technical requirements of the project.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
GeoTrellis
GeoTrellis is a Scala library and framework that provides APIs for reading, writing and operating on geospatial raster and vector data. GeoTrellis also provides helpers for these same operations in Spark and for performing MapAlgebra operations on rasters. It is released under the Apache 2 License.
Please visit the project site for more information as well as some interactive demos.
You're also welcome to ask questions and talk to developers (let us know what you're working on!) via Gitter.
Getting Started
GeoTrellis is currently available for Scala 2.12 and 2.13, using Spark 3.3.x.
To get started with SBT, simply add the following to your build.sbt file:
libraryDependencies += "org.locationtech.geotrellis" %% "geotrellis-raster" % "<latest version>"
To grab the latest SNAPSHOT
, RC
or milestone build, add these resolvers:
// maven central snapshots
resolvers ++= Seq(
"central-snapshots" at "https://central.sonatype.com/repository/maven-snapshots/"
)
// or eclipse snapshots
resolvers ++= Seq(
"eclipse-releases" at "https://repo.eclipse.org/content/groups/releases",
"eclipse-snapshots" at "https://repo.eclipse.org/content/groups/snapshots"
)
If you are just getting started with GeoTrellis, we recommend familiarizing yourself with the
geotrellis-raster
package, but it is just one of the many available. The complete list
of published GeoTrellis packages includes:
geotrellis-accumulo
: Accumulo store integration for GeoTrellisgeotrellis-accumulo-spark
: Accumulo store integration for GeoTrellis + Sparkgeotrellis-cassandra
: Cassandra store integration for GeoTrellisgeotrellis-cassandra-spark
: Cassandra store integration for GeoTrellis + Sparkgeotrellis-gdal
: GDAL bindings for GeoTrellisgeotrellis-geotools
: Conversions to and from GeoTools Vector and Raster datageotrellis-hbase
: HBase store integration for GeoTrellisgeotrellis-hbase-spark
: HBase store integration for GeoTrellis + Sparkgeotrellis-layer
: Datatypes to describe sets of rastersgeotrellis-macros
: Performance optimizations for GeoTrellis operationsgeotrellis-proj4
: Coordinate Reference systems and reproject (Scala wrapper around Proj4j)geotrellis-raster
: Raster data types and operations, including MapAlgebrageotrellis-raster-testkit
: Testkit for testinggeotrellis-raster
typesgeotrellis-s3
: Amazon S3 store integration for GeoTrellisgeotrellis-s3-spark
: Amazon S3 store integration for GeoTrellis + Sparkgeotrellis-shapefile
: Read ESRI Shapefiles into GeoTrellis data types via GeoToolsgeotrellis-spark
: Geospatially enables Spark and provides primitives for external data storesgeotrellis-spark-pipeline
: DSL for geospatial ingest jobs using GeoTrellis + Sparkgeotrellis-spark-testkit
: Testkit for testinggeotrellis-spark
codegeotrellis-store
: Abstract interfaces for storage services, with concrete implementations for local and Hadoop filesystemsgeotrellis-util
: Miscellaneous GeoTrellis helpersgeotrellis-vector
: Vector data types and operations extending JTSgeotrellis-vector-testkit
: Testkit for testinggeotrellis-vector
typesgeotrellis-vectortile
: Experimental vector tile support, including reading and writing
A more complete feature list can be found on the Module Hierarchy page of the GeoTrellis documentation. If you're looking for a specific feature or operation, we suggest searching there or reaching out on Gitter.
For older releases, check the complete list of packages and versions available at locationtech-releases.
Hello Raster
scala> import geotrellis.raster._
import geotrellis.raster._
scala> import geotrellis.raster.render.ascii._
import geotrellis.raster.render.ascii._
scala> import geotrellis.raster.mapalgebra.focal._
import geotrellis.raster.mapalgebra.focal._
scala> val nd = NODATA
nd: Int = -2147483648
scala> val input = Array[Int](
nd, 7, 1, 1, 3, 5, 9, 8, 2,
9, 1, 1, 2, 2, 2, 4, 3, 5,
3, 8, 1, 3, 3, 3, 1, 2, 2,
2, 4, 7, 1, nd, 1, 8, 4, 3)
input: Array[Int] = Array(-2147483648, 7, 1, 1, 3, 5, 9, 8, 2, 9, 1, 1, 2,
2, 2, 4, 3, 5, 3, 8, 1, 3, 3, 3, 1, 2, 2, 2, 4, 7, 1, -2147483648, 1, 8, 4, 3)
scala> val iat = IntArrayTile(input, 9, 4) // 9 and 4 here specify columns and rows
iat: geotrellis.raster.IntArrayTile = IntArrayTile([I@278434d0,9,4)
// The renderAscii method is mostly useful when you're working with small tiles
// which can be taken in at a glance.
scala> iat.renderAscii(AsciiArtEncoder.Palette.STIPLED)
res0: String =
ââ âââââ
â ââââââ
ââ âââ ââ
âââ â âââ
scala> val focalNeighborhood = Square(1) // a 3x3 square neighborhood
focalNeighborhood: geotrellis.raster.op.focal.Square =
O O O
O O O
O O O
scala> val meanTile = iat.focalMean(focalNeighborhood)
meanTile: geotrellis.raster.Tile = DoubleArrayTile([D@7e31c125,9,4)
scala> meanTile.getDouble(0, 0) // Should equal (1 + 7 + 9) / 3
res1: Double = 5.666666666666667
Documentation
Documentation is available at geotrellis.io/documentation.
Scaladocs for the the master
branch are
available here.
Further examples and documentation of GeoTrellis use-cases can be found in the docs/ folder.
Contributing
Feedback and contributions to the project, no matter what kind, are always very welcome. A CLA is required for contribution, see Contributing for more information. Please refer to the Scala style guide for formatting patches to the codebase.
Where is our commit history and contributor list prior to Nov 2016?
The entire old history is available in the _old/master
branch.
Why?
In November 2016, GeoTrellis moved it's repository from the
GeoTrellis GitHub Organization to it's current
home in the LocationTech GitHub organization.
In the process of moving our repository, we went through an IP review process.
Because the Eclipse foundation only reviews a snapshot of the repository, and
not all of history, we had to start from a clean master
branch.
Unfortunately, we lost our commit and contributor count in the move. These are significant statistics for a repository, and our current counts make us look younger than we are. GeoTrellis has been an open source project since 2011. This is what our contributor and commit count looked like before the move to LocationTech:
Along with counts, we want to make sure that all the awesome people who contributed to GeoTrellis before the LocationTech move can still be credited on a contributors page. For posterity, I will leave the following contributors page to what it was before the move:
https://github.com/lossyrob/geotrellis-before-locationtech/graphs/contributors
Tie Local History to Old History
You can also tie your local clone's master history to the old history by running
> git fetch origin refs/replace/*:refs/replace/*
if origin
points to https://github.com/locationtech/geotrellis.
This will allow you to see the old history for commands like git log
.
Top Related Projects
Rasterio reads and writes geospatial raster datasets
Python tools for geographic data
GDAL is an open source MIT licensed translator library for raster and vector geospatial data formats.
Manipulation and analysis of geometric objects
QGIS is a free, open source, cross platform (lin/win/mac) geographical information system (GIS)
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot