Producing GeoPackages with massivegeopackage: Difference between revisions

From Maria GDK Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 2: Line 2:
'''Massivegeopackage''' is a Python package for building a GeoPackage from very large raster or elevation datasets.  
'''Massivegeopackage''' is a Python package for building a GeoPackage from very large raster or elevation datasets.  


The core GDAL tools <code>gdalwarp</code>, <code>gdal_translate</code> and <code>gdaladdo</code> seem to struggle with mosaicing very large raster datasets. When the size of the source dataset is larger than 15-20 GB, processing time seems to increase drastically.  
The core GDAL tools '''''gdalwarp''''', '''''gdal_translate''''' and '''''gdaladdo''''' seem to struggle with mosaicing very large raster datasets. When the size of the source dataset is larger than 15-20 GB, processing time seems to increase drastically.  


This Python package will convert each source file to individual GeoPackages with a common tile matrix. This is done in a user configurable number of parallel processes. Completed files will immediately be queued to be merged into a base GeoPackage. The merging is very efficient because it consists of SQL <code>INSERT</code> statements only - no geoprocessing.
This Python package will convert each source file to individual GeoPackages with a common tile matrix. This is done in a user configurable number of parallel processes. Completed files will immediately be queued to be merged into a base GeoPackage. The merging is very efficient because it consists of SQL <code>INSERT</code> statements only - no geoprocessing.
Line 14: Line 14:


== Installation ==
== Installation ==
The package is distributed as a Python wheel-file. Install it using pip:
The package is distributed as a Python wheel-file. Install using pip:


<pre>python -m pip install c:\path\massivegeopackage-1.3.0-py3-none-any.whl</pre>
<pre>python -m pip install c:\path\massivegeopackage-1.3.0-py3-none-any.whl</pre>
== General usage ==
== Usage ==
The package consists of two separate modules: <code>raster</code> and <code>elevation</code>. After installation, the modules can be run with the commands:
From version 1.4.0, input arguments can be supplied on the command line. Run the module with `--help` to list all arguments. The first argument should be <pre>raster</pre> or <pre>elevation</pre> depending on your source data.


<pre>python -m massivegeopackage.raster
Examples:
 
python -m massivegeopackage.elevation</pre>
 
For version information, use either of these commands:


<pre>
<pre>
python -m pip show massivegeopackage
python -m massivegeopackage --help
 
python -m massivegeopackage
</pre>
From version 1.4.0, parameters can be supplied on the command line. Run either of the modules with <code>--help</code> to list all parameters. If no parameters are given at the command line, the program will use a series of input prompts instead.


Examples:
python -m massivegeopackage raster --srcfolder c:\data\geotiff --targetfolder c:\data\gpkg_output
<pre>
python -m massivegeopackage.raster -srcfolder c:\data\geotiff -targetfolder c:\data\gpkg_output -co tile_format=png8 -recursive


python -m massivegeopackage.elevation -srcfolder c:\data\50_dtm -areapath c:\norway\counties -targetfolder c:\data\gpkg_output -targetdatatype 32
python -m massivegeopackage elevation --srcfolder c:\data\50_dtm --areapath c:\norway\counties --targetfolder c:\data\gpkg_output --targetdatatype 32
</pre>
</pre>


If the package cannot be installed for whatever reason, it is also possible to unzip the .whl file, and then run the script files directly:
For version information, use this command:


<pre>
<pre>python -m pip show massivegeopackage</pre>
python c:\massivegeopackage-1.3.0-py3-none-any\massivegeopackage\raster.py
 
python c:\massivegeopackage-1.3.0-py3-none-any\massivegeopackage\elevation.py
</pre>


== massivegeopackage.raster ==
== raster ==
Input files should be homogenous (same projection, dimensions, bands, pixel size).
Input files should be homogenous (same projection, dimensions, bands, pixel size).


Line 93: Line 79:
|}
|}


== massivegeopackage.elevation ==
== elevation ==
Input files should be homogenous (same projection, dimensions, bands, pixel size).
Input files should be homogenous (same projection, dimensions, bands, pixel size).



Revision as of 21:01, 15 March 2022

Massivegeopackage is a Python package for building a GeoPackage from very large raster or elevation datasets.

The core GDAL tools gdalwarp, gdal_translate and gdaladdo seem to struggle with mosaicing very large raster datasets. When the size of the source dataset is larger than 15-20 GB, processing time seems to increase drastically.

This Python package will convert each source file to individual GeoPackages with a common tile matrix. This is done in a user configurable number of parallel processes. Completed files will immediately be queued to be merged into a base GeoPackage. The merging is very efficient because it consists of SQL INSERT statements only - no geoprocessing.

Dependencies

  • numpy
  • Pillow
  • GDAL >= 3.1.4

The easiest way to get a Python environment where these dependencies are met, is to install QGIS 3.16 or newer. Included is the OSGeo4W shell, where you can install and use the package.

Installation

The package is distributed as a Python wheel-file. Install using pip:

python -m pip install c:\path\massivegeopackage-1.3.0-py3-none-any.whl

Usage

From version 1.4.0, input arguments can be supplied on the command line. Run the module with `--help` to list all arguments. The first argument should be

raster

or

elevation

depending on your source data.

Examples:

python -m massivegeopackage --help

python -m massivegeopackage raster --srcfolder c:\data\geotiff --targetfolder c:\data\gpkg_output

python -m massivegeopackage elevation --srcfolder c:\data\50_dtm --areapath c:\norway\counties --targetfolder c:\data\gpkg_output --targetdatatype 32

For version information, use this command:

python -m pip show massivegeopackage

raster

Input files should be homogenous (same projection, dimensions, bands, pixel size).

Supported input:

Format Band configuration
Any raster format supported by GDAL 1 band color index (8 bit)
1 band grayscale (8 bit)
3 band RGB (24 bit)
4 band RGBA (32 bit)

Parameters:

Parameter Required? Description
srcfolder Yes Full path to a folder containing source raster files.
areapath No Full path to a vector dataset. Each layer in the dataset should contain a single polygon feature. One clipped GeoPackge will be created for each layer.
targetfolder Yes Empty folder for temporary files, logs and the completed GeoPackage.
input_file_type No File extension on input files. Default is tif
nodata No Pixel value to make transparent in the completed GeoPackage. Example: 0 0 0
co No Creation options for the GDAL GeoPackage driver. Example: tile_format=jpeg,quality=50
num_processes No Number of parallel processes to use. Most common storage devices will become a bottleneck with 20 or more parallel processes.
recursive No Search for source files in subfolders
debug No Log debug messages in addition to info
nocleanup No Do not clean up temporary GeoPackage files

elevation

Input files should be homogenous (same projection, dimensions, bands, pixel size).

If the input files are Float32 and target is set to 16 bit, a scale and offset will be computed for each tile. These are applied to each pixel and then rounded to the nearest integer. This stretches the tile's value range to utilize the full range of a 16 bit unsigned integer (0-65534). The scale and offset is reversed when an application reads from the output file. This way, an effective precision of around 0.01 - 0.001 meters is achieved (less varied source data results in higher precision).

Supported input and corresponding output:

Format Input configuration Output
Any raster format supported by GDAL 1 band (Int16) 1 band (UInt16)
1 band (UInt16) 1 band (UInt16)
1 band (Float32) 1 band (UInt16 or Float32)

Parameters:

Parameter Required? Description
srcfolder Yes Full path to a folder containing source raster files.
areapath No Full path to a vector dataset. Each layer in the dataset should contain a single polygon feature. One clipped GeoPackge will be created for each layer.
targetfolder Yes Empty folder for temporary files, logs and the completed GeoPackage.
input_file_type No File extension on input files. Default is tif
targetdatatype No Datatype for output GeoPackage - 32 or 16 bits.
num_processes No Number of parallel processes to use. Most common storage devices will become a bottleneck with 20 or more parallel processes.
recursive No Search for source files in subfolders
debug No Log debug messages in addition to info
nocleanup No Do not clean up temporary GeoPackage files