Producing GeoPackages with massivegeopackage: Difference between revisions

From Maria GDK Wiki
Jump to navigation Jump to search
()
()
Line 34: Line 34:
<pre>python -m pip show massivegeopackage</pre>
<pre>python -m pip show massivegeopackage</pre>


=== Arguments ===
=== Common arguments ===
The following arguments are common for both raster and elevation modes.
The following arguments are common for both raster and elevation modes.
{| class="wikitable"
{| class="wikitable"
Line 75: Line 75:
|}
|}


==== Raster mode ====
=== Raster mode ===
Input files should be homogenous (same projection, dimensions, bands, pixel size).
Input files should be homogenous (same projection, dimensions, bands, pixel size).


Line 108: Line 108:
|}
|}


=== elevation ===
=== Elevation mode ===
Input files should be homogenous (same projection, dimensions, bands, pixel size).
Input files should be homogenous (same projection, dimensions, bands, pixel size).



Revision as of 20:40, 3 October 2022

Massivegeopackage is a Python package for building a GeoPackage from very large raster or elevation datasets.

The core GDAL tools gdalwarp, gdal_translate and gdaladdo seem to struggle with mosaicing very large raster datasets. When the size of the source dataset is larger than 15-20 GB, processing time seems to increase drastically.

This Python package will convert each source file to individual GeoPackages with a common tile matrix. This is done in a user configurable number of parallel processes. Completed files will immediately be queued to be merged into a base GeoPackage. The merging is very efficient because it consists of data copy - no geoprocessing.

Dependencies

  • numpy
  • Pillow
  • GDAL >= 3.1.4

The easiest way to get a Python environment where these dependencies are met, is to install QGIS 3.16 or newer. Included is the OSGeo4W shell, where you can install and use the package.

Installation

The package is distributed as a Python wheel-file. Install using pip:

python -m pip install c:\path\massivegeopackage-1.3.0-py3-none-any.whl

Usage

From version 1.4.0, input arguments can be supplied on the command line. Run the module with --help to list all arguments. The first argument should be raster or elevation depending on your source data.

Examples:

python -m massivegeopackage --help

python -m massivegeopackage raster --srcfolder c:\data\geotiff --targetfolder c:\data\gpkg_output

python -m massivegeopackage elevation --srcfolder c:\data\50_dtm --areapath c:\norway\counties --targetfolder c:\data\gpkg_output --targetdatatype 32

For version information, use this command:

python -m pip show massivegeopackage

Common arguments

The following arguments are common for both raster and elevation modes.

Argument Description
{raster,elevation} Mode for output data. Use raster for 8 bit output. Use elevation for 16 or 32 bit output.
-s, --srcfolder Full path to a folder containing source raster files.
-a, --areapath Full path to a vector dataset. Each layer in the dataset should contain a single polygon feature. One clipped GeoPackge will be created for each layer.
-al --arealayer Name of a single layer from the vector area dataset.
-s_epsg, --source_epsg EPSG code of source if this information is not embedded in the input dataset.
-t, --targetfolder Empty folder for temporary files, logs and the completed GeoPackage.
-ft, --input_file_type File extension on input files. Default is tif
-resamp, --resampling Resampling method to use when creating overviews
-p, --num_processes Number of parallel processes to use. Most common storage devices will become a bottleneck with 20 or more parallel processes.
-r, --recursive Search for source files in subfolders
--name Name of the output GeoPackage file.
--safe Use SQLite WAL mode when merging files. When processing data on file systems which are prone to interruptions (e.g. network file systems), the script will roll back and retry a failed merge. This will increase the total processing time.
--debug Log debug messages in addition to info
--nocleanup Do not clean up temporary GeoPackage files

Raster mode

Input files should be homogenous (same projection, dimensions, bands, pixel size).

Supported input:

Format Band configuration
Any raster format supported by GDAL 1 band color index (8 bit)
1 band grayscale (8 bit)
3 band RGB (24 bit)
4 band RGBA (32 bit)

Raster specific arguments:

Argument Description
-nd, --nodata Pixel value to make transparent in the completed GeoPackage. Example: 0 0 0
-co, --creation_options Comma-separated list of creation options for the GDAL GeoPackage driver. Example: tile_format=jpeg,quality=50
-scl, --scalefactor Factor to use when it is neccessary to scale down to 8 bit. The range of values to use from each source band is calculated as (mean ± stddev × scalefactor).

Elevation mode

Input files should be homogenous (same projection, dimensions, bands, pixel size).

If the input files are Float32 and target is set to 16 bit, a scale and offset will be computed for each tile. These are applied to each pixel and then rounded to the nearest integer. This stretches the tile's value range to utilize the full range of a 16 bit unsigned integer (0-65534). The scale and offset is reversed when an application reads from the output file. This way, an effective precision of around 0.01 - 0.001 meters is achieved (less varied source data results in higher precision).

Supported input and corresponding output:

Format Input configuration Output
Any raster format supported by GDAL 1 band (Int16) 1 band (UInt16)
1 band (UInt16) 1 band (UInt16)
1 band (Float32) 1 band (UInt16 or Float32)

Elevation specific arguments:

Parameter Description
-dt,--targetdatatype {16,32} Datatype for output GeoPackage - 32 or 16 bits. If source is Float32 and target is set to 16 bit, the value range in each tile will be streched to utilize the entire space that a 16 bit unsigned integer provides (0-65535). A scale factor and offset will be computed, which will be applied when reading from the file output file. This way, an effective precision of around 0.01 - 0.001 meters is achieved (less varied source data results in higher precision).

Known issues

Network file systems

Using a network folder as output can sometimes result in errors, especially "Disk I/O Error". Massivegeopackage needs to do a lot of reading and writing to SQLite files, and will consequently be vulnerable to these errors. From the web page of the SQLite project:

SQLite depends on the underlying filesystem to do locking as the documentation says it will. But some filesystems contain bugs in their locking logic such that the locks do not always behave as advertised. This is especially true of network filesystems and NFS in particular. If SQLite is used on a filesystem where the locking primitives contain bugs, and if two or more threads or processes try to access the same database at the same time, then database corruption might result.

This can be mitigated by using the --safe argument, but it is no guarantee against errors. Processing data on a (preferably fast) local disk is recommended.

Script freezes

When running the script in a command window and execution appears to stop for no apparent reason, this can be caused by the command window "Quick Edit" feature. If text in the command window is selected with the left mouse button, the execution will pause. Right-clicking inside the command window will resume the script.