Geolocation: Difference between revisions

Latest revision as of 12:23, 10 August 2023

The Maria GDK Geolocation service allows fast, faceted freetext searches for partial or full placenames. General placename searches are supported, as well as street adress search.

A separate conversion step is required for converting from source data to a specialized SQLite database. Converters exist for GNS and Geonames. For simple (ie csv based formats) writing new converters is relatively simple.

Converting placename data

When converting a file containing placename information to a Maria GDK location database, you need to use the LocationServiceSqliteLoader conversion tool with a reader for that specific fileformat. See section on #Creating readers for location data for details.

A special usecase is if you have created an SQLite file with the proper data tables in another application (for instance FME). In this case you only need RTree and FTS tables. The tool will do this for you if you use the sqlite file as input, and the input format geoloc.

Readers are executed from the command line. Use options -?, --? or -help to display which parameters are available. Usage: <input file> <input format> <sqlite database file> [/clear] [/nogrid]

Order	Parameter	Description
1	`<input file>`	Path to file containing placename information. Must be in a format supported by the LocationServiceSqliteLoader.
2	`<input format>`	Input format string. Currently supported: `ssr`, `geonames`, `gns`, `gns_us` and `matrikkel`, `geoloc`.
3	`<sqlite database>`	Path to output sqlite file.
4	`[/clear]`	Optional argument. If output file exists placename data will be added to the file. Use `/clear` to force a fresh database.
5	`[/nogrid]`	Optional argument. If set - do not create density grid file based on the output data to help with performance. Otherwise density grid file will be created.

Example:

TPG.GeoFramework.LocationServiceSqliteLoader.exe c:\LocationData\GNS\no.txt gns c:\ServiceTest\LocationData\gns_no.location.sqlite /clear

When adding to an existing database, tables might end up with duplicate entries.

Support data

The different readers uses a small number of csv/txt-files to map information in the sqlite databases (F.ex. mapping from country codes (NO) to country names (Norway)). If these files are not present when converting a database, an error is given but converting the database will still work in some cases. The files should be placed in the same folder as the source datafiles.

Default data files can be found in the Maria GDK source repo at \Src\Layers\Location\TPG.GeoFramework.GeolocSqliteLoader\SupportData

Feature classes and codes

Dsg-files should map feature codes and feature classes to descriptive texts.

File should be comma separated strings with format code,name,text,fea_class. F.ex: 'E,"mosque","a building for public Islamic worship","S - Spot Features",'. The three first entries are feature code, feature name and feature code description. The last entry (fea_class) is a combination of feature class code and feature class text, separated with '-'.

Example:

CODE,NAME,TEXT,FEA_CLASS,
"MND","mound(s)","a low, isolated, rounded hill","T - Hypsographic",
"MNDU","undersea mound","a low, isolated, rounded hill","U - Undersea",
"MNFE","iron mine(s)","a mine where iron ore is extracted","S - Spot Features",
"MNMT","monument","a commemorative structure or statue","S - Spot Features",
"MNQ","abandoned mine","abandoned mine","S - Spot Features",

Used by:

Reader	Filename
gns	dsg.csv
geonames	dsg.csv

Country codes

cc-files should map country codes to country names.
File should be comma separated strings with format code,name, f.ex. 'ZI,"Zimbabwe",'.

Example:

CODE,NAME,
UV,"Burkina Faso",
CM,"Cameroon",
CJ,"Cayman Islands",
IP,"Clipperton Island",
CG,"Congo, Democratic Republic of the",
CY,"Cyprus",
DR,"Dominican Republic",

Used by:

Reader	Filename
gns	cc.csv
gns_us	cc.csv
geonames	cc.csv
ssr	cc.csv

Administrative data

Adm-files should map administrative codes to (numbers and/or letters) to descriptive text.
Use Adm1 for "Administrative division level 1, US state, Norwegian fylke", Adm2 for "Administrative division level 2, US county, Norwegian kommune" and Adm3 for "Administrative division level 3, US ?, Norwegian poststed".
Files should be comma separated strings with format code,name. F.ex: '1662,"Klæbu kommune",'.

Example:

CODE,NAME,
1662,"KlÃ¦bu kommune",
0604,"Kongsberg kommune",
0402,"Kongsvinger kommune",
0815,"KragerÃ¸ kommune",
1001,"Kristiansand kommune",

Used by:

Reader	Filename
gns	See Adm data for gns reader
matrikkel	matrikkel_adm.csv

Adm data for gns reader

File should be comma separated strings with format adm1_cd,country_cd,adm1_name. F.ex: '"04","NO","Buskerud",'.

Example:

ADM1,COUNTRY_CD,ADM1_NAME,
"04","QA","Al Khawr wa adh DhakhÄ«rah",
"34","WA","Okavango",
"35","AF","LaghmÄn",
"35","AG","AÃ¯n Defla",
"35","BF","San Salvador",

Used by:

Reader	Filename
gns	gns_adm1.csv

Navnetyper

Navnetyper.txt should map feature classes and feature codes to descriptive texts.\

File should be comma separated strings with format classcode,classtext,code,codetext. F.ex: 'FC1,"Terrengformer",N1,"Berg","Mindre fjell"'.

Example:

FC6,"Samferdsel",N165,"Landingsstripe","Landingsplass for privat flygning"
FC6,"Samferdsel",N232,"Plass/torg","I tettsted eller by"
FC7,"Administrative omrÃ¥der",N180,"Nasjon","Selvstendig stat / land (offisielt navn)"
FC7,"Administrative omrÃ¥der",N181,"Fylke","Offisielt navn"
FC7,"Administrative omrÃ¥der",N182,"Kommune","Offisielt navn"

Used by:

Reader	Filename
ssr	navnetyper.txt

Creating readers for location data

A Maria2012 GeoLoc placename data reader must be able to convert files with placename information to a Maria2012 readable sqlite databases. Teleplan Globe provides readers for f.ex. GNS, GeoNames and SSR.

Readers should implement interface IPlaceNameDataInterfacer.

Each reader must implement functions CreateTables and LoadData. CreateTables is responsible for creating tables used by the GeoLoc service when running placename searches. Mandatory tables are Rtree for spatial searches, FTS for free text search and main table for available placename information. Tables also utilised by the GeoLoc service when available are: feature class, feature code, country code and metadata. LoadData reads data from sourcefiles into the database tables.

IPlaceNameDB is available for database creation helper functions.

Mandatory tables

Main table

The main table placenames_main should contain all placename information extracted from a datasource. Use the placename_alt column for alternative versions of the main placename data if available. Country code, feature code/class and administrative information columns are used for facets and metadata searches.

Columns	Description	Type
lat	real	latitude wgs84 decimal degrees
long	real
feature_class	text	feature class based on raw data, ex 'H' for hydrography for GNS or 'Samferdsel' for SSR
feature_code	text	feature code, unique code
cc1	text	Primary country code
cc2	text	Secondary country codes, comma separated
placename	text	Placename (reading order with diacritics)
placename_alt	text	Alternate spellings of placename, comma separated
adm1	text	Administrative division level 1, US state, Norwegian fylke
adm2	text	Administrative division level 2, US county, Norwegian kommune
adm3	text	Administrative division level 3, US ?, Norwegian poststed

FTS table

The Maria GDK location service uses the FTS4 extension in sqlite to create a table (placenames_fts) with a built-in full-text index. This index allows us to efficiently query the database for all rows that contain one or more words/tokens.
All strings meant to be searchable must be added to the FTS table. Using tagged metadata will ensure more exact searches F.ex. searching for <placename> cc:NO will return placenames with related country code metadata "NO", while <placename> NO will match all metadata containing "NO".

Columns	Description	Type
names	searchable placenames	text
meta_tagged	searchable tagged metadata	text
meta_untagged	searchable untagged metadata	text

RTree table

The RTree table (spatial_index) is a mandatory table used for spatial searches. The Maria GDK location database rtree table should have a primary key and two column pairs representing the minimum and maximum values (bounding box) for a 2-dimensional object.

Columns	Description	Type
id	primary key	integer
minlat	minimum latitude	float
maxlat	maximum latitude	float
minlong	minimum longitude	float
maxlong	maximum longitude	float

Optional tables

Feature class table

Optional table (fclass) containing feature class/theme information extracted from datasource.
Example feature classes can be 'H' for hydrography features in GNS or 'Samferdsel' for Norwegian SSR data.

Columns	Description	Type
code	feature class	text
name	feature name	text
desc	description of feature class entry	text

Feature code table

Optional table (fcode) containing feature code/object type information extracted from datasource.
Example features can be "lake' in GNS.

Columns	Description	Type
code	feature code	text
name	feature name	text
desc	description of feature code entry	text

Country code table

Optional table (cc) listing all represented country codes found in the datasource, f.ex. "NO"/"Norway".

Columns	Description	Type
code	feature code (primary key)	char(2)
name	feature name	text

Administration code tables

Optional tables (admin1, admin2, admin3) used for collecting administrative information f.ex. state, county, fylke etc.

Columns	Description	Type
code	administration code	text
name	administration name	text

Metadata table

Optional table (metadata) used for additional data (f.ex. data source producer, source dataset etc.) collected from datasource.

Columns	Description	Type
name	metadata name	text
value	metadata value	text
desc	description of metadata entry	text

Service

The geolocation service is a WCF-service that provides functionality for placename searches via a sqlite/FTS database.

The following service interfaces are provided:

Name	Description
ILocationService	Placename search

WCF basic http binding is used by default.

Locating databases

The Geolocation service searches for available placename databases in the default folder provided at installation. Additional datasources can be set up in the settings file.

Supported readers

Under construction.

Following formats are supported with readers for creating Geolocation databases: ssr, geonames, gns, gns_us, matrikkel, ssr_sosi and geoloc

Geolocation reader (geoloc)

The Geolocation reader (under developement) takes an incomplete geolocation database and adds the support tables necessary for it to become a complete Maria GDK Geolocation database.

Input should be a sqlite database with minimum one table as described above (placename_main). Admin1, Admin2 and Admin3 support tables will be constructed if inputdatabase contains adm1, adm2 and/or adm3-info. Other optional support tables (feature code, feature class, country codes and metadata) are not included in the first version of Geolocation reader.

Performing placename searches

Searches can include multiple terms separated with comma or space. All terms must be found to make a match in the database. Wildcard-character '*' are permitted, but only at the end of a string/substring. The service will automatically try to add '*' to the end of submitted search-strings, but not if the string already contains a wildcard character.

Example: 
Say the database contains the placenames Bred Sund, Sunde, Sundnes and Brandal på Sunnmøre. 

Searching for "sun" will be expanded by the service to "sun*" and return 4 matches. 
Searching for "sund" will be expanded to "sund*" and return 3 matches (Bred Sund, Sunde and Sundnes). 
Searching for "sun* br*" will return 2 matches (Bred Sund and Brandal pÃ¥ SunnmÃ¸re). 
Searching for "sun* br" will NOT be expanded by the service since it already contains a wildcard, and return 0 matches.

The location service will look for matches in the FTS table contained in the sqlite database. This table contains placenames as well as tagged and untagged metadata (f.ex. country codes). Using tags will allow more specific searces, f.ex. searching for all norwegian schools with placenames starting with "sund" can be done like this:

sund cc:NO fclass:school

Returned placename matches are ranked according to distance from center position in map. If searching without wildcards, all excact matches are returned first (ranked by distance), then all non-excact matches (also ranked by distance). If searching with wildcards, results are ranked by distance.

Example: 
The database contains placenames Sandelva, Sande Skole, Sande, Sanden and Sande Stadion (listed by distance from center, that is Sande is closest to center and Sande Station furthest away).

Searching for "Sande" will return Sande Skole, Sande, Sande Stadion, Sandelva, Sanden.
Searching for "Sande*" will return Sandelva, Sande Skole, Sande, Sanden and Sande Stadion.

Setting up placename queries

PlacenameQuery data contract specifies the query parameters needed when performing a placename search.

Example query:

var query = new PlaceNameQuery
{
  AutoAddLocationWildcard = true,
  CenterPos = new GeoPos(60.0, 10.0),
  ExtractFacets = true,
  MaxInternalHits = 10000,
  MaxReturnHits = 10,
  NameSearchString = "sand"
};

Searching for "Sand" with AutoWildcard set to true, will f.x. return both "Sand" and "Sandvika", while AutoWildcard set to false will require an exact match and only return "Sand". The string must be at least 2 characters long and not already contain wildcards for AutoAddLocationWildcard to work. Wildcard-characters inside a search substring are not allowed.

CenterPos influences how the search results are ranked. If exact matches are found, results will be sorted in order of shortest distance from centerpos.

If true, facets are extracted from the search results. Use the Facets-parameter to control which facets should be extracted.

The NameSearchString is the primary placename text search string. Terms separated by spaces and commas must all be found. Wildcard '*' is allowed. Use metadata to narrow down the search results by adding tagged values on the form tag:value or tag=value (:norway or cc:no or cc=no). Use "-" to exclude terms from the search.

Use ResultOffset to limit the number of results returned from a query and enable paging of results.

F.x. MaxReturnHits set to 10 will return maximum 10 results from a query. If total hits found on server is 25, and ResultOffset is set to 10, the query will return entries 11-20. If ResultOffset is set to 20, query will return only the last five entries (21-25).

There is a max limit on returned searches set to 20000. If a search exceeds this limit, no matches are returned.

Note: If MaxReturnHits is undefined, no matches are returned.

Using facets

A search query will return a list of facets related to the search query results. Facets are used to narrow down the search results and perform more specific searches, f.ex. only display results related to Administrative Regions in Norway.

It is the reader for the input-data who decides how to map data to facets. Facet groups with single entries are ignored.

There are six categories of facets available:

Category	Description
Feature Class	Feature class based on raw data, ex 'H' for hydrography for GNS or 'Samferdsel' for SSR.
Feature Code	Feature code, unique code.
CC/Country	Primary country code.
Adm1	Administrative division level 1, US state, Norwegian fylke.
Adm2	Administrative division level 2, US county, Norwegian kommune.
Adm3	Administrative division level 3, US ?, Norwegian poststed

@@ Line 5: / Line 5: @@
 == Converting placename data ==
-When converting a file containing placename information to a Maria GDK location database, you need a reader for that specific fileformat. See section on [[#Creating readers for location data]] for details.
+When converting a file containing placename information to a Maria GDK location database, you need to use the LocationServiceSqliteLoader conversion tool with a reader for that specific fileformat. See section on [[#Creating readers for location data]] for details.
-Readers are executed from the command line. Usage: <code>&lt;input file&gt;</code> <code>&lt;input format&gt;</code> <code>&lt;sqlite database file&gt;</code> <code>[/clear]</code>
+A special usecase is if you have created an SQLite file with the proper data tables in another application (for instance FME). In this case you only need RTree and FTS tables. The tool will do this for you if you use the sqlite file as input, and the input format <code>geoloc</code>.
-A special usecase is if you have created and SQLite file with the proper data tables in another application (for instance FME). In this case you only need RTree and FTS tables. The tool will do this for you if you use the sqlite file as input, and the input format <code>geoloc</code>.
+Readers are executed from the command line. Use options -?, --? or -help to display which parameters are available. Usage: <code>&lt;input file&gt;</code> <code>&lt;input format&gt;</code> <code>&lt;sqlite database file&gt;</code> <code>[/clear]</code> <code>[/nogrid]</code>
 {| class="wikitable"
@@ Line 23: / Line 23: @@
 |-
 |4||<code>[/clear]</code>||Optional argument. If output file exists placename data will be added to the file. Use <code>/clear</code> to force a fresh database.
+|-
+|5||<code>[/nogrid]</code>||Optional argument. If set - do not create density grid file based on the output data to help with performance. Otherwise density grid file will be created.
 |}
@@ Line 409: / Line 411: @@
 === Locating databases ===
-The Geolocation service searches for available placename databases in the default folders provided at installation. If needed, [[./../../../maps/config/links|links.xml]]-files can be used to point to yet other folders containing databases.
+The Geolocation service searches for available placename databases in the default folder provided at installation. Additional datasources can be set up in the [[Service_configuration#LocationSettings|settings file.]]
-Databases are identified using the extension ''.location.sqlite''.
-Example database.links.xml-file:
-<source lang="xml"><links>
-  <link path="H:\GeoLocation\" recursiondepth="3"/>
-  <link path="C:\GeoLocation\databases\"/>
-</links></source>
 === Supported readers ===
@@ Line 457: / Line 450: @@
 ==== Setting up placename queries ====
-[http://codedoc.maria.teleplanglobe.com/codedoc/html/B15991F8.htm PlacenameQuery] data contract specifies the query parameters needed when performing a placename search.
+[http://codedocs.maria.teleplanglobe.com/release/managed/class_t_p_g_1_1_geo_framework_1_1_location_service_interfaces_1_1_place_name_query.html PlacenameQuery] data contract specifies the query parameters needed when performing a placename search.
 Example query:
@@ Line 483: / Line 476: @@
 There is a max limit on returned searches set to 20000. If a search exceeds this limit, no matches are returned.
+Note: If MaxReturnHits is undefined, no matches are returned.
 ==== Using facets ====

Geolocation: Difference between revisions

Latest revision as of 12:23, 10 August 2023

Contents

Converting placename data

Support data

Feature classes and codes

Country codes

Administrative data

Adm data for gns reader

Navnetyper

Creating readers for location data

Mandatory tables

Main table

FTS table

RTree table

Optional tables

Feature class table

Feature code table

Country code table

Administration code tables

Metadata table

Service

Locating databases

Supported readers

Geolocation reader (geoloc)

Performing placename searches

Setting up placename queries

Using facets

Navigation menu