Geolocation

From Maria GDK Wiki
Jump to navigation Jump to search

The Maria GDK Geolocation service allows fast, faceted freetext searches for partial or full placenames. General placename searches are supported, as well as street adress search.

A separate conversion step is required for converting from source data to a specialized SQLite database. Converters exist for GNS and Geonames. For simple (ie csv based formats) writing new converters is relatively simple.

Converting placename data

When converting a file containing placename information to a Maria GDK location database, you need to use the LocationServiceSqliteLoader conversion tool with a reader for that specific fileformat. See section on #Creating readers for location data for details.

A special usecase is if you have created an SQLite file with the proper data tables in another application (for instance FME). In this case you only need RTree and FTS tables. The tool will do this for you if you use the sqlite file as input, and the input format geoloc.

Readers are executed from the command line. Use options -?, --? or -help to display which parameters are available. Usage: <input file> <input format> <sqlite database file> [/clear] [/nogrid]

Order Parameter Description
1 <input file> Path to file containing placename information. Must be in a format supported by the LocationServiceSqliteLoader.
2 <input format> Input format string. Currently supported: ssr, geonames, gns, gns_us and matrikkel, geoloc.
3 <sqlite database> Path to output sqlite file.
4 [/clear] Optional argument. If output file exists placename data will be added to the file. Use /clear to force a fresh database.
5 [/nogrid] Optional argument. If set - do not create density grid file based on the output data to help with performance. Otherwise density grid file will be created.

Example:

TPG.GeoFramework.LocationServiceSqliteLoader.exe c:\LocationData\GNS\no.txt gns c:\ServiceTest\LocationData\gns_no.location.sqlite /clear
When adding to an existing database, tables might end up with duplicate entries.

Support data

The different readers uses a small number of csv/txt-files to map information in the sqlite databases (F.ex. mapping from country codes (NO) to country names (Norway)). If these files are not present when converting a database, an error is given but converting the database will still work in some cases. The files should be placed in the same folder as the source datafiles.

Default data files can be found in the Maria GDK source repo at \Src\Layers\Location\TPG.GeoFramework.GeolocSqliteLoader\SupportData

Feature classes and codes

Dsg-files should map feature codes and feature classes to descriptive texts.

File should be comma separated strings with format code,name,text,fea_class. F.ex: 'E,"mosque","a building for public Islamic worship","S - Spot Features",'. The three first entries are feature code, feature name and feature code description. The last entry (fea_class) is a combination of feature class code and feature class text, separated with '-'.

Example:

CODE,NAME,TEXT,FEA_CLASS,
"MND","mound(s)","a low, isolated, rounded hill","T - Hypsographic",
"MNDU","undersea mound","a low, isolated, rounded hill","U - Undersea",
"MNFE","iron mine(s)","a mine where iron ore is extracted","S - Spot Features",
"MNMT","monument","a commemorative structure or statue","S - Spot Features",
"MNQ","abandoned mine","abandoned mine","S - Spot Features",

Used by:

Reader Filename
gns dsg.csv
geonames dsg.csv

Country codes

cc-files should map country codes to country names.
File should be comma separated strings with format code,name, f.ex. 'ZI,"Zimbabwe",'.

Example:

CODE,NAME,
UV,"Burkina Faso",
CM,"Cameroon",
CJ,"Cayman Islands",
IP,"Clipperton Island",
CG,"Congo, Democratic Republic of the",
CY,"Cyprus",
DR,"Dominican Republic",

Used by:

Reader Filename
gns cc.csv
gns_us cc.csv
geonames cc.csv
ssr cc.csv

Administrative data

Adm-files should map administrative codes to (numbers and/or letters) to descriptive text.
Use Adm1 for "Administrative division level 1, US state, Norwegian fylke", Adm2 for "Administrative division level 2, US county, Norwegian kommune" and Adm3 for "Administrative division level 3, US ?, Norwegian poststed".
Files should be comma separated strings with format code,name. F.ex: '1662,"Klæbu kommune",'.

Example:

CODE,NAME,
1662,"Klæbu kommune",
0604,"Kongsberg kommune",
0402,"Kongsvinger kommune",
0815,"Kragerø kommune",
1001,"Kristiansand kommune",

Used by:

Reader Filename
gns See Adm data for gns reader
matrikkel matrikkel_adm.csv
Adm data for gns reader

File should be comma separated strings with format adm1_cd,country_cd,adm1_name. F.ex: '"04","NO","Buskerud",'.

Example:

ADM1,COUNTRY_CD,ADM1_NAME,
"04","QA","Al Khawr wa adh Dhakhīrah",
"34","WA","Okavango",
"35","AF","Laghmān",
"35","AG","Aïn Defla",
"35","BF","San Salvador",

Used by:

Reader Filename
gns gns_adm1.csv

Navnetyper

Navnetyper.txt should map feature classes and feature codes to descriptive texts.\

File should be comma separated strings with format classcode,classtext,code,codetext. F.ex: 'FC1,"Terrengformer",N1,"Berg","Mindre fjell"'.

Example:

FC6,"Samferdsel",N165,"Landingsstripe","Landingsplass for privat flygning"
FC6,"Samferdsel",N232,"Plass/torg","I tettsted eller by"
FC7,"Administrative områder",N180,"Nasjon","Selvstendig stat / land (offisielt navn)"
FC7,"Administrative områder",N181,"Fylke","Offisielt navn"
FC7,"Administrative områder",N182,"Kommune","Offisielt navn"

Used by:

Reader Filename
ssr navnetyper.txt

Creating readers for location data

A Maria2012 GeoLoc placename data reader must be able to convert files with placename information to a Maria2012 readable sqlite databases. Teleplan Globe provides readers for f.ex. GNS, GeoNames and SSR.

Readers should implement interface IPlaceNameDataInterfacer.

Each reader must implement functions CreateTables and LoadData. CreateTables is responsible for creating tables used by the GeoLoc service when running placename searches. Mandatory tables are Rtree for spatial searches, FTS for free text search and main table for available placename information. Tables also utilised by the GeoLoc service when available are: feature class, feature code, country code and metadata. LoadData reads data from sourcefiles into the database tables.

IPlaceNameDB is available for database creation helper functions.

Mandatory tables

Main table

The main table placenames_main should contain all placename information extracted from a datasource. Use the placename_alt column for alternative versions of the main placename data if available. Country code, feature code/class and administrative information columns are used for facets and metadata searches.

Columns Description Type
lat real latitude wgs84 decimal degrees
long real
feature_class text feature class based on raw data, ex 'H' for hydrography for GNS or 'Samferdsel' for SSR
feature_code text feature code, unique code
cc1 text Primary country code
cc2 text Secondary country codes, comma separated
placename text Placename (reading order with diacritics)
placename_alt text Alternate spellings of placename, comma separated
adm1 text Administrative division level 1, US state, Norwegian fylke
adm2 text Administrative division level 2, US county, Norwegian kommune
adm3 text Administrative division level 3, US ?, Norwegian poststed

FTS table

The Maria GDK location service uses the FTS4 extension in sqlite to create a table (placenames_fts) with a built-in full-text index. This index allows us to efficiently query the database for all rows that contain one or more words/tokens.
All strings meant to be searchable must be added to the FTS table. Using tagged metadata will ensure more exact searches F.ex. searching for <placename> cc:NO will return placenames with related country code metadata "NO", while <placename> NO will match all metadata containing "NO".

Columns Description Type
names searchable placenames text
meta_tagged searchable tagged metadata text
meta_untagged searchable untagged metadata text

RTree table

The RTree table (spatial_index) is a mandatory table used for spatial searches. The Maria GDK location database rtree table should have a primary key and two column pairs representing the minimum and maximum values (bounding box) for a 2-dimensional object.

Columns Description Type
id primary key integer
minlat minimum latitude float
maxlat maximum latitude float
minlong minimum longitude float
maxlong maximum longitude float

Optional tables

Feature class table

Optional table (fclass) containing feature class/theme information extracted from datasource.
Example feature classes can be 'H' for hydrography features in GNS or 'Samferdsel' for Norwegian SSR data.

Columns Description Type
code feature class text
name feature name text
desc description of feature class entry text

Feature code table

Optional table (fcode) containing feature code/object type information extracted from datasource.
Example features can be "lake' in GNS.

Columns Description Type
code feature code text
name feature name text
desc description of feature code entry text

Country code table

Optional table (cc) listing all represented country codes found in the datasource, f.ex. "NO"/"Norway".

Columns Description Type
code feature code (primary key) char(2)
name feature name text

Administration code tables

Optional tables (admin1, admin2, admin3) used for collecting administrative information f.ex. state, county, fylke etc.

Columns Description Type
code administration code text
name administration name text

Metadata table

Optional table (metadata) used for additional data (f.ex. data source producer, source dataset etc.) collected from datasource.

Columns Description Type
name metadata name text
value metadata value text
desc description of metadata entry text

Service

The geolocation service is a WCF-service that provides functionality for placename searches via a sqlite/FTS database.

The following service interfaces are provided:

Name Description
ILocationService Placename search

WCF basic http binding is used by default.

Locating databases

The Geolocation service searches for available placename databases in the default folder provided at installation. Additional datasources can be set up in the settings file.

Supported readers

Under construction.

Following formats are supported with readers for creating Geolocation databases: ssr, geonames, gns, gns_us, matrikkel, ssr_sosi and geoloc


Geolocation reader (geoloc)

The Geolocation reader (under developement) takes an incomplete geolocation database and adds the support tables necessary for it to become a complete Maria GDK Geolocation database.

Input should be a sqlite database with minimum one table as described above (placename_main). Admin1, Admin2 and Admin3 support tables will be constructed if inputdatabase contains adm1, adm2 and/or adm3-info. Other optional support tables (feature code, feature class, country codes and metadata) are not included in the first version of Geolocation reader.

Performing placename searches

Searches can include multiple terms separated with comma or space. All terms must be found to make a match in the database. Wildcard-character '*' are permitted, but only at the end of a string/substring. The service will automatically try to add '*' to the end of submitted search-strings, but not if the string already contains a wildcard character.

Example: 
Say the database contains the placenames Bred Sund, Sunde, Sundnes and Brandal på Sunnmøre. 

Searching for "sun" will be expanded by the service to "sun*" and return 4 matches. 
Searching for "sund" will be expanded to "sund*" and return 3 matches (Bred Sund, Sunde and Sundnes). 
Searching for "sun* br*" will return 2 matches (Bred Sund and Brandal på Sunnmøre). 
Searching for "sun* br" will NOT be expanded by the service since it already contains a wildcard, and return 0 matches.

The location service will look for matches in the FTS table contained in the sqlite database. This table contains placenames as well as tagged and untagged metadata (f.ex. country codes). Using tags will allow more specific searces, f.ex. searching for all norwegian schools with placenames starting with "sund" can be done like this:

sund cc:NO fclass:school

Returned placename matches are ranked according to distance from center position in map. If searching without wildcards, all excact matches are returned first (ranked by distance), then all non-excact matches (also ranked by distance). If searching with wildcards, results are ranked by distance.

Example: 
The database contains placenames Sandelva, Sande Skole, Sande, Sanden and Sande Stadion (listed by distance from center, that is Sande is closest to center and Sande Station furthest away).

Searching for "Sande" will return Sande Skole, Sande, Sande Stadion, Sandelva, Sanden.
Searching for "Sande*" will return Sandelva, Sande Skole, Sande, Sanden and Sande Stadion.

Setting up placename queries

PlacenameQuery data contract specifies the query parameters needed when performing a placename search.

Example query:

var query = new PlaceNameQuery
{
  AutoAddLocationWildcard = true,
  CenterPos = new GeoPos(60.0, 10.0),
  ExtractFacets = true,
  MaxInternalHits = 10000,
  MaxReturnHits = 10,
  NameSearchString = "sand"
};

Searching for "Sand" with AutoWildcard set to true, will f.x. return both "Sand" and "Sandvika", while AutoWildcard set to false will require an exact match and only return "Sand". The string must be at least 2 characters long and not already contain wildcards for AutoAddLocationWildcard to work. Wildcard-characters inside a search substring are not allowed.

CenterPos influences how the search results are ranked. If exact matches are found, results will be sorted in order of shortest distance from centerpos.

If true, facets are extracted from the search results. Use the Facets-parameter to control which facets should be extracted.

The NameSearchString is the primary placename text search string. Terms separated by spaces and commas must all be found. Wildcard '*' is allowed. Use metadata to narrow down the search results by adding tagged values on the form tag:value or tag=value (:norway or cc:no or cc=no). Use "-" to exclude terms from the search.

Use ResultOffset to limit the number of results returned from a query and enable paging of results.

F.x. MaxReturnHits set to 10 will return maximum 10 results from a query. If total hits found on server is 25, and ResultOffset is set to 10, the query will return entries 11-20. If ResultOffset is set to 20, query will return only the last five entries (21-25).

There is a max limit on returned searches set to 20000. If a search exceeds this limit, no matches are returned.

Note: If MaxReturnHits is undefined, no matches are returned.

Using facets

A search query will return a list of facets related to the search query results. Facets are used to narrow down the search results and perform more specific searches, f.ex. only display results related to Administrative Regions in Norway.

It is the reader for the input-data who decides how to map data to facets. Facet groups with single entries are ignored.

There are six categories of facets available:

Category Description
Feature Class Feature class based on raw data, ex 'H' for hydrography for GNS or 'Samferdsel' for SSR.
Feature Code Feature code, unique code.
CC/Country Primary country code.
Adm1 Administrative division level 1, US state, Norwegian fylke.
Adm2 Administrative division level 2, US county, Norwegian kommune.
Adm3 Administrative division level 3, US ?, Norwegian poststed