Geolocation: Difference between revisions
(→) |
m (→) |
||
(3 intermediate revisions by 2 users not shown) | |||
Line 5: | Line 5: | ||
== Converting placename data == | == Converting placename data == | ||
When converting a file containing placename information to a Maria GDK location database, you need a reader for that specific fileformat. See section on [[#Creating readers for location data]] for details. | When converting a file containing placename information to a Maria GDK location database, you need to use the LocationServiceSqliteLoader conversion tool with a reader for that specific fileformat. See section on [[#Creating readers for location data]] for details. | ||
A special usecase is if you have created an SQLite file with the proper data tables in another application (for instance FME). In this case you only need RTree and FTS tables. The tool will do this for you if you use the sqlite file as input, and the input format <code>geoloc</code>. | |||
Readers are executed from the command line. Use options -?, --? or -help to display which parameters are available. Usage: <code><input file></code> <code><input format></code> <code><sqlite database file></code> <code>[/clear]</code> <code>[/nogrid]</code> | |||
{| class="wikitable" | {| class="wikitable" | ||
Line 23: | Line 23: | ||
|- | |- | ||
|4||<code>[/clear]</code>||Optional argument. If output file exists placename data will be added to the file. Use <code>/clear</code> to force a fresh database. | |4||<code>[/clear]</code>||Optional argument. If output file exists placename data will be added to the file. Use <code>/clear</code> to force a fresh database. | ||
|- | |||
|5||<code>[/nogrid]</code>||Optional argument. If set - do not create density grid file based on the output data to help with performance. Otherwise density grid file will be created. | |||
|} | |} | ||
Line 409: | Line 411: | ||
=== Locating databases === | === Locating databases === | ||
The Geolocation service searches for available placename databases in the default | The Geolocation service searches for available placename databases in the default folder provided at installation. Additional datasources can be set up in the [[Service_configuration#LocationSettings|settings file.]] | ||
=== Supported readers === | === Supported readers === | ||
Line 457: | Line 450: | ||
==== Setting up placename queries ==== | ==== Setting up placename queries ==== | ||
[http:// | [http://codedocs.maria.teleplanglobe.com/release/managed/class_t_p_g_1_1_geo_framework_1_1_location_service_interfaces_1_1_place_name_query.html PlacenameQuery] data contract specifies the query parameters needed when performing a placename search. | ||
Example query: | Example query: | ||
Line 483: | Line 476: | ||
There is a max limit on returned searches set to 20000. If a search exceeds this limit, no matches are returned. | There is a max limit on returned searches set to 20000. If a search exceeds this limit, no matches are returned. | ||
Note: If MaxReturnHits is undefined, no matches are returned. | |||
==== Using facets ==== | ==== Using facets ==== |
Latest revision as of 12:23, 10 August 2023
The Maria GDK Geolocation service allows fast, faceted freetext searches for partial or full placenames. General placename searches are supported, as well as street adress search.
A separate conversion step is required for converting from source data to a specialized SQLite database. Converters exist for GNS and Geonames. For simple (ie csv based formats) writing new converters is relatively simple.
Converting placename data
When converting a file containing placename information to a Maria GDK location database, you need to use the LocationServiceSqliteLoader conversion tool with a reader for that specific fileformat. See section on #Creating readers for location data for details.
A special usecase is if you have created an SQLite file with the proper data tables in another application (for instance FME). In this case you only need RTree and FTS tables. The tool will do this for you if you use the sqlite file as input, and the input format geoloc
.
Readers are executed from the command line. Use options -?, --? or -help to display which parameters are available. Usage: <input file>
<input format>
<sqlite database file>
[/clear]
[/nogrid]
Order | Parameter | Description |
---|---|---|
1 | <input file> |
Path to file containing placename information. Must be in a format supported by the LocationServiceSqliteLoader. |
2 | <input format> |
Input format string. Currently supported: ssr , geonames , gns , gns_us and matrikkel , geoloc .
|
3 | <sqlite database> |
Path to output sqlite file. |
4 | [/clear] |
Optional argument. If output file exists placename data will be added to the file. Use /clear to force a fresh database.
|
5 | [/nogrid] |
Optional argument. If set - do not create density grid file based on the output data to help with performance. Otherwise density grid file will be created. |
Example:
TPG.GeoFramework.LocationServiceSqliteLoader.exe c:\LocationData\GNS\no.txt gns c:\ServiceTest\LocationData\gns_no.location.sqlite /clear
Support data
The different readers uses a small number of csv/txt-files to map information in the sqlite databases (F.ex. mapping from country codes (NO) to country names (Norway)). If these files are not present when converting a database, an error is given but converting the database will still work in some cases. The files should be placed in the same folder as the source datafiles.
Default data files can be found in the Maria GDK source repo at
\Src\Layers\Location\TPG.GeoFramework.GeolocSqliteLoader\SupportData
Feature classes and codes
Dsg-files should map feature codes and feature classes to descriptive texts.
File should be comma separated strings with format code,name,text,fea_class. F.ex: 'E,"mosque","a building for public Islamic worship","S - Spot Features",'. The three first entries are feature code, feature name and feature code description. The last entry (fea_class) is a combination of feature class code and feature class text, separated with '-'.
Example:
CODE,NAME,TEXT,FEA_CLASS,
"MND","mound(s)","a low, isolated, rounded hill","T - Hypsographic",
"MNDU","undersea mound","a low, isolated, rounded hill","U - Undersea",
"MNFE","iron mine(s)","a mine where iron ore is extracted","S - Spot Features",
"MNMT","monument","a commemorative structure or statue","S - Spot Features",
"MNQ","abandoned mine","abandoned mine","S - Spot Features",
Used by:
Reader | Filename |
---|---|
gns | dsg.csv |
geonames | dsg.csv |
Country codes
cc-files should map country codes to country names.
File should be comma separated strings with format code,name, f.ex. 'ZI,"Zimbabwe",'.
Example:
CODE,NAME,
UV,"Burkina Faso",
CM,"Cameroon",
CJ,"Cayman Islands",
IP,"Clipperton Island",
CG,"Congo, Democratic Republic of the",
CY,"Cyprus",
DR,"Dominican Republic",
Used by:
Reader | Filename |
---|---|
gns | cc.csv |
gns_us | cc.csv |
geonames | cc.csv |
ssr | cc.csv |
Administrative data
Adm-files should map administrative codes to (numbers and/or letters) to descriptive text.
Use Adm1 for "Administrative division level 1, US state, Norwegian fylke", Adm2 for "Administrative division level 2, US county, Norwegian kommune" and Adm3 for "Administrative division level 3, US ?, Norwegian poststed".
Files should be comma separated strings with format code,name. F.ex: '1662,"Klæbu kommune",'.
Example:
CODE,NAME,
1662,"Klæbu kommune",
0604,"Kongsberg kommune",
0402,"Kongsvinger kommune",
0815,"Kragerø kommune",
1001,"Kristiansand kommune",
Used by:
Reader | Filename |
---|---|
gns | See Adm data for gns reader |
matrikkel | matrikkel_adm.csv |
Adm data for gns reader
File should be comma separated strings with format adm1_cd,country_cd,adm1_name. F.ex: '"04","NO","Buskerud",'.
Example:
ADM1,COUNTRY_CD,ADM1_NAME,
"04","QA","Al Khawr wa adh Dhakhīrah",
"34","WA","Okavango",
"35","AF","LaghmÄn",
"35","AG","Aïn Defla",
"35","BF","San Salvador",
Used by:
Reader | Filename |
---|---|
gns | gns_adm1.csv |
Navnetyper.txt should map feature classes and feature codes to descriptive texts.\
File should be comma separated strings with format classcode,classtext,code,codetext. F.ex: 'FC1,"Terrengformer",N1,"Berg","Mindre fjell"'.
Example:
FC6,"Samferdsel",N165,"Landingsstripe","Landingsplass for privat flygning"
FC6,"Samferdsel",N232,"Plass/torg","I tettsted eller by"
FC7,"Administrative områder",N180,"Nasjon","Selvstendig stat / land (offisielt navn)"
FC7,"Administrative områder",N181,"Fylke","Offisielt navn"
FC7,"Administrative områder",N182,"Kommune","Offisielt navn"
Used by:
Reader | Filename |
---|---|
ssr | navnetyper.txt |
Creating readers for location data
A Maria2012 GeoLoc placename data reader must be able to convert files with placename information to a Maria2012 readable sqlite databases. Teleplan Globe provides readers for f.ex. GNS, GeoNames and SSR.
Each reader must implement functions CreateTables and LoadData. CreateTables is responsible for creating tables used by the GeoLoc service when running placename searches. Mandatory tables are Rtree for spatial searches, FTS for free text search and main table for available placename information. Tables also utilised by the GeoLoc service when available are: feature class, feature code, country code and metadata. LoadData reads data from sourcefiles into the database tables.
IPlaceNameDB is available for database creation helper functions.
Mandatory tables
Main table
The main table placenames_main
should contain all placename information extracted from a datasource. Use the placename_alt
column for alternative versions of the main placename data if available. Country code, feature code/class and administrative information columns are used for facets and metadata searches.
Columns | Description | Type |
---|---|---|
lat | real | latitude wgs84 decimal degrees |
long | real | |
feature_class | text | feature class based on raw data, ex 'H' for hydrography for GNS or 'Samferdsel' for SSR |
feature_code | text | feature code, unique code |
cc1 | text | Primary country code |
cc2 | text | Secondary country codes, comma separated |
placename | text | Placename (reading order with diacritics) |
placename_alt | text | Alternate spellings of placename, comma separated |
adm1 | text | Administrative division level 1, US state, Norwegian fylke |
adm2 | text | Administrative division level 2, US county, Norwegian kommune |
adm3 | text | Administrative division level 3, US ?, Norwegian poststed |
FTS table
The Maria GDK location service uses the FTS4 extension in sqlite to create a table (placenames_fts) with a built-in full-text index. This index allows us to efficiently query the database for all rows that contain one or more words/tokens.
All strings meant to be searchable must be added to the FTS table. Using tagged metadata will ensure more exact searches F.ex. searching for <placename>
cc:NO will return placenames with related country code metadata "NO", while <placename>
NO will match all metadata containing "NO".
Columns | Description | Type |
---|---|---|
names | searchable placenames | text |
meta_tagged | searchable tagged metadata | text |
meta_untagged | searchable untagged metadata | text |
RTree table
The RTree table (spatial_index) is a mandatory table used for spatial searches. The Maria GDK location database rtree table should have a primary key and two column pairs representing the minimum and maximum values (bounding box) for a 2-dimensional object.
Columns | Description | Type |
---|---|---|
id | primary key | integer |
minlat | minimum latitude | float |
maxlat | maximum latitude | float |
minlong | minimum longitude | float |
maxlong | maximum longitude | float |
Optional tables
Feature class table
Optional table (fclass) containing feature class/theme information extracted from datasource.
Example feature classes can be 'H' for hydrography features in GNS or 'Samferdsel' for Norwegian SSR data.
Columns | Description | Type |
---|---|---|
code | feature class | text |
name | feature name | text |
desc | description of feature class entry | text |
Feature code table
Optional table (fcode) containing feature code/object type information extracted from datasource.
Example features can be "lake' in GNS.
Columns | Description | Type |
---|---|---|
code | feature code | text |
name | feature name | text |
desc | description of feature code entry | text |
Country code table
Optional table (cc) listing all represented country codes found in the datasource, f.ex. "NO"/"Norway".
Columns | Description | Type |
---|---|---|
code | feature code (primary key) | char(2) |
name | feature name | text |
Administration code tables
Optional tables (admin1, admin2, admin3) used for collecting administrative information f.ex. state, county, fylke etc.
Columns | Description | Type |
---|---|---|
code | administration code | text |
name | administration name | text |
Metadata table
Optional table (metadata) used for additional data (f.ex. data source producer, source dataset etc.) collected from datasource.
Columns | Description | Type |
---|---|---|
name | metadata name | text |
value | metadata value | text |
desc | description of metadata entry | text |
Service
The geolocation service is a WCF-service that provides functionality for placename searches via a sqlite/FTS database.
The following service interfaces are provided:
Name | Description |
---|---|
ILocationService | Placename search |
WCF basic http binding is used by default.
Locating databases
The Geolocation service searches for available placename databases in the default folder provided at installation. Additional datasources can be set up in the settings file.
Supported readers
Under construction.
Following formats are supported with readers for creating Geolocation databases: ssr, geonames, gns, gns_us, matrikkel, ssr_sosi and geoloc
Geolocation reader (geoloc)
The Geolocation reader (under developement) takes an incomplete geolocation database and adds the support tables necessary for it to become a complete Maria GDK Geolocation database.
Input should be a sqlite database with minimum one table as described above (placename_main). Admin1, Admin2 and Admin3 support tables will be constructed if inputdatabase contains adm1, adm2 and/or adm3-info. Other optional support tables (feature code, feature class, country codes and metadata) are not included in the first version of Geolocation reader.
Performing placename searches
Searches can include multiple terms separated with comma or space. All terms must be found to make a match in the database. Wildcard-character '*' are permitted, but only at the end of a string/substring. The service will automatically try to add '*' to the end of submitted search-strings, but not if the string already contains a wildcard character.
Example:
Say the database contains the placenames Bred Sund, Sunde, Sundnes and Brandal på Sunnmøre.
Searching for "sun" will be expanded by the service to "sun*" and return 4 matches.
Searching for "sund" will be expanded to "sund*" and return 3 matches (Bred Sund, Sunde and Sundnes).
Searching for "sun* br*" will return 2 matches (Bred Sund and Brandal på Sunnmøre).
Searching for "sun* br" will NOT be expanded by the service since it already contains a wildcard, and return 0 matches.
The location service will look for matches in the FTS table contained in the sqlite database. This table contains placenames as well as tagged and untagged metadata (f.ex. country codes). Using tags will allow more specific searces, f.ex. searching for all norwegian schools with placenames starting with "sund" can be done like this:
sund cc:NO fclass:school
Returned placename matches are ranked according to distance from center position in map. If searching without wildcards, all excact matches are returned first (ranked by distance), then all non-excact matches (also ranked by distance). If searching with wildcards, results are ranked by distance.
Example:
The database contains placenames Sandelva, Sande Skole, Sande, Sanden and Sande Stadion (listed by distance from center, that is Sande is closest to center and Sande Station furthest away).
Searching for "Sande" will return Sande Skole, Sande, Sande Stadion, Sandelva, Sanden.
Searching for "Sande*" will return Sandelva, Sande Skole, Sande, Sanden and Sande Stadion.
Setting up placename queries
PlacenameQuery data contract specifies the query parameters needed when performing a placename search.
Example query:
var query = new PlaceNameQuery
{
AutoAddLocationWildcard = true,
CenterPos = new GeoPos(60.0, 10.0),
ExtractFacets = true,
MaxInternalHits = 10000,
MaxReturnHits = 10,
NameSearchString = "sand"
};
Searching for "Sand" with AutoWildcard set to true, will f.x. return both "Sand" and "Sandvika", while AutoWildcard set to false will require an exact match and only return "Sand". The string must be at least 2 characters long and not already contain wildcards for AutoAddLocationWildcard to work. Wildcard-characters inside a search substring are not allowed.
CenterPos influences how the search results are ranked. If exact matches are found, results will be sorted in order of shortest distance from centerpos.
If true, facets are extracted from the search results. Use the Facets-parameter to control which facets should be extracted.
The NameSearchString is the primary placename text search string. Terms separated by spaces and commas must all be found. Wildcard '*' is allowed. Use metadata to narrow down the search results by adding tagged values on the form tag:value or tag=value (:norway or cc:no or cc=no). Use "-" to exclude terms from the search.
Use ResultOffset to limit the number of results returned from a query and enable paging of results.
F.x. MaxReturnHits set to 10 will return maximum 10 results from a query. If total hits found on server is 25, and ResultOffset is set to 10, the query will return entries 11-20. If ResultOffset is set to 20, query will return only the last five entries (21-25).
There is a max limit on returned searches set to 20000. If a search exceeds this limit, no matches are returned.
Note: If MaxReturnHits is undefined, no matches are returned.
Using facets
A search query will return a list of facets related to the search query results. Facets are used to narrow down the search results and perform more specific searches, f.ex. only display results related to Administrative Regions in Norway.
It is the reader for the input-data who decides how to map data to facets. Facet groups with single entries are ignored.
There are six categories of facets available:
Category | Description |
---|---|
Feature Class | Feature class based on raw data, ex 'H' for hydrography for GNS or 'Samferdsel' for SSR. |
Feature Code | Feature code, unique code. |
CC/Country | Primary country code. |
Adm1 | Administrative division level 1, US state, Norwegian fylke. |
Adm2 | Administrative division level 2, US county, Norwegian kommune. |
Adm3 | Administrative division level 3, US ?, Norwegian poststed |