geograpy package¶
Submodules¶
geograpy.extraction module¶
-
class
geograpy.extraction.
Extractor
(text=None, url=None, debug=False)[source]¶ Bases:
object
Extract geo context for text or from url
-
find_entities
(labels=['GPE', 'GSP', 'PERSON', 'ORGANIZATION'])[source]¶ Find entities with the given labels set self.places and returns it :param labels: Labels: The labels to filter
Returns: List of places Return type: list
-
geograpy.locator module¶
The locator module allows to get detailed city information including the region and country of a city from a location string.
Examples for location strings are:
Amsterdam, Netherlands Vienna, Austria Vienna, IL Paris - Texas Paris TX
the locator will lookup the cities and try to disambiguate the result based on the country or region information found.
The results in string representationa are:
Amsterdam (NH(North Holland) - NL(Netherlands)) Vienna (9(Vienna) - AT(Austria)) Vienna (IL(Illinois) - US(United States)) Paris (TX(Texas) - US(United States)) Paris (TX(Texas) - US(United States))
Each city returned has a city.region and city.country attribute with the details of the city.
Created on 2020-09-18
@author: wf
-
class
geograpy.locator.
Locator
(db_file=None, correctMisspelling=False, debug=False)[source]¶ Bases:
object
location handling
-
cities_for_name
(cityName)[source]¶ find cities with the given cityName
Parameters: cityName (string) – the potential name of a city Returns: a list of city records
-
correct_country_misspelling
(name)[source]¶ correct potential misspellings :param name: the name of the country potentially misspelled :type name: string
Returns: correct name of unchanged Return type: string
-
db_has_data
()[source]¶ check whether the database has data / is populated
Returns: True if the cities table exists and has more than one record Return type: boolean
-
db_recordCount
(tableList, tableName)[source]¶ count the number of records for the given tableName
Parameters: - tableList (list) – the list of table to check
- tableName (str) – the name of the table to check
- Returns
- int: the number of records found for the table
-
disambiguate
(country, regions, cities, byPopulation=True)[source]¶ try determining country, regions and city from the potential choices
Parameters: - country (Country) – a matching country found
- regions (list) – a list of matching Regions found
- cities (list) – a list of matching cities found
Returns: the found city or None
Return type:
-
getCountry
(name)[source]¶ get the country for the given name :param name: the name of the country to lookup :type name: string
Returns: the country if one was found or None if not Return type: country
-
getGeolite2Cities
()[source]¶ get the Geolite2 City-Locations as a list of Dicts
Returns: a list of Geolite2 City-Locator dicts Return type: list
-
static
getInstance
(correctMisspelling=False, debug=False)[source]¶ get the singleton instance of the Locator. If parameters are changed on further calls the initial parameters will still be in effect since the original instance will be returned!
Parameters: - correctMispelling (bool) – if True correct typical misspellings
- debug (bool) – if True show debug information
-
getView
()[source]¶ get the view to be used
Returns: the SQL view to be used for CityLookups e.g. GeoLite2CityLookup Return type: str
-
getWikidataCityPopulation
(sqlDB, endpoint=None)[source]¶ Parameters: - sqlDB (SQLDB) – target SQL database
- endpoint (str) – url of the wikidata endpoint or None if default should be used
-
isAmbiguousPrefix
(name)[source]¶ check if the given name is an ambiguous prefix
Parameters: name (string) – the city name to check Returns: True if this is a known prefix that is ambigous that is there is also a city with such a name Return type: bool
-
isISO
(s)[source]¶ check if the given string is an ISO code
Returns: True if the string is an ISO Code Return type: bool
-
isPrefix
(name, level)[source]¶ check if the given name is a city prefix at the given level
Parameters: - name (string) – the city name to check
- level (int) – the level on which to check (number of words)
Returns: True if this is a known prefix of multiple cities e.g. “San”, “New”, “Los”
Return type: bool
-
is_a_country
(name)[source]¶ check if the given string name is a country
Parameters: name (string) – the string to check Returns: if pycountry thinks the string is a country Return type: True
-
locate
(places)[source]¶ locate a city, region country combination based on the places information
Parameters: places (list) – a list of place tokens e.g. “Vienna, Austria” Returns: a city with country and region details Return type: City
-
locator
= None¶
-
places_by_name
(placeName, columnName)[source]¶ get places by name and column :param placeName: the name of the place :type placeName: string :param columnName: the column to look at :type columnName: string
-
populate_Cities
(sqlDB)[source]¶ populate the given sqlDB with the Geolite2 Cities
Parameters: sqlDB (SQLDB) – the SQL database to use
-
populate_Cities_FromWikidata
(sqlDB)[source]¶ populate the given sqlDB with the Wikidata Cities
Parameters: sqlDB (SQLDB) – target SQL database
-
populate_PrefixAmbiguities
(sqlDB, view)[source]¶ create a table with ambiguous prefixes
Parameters: - sqlDB (SQLDB) – the SQL database to use
- view (str) – the view to use
-
populate_PrefixTree
(sqlDB, view)[source]¶ calculate the PrefixTree info
Parameters: - sqlDb (SQLDB) – the SQL Database to use
- view (string) – the view to use
Returns: the prefix tree
Return type:
-
populate_db
(force=False)[source]¶ populate the cities SQL database which caches the information from the GeoLite2-City-Locations.csv file
Parameters: force (bool) – if True force a recreation of the database
-
geograpy.places module¶
-
class
geograpy.places.
PlaceContext
(place_names, setAll=True)[source]¶ Bases:
geograpy.locator.Locator
Adds context information to a place name
geograpy.prefixtree module¶
Created on 2020-09-20
@author: wf
-
class
geograpy.prefixtree.
PrefixTree
[source]¶ Bases:
object
prefix analysis and search
see http://p-nand-q.com/python/data-types/general/tries.html for the general data structure this class is more specific and creats
-
add2Table
(prefix, prefixStr, table, level)[source]¶ recursively add prefix tree entries to a table
Parameters: - prefix (dict) – the dictionary to start with
- prefixStr (string) – the prefix string up to this level
- table (list) – a “flat” list of dicts as a table
- level (int) – the level (length of word sequence) on which to add
-
countStartsWith
(namePrefix)[source]¶ count how many entries start with the given namePrefix
Parameters: namePrefix (string) – the prefix to check
-
geograpy.utils module¶
-
geograpy.utils.
fuzzy_match
(s1, s2, max_dist=0.8)[source]¶ Fuzzy match the given two strings with the given maximum distance :param s1: string: First string :param s2: string: Second string :param max_dist: float: The distance - default: 0.8
Returns: jellyfish jaro_winkler_similarity based on https://en.wikipedia.org/wiki/Jaro-Winkler_distance Return type: float
geograpy.wikidata module¶
Created on 2020-09-23
@author: wf
Module contents¶
-
geograpy.
get_geoPlace_context
(url=None, text=None, debug=False)[source]¶ Get a place context for a given text with information about country, region, city and other based on NLTK Named Entities having the Geographic(GPE) label.
Parameters: - url (String) – the url to read text from (if any)
- text (String) – the text to analyze
- debug (boolean) – if True show debug information
Returns: PlaceContext: the place context
Return type: places
-
geograpy.
get_place_context
(url=None, text=None, labels=['GPE', 'GSP', 'PERSON', 'ORGANIZATION'], debug=False)[source]¶ Get a place context for a given text with information about country, region, city and other based on NLTK Named Entities in the label set Geographic(GPE), Person(PERSON) and Organization(ORGANIZATION).
Parameters: - url (String) – the url to read text from (if any)
- text (String) – the text to analyze
- debug (boolean) – if True show debug information
Returns: PlaceContext: the place context
Return type: pc