geograpy package¶
Submodules¶
geograpy.extraction module¶
-
class
geograpy.extraction.
Extractor
(text=None, url=None, debug=False)[source]¶ Bases:
object
Extract geo context for text or from url
-
find_entities
(labels=['GPE', 'GSP', 'PERSON', 'ORGANIZATION'])[source]¶ Find entities with the given labels set self.places and returns it :param labels: Labels: The labels to filter
Returns: List of places Return type: list
-
geograpy.locator module¶
The locator module allows to get detailed city information including the region and country of a city from a location string.
Examples for location strings are:
Amsterdam, Netherlands Vienna, Austria Vienna, IL Paris - Texas Paris TX
the locator will lookup the cities and try to disambiguate the result based on the country or region information found.
The results in string representationa are:
Amsterdam (NH(North Holland) - NL(Netherlands)) Vienna (9(Vienna) - AT(Austria)) Vienna (IL(Illinois) - US(United States)) Paris (TX(Texas) - US(United States)) Paris (TX(Texas) - US(United States))
Each city returned has a city.region and city.country attribute with the details of the city.
Created on 2020-09-18
@author: wf
-
class
geograpy.locator.
Locator
(db_file=None, correctMisspelling=False, debug=False)[source]¶ Bases:
object
location handling
-
cities_for_name
(city_name)[source]¶ find cities with the given city_name
Parameters: city_name (string) – the potential name of a city Returns: a list of city records
-
correct_country_misspelling
(name)[source]¶ correct potential misspellings :param name: the name of the country potentially misspelled :type name: string
Returns: correct name of unchanged Return type: string
-
db_has_data
()[source]¶ check whether the database has data / is populated
Returns: True if the cities table exists and has more than one record Return type: boolean
-
disambiguate
(country, regions, cities)[source]¶ try determining country, regions and city from the potential choices
Parameters: - country (Country) – a matching country found
- regions (list) – a list of matching Regions found
- cities (list) – a list of matching cities found
Returns: the found city or None
Return type:
-
getCountry
(name)[source]¶ get the country for the given name :param name: the name of the country to lookup :type name: string
Returns: the country if one was found or None if not Return type: country
-
getGeolite2Cities
()[source]¶ get the Geolite2 City-Locations as a list of Dicts
Returns: a list of Geolite2 City-Locator dicts Return type: list
-
static
getInstance
(correctMisspelling=False, debug=False)[source]¶ get the singleton instance of the Locator. If parameters are changed on further calls the initial parameters will still be in effect since the original instance will be returned!
Parameters: - correctMispelling (bool) – if True correct typical misspellings
- debug (bool) – if True show debug information
-
isAmbiguousPrefix
(name)[source]¶ check if the given name is an ambiguous prefix
Parameters: name (string) – the city name to check Returns: True if this is a known prefix that is ambigous that is there is also a city with such a name Return type: bool
-
isISO
(s)[source]¶ check if the given string is an ISO code
Returns: True if the string is an ISO Code Return type: bool
-
isPrefix
(name, level)[source]¶ check if the given name is a city prefix at the given level
Parameters: - name (string) – the city name to check
- level (int) – the level on which to check (number of words)
Returns: True if this is a known prefix of multiple cities e.g. “San”, “New”, “Los”
Return type: bool
-
is_a_country
(name)[source]¶ check if the given string name is a country
Parameters: name (string) – the string to check Returns: if pycountry thinks the string is a country Return type: True
-
locate
(places)[source]¶ locate a city, region country combination based on the places information
Parameters: places (list) – a list of place tokens e.g. “Vienna, Austria” Returns: a city with country and region details Return type: City
-
locator
= None¶
-
places_by_name
(place_name, column_name)[source]¶ get places by name and column :param place_name: the name of the place :type place_name: string :param column_name: the column to look at :type column_name: string
-
populate_Cities
(sqlDB)[source]¶ populate the given sqlDB with the Geolite2 Cities
Parameters: sqlDB (SQLDB) – the SQL database to use
-
populate_PrefixAmbiguities
(sqlDB)[source]¶ create a table with ambiguous prefixes
Parameters: sqlDB (SQLDB) – the SQL database to use
-
populate_PrefixTree
(sqlDB)[source]¶ calculate the PrefixTree info
Parameters: sqlDb – the SQL Database to use Returns: the prefix tree Return type: PrefixTree
-
geograpy.places module¶
-
class
geograpy.places.
PlaceContext
(place_names, setAll=True)[source]¶ Bases:
geograpy.locator.Locator
Adds context information to a place name
geograpy.prefixtree module¶
Created on 2020-09-20
@author: wf
-
class
geograpy.prefixtree.
PrefixTree
[source]¶ Bases:
object
prefix analysis and search
see http://p-nand-q.com/python/data-types/general/tries.html
-
add2Table
(prefix, prefixStr, table, level)[source]¶ recursively add prefix tree entries to a table
Parameters: - prefix (dict) – the dictionary to start with
- prefixStr (string) – the prefix string up to this level
- table (list) – a “flat” list of dicts as a table
- level (int) – the level (length of word sequence) on which to add
-
countStartsWith
(namePrefix)[source]¶ count how many entries start with the given namePrefix
Parameters: namePrefix (string) – the prefix to check
-
geograpy.utils module¶
-
geograpy.utils.
fuzzy_match
(s1, s2, max_dist=0.8)[source]¶ Fuzzy match the given two strings with the given maximum distance :param s1: string: First string :param s2: string: Second string :param max_dist: float: The distance - default: 0.8
Returns: jellyfish jaro_winkler_similarity based on https://en.wikipedia.org/wiki/Jaro-Winkler_distance Return type: float
Module contents¶
-
geograpy.
get_geoPlace_context
(url=None, text=None, debug=False)[source]¶ Get a place context for a given text with information about country, region, city and other based on NLTK Named Entities having the Geographic(GPE) label.
Parameters: - url (String) – the url to read text from (if any)
- text (String) – the text to analyze
- debug (boolean) – if True show debug information
Returns: PlaceContext: the place context
Return type: places
-
geograpy.
get_place_context
(url=None, text=None, labels=['GPE', 'GSP', 'PERSON', 'ORGANIZATION'], debug=False)[source]¶ Get a place context for a given text with information about country, region, city and other based on NLTK Named Entities in the label set Geographic(GPE), Person(PERSON) and Organization(ORGANIZATION).
Parameters: - url (String) – the url to read text from (if any)
- text (String) – the text to analyze
- debug (boolean) – if True show debug information
Returns: PlaceContext: the place context
Return type: pc