geograpy package

Submodules

geograpy.extraction module

class geograpy.extraction.Extractor(text=None, url=None, debug=False)[source]

Bases: object

Extract geo context for text or from url

find_entities(labels=['GPE', 'GSP', 'PERSON', 'ORGANIZATION'])[source]

Find entities with the given labels set self.places and returns it :param labels: Labels: The labels to filter

Returns:List of places
Return type:list
find_geoEntities()[source]

Find geographic entities

Returns:List of places
Return type:list
set_text()[source]

Setter for text

split(delimiter=', ')[source]

simpler regular expression splitter with not entity check

hat tip: https://stackoverflow.com/a/1059601/1497139

geograpy.labels module

Created on 2020-09-10

@author: wf

class geograpy.labels.Labels[source]

Bases: object

NLTK labels

default = ['GPE', 'GSP', 'PERSON', 'ORGANIZATION']
geo = ['GPE', 'GSP']

geograpy.locator module

The locator module allows to get detailed city information including the region and country of a city from a location string.

Examples for location strings are:

Amsterdam, Netherlands Vienna, Austria Vienna, IL Paris - Texas Paris TX

the locator will lookup the cities and try to disambiguate the result based on the country or region information found.

The results in string representationa are:

Amsterdam (NH(North Holland) - NL(Netherlands)) Vienna (9(Vienna) - AT(Austria)) Vienna (IL(Illinois) - US(United States)) Paris (TX(Texas) - US(United States)) Paris (TX(Texas) - US(United States))

Each city returned has a city.region and city.country attribute with the details of the city.

Created on 2020-09-18

@author: wf

class geograpy.locator.City[source]

Bases: object

a single city as an object

static fromGeoLite2(record)[source]
setValue(name, record)[source]

set a field value with the given name to the given record dicts corresponding entry or none

Parameters:
  • name (string) – the name of the field
  • record (dict) – the dict to get the value from
class geograpy.locator.Country[source]

Bases: object

a country

static fromGeoLite2(record)[source]

create a country from a geolite2 record

static fromPyCountry(pcountry)[source]
Parameters:pcountry (PyCountry) – a country as gotten from pycountry
Returns:the country
Return type:Country
class geograpy.locator.Locator(db_file=None, correctMisspelling=False, debug=False)[source]

Bases: object

location handling

cities_for_name(cityName)[source]

find cities with the given cityName

Parameters:cityName (string) – the potential name of a city
Returns:a list of city records
correct_country_misspelling(name)[source]

correct potential misspellings :param name: the name of the country potentially misspelled :type name: string

Returns:correct name of unchanged
Return type:string
createViews(sqlDB)[source]
db_has_data()[source]

check whether the database has data / is populated

Returns:True if the cities table exists and has more than one record
Return type:boolean
db_recordCount(tableList, tableName)[source]

count the number of records for the given tableName

Parameters:
  • tableList (list) – the list of table to check
  • tableName (str) – the name of the table to check
Returns
int: the number of records found for the table
disambiguate(country, regions, cities, byPopulation=True)[source]

try determining country, regions and city from the potential choices

Parameters:
  • country (Country) – a matching country found
  • regions (list) – a list of matching Regions found
  • cities (list) – a list of matching cities found
Returns:

the found city or None

Return type:

City

getAliases()[source]

get the aliases hashTable

getCountry(name)[source]

get the country for the given name :param name: the name of the country to lookup :type name: string

Returns:the country if one was found or None if not
Return type:country
getGeolite2Cities()[source]

get the Geolite2 City-Locations as a list of Dicts

Returns:a list of Geolite2 City-Locator dicts
Return type:list
static getInstance(correctMisspelling=False, debug=False)[source]

get the singleton instance of the Locator. If parameters are changed on further calls the initial parameters will still be in effect since the original instance will be returned!

Parameters:
  • correctMispelling (bool) – if True correct typical misspellings
  • debug (bool) – if True show debug information
getView()[source]

get the view to be used

Returns:the SQL view to be used for CityLookups e.g. GeoLite2CityLookup
Return type:str
getWikidataCityPopulation(sqlDB, endpoint=None)[source]
Parameters:
  • sqlDB (SQLDB) – target SQL database
  • endpoint (str) – url of the wikidata endpoint or None if default should be used
isISO(s)[source]

check if the given string is an ISO code

Returns:True if the string is an ISO Code
Return type:bool
is_a_country(name)[source]

check if the given string name is a country

Parameters:name (string) – the string to check
Returns:if pycountry thinks the string is a country
Return type:True
locateCity(places)[source]

locate a city, region country combination based on the given wordtoken information

Parameters:
  • places (list) – a list of places derived by splitting a locality e.g. “San Francisco, CA”
  • to "San Francisco", "CA" (leads) –
Returns:

a city with country and region details

Return type:

City

locator = None
places_by_name(placeName, columnName)[source]

get places by name and column :param placeName: the name of the place :type placeName: string :param columnName: the column to look at :type columnName: string

populateFromWikidata(sqlDB)[source]

populate countries and regions from Wikidata

Parameters:sqlDB (SQLDB) – target SQL database
populate_Cities(sqlDB)[source]

populate the given sqlDB with the Geolite2 Cities

Parameters:sqlDB (SQLDB) – the SQL database to use
populate_Cities_FromWikidata(sqlDB)[source]

populate the given sqlDB with the Wikidata Cities

Parameters:sqlDB (SQLDB) – target SQL database
populate_Countries(sqlDB)[source]

populate database with countries from wikiData

Parameters:sqlDB (SQLDB) – target SQL database
populate_Regions(sqlDB)[source]

populate database with regions from wikiData

Parameters:sqlDB (SQLDB) – target SQL database
populate_Version(sqlDB)[source]

populate the version table

Parameters:sqlDB (SQLDB) – target SQL database
populate_db(force=False)[source]

populate the cities SQL database which caches the information from the GeoLite2-City-Locations.csv file

Parameters:force (bool) – if True force a recreation of the database
readCSV(fileName)[source]
recreateDatabase()[source]

recreate my lookup database

regions_for_name(region_name)[source]

get the regions for the given region_name (which might be an ISO code)

Parameters:region_name (string) – region name
Returns:the list of cities for this region
Return type:list
static resetInstance()[source]
class geograpy.locator.Region[source]

Bases: object

a Region (Subdivision)

static fromGeoLite2(record)[source]

create a region from a Geolite2 record

Parameters:record (dict) – the records as returned from a Query
Returns:the corresponding region information
Return type:Region
static fromWikidata(record)[source]

create a region from a Wikidata record

Parameters:record (dict) – the records as returned from a Query
Returns:the corresponding region information
Return type:Region
geograpy.locator.main(argv=None)[source]

main program.

geograpy.places module

class geograpy.places.PlaceContext(place_names, setAll=True)[source]

Bases: geograpy.locator.Locator

Adds context information to a place name

get_region_names(country_name)[source]
setAll()[source]

Set all context information

set_cities()[source]

set the cities information

set_countries()[source]

get the country information from my places

set_other()[source]
set_regions()[source]

geograpy.prefixtree module

geograpy.utils module

geograpy.utils.fuzzy_match(s1, s2, max_dist=0.8)[source]

Fuzzy match the given two strings with the given maximum distance :param s1: string: First string :param s2: string: Second string :param max_dist: float: The distance - default: 0.8

Returns:jellyfish jaro_winkler_similarity based on https://en.wikipedia.org/wiki/Jaro-Winkler_distance
Return type:float
geograpy.utils.remove_non_ascii(s)[source]

Remove non ascii chars from the given string :param s: string: The string to remove chars from

Returns:The result string with non-ascii chars removed
Return type:string

Hat tip: http://stackoverflow.com/a/1342373/2367526

geograpy.wikidata module

Created on 2020-09-23

@author: wf

class geograpy.wikidata.Wikidata(endpoint='https://query.wikidata.org/sparql')[source]

Bases: object

Wikidata access

getCities(region=None, country=None)[source]

get the cities from Wikidata

getCityPopulations(profile=True)[source]

get the city populations from Wikidata

Parameters:profile (bool) – if True show profiling information
getCountries()[source]

get a list of countries

try query

getRegions()[source]

get Regions from Wikidata

try query

Module contents

main geograpy 3 module

geograpy.get_geoPlace_context(url=None, text=None, debug=False)[source]

Get a place context for a given text with information about country, region, city and other based on NLTK Named Entities having the Geographic(GPE) label.

Parameters:
  • url (String) – the url to read text from (if any)
  • text (String) – the text to analyze
  • debug (boolean) – if True show debug information
Returns:

PlaceContext: the place context

Return type:

places

geograpy.get_place_context(url=None, text=None, labels=['GPE', 'GSP', 'PERSON', 'ORGANIZATION'], debug=False)[source]

Get a place context for a given text with information about country, region, city and other based on NLTK Named Entities in the label set Geographic(GPE), Person(PERSON) and Organization(ORGANIZATION).

Parameters:
  • url (String) – the url to read text from (if any)
  • text (String) – the text to analyze
  • debug (boolean) – if True show debug information
Returns:

PlaceContext: the place context

Return type:

pc

geograpy.locateCity(location, correctMisspelling=False, debug=False)[source]

locate the given location string :param location: the description of the location :type location: string

Returns:the location
Return type:Locator