geograpy package


geograpy.extraction module

class geograpy.extraction.Extractor(text=None, url=None, debug=False)[source]

Bases: object

Extract geo context for text or from url

find_entities(labels=['GPE', 'GSP', 'PERSON', 'ORGANIZATION'])[source]

Find entities with the given labels set self.places and returns it :param labels: Labels: The labels to filter

Returns:List of places
Return type:list

Find geographic entities

Returns:List of places
Return type:list

Setter for text

split(delimiter=', ')[source]

simpler regular expression splitter with not entity check

hat tip:

geograpy.labels module

Created on 2020-09-10

@author: wf

class geograpy.labels.Labels[source]

Bases: object

NLTK labels

default = ['GPE', 'GSP', 'PERSON', 'ORGANIZATION']
geo = ['GPE', 'GSP']

geograpy.locator module

The locator module allows to get detailed city information including the region and country of a city from a location string.

Examples for location strings are:

Amsterdam, Netherlands Vienna, Austria Vienna, IL Paris - Texas Paris TX

the locator will lookup the cities and try to disambiguate the result based on the country or region information found.

The results in string representationa are:

Amsterdam (NH(North Holland) - NL(Netherlands)) Vienna (9(Vienna) - AT(Austria)) Vienna (IL(Illinois) - US(United States)) Paris (TX(Texas) - US(United States)) Paris (TX(Texas) - US(United States))

Each city returned has a city.region and attribute with the details of the city.

Created on 2020-09-18

@author: wf

class geograpy.locator.City(**kwargs)[source]

Bases: geograpy.locator.Location

a single city as an object

static fromGeoLite2(record)[source]
classmethod getSamples()[source]
setValue(name, record)[source]

set a field value with the given name to the given record dicts corresponding entry or none

  • name (string) – the name of the field
  • record (dict) – the dict to get the value from
class geograpy.locator.CityManager(name: str = 'CityManager', config: lodstorage.storageconfig.StorageConfig = None, debug=False)[source]

Bases: geograpy.locator.LocationManager

a list of cities

classmethod fromJSONBackup(config: lodstorage.storageconfig.StorageConfig = None)[source]

get city list from json backup (json backup is based on wikidata query results)

Parameters:jsonStr (str) – JSON string the CityList should be loaded from. If None json backup is loaded. Default is None
Returns:CityList based on the json backup
classmethod fromWikidata(fromBackup: bool = True, countryIDs: list = None, regionIDs: list = None, config: lodstorage.storageconfig.StorageConfig = None)[source]

get city list form wikidata

  • fromBackup (bool) – If True instead of querying wikidata a backup of the wikidata results is used to create the city list. Otherwise wikidata is queried for the city data. Default is True
  • countryIDs (list) – List of countryWikiDataIDs. Limits the returned cities to the given countries
  • regionIDs (list) – List of regionWikiDataIDs. Limits the returned cities to the given regions

CityList based wikidata query results

classmethod getLocationLodFromJsonBackup()[source]
updateCity(wikidataid: str, cityRecord: dict)[source]

Updates the city corresponding to the given city with the given data. If the city does not exist a new city object is created and added to this CityList :param wikidataid: wikidata id of the city that should be updated/added :type wikidataid: str :param cityRecord: data of the given city that should be updated/added :type cityRecord: dict

class geograpy.locator.Country(lookupSource='sqlDB', **kwargs)[source]

Bases: geograpy.locator.Location

a country

static fromGeoLite2(record)[source]

create a country from a geolite2 record

static fromPyCountry(pcountry)[source]
Parameters:pcountry (PyCountry) – a country as gotten from pycountry
Returns:the country
Return type:Country
classmethod getSamples()[source]
class geograpy.locator.CountryManager(name: str = 'CountryManager', config: lodstorage.storageconfig.StorageConfig = None, debug=False)[source]

Bases: geograpy.locator.LocationManager

a list of countries

classmethod fromErdem()[source]

get country list provided by Erdem Ozkol

classmethod fromJSONBackup(config: lodstorage.storageconfig.StorageConfig = None)[source]

get country list from json backup (json backup is based on wikidata query results)

Returns:CountryList based on the json backup
classmethod fromWikidata()[source]

get country list form wikidata

classmethod from_sqlDb(sqlDB)[source]
classmethod getLocationLodFromJsonBackup()[source]
class geograpy.locator.Earth[source]

Bases: object

radius = 6371.0
class geograpy.locator.Location(**kwargs)[source]

Bases: lodstorage.jsonable.JSONAble

Represents a Location

balltreeQueryResultToLocationManager(distances, indices, lookupListOfLocations)[source]

convert the given ballTree Query Result to a LocationManager

  • distances (list) – array of distances
  • indices (list) – array of indices
  • lookupListOfLocations (list) – a list of valid locations to use for lookup

a list of result Location/distance tuples

Return type:


distance(other) → float[source]

calculate the distance to another Location

Parameters:other (Location) – the other location
Returns:the haversine distance in km
getLocationsWithinRadius(lookupLocationManager, radiusKm: float)[source]

Gives the n closest locations to me from the given lookupListOfLocations

  • lookupLocationManager (LocationManager) – a LocationManager object to use for lookup
  • radiusKm (float) – the radius in which to check (in km)

a list of result Location/distance tuples

Return type:


getNClosestLocations(lookupLocationManager, n: int)[source]

Gives a list of up to n locations which have the shortest distance to me as calculated from the given listOfLocations

  • lookupLocationManager (LocationManager) – a LocationManager object to use for lookup
  • n (int) – the maximum number of closest locations to return

a list of result Location/distance tuples

Return type:


classmethod getSamples()[source]
static haversine(lon1, lat1, lon2, lat2)[source]

Calculate the great circle distance between two points on the earth (specified in decimal degrees)

isKnownAs(name) → bool[source]

Checks if this location is known under the given name

Parameters:name (str) – name the location should be checked against
Returns:True if the given name is either the name of the location or present in the labels of the location
class geograpy.locator.LocationContext(countryManager: geograpy.locator.CountryManager, regionManager: geograpy.locator.RegionManager, cityManager: geograpy.locator.CityManager)[source]

Bases: object

Holds LocationManagers of all hierarchy levels and provides methods to traverse through the levels

classmethod fromCache(config: lodstorage.storageconfig.StorageConfig = None, forceUpdate: bool = False)[source]

Inits a LocationContext form Cache if existent otherwise init cache

classmethod fromJSONBackup(config: lodstorage.storageconfig.StorageConfig = None)[source]

Inits a LocationContext form the JSON backup

getCities(name: str)[source]

Returns all cities that are known under the given name

getCountries(name: str)[source]

Returns all countries that are known under the given name

static getDefaultConfig() → lodstorage.storageconfig.StorageConfig[source]

Returns default StorageConfig

getRegions(name: str)[source]

Returns all regions that are known under the given name


Interlinks locations by adding the hierarchy references to the locations


Get possible locations for the given location names. Current prioritization of the results is city(ordered by population)→region→country ToDo: Extend the ranking of the results e.g. matching of multiple location parts increase ranking :param *locations:


class geograpy.locator.LocationManager(name: str, entityName: str, entityPluralName: str, listName: str = None, clazz=None, primaryKey: str = None, config: lodstorage.storageconfig.StorageConfig = None, handleInvalidListTypes=True, filterInvalidListTypes=False, debug=False)[source]

Bases: lodstorage.entity.EntityManager

a list of locations

static downloadBackupFile(url: str, fileName: str, force: bool = False)[source]

Downloads from the given url the zip-file and extracts the file corresponding to the given fileName.

  • url – url linking to a downloadable gzip file
  • fileName – Name of the file that should be extracted from gzip file
  • force (bool) – True if the download should be forced

Name of the extracted file with path to the backup directory

static getBackupDirectory()[source]
getBallTuple(cache: bool = True)[source]

get the BallTuple=BallTree,validList of this location list

  • cache (bool) – if True calculate and use a cached version otherwise recalculate on
  • call of this function (every) –

a sklearn.neighbors.BallTree for the given list of locations, list: the valid list of locations list: valid list of locations

Return type:


getByName(name: str)[source]

Get locations matching given name :param name: Name of the location

Returns:Returns locations that match the given name
static getFileContent(path: str)[source]
getLocationByID(wikidataID: str)[source]

Returns the location object that corresponds to the given location

Parameters:wikidataID – wikidataid of the location that should be returned
Returns:Location object
static getURLContent(url: str)[source]
class geograpy.locator.Locator(db_file=None, correctMisspelling=False, debug=False)[source]

Bases: object

location handling


find cities with the given cityName

Parameters:cityName (string) – the potential name of a city
Returns:a list of city records

correct potential misspellings :param name: the name of the country potentially misspelled :type name: string

Returns:correct name of unchanged
Return type:string

check whether the database has data / is populated

Returns:True if the cities table exists and has more than one record
Return type:boolean
db_recordCount(tableList, tableName)[source]

count the number of records for the given tableName

  • tableList (list) – the list of table to check
  • tableName (str) – the name of the table to check
int: the number of records found for the table
disambiguate(country, regions, cities, byPopulation=True)[source]

try determining country, regions and city from the potential choices

  • country (Country) – a matching country found
  • regions (list) – a list of matching Regions found
  • cities (list) – a list of matching cities found

the found city or None

Return type:



get the aliases hashTable


get the country for the given name :param name: the name of the country to lookup :type name: string

Returns:the country if one was found or None if not
Return type:country

get the Geolite2 City-Locations as a list of Dicts

Returns:a list of Geolite2 City-Locator dicts
Return type:list
static getInstance(correctMisspelling=False, debug=False)[source]

get the singleton instance of the Locator. If parameters are changed on further calls the initial parameters will still be in effect since the original instance will be returned!

  • correctMispelling (bool) – if True correct typical misspellings
  • debug (bool) – if True show debug information

get the view to be used

Returns:the SQL view to be used for CityLookups e.g. GeoLite2CityLookup
Return type:str
getWikidataCityPopulation(sqlDB, endpoint=None)[source]
  • sqlDB (SQLDB) – target SQL database
  • endpoint (str) – url of the wikidata endpoint or None if default should be used
static isISO(s)[source]

check if the given string is an ISO code

Returns:True if the string is an ISO Code
Return type:bool

check if the given string name is a country

Parameters:name (string) – the string to check
Returns:if pycountry thinks the string is a country
Return type:True

locate a city, region country combination based on the given wordtoken information

  • places (list) – a list of places derived by splitting a locality e.g. “San Francisco, CA”
  • to "San Francisco", "CA" (leads) –

a city with country and region details

Return type:


locator = None
places_by_name(placeName, columnName)[source]

get places by name and column :param placeName: the name of the place :type placeName: string :param columnName: the column to look at :type columnName: string


populate countries and regions from Wikidata

Parameters:sqlDB (SQLDB) – target SQL database

populate the given sqlDB with the Geolite2 Cities

Parameters:sqlDB (SQLDB) – the SQL database to use

populate the given sqlDB with the Wikidata Cities

Parameters:sqlDB (SQLDB) – target SQL database

populate database with countries from wikiData

Parameters:sqlDB (SQLDB) – target SQL database

populate database with regions from wikiData

Parameters:sqlDB (SQLDB) – target SQL database

populate the version table

Parameters:sqlDB (SQLDB) – target SQL database

populate the cities SQL database which caches the information from the GeoLite2-City-Locations.csv file

Parameters:force (bool) – if True force a recreation of the database

recreate my lookup database


get the regions for the given region_name (which might be an ISO code)

Parameters:region_name (string) – region name
Returns:the list of cities for this region
Return type:list
static resetInstance()[source]
class geograpy.locator.Region(**kwargs)[source]

Bases: geograpy.locator.Location

a Region (Subdivision)

static fromGeoLite2(record)[source]

create a region from a Geolite2 record

Parameters:record (dict) – the records as returned from a Query
Returns:the corresponding region information
Return type:Region
static fromWikidata(record)[source]

create a region from a Wikidata record

Parameters:record (dict) – the records as returned from a Query
Returns:the corresponding region information
Return type:Region
classmethod getSamples()[source]
class geograpy.locator.RegionManager(name: str = 'RegionManager', config: lodstorage.storageconfig.StorageConfig = None, debug=False)[source]

Bases: geograpy.locator.LocationManager

a list of regions

classmethod fromJSONBackup(config: lodstorage.storageconfig.StorageConfig = None)[source]

get region list from json backup (json backup is based on wikidata query results)

Returns:RegionList based on the json backup
classmethod fromWikidata(config: lodstorage.storageconfig.StorageConfig = None)[source]

get region list form wikidata

classmethod from_sqlDb(sqlDB, config: lodstorage.storageconfig.StorageConfig = None)[source]
classmethod getLocationLodFromJsonBackup()[source]

main program.

geograpy.places module

class geograpy.places.PlaceContext(place_names, setAll=True)[source]

Bases: geograpy.locator.Locator

Adds context information to a place name


Set all context information


set the cities information


get the country information from my places


geograpy.prefixtree module

geograpy.utils module

geograpy.utils.fuzzy_match(s1, s2, max_dist=0.8)[source]

Fuzzy match the given two strings with the given maximum distance :param s1: string: First string :param s2: string: Second string :param max_dist: float: The distance - default: 0.8

Returns:jellyfish jaro_winkler_similarity based on
Return type:float

Remove non ascii chars from the given string :param s: string: The string to remove chars from

Returns:The result string with non-ascii chars removed
Return type:string

Hat tip:

geograpy.wikidata module

Created on 2020-09-23

@author: wf

class geograpy.wikidata.Wikidata(endpoint='')[source]

Bases: object

Wikidata access

getCities(region=None, country=None)[source]

get the cities from Wikidata

  • region – List of countryWikiDataIDs. Limits the returned cities to the given countries
  • country – List of regionWikiDataIDs. Limits the returned cities to the given regions
getCitiesOfRegion(regionWikidataId: str, limit: int)[source]

Queries the cities of the given region. If the region is a city state the region is returned as city. The cities are ordered by population and can be limited by the given limit attribute.

  • regionWikidataId – wikidata id of the region the cities should be queried for
  • limit – Limits the amount of returned cities

Returns list of cities of the given region ordered by population


get the city populations from Wikidata

Parameters:profile (bool) – if True show profiling information
static getCoordinateComponents(coordinate: str) -> (<class 'float'>, <class 'float'>)[source]

Converts the wikidata coordinate representation into its subcomponents longitude and latitude Example: ‘Point(-118.25 35.05694444)’ results in (‘-118.25’ ‘35.05694444’)

Parameters:coordinate – coordinate value in the format as returned by wikidata queries
Returns:Returns the longitude and latitude of the given coordinate as separate values

get a list of countries

try query


get Regions from Wikidata

try query

static getValuesClause(varName: str, values, wikidataEntities: bool = True)[source]

generates the SPARQL value clause for the given variable name containing the given values :param varName: variable name for the ValuesClause :param values: values for the clause :param wikidataEntities: if true the wikidata prefix is added to the values otherwise it is expected taht the given values are proper IRIs :type wikidataEntities: bool

static getWikidataId(wikidataURL: str)[source]

Extracts the wikidata id from the given wikidata URL

Parameters:wikidataURL – wikidata URL the id should be extracted from
Returns:The wikidata id if present in the given wikidata URL otherwise None

Module contents

main geograpy 3 module

geograpy.get_geoPlace_context(url=None, text=None, debug=False)[source]

Get a place context for a given text with information about country, region, city and other based on NLTK Named Entities having the Geographic(GPE) label.

  • url (String) – the url to read text from (if any)
  • text (String) – the text to analyze
  • debug (boolean) – if True show debug information

PlaceContext: the place context

Return type:


geograpy.get_place_context(url=None, text=None, labels=['GPE', 'GSP', 'PERSON', 'ORGANIZATION'], debug=False)[source]

Get a place context for a given text with information about country, region, city and other based on NLTK Named Entities in the label set Geographic(GPE), Person(PERSON) and Organization(ORGANIZATION).

  • url (String) – the url to read text from (if any)
  • text (String) – the text to analyze
  • debug (boolean) – if True show debug information

PlaceContext: the place context

Return type:


geograpy.locateCity(location, correctMisspelling=False, debug=False)[source]

locate the given location string :param location: the description of the location :type location: string

Returns:the location
Return type:Locator