geograpy package


geograpy.extraction module

class geograpy.extraction.Extractor(text=None, url=None, debug=False)[source]

Bases: object

Extract geo context for text or from url

find_entities(labels=['GPE', 'GSP', 'PERSON', 'ORGANIZATION'])[source]

Find entities with the given labels set self.places and returns it :param labels: Labels: The labels to filter

Returns:List of places
Return type:list

Find geographic entities

Returns:List of places
Return type:list

Setter for text

split(delimiter=', ')[source]

simpler regular expression splitter with not entity check

hat tip:

geograpy.labels module

Created on 2020-09-10

@author: wf

class geograpy.labels.Labels[source]

Bases: object

NLTK labels

default = ['GPE', 'GSP', 'PERSON', 'ORGANIZATION']
geo = ['GPE', 'GSP']

geograpy.locator module

The locator module allows to get detailed city information including the region and country of a city from a location string.

Examples for location strings are:

Amsterdam, Netherlands Vienna, Austria Vienna, IL Paris - Texas Paris TX

the locator will lookup the cities and try to disambiguate the result based on the country or region information found.

The results in string representationa are:

Amsterdam (NH(North Holland) - NL(Netherlands)) Vienna (9(Vienna) - AT(Austria)) Vienna (IL(Illinois) - US(United States)) Paris (TX(Texas) - US(United States)) Paris (TX(Texas) - US(United States))

Each city returned has a city.region and attribute with the details of the city.

Created on 2020-09-18

@author: wf

class geograpy.locator.City(**kwargs)[source]

Bases: geograpy.locator.Location

a single city as an object

static fromCityLookup(cityLookupRecord: dict)[source]

create a city from a cityLookupRecord and setting City, Region and Country while at it :param cityRecord: a map derived from the CityLookup view :type cityRecord: dict

classmethod getSamples()[source]
setValue(name, record)[source]

set a field value with the given name to the given record dicts corresponding entry or none

  • name (string) – the name of the field
  • record (dict) – the dict to get the value from
class geograpy.locator.CityManager(name: str = 'CityManager', config: lodstorage.storageconfig.StorageConfig = None, debug=False)[source]

Bases: geograpy.locator.LocationManager

a list of cities

classmethod getJsonFiles(config: lodstorage.storageconfig.StorageConfig) → list[source]

get the list of the json files that have my data

Returns:a list of json file names
Return type:list
class geograpy.locator.Country(lookupSource='sqlDB', **kwargs)[source]

Bases: geograpy.locator.Location

a country

static fromCountryLookup(countryLookupRecord: dict)[source]

create a region from a regionLookupRecord and setting Region and Country while at it :param regionRecord: a map derived from the CityLookup view :type regionRecord: dict

classmethod getSamples()[source]
class geograpy.locator.CountryManager(name: str = 'CountryManager', config: lodstorage.storageconfig.StorageConfig = None, debug=False)[source]

Bases: geograpy.locator.LocationManager

a list of countries

classmethod fromErdem()[source]

get country list provided by Erdem Ozkol

class geograpy.locator.Earth[source]

Bases: object

radius = 6371.0
class geograpy.locator.Location(**kwargs)[source]

Bases: lodstorage.jsonable.JSONAble

Represents a Location

balltreeQueryResultToLocationManager(distances, indices, lookupListOfLocations)[source]

convert the given ballTree Query Result to a LocationManager

  • distances (list) – array of distances
  • indices (list) – array of indices
  • lookupListOfLocations (list) – a list of valid locations to use for lookup

a list of result Location/distance tuples

Return type:


distance(other) → float[source]

calculate the distance to another Location

Parameters:other (Location) – the other location
Returns:the haversine distance in km
classmethod fromRecord(regionRecord: dict)[source]

create a location from a dict record

Parameters:regionRecord (dict) – the records as returned from a Query
Returns:the corresponding region information
Return type:Region
getLocationsWithinRadius(lookupLocationManager, radiusKm: float)[source]

Gives the n closest locations to me from the given lookupListOfLocations

  • lookupLocationManager (LocationManager) – a LocationManager object to use for lookup
  • radiusKm (float) – the radius in which to check (in km)

a list of result Location/distance tuples

Return type:


getNClosestLocations(lookupLocationManager, n: int)[source]

Gives a list of up to n locations which have the shortest distance to me as calculated from the given listOfLocations

  • lookupLocationManager (LocationManager) – a LocationManager object to use for lookup
  • n (int) – the maximum number of closest locations to return

a list of result Location/distance tuples

Return type:


classmethod getSamples()[source]
static haversine(lon1, lat1, lon2, lat2)[source]

Calculate the great circle distance between two points on the earth (specified in decimal degrees)

isKnownAs(name) → bool[source]

Checks if this location is known under the given name

Parameters:name (str) – name the location should be checked against
Returns:True if the given name is either the name of the location or present in the labels of the location
static mappedDict(record, keyMapList: list)[source]
static partialDict(record, clazz, keys=None)[source]
class geograpy.locator.LocationContext(countryManager: geograpy.locator.CountryManager, regionManager: geograpy.locator.RegionManager, cityManager: geograpy.locator.CityManager, config: lodstorage.storageconfig.StorageConfig)[source]

Bases: object

Holds LocationManagers of all hierarchy levels and provides methods to traverse through the levels

db_filename = 'locations.db'
classmethod fromCache(config: lodstorage.storageconfig.StorageConfig = None, forceUpdate: bool = False)[source]

Inits a LocationContext form Cache if existent otherwise init cache

  • config (StorageConfig) – configuration of the cache if None the default config is used
  • forceUpdate (bool) – If True an existent cache will be over written
static getDefaultConfig() → lodstorage.storageconfig.StorageConfig[source]

Returns default StorageConfig

interlinkLocations(warnOnDuplicates: bool = True, profile=True)[source]

Interlinks locations by adding the hierarchy references to the locations

Parameters:warnOnDuplicates (bool) – if there are duplicates warn
load(forceUpdate: bool = False, warnOnDuplicates: bool = False)[source]

load my data

locateLocation(*locations, verbose: bool = False)[source]

Get possible locations for the given location names. Current prioritization of the results is city(ordered by population)→region→country ToDo: Extend the ranking of the results e.g. matching of multiple location parts increase ranking :param *locations: :param verbose: If True combinations of locations names are used to improve the search results. (Increases lookup time) :type verbose: bool


class geograpy.locator.LocationManager(name: str, entityName: str, entityPluralName: str, listName: str = None, tableName: str = None, clazz=None, primaryKey: str = None, config: lodstorage.storageconfig.StorageConfig = None, handleInvalidListTypes=True, filterInvalidListTypes=False, debug=False)[source]

Bases: lodstorage.entity.EntityManager

a list of locations


add the given location to me

Parameters:location (object) – the location to be added and put in my hash map
classmethod downloadBackupFileFromGitHub(fileName: str, targetDirectory: str = None, force: bool = False)[source]

download the given fileName from the github data directory

  • fileName (str) – the filename to download
  • targetDirectory (str) – download the file this directory
  • force (bool) – force the overwriting of the existent file

the local file

Return type:


fromCache(force=False, getListOfDicts=None, sampleRecordCount=-1)[source]

get me from the cache

static getBackupDirectory()[source]
getBallTuple(cache: bool = True)[source]

get the BallTuple=BallTree,validList of this location list

  • cache (bool) – if True calculate and use a cached version otherwise recalculate on
  • call of this function (every) –

a sklearn.neighbors.BallTree for the given list of locations, list: the valid list of locations list: valid list of locations

Return type:



Get locations matching given names :param name: Name of the location

Returns:Returns locations that match the given name
getLocationByID(wikidataID: str)[source]

Returns the location object that corresponds to the given location

Parameters:wikidataID – wikidataid of the location that should be returned
Returns:Location object
getLocationByIsoCode(isoCode: str)[source]

Get possible locations matching the given isoCode :param isoCode: isoCode of possible Locations

Returns:List of wikidata ids of locations matching the given isoCode

Returns Location objects for the given wikidataids :param *wikidataId: wikidataIds of the locations that should be returned :type *wikidataId: str

Returns:Location objects matching the given wikidataids
class geograpy.locator.Locator(db_file=None, correctMisspelling=False, storageConfig: lodstorage.storageconfig.StorageConfig = None, debug=False)[source]

Bases: object

location handling


find cities with the given cityName

Parameters:cityName (string) – the potential name of a city
Returns:a list of city records

correct potential misspellings :param name: the name of the country potentially misspelled :type name: string

Returns:correct name of unchanged
Return type:string

check whether the database has data / is populated

Returns:True if the cities table exists and has more than one record
Return type:boolean
db_recordCount(tableList, tableName)[source]

count the number of records for the given tableName

  • tableList (list) – the list of table to check
  • tableName (str) – the name of the table to check
int: the number of records found for the table
disambiguate(country, regions, cities, byPopulation=True)[source]

try determining country, regions and city from the potential choices

  • country (Country) – a matching country found
  • regions (list) – a list of matching Regions found
  • cities (list) – a list of matching cities found

the found city or None

Return type:


downloadDB(forceUpdate: bool = False)[source]

download my database

Parameters:forceUpdate (bool) – force the overwriting of the existent file

get the aliases hashTable


get the country for the given name :param name: the name of the country to lookup :type name: string

Returns:the country if one was found or None if not
Return type:country
static getInstance(correctMisspelling=False, debug=False)[source]

get the singleton instance of the Locator. If parameters are changed on further calls the initial parameters will still be in effect since the original instance will be returned!

  • correctMispelling (bool) – if True correct typical misspellings
  • debug (bool) – if True show debug information

get the view to be used

Returns:the SQL view to be used for CityLookups e.g. CityLookup
Return type:str
static isISO(s)[source]

check if the given string is an ISO code (ISO 3166-2 code) see

Returns:True if the string might be an ISO Code as per a regexp check
Return type:bool

check if the given string name is a country

Parameters:name (string) – the string to check
Returns:if pycountry thinks the string is a country
Return type:True

loads the database from cache and sets it as sqlDB property

locateCity(places: list)[source]

locate a city, region country combination based on the given wordtoken information

  • places (list) – a list of places derived by splitting a locality e.g. “San Francisco, CA”
  • to "San Francisco", "CA" (leads) –

a city with country and region details

Return type:


locator = None
normalizePlaces(places: list)[source]

normalize places

Parameters:places (list) –
Returns:stripped and aliased list of places
Return type:list
places_by_name(placeName, columnName)[source]

get places by name and column :param placeName: the name of the place :type placeName: string :param columnName: the column to look at :type columnName: string


populate the given sqlDB with the Wikidata Cities

Parameters:sqlDB (SQLDB) – target SQL database

populate database with countries from wikiData

Parameters:sqlDB (SQLDB) – target SQL database

populate database with regions from wikiData

Parameters:sqlDB (SQLDB) – target SQL database

populate the version table

Parameters:sqlDB (SQLDB) – target SQL database

populate the cities SQL database which caches the information from the GeoLite2-City-Locations.csv file

Parameters:force (bool) – if True force a recreation of the database
readCSV(fileName: str)[source]

read the given CSV file

Parameters:fileName (str) – the filename to read

recreate my lookup database


get the regions for the given region_name (which might be an ISO code)

Parameters:region_name (string) – region name
Returns:the list of cities for this region
Return type:list
static resetInstance()[source]
class geograpy.locator.Region(**kwargs)[source]

Bases: geograpy.locator.Location

a Region (Subdivision)

static fromRegionLookup(regionLookupRecord: dict)[source]

create a region from a regionLookupRecord and setting Region and Country while at it :param regionRecord: a map derived from the CityLookup view :type regionRecord: dict

classmethod getSamples()[source]
class geograpy.locator.RegionManager(name: str = 'RegionManager', config: lodstorage.storageconfig.StorageConfig = None, debug=False)[source]

Bases: geograpy.locator.LocationManager

a list of regions


main program.

geograpy.places module

class geograpy.places.PlaceContext(place_names: list, setAll: bool = True, correctMisspelling: bool = False)[source]

Bases: geograpy.locator.Locator

Adds context information to a place name

getRegions(countryName: str) → list[source]

get a list of regions for the given countryName

countryName(str): the countryName to check

get_region_names(countryName: str) → list[source]

get region names for the given country

Parameters:countryName (str) – the name of the country

Set all context information


set the cities information


get the country information from my places


get the region information from my places (limited to the already identified countries)

geograpy.prefixtree module

geograpy.utils module

class geograpy.utils.Download[source]

Bases: object

Utility functions for downloading data

static downloadBackupFile(url: str, fileName: str, targetDirectory: str, force: bool = False)[source]

Downloads from the given url the zip-file and extracts the file corresponding to the given fileName.

  • url – url linking to a downloadable gzip file
  • fileName – Name of the file that should be extracted from gzip file
  • targetDirectory (str) – download the file this directory
  • force (bool) – True if the download should be forced

Name of the extracted file with path to the backup directory

static getFileContent(path: str)[source]
static getURLContent(url: str)[source]
static needsDownload(filePath: str, force: bool = False) → bool[source]

check if a download of the given filePath is necessary that is the file does not exist has a size of zero or the download should be forced

  • filePath (str) – the path of the file to be checked
  • force (bool) – True if the result should be forced to True

True if a download for this file needed

Return type:


class geograpy.utils.Profiler(msg, profile=True)[source]

Bases: object

simple profiler


time the action and print if profile is active

geograpy.utils.fuzzy_match(s1, s2, max_dist=0.8)[source]

Fuzzy match the given two strings with the given maximum distance jellyfish jaro_winkler_similarity based on :param s1: string: First string :param s2: string: Second string :param max_dist: float: The distance - default: 0.8

Returns:True if the match is greater equals max_dist. Otherwise false

Remove non ascii chars from the given string :param s: string: The string to remove chars from

Returns:The result string with non-ascii chars removed
Return type:string

Hat tip:

geograpy.wikidata module

Created on 2020-09-23

@author: wf

class geograpy.wikidata.Wikidata(endpoint='', profile: bool = True)[source]

Bases: object

Wikidata access


get all human settlements as list of dict with duplicates for label, region, country …

getCitiesForRegion(regionId, msg)[source]

get the cities for the given Region


get city states from Wikidata

try query

static getCoordinateComponents(coordinate: str) -> (<class 'float'>, <class 'float'>)[source]

Converts the wikidata coordinate representation into its subcomponents longitude and latitude Example: ‘Point(-118.25 35.05694444)’ results in (‘-118.25’ ‘35.05694444’)

Parameters:coordinate – coordinate value in the format as returned by wikidata queries
Returns:Returns the longitude and latitude of the given coordinate as separate values

get a list of countries

try query


get Regions from Wikidata

try query

static getValuesClause(varName: str, values, wikidataEntities: bool = True)[source]

generates the SPARQL value clause for the given variable name containing the given values :param varName: variable name for the ValuesClause :param values: values for the clause :param wikidataEntities: if true the wikidata prefix is added to the values otherwise it is expected taht the given values are proper IRIs :type wikidataEntities: bool

static getWikidataId(wikidataURL: str)[source]

Extracts the wikidata id from the given wikidata URL

Parameters:wikidataURL – wikidata URL the id should be extracted from
Returns:The wikidata id if present in the given wikidata URL otherwise None
query(msg, queryString: str, limit=None) → list[source]

get the query result

  • msg (str) – the profile message to display
  • queryString (str) – the query to execute

the list of dicts with the result

Return type:


store2DB(lod, tableName: str, primaryKey: str = None, sqlDB=None)[source]

store the given list of dicts to the database

  • lod (list) – the list of dicts
  • tableName (str) – the table name to use
  • primaryKey (str) – primary key (if any)
  • sqlDB (SQLDB) – target SQL database

Module contents

main geograpy 3 module

geograpy.get_geoPlace_context(url=None, text=None, debug=False)[source]

Get a place context for a given text with information about country, region, city and other based on NLTK Named Entities having the Geographic(GPE) label.

  • url (String) – the url to read text from (if any)
  • text (String) – the text to analyze
  • debug (boolean) – if True show debug information

PlaceContext: the place context

Return type:


geograpy.get_place_context(url=None, text=None, labels=['GPE', 'GSP', 'PERSON', 'ORGANIZATION'], debug=False)[source]

Get a place context for a given text with information about country, region, city and other based on NLTK Named Entities in the label set Geographic(GPE), Person(PERSON) and Organization(ORGANIZATION).

  • url (String) – the url to read text from (if any)
  • text (String) – the text to analyze
  • debug (boolean) – if True show debug information

PlaceContext: the place context

Return type:


geograpy.locateCity(location, correctMisspelling=False, debug=False)[source]

locate the given location string :param location: the description of the location :type location: string

Returns:the location
Return type:Locator