geograpy package

Submodules

geograpy.extraction module

class geograpy.extraction.Extractor(text=None, url=None, debug=False)[source]

Bases: object

Extract geo context for text or from url

find_entities(labels=['GPE', 'GSP', 'PERSON', 'ORGANIZATION'])[source]

Find entities with the given labels set self.places and returns it :param labels: Labels: The labels to filter

Returns:List of places
Return type:list
find_geoEntities()[source]

Find geographic entities

Returns:List of places
Return type:list
set_text()[source]

Setter for text

split(delimiter=', ')[source]

simpler regular expression splitter with not entity check

hat tip: https://stackoverflow.com/a/1059601/1497139

geograpy.labels module

Created on 2020-09-10

@author: wf

class geograpy.labels.Labels[source]

Bases: object

NLTK labels

default = ['GPE', 'GSP', 'PERSON', 'ORGANIZATION']
geo = ['GPE', 'GSP']

geograpy.locator module

The locator module allows to get detailed city information including the region and country of a city from a location string.

Examples for location strings are:

Amsterdam, Netherlands Vienna, Austria Vienna, IL Paris - Texas Paris TX

the locator will lookup the cities and try to disambiguate the result based on the country or region information found.

The results in string representationa are:

Amsterdam (NH(North Holland) - NL(Netherlands)) Vienna (9(Vienna) - AT(Austria)) Vienna (IL(Illinois) - US(United States)) Paris (TX(Texas) - US(United States)) Paris (TX(Texas) - US(United States))

Each city returned has a city.region and city.country attribute with the details of the city.

Created on 2020-09-18

@author: wf

class geograpy.locator.City(**kwargs)[source]

Bases: geograpy.locator.Location

a single city as an object

country
static fromGeoLite2(record)[source]
classmethod getSamples()[source]
region
setValue(name, record)[source]

set a field value with the given name to the given record dicts corresponding entry or none

Parameters:
  • name (string) – the name of the field
  • record (dict) – the dict to get the value from
class geograpy.locator.CityManager(name: str = 'CityManager', config: lodstorage.storageconfig.StorageConfig = None, debug=False)[source]

Bases: geograpy.locator.LocationManager

a list of cities

classmethod fromJSONBackup(config: lodstorage.storageconfig.StorageConfig = None)[source]

get city list from json backup (json backup is based on wikidata query results)

Parameters:jsonStr (str) – JSON string the CityList should be loaded from. If None json backup is loaded. Default is None
Returns:CityList based on the json backup
classmethod fromWikidata(fromBackup: bool = True, countryIDs: list = None, regionIDs: list = None, config: lodstorage.storageconfig.StorageConfig = None)[source]

get city list form wikidata

Parameters:
  • fromBackup (bool) – If True instead of querying wikidata a backup of the wikidata results is used to create the city list. Otherwise wikidata is queried for the city data. Default is True
  • countryIDs (list) – List of countryWikiDataIDs. Limits the returned cities to the given countries
  • regionIDs (list) – List of regionWikiDataIDs. Limits the returned cities to the given regions
Returns:

CityList based wikidata query results

classmethod getLocationLodFromJsonBackup()[source]
updateCity(wikidataid: str, cityRecord: dict)[source]

Updates the city corresponding to the given city with the given data. If the city does not exist a new city object is created and added to this CityList :param wikidataid: wikidata id of the city that should be updated/added :type wikidataid: str :param cityRecord: data of the given city that should be updated/added :type cityRecord: dict

Returns:Nothing
class geograpy.locator.Country(lookupSource='sqlDB', **kwargs)[source]

Bases: geograpy.locator.Location

a country

static fromGeoLite2(record)[source]

create a country from a geolite2 record

static fromPyCountry(pcountry)[source]
Parameters:pcountry (PyCountry) – a country as gotten from pycountry
Returns:the country
Return type:Country
classmethod getSamples()[source]
class geograpy.locator.CountryManager(name: str = 'CountryManager', config: lodstorage.storageconfig.StorageConfig = None, debug=False)[source]

Bases: geograpy.locator.LocationManager

a list of countries

classmethod fromErdem()[source]

get country list provided by Erdem Ozkol https://github.com/erdem

classmethod fromJSONBackup(config: lodstorage.storageconfig.StorageConfig = None)[source]

get country list from json backup (json backup is based on wikidata query results)

Returns:CountryList based on the json backup
classmethod fromWikidata()[source]

get country list form wikidata

classmethod from_sqlDb(sqlDB)[source]
classmethod getLocationLodFromJsonBackup()[source]
class geograpy.locator.Earth[source]

Bases: object

radius = 6371.0
class geograpy.locator.Location(**kwargs)[source]

Bases: lodstorage.jsonable.JSONAble

Represents a Location

balltreeQueryResultToLocationManager(distances, indices, lookupListOfLocations)[source]

convert the given ballTree Query Result to a LocationManager

Parameters:
  • distances (list) – array of distances
  • indices (list) – array of indices
  • lookupListOfLocations (list) – a list of valid locations to use for lookup
Returns:

a list of result Location/distance tuples

Return type:

list

distance(other) → float[source]

calculate the distance to another Location

Parameters:other (Location) – the other location
Returns:the haversine distance in km
getLocationsWithinRadius(lookupLocationManager, radiusKm: float)[source]

Gives the n closest locations to me from the given lookupListOfLocations

Parameters:
  • lookupLocationManager (LocationManager) – a LocationManager object to use for lookup
  • radiusKm (float) – the radius in which to check (in km)
Returns:

a list of result Location/distance tuples

Return type:

list

getNClosestLocations(lookupLocationManager, n: int)[source]

Gives a list of up to n locations which have the shortest distance to me as calculated from the given listOfLocations

Parameters:
  • lookupLocationManager (LocationManager) – a LocationManager object to use for lookup
  • n (int) – the maximum number of closest locations to return
Returns:

a list of result Location/distance tuples

Return type:

list

classmethod getSamples()[source]
static haversine(lon1, lat1, lon2, lat2)[source]

Calculate the great circle distance between two points on the earth (specified in decimal degrees)

isKnownAs(name) → bool[source]

Checks if this location is known under the given name

Parameters:name (str) – name the location should be checked against
Returns:True if the given name is either the name of the location or present in the labels of the location
class geograpy.locator.LocationContext(countryManager: geograpy.locator.CountryManager, regionManager: geograpy.locator.RegionManager, cityManager: geograpy.locator.CityManager)[source]

Bases: object

Holds LocationManagers of all hierarchy levels and provides methods to traverse through the levels

cities
countries
classmethod fromCache(config: lodstorage.storageconfig.StorageConfig = None, forceUpdate: bool = False)[source]

Inits a LocationContext form Cache if existent otherwise init cache

classmethod fromJSONBackup(config: lodstorage.storageconfig.StorageConfig = None)[source]

Inits a LocationContext form the JSON backup

getCities(name: str)[source]

Returns all cities that are known under the given name

getCountries(name: str)[source]

Returns all countries that are known under the given name

static getDefaultConfig() → lodstorage.storageconfig.StorageConfig[source]

Returns default StorageConfig

getRegions(name: str)[source]

Returns all regions that are known under the given name

interlinkLocations()[source]

Interlinks locations by adding the hierarchy references to the locations

locateLocation(*locations)[source]

Get possible locations for the given location names. Current prioritization of the results is city(ordered by population)→region→country ToDo: Extend the ranking of the results e.g. matching of multiple location parts increase ranking :param *locations:

Returns:

regions
class geograpy.locator.LocationManager(name: str, entityName: str, entityPluralName: str, listName: str = None, clazz=None, primaryKey: str = None, config: lodstorage.storageconfig.StorageConfig = None, handleInvalidListTypes=True, filterInvalidListTypes=False, debug=False)[source]

Bases: lodstorage.entity.EntityManager

a list of locations

static downloadBackupFile(url: str, fileName: str, force: bool = False)[source]

Downloads from the given url the zip-file and extracts the file corresponding to the given fileName.

Parameters:
  • url – url linking to a downloadable gzip file
  • fileName – Name of the file that should be extracted from gzip file
  • force (bool) – True if the download should be forced
Returns:

Name of the extracted file with path to the backup directory

static getBackupDirectory()[source]
getBallTuple(cache: bool = True)[source]

get the BallTuple=BallTree,validList of this location list

Parameters:
  • cache (bool) – if True calculate and use a cached version otherwise recalculate on
  • call of this function (every) –
Returns:

a sklearn.neighbors.BallTree for the given list of locations, list: the valid list of locations list: valid list of locations

Return type:

BallTree,list

getByName(name: str)[source]

Get locations matching given name :param name: Name of the location

Returns:Returns locations that match the given name
static getFileContent(path: str)[source]
getLocationByID(wikidataID: str)[source]

Returns the location object that corresponds to the given location

Parameters:wikidataID – wikidataid of the location that should be returned
Returns:Location object
static getURLContent(url: str)[source]
class geograpy.locator.Locator(db_file=None, correctMisspelling=False, debug=False)[source]

Bases: object

location handling

cities_for_name(cityName)[source]

find cities with the given cityName

Parameters:cityName (string) – the potential name of a city
Returns:a list of city records
correct_country_misspelling(name)[source]

correct potential misspellings :param name: the name of the country potentially misspelled :type name: string

Returns:correct name of unchanged
Return type:string
createViews(sqlDB)[source]
db_has_data()[source]

check whether the database has data / is populated

Returns:True if the cities table exists and has more than one record
Return type:boolean
db_recordCount(tableList, tableName)[source]

count the number of records for the given tableName

Parameters:
  • tableList (list) – the list of table to check
  • tableName (str) – the name of the table to check
Returns
int: the number of records found for the table
disambiguate(country, regions, cities, byPopulation=True)[source]

try determining country, regions and city from the potential choices

Parameters:
  • country (Country) – a matching country found
  • regions (list) – a list of matching Regions found
  • cities (list) – a list of matching cities found
Returns:

the found city or None

Return type:

City

getAliases()[source]

get the aliases hashTable

getCountry(name)[source]

get the country for the given name :param name: the name of the country to lookup :type name: string

Returns:the country if one was found or None if not
Return type:country
getGeolite2Cities()[source]

get the Geolite2 City-Locations as a list of Dicts

Returns:a list of Geolite2 City-Locator dicts
Return type:list
static getInstance(correctMisspelling=False, debug=False)[source]

get the singleton instance of the Locator. If parameters are changed on further calls the initial parameters will still be in effect since the original instance will be returned!

Parameters:
  • correctMispelling (bool) – if True correct typical misspellings
  • debug (bool) – if True show debug information
getView()[source]

get the view to be used

Returns:the SQL view to be used for CityLookups e.g. GeoLite2CityLookup
Return type:str
getWikidataCityPopulation(sqlDB, endpoint=None)[source]
Parameters:
  • sqlDB (SQLDB) – target SQL database
  • endpoint (str) – url of the wikidata endpoint or None if default should be used
static isISO(s)[source]

check if the given string is an ISO code

Returns:True if the string is an ISO Code
Return type:bool
is_a_country(name)[source]

check if the given string name is a country

Parameters:name (string) – the string to check
Returns:if pycountry thinks the string is a country
Return type:True
locateCity(places)[source]

locate a city, region country combination based on the given wordtoken information

Parameters:
  • places (list) – a list of places derived by splitting a locality e.g. “San Francisco, CA”
  • to "San Francisco", "CA" (leads) –
Returns:

a city with country and region details

Return type:

City

locator = None
places_by_name(placeName, columnName)[source]

get places by name and column :param placeName: the name of the place :type placeName: string :param columnName: the column to look at :type columnName: string

populateFromWikidata(sqlDB)[source]

populate countries and regions from Wikidata

Parameters:sqlDB (SQLDB) – target SQL database
populate_Cities(sqlDB)[source]

populate the given sqlDB with the Geolite2 Cities

Parameters:sqlDB (SQLDB) – the SQL database to use
populate_Cities_FromWikidata(sqlDB)[source]

populate the given sqlDB with the Wikidata Cities

Parameters:sqlDB (SQLDB) – target SQL database
populate_Countries(sqlDB)[source]

populate database with countries from wikiData

Parameters:sqlDB (SQLDB) – target SQL database
populate_Regions(sqlDB)[source]

populate database with regions from wikiData

Parameters:sqlDB (SQLDB) – target SQL database
populate_Version(sqlDB)[source]

populate the version table

Parameters:sqlDB (SQLDB) – target SQL database
populate_db(force=False)[source]

populate the cities SQL database which caches the information from the GeoLite2-City-Locations.csv file

Parameters:force (bool) – if True force a recreation of the database
readCSV(fileName)[source]
recreateDatabase()[source]

recreate my lookup database

regions_for_name(region_name)[source]

get the regions for the given region_name (which might be an ISO code)

Parameters:region_name (string) – region name
Returns:the list of cities for this region
Return type:list
static resetInstance()[source]
class geograpy.locator.Region(**kwargs)[source]

Bases: geograpy.locator.Location

a Region (Subdivision)

country
static fromGeoLite2(record)[source]

create a region from a Geolite2 record

Parameters:record (dict) – the records as returned from a Query
Returns:the corresponding region information
Return type:Region
static fromWikidata(record)[source]

create a region from a Wikidata record

Parameters:record (dict) – the records as returned from a Query
Returns:the corresponding region information
Return type:Region
classmethod getSamples()[source]
class geograpy.locator.RegionManager(name: str = 'RegionManager', config: lodstorage.storageconfig.StorageConfig = None, debug=False)[source]

Bases: geograpy.locator.LocationManager

a list of regions

classmethod fromJSONBackup(config: lodstorage.storageconfig.StorageConfig = None)[source]

get region list from json backup (json backup is based on wikidata query results)

Returns:RegionList based on the json backup
classmethod fromWikidata(config: lodstorage.storageconfig.StorageConfig = None)[source]

get region list form wikidata

classmethod from_sqlDb(sqlDB, config: lodstorage.storageconfig.StorageConfig = None)[source]
classmethod getLocationLodFromJsonBackup()[source]
geograpy.locator.main(argv=None)[source]

main program.

geograpy.places module

class geograpy.places.PlaceContext(place_names, setAll=True)[source]

Bases: geograpy.locator.Locator

Adds context information to a place name

get_region_names(country_name)[source]
setAll()[source]

Set all context information

set_cities()[source]

set the cities information

set_countries()[source]

get the country information from my places

set_other()[source]
set_regions()[source]

geograpy.prefixtree module

geograpy.utils module

geograpy.utils.fuzzy_match(s1, s2, max_dist=0.8)[source]

Fuzzy match the given two strings with the given maximum distance :param s1: string: First string :param s2: string: Second string :param max_dist: float: The distance - default: 0.8

Returns:jellyfish jaro_winkler_similarity based on https://en.wikipedia.org/wiki/Jaro-Winkler_distance
Return type:float
geograpy.utils.remove_non_ascii(s)[source]

Remove non ascii chars from the given string :param s: string: The string to remove chars from

Returns:The result string with non-ascii chars removed
Return type:string

Hat tip: http://stackoverflow.com/a/1342373/2367526

geograpy.wikidata module

Created on 2020-09-23

@author: wf

class geograpy.wikidata.Wikidata(endpoint='https://query.wikidata.org/sparql')[source]

Bases: object

Wikidata access

getCities(region=None, country=None)[source]

get the cities from Wikidata

Parameters:
  • region – List of countryWikiDataIDs. Limits the returned cities to the given countries
  • country – List of regionWikiDataIDs. Limits the returned cities to the given regions
getCitiesOfRegion(regionWikidataId: str, limit: int)[source]

Queries the cities of the given region. If the region is a city state the region is returned as city. The cities are ordered by population and can be limited by the given limit attribute.

Parameters:
  • regionWikidataId – wikidata id of the region the cities should be queried for
  • limit – Limits the amount of returned cities
Returns:

Returns list of cities of the given region ordered by population

getCityPopulations(profile=True)[source]

get the city populations from Wikidata

Parameters:profile (bool) – if True show profiling information
static getCoordinateComponents(coordinate: str) -> (<class 'float'>, <class 'float'>)[source]

Converts the wikidata coordinate representation into its subcomponents longitude and latitude Example: ‘Point(-118.25 35.05694444)’ results in (‘-118.25’ ‘35.05694444’)

Parameters:coordinate – coordinate value in the format as returned by wikidata queries
Returns:Returns the longitude and latitude of the given coordinate as separate values
getCountries()[source]

get a list of countries

try query

getRegions()[source]

get Regions from Wikidata

try query

static getValuesClause(varName: str, values, wikidataEntities: bool = True)[source]

generates the SPARQL value clause for the given variable name containing the given values :param varName: variable name for the ValuesClause :param values: values for the clause :param wikidataEntities: if true the wikidata prefix is added to the values otherwise it is expected taht the given values are proper IRIs :type wikidataEntities: bool

Returns:str
static getWikidataId(wikidataURL: str)[source]

Extracts the wikidata id from the given wikidata URL

Parameters:wikidataURL – wikidata URL the id should be extracted from
Returns:The wikidata id if present in the given wikidata URL otherwise None

Module contents

main geograpy 3 module

geograpy.get_geoPlace_context(url=None, text=None, debug=False)[source]

Get a place context for a given text with information about country, region, city and other based on NLTK Named Entities having the Geographic(GPE) label.

Parameters:
  • url (String) – the url to read text from (if any)
  • text (String) – the text to analyze
  • debug (boolean) – if True show debug information
Returns:

PlaceContext: the place context

Return type:

places

geograpy.get_place_context(url=None, text=None, labels=['GPE', 'GSP', 'PERSON', 'ORGANIZATION'], debug=False)[source]

Get a place context for a given text with information about country, region, city and other based on NLTK Named Entities in the label set Geographic(GPE), Person(PERSON) and Organization(ORGANIZATION).

Parameters:
  • url (String) – the url to read text from (if any)
  • text (String) – the text to analyze
  • debug (boolean) – if True show debug information
Returns:

PlaceContext: the place context

Return type:

pc

geograpy.locateCity(location, correctMisspelling=False, debug=False)[source]

locate the given location string :param location: the description of the location :type location: string

Returns:the location
Return type:Locator