geograpy package

Submodules

geograpy.extraction module

class geograpy.extraction.Extractor(text=None, url=None, debug=False)[source]

Bases: object

Extract geo context for text or from url

find_entities(labels=['GPE', 'GSP', 'PERSON', 'ORGANIZATION'])[source]

Find entities with the given labels set self.places and returns it :param labels: Labels: The labels to filter

Returns:List of places
Return type:list
find_geoEntities()[source]

Find geographic entities

Returns:List of places
Return type:list
set_text()[source]

Setter for text

split(delimiter=', ')[source]

simpler regular expression splitter with not entity check

hat tip: https://stackoverflow.com/a/1059601/1497139

geograpy.labels module

Created on 2020-09-10

@author: wf

class geograpy.labels.Labels[source]

Bases: object

NLTK labels

default = ['GPE', 'GSP', 'PERSON', 'ORGANIZATION']
geo = ['GPE', 'GSP']

geograpy.locator module

The locator module allows to get detailed city information including the region and country of a city from a location string.

Examples for location strings are:

Amsterdam, Netherlands Vienna, Austria Vienna, IL Paris - Texas Paris TX

the locator will lookup the cities and try to disambiguate the result based on the country or region information found.

The results in string representationa are:

Amsterdam (NH(North Holland) - NL(Netherlands)) Vienna (9(Vienna) - AT(Austria)) Vienna (IL(Illinois) - US(United States)) Paris (TX(Texas) - US(United States)) Paris (TX(Texas) - US(United States))

Each city returned has a city.region and city.country attribute with the details of the city.

Created on 2020-09-18

@author: wf

class geograpy.locator.City(**kwargs)[source]

Bases: geograpy.locator.Location

a single city as an object

country
static fromCityLookup(cityLookupRecord: dict)[source]

create a city from a cityLookupRecord and setting City, Region and Country while at it :param cityRecord: a map derived from the CityLookup view :type cityRecord: dict

classmethod getSamples()[source]
region
setValue(name, record)[source]

set a field value with the given name to the given record dicts corresponding entry or none

Parameters:
  • name (string) – the name of the field
  • record (dict) – the dict to get the value from
class geograpy.locator.CityManager(name: str = 'CityManager', config: lodstorage.storageconfig.StorageConfig = None, debug=False)[source]

Bases: geograpy.locator.LocationManager

a list of cities

classmethod getJsonFiles(config: lodstorage.storageconfig.StorageConfig) → list[source]

get the list of the json files that have my data

Returns:a list of json file names
Return type:list
class geograpy.locator.Country(lookupSource='sqlDB', **kwargs)[source]

Bases: geograpy.locator.Location

a country

static fromCountryLookup(countryLookupRecord: dict)[source]

create a region from a regionLookupRecord and setting Region and Country while at it :param regionRecord: a map derived from the CityLookup view :type regionRecord: dict

classmethod getSamples()[source]
class geograpy.locator.CountryManager(name: str = 'CountryManager', config: lodstorage.storageconfig.StorageConfig = None, debug=False)[source]

Bases: geograpy.locator.LocationManager

a list of countries

classmethod fromErdem()[source]

get country list provided by Erdem Ozkol https://github.com/erdem

class geograpy.locator.Earth[source]

Bases: object

radius = 6371.0
class geograpy.locator.Location(**kwargs)[source]

Bases: lodstorage.jsonable.JSONAble

Represents a Location

balltreeQueryResultToLocationManager(distances, indices, lookupListOfLocations)[source]

convert the given ballTree Query Result to a LocationManager

Parameters:
  • distances (list) – array of distances
  • indices (list) – array of indices
  • lookupListOfLocations (list) – a list of valid locations to use for lookup
Returns:

a list of result Location/distance tuples

Return type:

list

distance(other) → float[source]

calculate the distance to another Location

Parameters:other (Location) – the other location
Returns:the haversine distance in km
classmethod fromRecord(regionRecord: dict)[source]

create a location from a dict record

Parameters:regionRecord (dict) – the records as returned from a Query
Returns:the corresponding region information
Return type:Region
getLocationsWithinRadius(lookupLocationManager, radiusKm: float)[source]

Gives the n closest locations to me from the given lookupListOfLocations

Parameters:
  • lookupLocationManager (LocationManager) – a LocationManager object to use for lookup
  • radiusKm (float) – the radius in which to check (in km)
Returns:

a list of result Location/distance tuples

Return type:

list

getNClosestLocations(lookupLocationManager, n: int)[source]

Gives a list of up to n locations which have the shortest distance to me as calculated from the given listOfLocations

Parameters:
  • lookupLocationManager (LocationManager) – a LocationManager object to use for lookup
  • n (int) – the maximum number of closest locations to return
Returns:

a list of result Location/distance tuples

Return type:

list

classmethod getSamples()[source]
static haversine(lon1, lat1, lon2, lat2)[source]

Calculate the great circle distance between two points on the earth (specified in decimal degrees)

isKnownAs(name) → bool[source]

Checks if this location is known under the given name

Parameters:name (str) – name the location should be checked against
Returns:True if the given name is either the name of the location or present in the labels of the location
static mappedDict(record, keyMapList: list)[source]
static partialDict(record, clazz, keys=None)[source]
class geograpy.locator.LocationContext(countryManager: geograpy.locator.CountryManager, regionManager: geograpy.locator.RegionManager, cityManager: geograpy.locator.CityManager, config: lodstorage.storageconfig.StorageConfig)[source]

Bases: object

Holds LocationManagers of all hierarchy levels and provides methods to traverse through the levels

cities
countries
db_filename = 'locations.db'
classmethod fromCache(config: lodstorage.storageconfig.StorageConfig = None)[source]

Inits a LocationContext form Cache if existent otherwise init cache

static getDefaultConfig() → lodstorage.storageconfig.StorageConfig[source]

Returns default StorageConfig

interlinkLocations(warnOnDuplicates: bool = True, profile=True)[source]

Interlinks locations by adding the hierarchy references to the locations

Parameters:warnOnDuplicates (bool) – if there are duplicates warn
load(forceUpdate: bool = False, warnOnDuplicates: bool = False)[source]

load my data

locateLocation(*locations, verbose: bool = False)[source]

Get possible locations for the given location names. Current prioritization of the results is city(ordered by population)→region→country ToDo: Extend the ranking of the results e.g. matching of multiple location parts increase ranking :param *locations: :param verbose: If True combinations of locations names are used to improve the search results. (Increases lookup time) :type verbose: bool

Returns:

regions
class geograpy.locator.LocationManager(name: str, entityName: str, entityPluralName: str, listName: str = None, tableName: str = None, clazz=None, primaryKey: str = None, config: lodstorage.storageconfig.StorageConfig = None, handleInvalidListTypes=True, filterInvalidListTypes=False, debug=False)[source]

Bases: lodstorage.entity.EntityManager

a list of locations

add(location)[source]

add the given location to me

Parameters:location (object) – the location to be added and put in my hash map
static downloadBackupFile(url: str, fileName: str, targetDirectory: str = None, force: bool = False)[source]

Downloads from the given url the zip-file and extracts the file corresponding to the given fileName.

Parameters:
  • url – url linking to a downloadable gzip file
  • fileName – Name of the file that should be extracted from gzip file
  • targetDirectory (str) – download the file this directory
  • force (bool) – True if the download should be forced
Returns:

Name of the extracted file with path to the backup directory

classmethod downloadBackupFileFromGitHub(fileName: str, targetDirectory: str = None)[source]

download the given fileName from the github data directory

Parameters:
  • fileName (str) – the filename to download
  • targetDirectory (str) – download the file this directory
Returns:

the local file

Return type:

str

fromCache(force=False, getListOfDicts=None, sampleRecordCount=-1)[source]

get me from the cache

static getBackupDirectory()[source]
getBallTuple(cache: bool = True)[source]

get the BallTuple=BallTree,validList of this location list

Parameters:
  • cache (bool) – if True calculate and use a cached version otherwise recalculate on
  • call of this function (every) –
Returns:

a sklearn.neighbors.BallTree for the given list of locations, list: the valid list of locations list: valid list of locations

Return type:

BallTree,list

getByName(*names)[source]

Get locations matching given names :param name: Name of the location

Returns:Returns locations that match the given name
getLocationByID(wikidataID: str)[source]

Returns the location object that corresponds to the given location

Parameters:wikidataID – wikidataid of the location that should be returned
Returns:Location object
getLocationByIsoCode(isoCode: str)[source]

Get possible locations matching the given isoCode :param isoCode: isoCode of possible Locations

Returns:List of wikidata ids of locations matching the given isoCode
getLocationsByWikidataId(*wikidataId)[source]

Returns Location objects for the given wikidataids :param *wikidataId: wikidataIds of the locations that should be returned :type *wikidataId: str

Returns:Location objects matching the given wikidataids
class geograpy.locator.Locator(db_file=None, correctMisspelling=False, storageConfig: lodstorage.storageconfig.StorageConfig = None, debug=False)[source]

Bases: object

location handling

cities_for_name(cityName)[source]

find cities with the given cityName

Parameters:cityName (string) – the potential name of a city
Returns:a list of city records
correct_country_misspelling(name)[source]

correct potential misspellings :param name: the name of the country potentially misspelled :type name: string

Returns:correct name of unchanged
Return type:string
createViews(sqlDB)[source]
db_has_data()[source]

check whether the database has data / is populated

Returns:True if the cities table exists and has more than one record
Return type:boolean
db_recordCount(tableList, tableName)[source]

count the number of records for the given tableName

Parameters:
  • tableList (list) – the list of table to check
  • tableName (str) – the name of the table to check
Returns
int: the number of records found for the table
disambiguate(country, regions, cities, byPopulation=True)[source]

try determining country, regions and city from the potential choices

Parameters:
  • country (Country) – a matching country found
  • regions (list) – a list of matching Regions found
  • cities (list) – a list of matching cities found
Returns:

the found city or None

Return type:

City

downloadDB()[source]

download my database

getAliases()[source]

get the aliases hashTable

getCountry(name)[source]

get the country for the given name :param name: the name of the country to lookup :type name: string

Returns:the country if one was found or None if not
Return type:country
static getInstance(correctMisspelling=False, debug=False)[source]

get the singleton instance of the Locator. If parameters are changed on further calls the initial parameters will still be in effect since the original instance will be returned!

Parameters:
  • correctMispelling (bool) – if True correct typical misspellings
  • debug (bool) – if True show debug information
getView()[source]

get the view to be used

Returns:the SQL view to be used for CityLookups e.g. CityLookup
Return type:str
static isISO(s)[source]

check if the given string is an ISO code (ISO 3166-2 code) see https://www.wikidata.org/wiki/Property:P300

Returns:True if the string might be an ISO Code as per a regexp check
Return type:bool
is_a_country(name)[source]

check if the given string name is a country

Parameters:name (string) – the string to check
Returns:if pycountry thinks the string is a country
Return type:True
locateCity(places: list)[source]

locate a city, region country combination based on the given wordtoken information

Parameters:
  • places (list) – a list of places derived by splitting a locality e.g. “San Francisco, CA”
  • to "San Francisco", "CA" (leads) –
Returns:

a city with country and region details

Return type:

City

locator = None
normalizePlaces(places: list)[source]

normalize places

Parameters:places (list) –
Returns:stripped and aliased list of places
Return type:list
places_by_name(placeName, columnName)[source]

get places by name and column :param placeName: the name of the place :type placeName: string :param columnName: the column to look at :type columnName: string

populate_Cities(sqlDB)[source]

populate the given sqlDB with the Wikidata Cities

Parameters:sqlDB (SQLDB) – target SQL database
populate_Countries(sqlDB)[source]

populate database with countries from wikiData

Parameters:sqlDB (SQLDB) – target SQL database
populate_Regions(sqlDB)[source]

populate database with regions from wikiData

Parameters:sqlDB (SQLDB) – target SQL database
populate_Version(sqlDB)[source]

populate the version table

Parameters:sqlDB (SQLDB) – target SQL database
populate_db(force=False)[source]

populate the cities SQL database which caches the information from the GeoLite2-City-Locations.csv file

Parameters:force (bool) – if True force a recreation of the database
readCSV(fileName: str)[source]

read the given CSV file

Parameters:fileName (str) – the filename to read
recreateDatabase()[source]

recreate my lookup database

regions_for_name(region_name)[source]

get the regions for the given region_name (which might be an ISO code)

Parameters:region_name (string) – region name
Returns:the list of cities for this region
Return type:list
static resetInstance()[source]
class geograpy.locator.Region(**kwargs)[source]

Bases: geograpy.locator.Location

a Region (Subdivision)

country
static fromRegionLookup(regionLookupRecord: dict)[source]

create a region from a regionLookupRecord and setting Region and Country while at it :param regionRecord: a map derived from the CityLookup view :type regionRecord: dict

classmethod getSamples()[source]
class geograpy.locator.RegionManager(name: str = 'RegionManager', config: lodstorage.storageconfig.StorageConfig = None, debug=False)[source]

Bases: geograpy.locator.LocationManager

a list of regions

geograpy.locator.main(argv=None)[source]

main program.

geograpy.places module

class geograpy.places.PlaceContext(place_names: list, setAll: bool = True, correctMisspelling: bool = False)[source]

Bases: geograpy.locator.Locator

Adds context information to a place name

getRegions(countryName: str) → list[source]

get a list of regions for the given countryName

countryName(str): the countryName to check

get_region_names(countryName: str) → list[source]

get region names for the given country

Parameters:countryName (str) – the name of the country
setAll()[source]

Set all context information

set_cities()[source]

set the cities information

set_countries()[source]

get the country information from my places

set_other()[source]
set_regions()[source]

get the region information from my places (limited to the already identified countries)

geograpy.prefixtree module

geograpy.utils module

class geograpy.utils.Download[source]

Bases: object

Utility functions for downloading data

static getFileContent(path: str)[source]
static getURLContent(url: str)[source]
static needsDownload(filePath: str, force: bool = False) → bool[source]

check if a download of the given filePath is necessary that is the file does not exist has a size of zero or the download should be forced

Parameters:
  • filePath (str) – the path of the file to be checked
  • force (bool) – True if the result should be forced to True
Returns:

True if a download for this file needed

Return type:

bool

class geograpy.utils.Profiler(msg, profile=True)[source]

Bases: object

simple profiler

time(extraMsg='')[source]

time the action and print if profile is active

geograpy.utils.fuzzy_match(s1, s2, max_dist=0.8)[source]

Fuzzy match the given two strings with the given maximum distance jellyfish jaro_winkler_similarity based on https://en.wikipedia.org/wiki/Jaro-Winkler_distance :param s1: string: First string :param s2: string: Second string :param max_dist: float: The distance - default: 0.8

Returns:True if the match is greater equals max_dist. Otherwise false
geograpy.utils.remove_non_ascii(s)[source]

Remove non ascii chars from the given string :param s: string: The string to remove chars from

Returns:The result string with non-ascii chars removed
Return type:string

Hat tip: http://stackoverflow.com/a/1342373/2367526

geograpy.wikidata module

Created on 2020-09-23

@author: wf

class geograpy.wikidata.Wikidata(endpoint='https://query.wikidata.org/sparql', profile: bool = True)[source]

Bases: object

Wikidata access

getCities(limit=1000000)[source]

get all human settlements as list of dict with duplicates for label, region, country …

getCitiesForRegion(regionId, msg)[source]

get the cities for the given Region

getCityStates(limit=None)[source]

get Regions from Wikidata

try query

static getCoordinateComponents(coordinate: str) -> (<class 'float'>, <class 'float'>)[source]

Converts the wikidata coordinate representation into its subcomponents longitude and latitude Example: ‘Point(-118.25 35.05694444)’ results in (‘-118.25’ ‘35.05694444’)

Parameters:coordinate – coordinate value in the format as returned by wikidata queries
Returns:Returns the longitude and latitude of the given coordinate as separate values
getCountries(limit=None)[source]

get a list of countries

try query

getRegions(limit=None)[source]

get Regions from Wikidata

try query

static getValuesClause(varName: str, values, wikidataEntities: bool = True)[source]

generates the SPARQL value clause for the given variable name containing the given values :param varName: variable name for the ValuesClause :param values: values for the clause :param wikidataEntities: if true the wikidata prefix is added to the values otherwise it is expected taht the given values are proper IRIs :type wikidataEntities: bool

Returns:str
static getWikidataId(wikidataURL: str)[source]

Extracts the wikidata id from the given wikidata URL

Parameters:wikidataURL – wikidata URL the id should be extracted from
Returns:The wikidata id if present in the given wikidata URL otherwise None
query(msg, queryString: str, limit=None) → list[source]

get the query result

Parameters:
  • msg (str) – the profile message to display
  • queryString (str) – the query to execute
Returns:

the list of dicts with the result

Return type:

list

store2DB(lod, tableName: str, primaryKey: str = None, sqlDB=None)[source]

store the given list of dicts to the database

Parameters:
  • lod (list) – the list of dicts
  • tableName (str) – the table name to use
  • primaryKey (str) – primary key (if any)
  • sqlDB (SQLDB) – target SQL database

Module contents

main geograpy 3 module

geograpy.get_geoPlace_context(url=None, text=None, debug=False)[source]

Get a place context for a given text with information about country, region, city and other based on NLTK Named Entities having the Geographic(GPE) label.

Parameters:
  • url (String) – the url to read text from (if any)
  • text (String) – the text to analyze
  • debug (boolean) – if True show debug information
Returns:

PlaceContext: the place context

Return type:

places

geograpy.get_place_context(url=None, text=None, labels=['GPE', 'GSP', 'PERSON', 'ORGANIZATION'], debug=False)[source]

Get a place context for a given text with information about country, region, city and other based on NLTK Named Entities in the label set Geographic(GPE), Person(PERSON) and Organization(ORGANIZATION).

Parameters:
  • url (String) – the url to read text from (if any)
  • text (String) – the text to analyze
  • debug (boolean) – if True show debug information
Returns:

PlaceContext: the place context

Return type:

pc

geograpy.locateCity(location, correctMisspelling=False, debug=False)[source]

locate the given location string :param location: the description of the location :type location: string

Returns:the location
Return type:Locator