geograpy package¶
Submodules¶
geograpy.extraction module¶
-
class
geograpy.extraction.
Extractor
(text=None, url=None, debug=False)[source]¶ Bases:
object
Extract geo context for text or from url
-
find_entities
(labels=['GPE', 'GSP', 'PERSON', 'ORGANIZATION'])[source]¶ Find entities with the given labels set self.places and returns it :param labels: Labels: The labels to filter
Returns: List of places Return type: list
-
geograpy.locator module¶
The locator module allows to get detailed city information including the region and country of a city from a location string.
Examples for location strings are:
Amsterdam, Netherlands Vienna, Austria Vienna, IL Paris - Texas Paris TX
the locator will lookup the cities and try to disambiguate the result based on the country or region information found.
The results in string representationa are:
Amsterdam (NH(North Holland) - NL(Netherlands)) Vienna (9(Vienna) - AT(Austria)) Vienna (IL(Illinois) - US(United States)) Paris (TX(Texas) - US(United States)) Paris (TX(Texas) - US(United States))
Each city returned has a city.region and city.country attribute with the details of the city.
Created on 2020-09-18
@author: wf
-
class
geograpy.locator.
City
(**kwargs)[source]¶ Bases:
geograpy.locator.Location
a single city as an object
-
country
¶
-
static
fromCityLookup
(cityLookupRecord: dict)[source]¶ create a city from a cityLookupRecord and setting City, Region and Country while at it :param cityRecord: a map derived from the CityLookup view :type cityRecord: dict
-
region
¶
-
-
class
geograpy.locator.
CityManager
(name: str = 'CityManager', config: lodstorage.storageconfig.StorageConfig = None, debug=False)[source]¶ Bases:
geograpy.locator.LocationManager
a list of cities
-
class
geograpy.locator.
Country
(lookupSource='sqlDB', **kwargs)[source]¶ Bases:
geograpy.locator.Location
a country
-
class
geograpy.locator.
CountryManager
(name: str = 'CountryManager', config: lodstorage.storageconfig.StorageConfig = None, debug=False)[source]¶ Bases:
geograpy.locator.LocationManager
a list of countries
-
classmethod
fromErdem
()[source]¶ get country list provided by Erdem Ozkol https://github.com/erdem
-
classmethod
-
class
geograpy.locator.
Location
(**kwargs)[source]¶ Bases:
lodstorage.jsonable.JSONAble
Represents a Location
-
balltreeQueryResultToLocationManager
(distances, indices, lookupListOfLocations)[source]¶ convert the given ballTree Query Result to a LocationManager
Parameters: - distances (list) – array of distances
- indices (list) – array of indices
- lookupListOfLocations (list) – a list of valid locations to use for lookup
Returns: a list of result Location/distance tuples
Return type: list
-
distance
(other) → float[source]¶ calculate the distance to another Location
Parameters: other (Location) – the other location Returns: the haversine distance in km
-
classmethod
fromRecord
(regionRecord: dict)[source]¶ create a location from a dict record
Parameters: regionRecord (dict) – the records as returned from a Query Returns: the corresponding region information Return type: Region
-
getLocationsWithinRadius
(lookupLocationManager, radiusKm: float)[source]¶ Gives the n closest locations to me from the given lookupListOfLocations
Parameters: - lookupLocationManager (LocationManager) – a LocationManager object to use for lookup
- radiusKm (float) – the radius in which to check (in km)
Returns: a list of result Location/distance tuples
Return type: list
-
getNClosestLocations
(lookupLocationManager, n: int)[source]¶ Gives a list of up to n locations which have the shortest distance to me as calculated from the given listOfLocations
Parameters: - lookupLocationManager (LocationManager) – a LocationManager object to use for lookup
- n (int) – the maximum number of closest locations to return
Returns: a list of result Location/distance tuples
Return type: list
-
static
haversine
(lon1, lat1, lon2, lat2)[source]¶ Calculate the great circle distance between two points on the earth (specified in decimal degrees)
-
-
class
geograpy.locator.
LocationContext
(countryManager: geograpy.locator.CountryManager, regionManager: geograpy.locator.RegionManager, cityManager: geograpy.locator.CityManager, config: lodstorage.storageconfig.StorageConfig)[source]¶ Bases:
object
Holds LocationManagers of all hierarchy levels and provides methods to traverse through the levels
-
cities
¶
-
countries
¶
-
db_filename
= 'locations.db'¶
-
classmethod
fromCache
(config: lodstorage.storageconfig.StorageConfig = None, forceUpdate: bool = False)[source]¶ Inits a LocationContext form Cache if existent otherwise init cache
Parameters: - config (StorageConfig) – configuration of the cache if None the default config is used
- forceUpdate (bool) – If True an existent cache will be over written
-
static
getDefaultConfig
() → lodstorage.storageconfig.StorageConfig[source]¶ Returns default StorageConfig
-
interlinkLocations
(warnOnDuplicates: bool = True, profile=True)[source]¶ Interlinks locations by adding the hierarchy references to the locations
Parameters: warnOnDuplicates (bool) – if there are duplicates warn
-
locateLocation
(*locations, verbose: bool = False)[source]¶ Get possible locations for the given location names. Current prioritization of the results is city(ordered by population)→region→country ToDo: Extend the ranking of the results e.g. matching of multiple location parts increase ranking :param *locations: :param verbose: If True combinations of locations names are used to improve the search results. (Increases lookup time) :type verbose: bool
Returns:
-
regions
¶
-
-
class
geograpy.locator.
LocationManager
(name: str, entityName: str, entityPluralName: str, listName: str = None, tableName: str = None, clazz=None, primaryKey: str = None, config: lodstorage.storageconfig.StorageConfig = None, handleInvalidListTypes=True, filterInvalidListTypes=False, debug=False)[source]¶ Bases:
lodstorage.entity.EntityManager
a list of locations
-
add
(location)[source]¶ add the given location to me
Parameters: location (object) – the location to be added and put in my hash map
-
classmethod
downloadBackupFileFromGitHub
(fileName: str, targetDirectory: str = None, force: bool = False)[source]¶ download the given fileName from the github data directory
Parameters: - fileName (str) – the filename to download
- targetDirectory (str) – download the file this directory
- force (bool) – force the overwriting of the existent file
Returns: the local file
Return type: str
-
getBallTuple
(cache: bool = True)[source]¶ get the BallTuple=BallTree,validList of this location list
Parameters: - cache (bool) – if True calculate and use a cached version otherwise recalculate on
- call of this function (every) –
Returns: a sklearn.neighbors.BallTree for the given list of locations, list: the valid list of locations list: valid list of locations
Return type: BallTree,list
-
getByName
(*names)[source]¶ Get locations matching given names :param name: Name of the location
Returns: Returns locations that match the given name
-
getLocationByID
(wikidataID: str)[source]¶ Returns the location object that corresponds to the given location
Parameters: wikidataID – wikidataid of the location that should be returned Returns: Location object
-
-
class
geograpy.locator.
Locator
(db_file=None, correctMisspelling=False, storageConfig: lodstorage.storageconfig.StorageConfig = None, debug=False)[source]¶ Bases:
object
location handling
-
cities_for_name
(cityName)[source]¶ find cities with the given cityName
Parameters: cityName (string) – the potential name of a city Returns: a list of city records
-
correct_country_misspelling
(name)[source]¶ correct potential misspellings :param name: the name of the country potentially misspelled :type name: string
Returns: correct name of unchanged Return type: string
-
db_has_data
()[source]¶ check whether the database has data / is populated
Returns: True if the cities table exists and has more than one record Return type: boolean
-
db_recordCount
(tableList, tableName)[source]¶ count the number of records for the given tableName
Parameters: - tableList (list) – the list of table to check
- tableName (str) – the name of the table to check
- Returns
- int: the number of records found for the table
-
disambiguate
(country, regions, cities, byPopulation=True)[source]¶ try determining country, regions and city from the potential choices
Parameters: - country (Country) – a matching country found
- regions (list) – a list of matching Regions found
- cities (list) – a list of matching cities found
Returns: the found city or None
Return type:
-
downloadDB
(forceUpdate: bool = False)[source]¶ download my database
Parameters: forceUpdate (bool) – force the overwriting of the existent file
-
getCountry
(name)[source]¶ get the country for the given name :param name: the name of the country to lookup :type name: string
Returns: the country if one was found or None if not Return type: country
-
static
getInstance
(correctMisspelling=False, debug=False)[source]¶ get the singleton instance of the Locator. If parameters are changed on further calls the initial parameters will still be in effect since the original instance will be returned!
Parameters: - correctMispelling (bool) – if True correct typical misspellings
- debug (bool) – if True show debug information
-
getView
()[source]¶ get the view to be used
Returns: the SQL view to be used for CityLookups e.g. CityLookup Return type: str
-
static
isISO
(s)[source]¶ check if the given string is an ISO code (ISO 3166-2 code) see https://www.wikidata.org/wiki/Property:P300
Returns: True if the string might be an ISO Code as per a regexp check Return type: bool
-
is_a_country
(name)[source]¶ check if the given string name is a country
Parameters: name (string) – the string to check Returns: if pycountry thinks the string is a country Return type: True
-
locateCity
(places: list)[source]¶ locate a city, region country combination based on the given wordtoken information
Parameters: - places (list) – a list of places derived by splitting a locality e.g. “San Francisco, CA”
- to "San Francisco", "CA" (leads) –
Returns: a city with country and region details
Return type:
-
locator
= None¶
-
normalizePlaces
(places: list)[source]¶ normalize places
Parameters: places (list) – Returns: stripped and aliased list of places Return type: list
-
places_by_name
(placeName, columnName)[source]¶ get places by name and column :param placeName: the name of the place :type placeName: string :param columnName: the column to look at :type columnName: string
-
populate_Cities
(sqlDB)[source]¶ populate the given sqlDB with the Wikidata Cities
Parameters: sqlDB (SQLDB) – target SQL database
-
populate_Countries
(sqlDB)[source]¶ populate database with countries from wikiData
Parameters: sqlDB (SQLDB) – target SQL database
-
populate_Regions
(sqlDB)[source]¶ populate database with regions from wikiData
Parameters: sqlDB (SQLDB) – target SQL database
-
populate_Version
(sqlDB)[source]¶ populate the version table
Parameters: sqlDB (SQLDB) – target SQL database
-
populate_db
(force=False)[source]¶ populate the cities SQL database which caches the information from the GeoLite2-City-Locations.csv file
Parameters: force (bool) – if True force a recreation of the database
-
readCSV
(fileName: str)[source]¶ read the given CSV file
Parameters: fileName (str) – the filename to read
-
-
class
geograpy.locator.
Region
(**kwargs)[source]¶ Bases:
geograpy.locator.Location
a Region (Subdivision)
-
country
¶
-
-
class
geograpy.locator.
RegionManager
(name: str = 'RegionManager', config: lodstorage.storageconfig.StorageConfig = None, debug=False)[source]¶ Bases:
geograpy.locator.LocationManager
a list of regions
geograpy.places module¶
-
class
geograpy.places.
PlaceContext
(place_names: list, setAll: bool = True, correctMisspelling: bool = False)[source]¶ Bases:
geograpy.locator.Locator
Adds context information to a place name
-
getRegions
(countryName: str) → list[source]¶ get a list of regions for the given countryName
countryName(str): the countryName to check
-
geograpy.prefixtree module¶
geograpy.utils module¶
-
class
geograpy.utils.
Download
[source]¶ Bases:
object
Utility functions for downloading data
-
static
downloadBackupFile
(url: str, fileName: str, targetDirectory: str, force: bool = False)[source]¶ Downloads from the given url the zip-file and extracts the file corresponding to the given fileName.
Parameters: - url – url linking to a downloadable gzip file
- fileName – Name of the file that should be extracted from gzip file
- targetDirectory (str) – download the file this directory
- force (bool) – True if the download should be forced
Returns: Name of the extracted file with path to the backup directory
-
static
needsDownload
(filePath: str, force: bool = False) → bool[source]¶ check if a download of the given filePath is necessary that is the file does not exist has a size of zero or the download should be forced
Parameters: - filePath (str) – the path of the file to be checked
- force (bool) – True if the result should be forced to True
Returns: True if a download for this file needed
Return type: bool
-
static
-
geograpy.utils.
fuzzy_match
(s1, s2, max_dist=0.8)[source]¶ Fuzzy match the given two strings with the given maximum distance jellyfish jaro_winkler_similarity based on https://en.wikipedia.org/wiki/Jaro-Winkler_distance :param s1: string: First string :param s2: string: Second string :param max_dist: float: The distance - default: 0.8
Returns: True if the match is greater equals max_dist. Otherwise false
geograpy.wikidata module¶
Created on 2020-09-23
@author: wf
-
class
geograpy.wikidata.
Wikidata
(endpoint='https://query.wikidata.org/sparql', profile: bool = True)[source]¶ Bases:
object
Wikidata access
-
getCities
(limit=1000000)[source]¶ get all human settlements as list of dict with duplicates for label, region, country …
-
static
getCoordinateComponents
(coordinate: str) -> (<class 'float'>, <class 'float'>)[source]¶ Converts the wikidata coordinate representation into its subcomponents longitude and latitude Example: ‘Point(-118.25 35.05694444)’ results in (‘-118.25’ ‘35.05694444’)
Parameters: coordinate – coordinate value in the format as returned by wikidata queries Returns: Returns the longitude and latitude of the given coordinate as separate values
-
static
getValuesClause
(varName: str, values, wikidataEntities: bool = True)[source]¶ generates the SPARQL value clause for the given variable name containing the given values :param varName: variable name for the ValuesClause :param values: values for the clause :param wikidataEntities: if true the wikidata prefix is added to the values otherwise it is expected taht the given values are proper IRIs :type wikidataEntities: bool
Returns: str
-
static
getWikidataId
(wikidataURL: str)[source]¶ Extracts the wikidata id from the given wikidata URL
Parameters: wikidataURL – wikidata URL the id should be extracted from Returns: The wikidata id if present in the given wikidata URL otherwise None
-
Module contents¶
main geograpy 3 module
-
geograpy.
get_geoPlace_context
(url=None, text=None, debug=False)[source]¶ Get a place context for a given text with information about country, region, city and other based on NLTK Named Entities having the Geographic(GPE) label.
Parameters: - url (String) – the url to read text from (if any)
- text (String) – the text to analyze
- debug (boolean) – if True show debug information
Returns: PlaceContext: the place context
Return type: places
-
geograpy.
get_place_context
(url=None, text=None, labels=['GPE', 'GSP', 'PERSON', 'ORGANIZATION'], debug=False)[source]¶ Get a place context for a given text with information about country, region, city and other based on NLTK Named Entities in the label set Geographic(GPE), Person(PERSON) and Organization(ORGANIZATION).
Parameters: - url (String) – the url to read text from (if any)
- text (String) – the text to analyze
- debug (boolean) – if True show debug information
Returns: PlaceContext: the place context
Return type: pc