geograpy package¶
Submodules¶
geograpy.extraction module¶
-
class
geograpy.extraction.
Extractor
(text=None, url=None, debug=False)[source]¶ Bases:
object
Extract geo context for text or from url
-
find_entities
(labels=['GPE', 'GSP', 'PERSON', 'ORGANIZATION'])[source]¶ Find entities with the given labels set self.places and returns it :param labels: Labels: The labels to filter
Returns: List of places Return type: list
-
geograpy.locator module¶
The locator module allows to get detailed city information including the region and country of a city from a location string.
Examples for location strings are:
Amsterdam, Netherlands Vienna, Austria Vienna, IL Paris - Texas Paris TX
the locator will lookup the cities and try to disambiguate the result based on the country or region information found.
The results in string representationa are:
Amsterdam (NH(North Holland) - NL(Netherlands)) Vienna (9(Vienna) - AT(Austria)) Vienna (IL(Illinois) - US(United States)) Paris (TX(Texas) - US(United States)) Paris (TX(Texas) - US(United States))
Each city returned has a city.region and city.country attribute with the details of the city.
Created on 2020-09-18
@author: wf
-
class
geograpy.locator.
City
(**kwargs)[source]¶ Bases:
geograpy.locator.Location
a single city as an object
-
country
¶
-
region
¶
-
-
class
geograpy.locator.
CityList
[source]¶ Bases:
geograpy.locator.LocationList
a list of cities
-
classmethod
fromJSONBackup
(jsonStr: str = None)[source]¶ get city list from json backup (json backup is based on wikidata query results)
Parameters: jsonStr (str) – JSON string the CityList should be loaded from. If None json backup is loaded. Default is None Returns: CityList based on the json backup
-
classmethod
fromWikidata
(fromBackup: bool = True, countryIDs: list = None, regionIDs: list = None)[source]¶ get city list form wikidata
Parameters: - fromBackup (bool) – If True instead of querying wikidata a backup of the wikidata results is used to create the city list. Otherwise wikidata is queried for the city data. Default is True
- countryIDs (list) – List of countryWikiDataIDs. Limits the returned cities to the given countries
- regionIDs (list) – List of regionWikiDataIDs. Limits the returned cities to the given regions
Returns: CityList based wikidata query results
-
updateCity
(wikidataid: str, cityRecord: dict)[source]¶ Updates the city corresponding to the given city with the given data. If the city does not exist a new city object is created and added to this CityList :param wikidataid: wikidata id of the city that should be updated/added :type wikidataid: str :param cityRecord: data of the given city that should be updated/added :type cityRecord: dict
Returns: Nothing
-
classmethod
-
class
geograpy.locator.
Country
(lookupSource='sqlDB', **kwargs)[source]¶ Bases:
geograpy.locator.Location
a country
-
class
geograpy.locator.
CountryList
[source]¶ Bases:
geograpy.locator.LocationList
a list of countries
-
classmethod
fromErdem
()[source]¶ get country list provided by Erdem Ozkol https://github.com/erdem
-
classmethod
-
class
geograpy.locator.
Location
(**kwargs)[source]¶ Bases:
lodstorage.jsonable.JSONAble
Represents a Location
-
balltreeQueryResultToLocationList
(distances, indices, lookupListOfLocations)[source]¶ convert the given ballTree Query Result to a LocationList
Parameters: - distances (list) – array of distances
- indices (list) – array of indices
- lookupListOfLocations (list) – a list of valid locations to use for lookup
Returns: a list of result Location/distance tuples
Return type: list
-
distance
(other) → float[source]¶ calculate the distance to another Location
Parameters: other (Location) – the other location Returns: the haversine distance in km
-
getLocationsWithinRadius
(lookupLocationList, radiusKm: float)[source]¶ Gives the n closest locations to me from the given lookupListOfLocations
Parameters: - lookupLocationList (LocationList) – a LocationList object to use for lookup
- radiusKm (float) – the radius in which to check (in km)
Returns: a list of result Location/distance tuples
Return type: list
-
getNClosestLocations
(lookupLocationList, n: int)[source]¶ Gives a list of up to n locations which have the shortest distance to me as calculated from the given listOfLocations
Parameters: - lookupLocationList (LocationList) – a LocationList object to use for lookup
- n (int) – the maximum number of closest locations to return
Returns: a list of result Location/distance tuples
Return type: list
-
-
class
geograpy.locator.
LocationContext
(countryList: geograpy.locator.CountryList, regionList: geograpy.locator.RegionList, cityList: geograpy.locator.CityList)[source]¶ Bases:
object
Holds LocationLists of all hierarchy levels and provides methods to traverse through the levels
-
cities
¶
-
cityList
¶
-
countries
¶
-
countryList
¶
-
regionList
¶
-
regions
¶
-
-
class
geograpy.locator.
LocationList
(listName: str = None, clazz=None, tableName: str = None)[source]¶ Bases:
lodstorage.jsonable.JSONAbleList
a list of locations
-
static
downloadBackupFile
(url: str, fileName: str)[source]¶ Downloads from the given url the zip-file and extracts the file corresponding to the given fileName.
Parameters: - url – url linking to a downloadable gzip file
- fileName – Name of the file that should be extracted from gzip file
Returns: Name of the extracted file with path to the backup directory
-
getBallTuple
(cache: bool = True)[source]¶ get the BallTuple=BallTree,validList of this location list
Parameters: - cache (bool) – if True calculate and use a cached version otherwise recalculate on
- call of this function (every) –
Returns: a sklearn.neighbors.BallTree for the given list of locations, list: the valid list of locations list: valid list of locations
Return type: BallTree,list
-
static
-
class
geograpy.locator.
Locator
(db_file=None, correctMisspelling=False, debug=False)[source]¶ Bases:
object
location handling
-
cities_for_name
(cityName)[source]¶ find cities with the given cityName
Parameters: cityName (string) – the potential name of a city Returns: a list of city records
-
correct_country_misspelling
(name)[source]¶ correct potential misspellings :param name: the name of the country potentially misspelled :type name: string
Returns: correct name of unchanged Return type: string
-
db_has_data
()[source]¶ check whether the database has data / is populated
Returns: True if the cities table exists and has more than one record Return type: boolean
-
db_recordCount
(tableList, tableName)[source]¶ count the number of records for the given tableName
Parameters: - tableList (list) – the list of table to check
- tableName (str) – the name of the table to check
- Returns
- int: the number of records found for the table
-
disambiguate
(country, regions, cities, byPopulation=True)[source]¶ try determining country, regions and city from the potential choices
Parameters: - country (Country) – a matching country found
- regions (list) – a list of matching Regions found
- cities (list) – a list of matching cities found
Returns: the found city or None
Return type:
-
getCountry
(name)[source]¶ get the country for the given name :param name: the name of the country to lookup :type name: string
Returns: the country if one was found or None if not Return type: country
-
getGeolite2Cities
()[source]¶ get the Geolite2 City-Locations as a list of Dicts
Returns: a list of Geolite2 City-Locator dicts Return type: list
-
static
getInstance
(correctMisspelling=False, debug=False)[source]¶ get the singleton instance of the Locator. If parameters are changed on further calls the initial parameters will still be in effect since the original instance will be returned!
Parameters: - correctMispelling (bool) – if True correct typical misspellings
- debug (bool) – if True show debug information
-
getView
()[source]¶ get the view to be used
Returns: the SQL view to be used for CityLookups e.g. GeoLite2CityLookup Return type: str
-
getWikidataCityPopulation
(sqlDB, endpoint=None)[source]¶ Parameters: - sqlDB (SQLDB) – target SQL database
- endpoint (str) – url of the wikidata endpoint or None if default should be used
-
isISO
(s)[source]¶ check if the given string is an ISO code
Returns: True if the string is an ISO Code Return type: bool
-
is_a_country
(name)[source]¶ check if the given string name is a country
Parameters: name (string) – the string to check Returns: if pycountry thinks the string is a country Return type: True
-
locateCity
(places)[source]¶ locate a city, region country combination based on the given wordtoken information
Parameters: - places (list) – a list of places derived by splitting a locality e.g. “San Francisco, CA”
- to "San Francisco", "CA" (leads) –
Returns: a city with country and region details
Return type:
-
locator
= None¶
-
places_by_name
(placeName, columnName)[source]¶ get places by name and column :param placeName: the name of the place :type placeName: string :param columnName: the column to look at :type columnName: string
-
populateFromWikidata
(sqlDB)[source]¶ populate countries and regions from Wikidata
Parameters: sqlDB (SQLDB) – target SQL database
-
populate_Cities
(sqlDB)[source]¶ populate the given sqlDB with the Geolite2 Cities
Parameters: sqlDB (SQLDB) – the SQL database to use
-
populate_Cities_FromWikidata
(sqlDB)[source]¶ populate the given sqlDB with the Wikidata Cities
Parameters: sqlDB (SQLDB) – target SQL database
-
populate_Countries
(sqlDB)[source]¶ populate database with countries from wikiData
Parameters: sqlDB (SQLDB) – target SQL database
-
populate_Regions
(sqlDB)[source]¶ populate database with regions from wikiData
Parameters: sqlDB (SQLDB) – target SQL database
-
populate_Version
(sqlDB)[source]¶ populate the version table
Parameters: sqlDB (SQLDB) – target SQL database
-
populate_db
(force=False)[source]¶ populate the cities SQL database which caches the information from the GeoLite2-City-Locations.csv file
Parameters: force (bool) – if True force a recreation of the database
-
-
class
geograpy.locator.
Region
(**kwargs)[source]¶ Bases:
geograpy.locator.Location
a Region (Subdivision)
-
country
¶
-
static
fromGeoLite2
(record)[source]¶ create a region from a Geolite2 record
Parameters: record (dict) – the records as returned from a Query Returns: the corresponding region information Return type: Region
-
-
class
geograpy.locator.
RegionList
[source]¶ Bases:
geograpy.locator.LocationList
a list of regions
geograpy.places module¶
-
class
geograpy.places.
PlaceContext
(place_names, setAll=True)[source]¶ Bases:
geograpy.locator.Locator
Adds context information to a place name
geograpy.prefixtree module¶
geograpy.utils module¶
-
geograpy.utils.
fuzzy_match
(s1, s2, max_dist=0.8)[source]¶ Fuzzy match the given two strings with the given maximum distance :param s1: string: First string :param s2: string: Second string :param max_dist: float: The distance - default: 0.8
Returns: jellyfish jaro_winkler_similarity based on https://en.wikipedia.org/wiki/Jaro-Winkler_distance Return type: float
geograpy.wikidata module¶
Created on 2020-09-23
@author: wf
-
class
geograpy.wikidata.
Wikidata
(endpoint='https://query.wikidata.org/sparql')[source]¶ Bases:
object
Wikidata access
-
getCities
(region=None, country=None)[source]¶ get the cities from Wikidata
Parameters: - region – List of countryWikiDataIDs. Limits the returned cities to the given countries
- country – List of regionWikiDataIDs. Limits the returned cities to the given regions
-
getCitiesOfRegion
(regionWikidataId: str, limit: int)[source]¶ Queries the cities of the given region. If the region is a city state the region is returned as city. The cities are ordered by population and can be limited by the given limit attribute.
Parameters: - regionWikidataId – wikidata id of the region the cities should be queried for
- limit – Limits the amount of returned cities
Returns: Returns list of cities of the given region ordered by population
-
getCityPopulations
(profile=True)[source]¶ get the city populations from Wikidata
Parameters: profile (bool) – if True show profiling information
-
static
getCoordinateComponents
(coordinate: str) -> (<class 'float'>, <class 'float'>)[source]¶ Converts the wikidata coordinate representation into its subcomponents longitude and latitude Example: ‘Point(-118.25 35.05694444)’ results in (‘-118.25’ ‘35.05694444’)
Parameters: coordinate – coordinate value in the format as returned by wikidata queries Returns: Returns the longitude and latitude of the given coordinate as separate values
-
static
getValuesClause
(varName: str, values, wikidataEntities: bool = True)[source]¶ generates the SPARQL value clause for the given variable name containing the given values :param varName: variable name for the ValuesClause :param values: values for the clause :param wikidataEntities: if true the wikidata prefix is added to the values otherwise it is expected taht the given values are proper IRIs :type wikidataEntities: bool
Returns: str
-
Module contents¶
main geograpy 3 module
-
geograpy.
get_geoPlace_context
(url=None, text=None, debug=False)[source]¶ Get a place context for a given text with information about country, region, city and other based on NLTK Named Entities having the Geographic(GPE) label.
Parameters: - url (String) – the url to read text from (if any)
- text (String) – the text to analyze
- debug (boolean) – if True show debug information
Returns: PlaceContext: the place context
Return type: places
-
geograpy.
get_place_context
(url=None, text=None, labels=['GPE', 'GSP', 'PERSON', 'ORGANIZATION'], debug=False)[source]¶ Get a place context for a given text with information about country, region, city and other based on NLTK Named Entities in the label set Geographic(GPE), Person(PERSON) and Organization(ORGANIZATION).
Parameters: - url (String) – the url to read text from (if any)
- text (String) – the text to analyze
- debug (boolean) – if True show debug information
Returns: PlaceContext: the place context
Return type: pc