niimpy.preprocessing.location module
- niimpy.preprocessing.location.cluster_locations(lats, lons, min_samples=5, eps=200)[source]
Performs clustering on the locations
- Parameters:
- latspd.DataFrame
Latitudes
- lonspd.DataFrame
Longitudes
- mins_samplesint
Minimum number of samples to form a cluster. Default is 5.
- epsfloat
Epsilone parameter in DBSCAN. The maximum distance between two neighbour samples. Default is 200.
- Returns:
- clustersarray
Array of clusters. -1 indicates outlier.
- niimpy.preprocessing.location.compute_nbin_maxdist_home(lats, lons, latlon_home, home_radius=50)[source]
Computes number of bins in home and maximum distance to home
- Parameters:
- latspd.DataFrame
Latitudes
- lonspd.DataFrame
Longitudes
- latlon_homearray
A tuple (lat, lon) showing the coordinate of home
- Returns:
- (n_home, max_dist_home)tuple
n_home: number of bins the person has been near the home max_dist_home: maximum distance that the person has been from home
- niimpy.preprocessing.location.distance_matrix(lats, lons)[source]
Compute distance matrix using great-circle distance formula
https://en.wikipedia.org/wiki/Great-circle_distance#Formulae
- Parameters:
- latsarray
Latitudes
- lonsarray
Longitudes
- Returns:
- distsmatrix
Entry (i, j) shows the great-circle distance between point i and j, i.e. distance between (lats[i], lons[i]) and (lats[j], lons[j]).
- niimpy.preprocessing.location.extract_features_location(df, features=None)[source]
Calculates location features
- Parameters:
- dfpd.DataFrame
dataframe of location data. It must contain these columns: double_latitude, double_longitude, user, group. double_speed is optional. If not provided, it will be computed manually.
- speed_thresholdfloat
Bins whose speed is lower than speed_threshold are considred static and the rest are moving.
- featuresmap (dictionary) of functions that compute features.
it is a map of map, where the keys to the first map is the name of functions that compute features and the nested map contains the keyword arguments to that function. If there is no arguments use an empty map. Default is None. If None, all the available functions are used. Those functions are in the dict location.ALL_FEATURES. You can implement your own function and use it instead or add it to the mentioned map.
- Returns:
- featurespd.DataFrame
Dataframe of computed features where the index is users and columns are the the features.
- niimpy.preprocessing.location.filter_location(location, remove_disabled=True, remove_zeros=True, remove_network=False, latitude_column='double_latitude', longitude_column='double_longitude', label_column='label', provider_column='provider')[source]
Remove low-quality or weird location samples
- Parameters:
- locationpd.DataFrame
DataFrame of locations
- remove_disabledbool
Remove locations whose label is disabled
- remove_zerobool
Remove locations which their latitude and longitueds are close to 0
- remove_networkbool
Keep only locations whose provider is gps
- Returns:
- locationpd.DataFrame
- niimpy.preprocessing.location.find_home(lats, lons, times)[source]
Find coordinates of the home of a person
Home is defined as the place most visited between 12am - 6am. Locations within this time period first clustered and then the center of largest clusetr shows the home.
- Parameters:
- latsarray-like
Latitudes
- lonsarray-like
Longitudes
- timesarray-like
Time of the recorderd coordinates
- Returns
- ——
- (lat_home, lon_home)tuple of floats
Coordinates of the home
- niimpy.preprocessing.location.get_speeds_totaldist(lats, lons, times)[source]
Computes speed of bins with dividing distance by their time difference
- Parameters:
- latsarray-like
Array of latitudes
- lonsarray-like
Array of longitudes
- timesarray-like
Array of times associted with bins
- Returns
- ——
- (speeds, total_distances)tuple of speeds (array) and total distance travled (float)
- niimpy.preprocessing.location.group_data(df)[source]
Group the dataframe by a standard set of columns listed in group_by_columns.
- niimpy.preprocessing.location.location_distance_features(df, config={})[source]
Calculates features related to distance and speed.
- Parameters:
- df: dataframe with date index
- config: A dictionary of optional arguments
- Optional arguments in config:
longitude_column: The name of the column with longitude data in a floating point format. Defaults to ‘double_longitude’. latitude_column: The name of the column with latitude data in a floating point format. Defaults to ‘double_latitude’. speed_column: The name of the column with speed data in a floating point format. Defaults to ‘double_speed’. resample_args: a dictionary of arguments for the Pandas resample function. For example to resample by hour, you would pass {“rule”: “1h”}.
- niimpy.preprocessing.location.location_number_of_significant_places(df, config={})[source]
Computes number of significant places
- niimpy.preprocessing.location.location_significant_place_features(df, config={})[source]
Calculates features related to Significant Places.
- Parameters:
- df: dataframe with date index
- config: A dictionary of optional arguments
- Optional arguments in config:
longitude_column: The name of the column with longitude data in a floating point format. Defaults to ‘double_longitude’. latitude_column: The name of the column with latitude data in a floating point format. Defaults to ‘double_latitude’. speed_column: The name of the column with speed data in a floating point format. Defaults to ‘double_speed’. resample_args: a dictionary of arguments for the Pandas resample function. For example to resample by hour, you would pass {“rule”: “1h”}.
- niimpy.preprocessing.location.number_of_significant_places(lats, lons, times)[source]
Computes number of significant places.
Number of significant plcaes is computed by first clustering the locations in each month and then taking the median of the number of clusters in each month.
It is assumed that lats and lons are the coordinates of static points.
- Parameters:
- latspd.DataFrame
Latitudes
- lonspd.DataFrame
Longitudes
- timesarray
Array of times
- Returnsthe number of significant places discovered