niimpy.preprocessing.location module
- niimpy.preprocessing.location.cluster_locations(lats, lons, min_samples=5, eps=200)[source]
Performs clustering on the locations
- Parameters:
- latspd.DataFrame
Latitudes
- lonspd.DataFrame
Longitudes
- mins_samplesint
Minimum number of samples to form a cluster. Default is 5.
- epsfloat
Epsilone parameter in DBSCAN. The maximum distance between two neighbour samples. Default is 200.
- Returns:
- clustersarray
Array of clusters. -1 indicates outlier.
- niimpy.preprocessing.location.compute_nbin_maxdist_home(lats, lons, latlon_home, home_radius=50)[source]
Computes number of bins in home and maximum distance to home
- Parameters:
- latspd.DataFrame
Latitudes
- lonspd.DataFrame
Longitudes
- latlon_homearray
A tuple (lat, lon) showing the coordinate of home
- Returns:
- (n_home, max_dist_home)tuple
n_home: number of bins the person has been near the home max_dist_home: maximum distance that the person has been from home
- niimpy.preprocessing.location.distance_matrix(lats, lons)[source]
Compute distance matrix using great-circle distance formula
https://en.wikipedia.org/wiki/Great-circle_distance#Formulae
- Parameters:
- latsarray
Latitudes
- lonsarray
Longitudes
- Returns:
- distsmatrix
Entry (i, j) shows the great-circle distance between point i and j, i.e. distance between (lats[i], lons[i]) and (lats[j], lons[j]).
- niimpy.preprocessing.location.extract_features_location(df, features=None)[source]
Calculates location features
- Parameters:
- dfpd.DataFrame
dataframe of location data. It must contain these columns: latitude, longitude, user, group. speed is optional. If not provided, it will be computed manually.
- speed_thresholdfloat
Bins whose speed is lower than speed_threshold are considred static and the rest are moving.
- featuresmap (dictionary) of functions that compute features.
it is a map of map, where the keys to the first map is the name of functions that compute features and the nested map contains the keyword arguments to that function. If there is no arguments use an empty map. Default is None. If None, all the available functions are used. Those functions are in the dict location.ALL_FEATURES. You can implement your own function and use it instead or add it to the mentioned map.
- Returns:
- featurespd.DataFrame
Dataframe of computed features where the index is users and columns are the the features.
- niimpy.preprocessing.location.filter_location(location, remove_disabled=True, remove_zeros=True, remove_network=False, latitude_column='latitude', longitude_column='longitude', label_column='label', provider_column='provider')[source]
Remove low-quality or weird location samples
- Parameters:
- locationpd.DataFrame
DataFrame of locations
- remove_disabledbool
Remove locations whose label is disabled
- remove_zerobool
Remove locations which their latitude and longitueds are close to 0
- remove_networkbool
Keep only locations whose provider is gps
- Returns:
- locationpd.DataFrame
- niimpy.preprocessing.location.find_home(lats, lons, times)[source]
Find coordinates of the home of a person
Home is defined as the place most visited between 12am - 6am. Locations within this time period first clustered and then the center of largest clusetr shows the home.
- Parameters:
- latsarray-like
Latitudes
- lonsarray-like
Longitudes
- timesarray-like
Time of the recorderd coordinates
- Returns
- ——
- (lat_home, lon_home)tuple of floats
Coordinates of the home
- niimpy.preprocessing.location.get_speeds_totaldist(lats, lons, times)[source]
Computes speed of bins with dividing distance by their time difference
- Parameters:
- latsarray-like
Array of latitudes
- lonsarray-like
Array of longitudes
- timesarray-like
Array of times associted with bins
- Returns
- ——
- (speeds, total_distances)tuple of speeds (array) and total distance travled (float)
- niimpy.preprocessing.location.location_distance_features(df, latitude_column='latitude', longitude_column='latitude', speed_column='speed', resample_args={'rule': '1ME'}, **kwargs)[source]
Calculates features related to distance and speed.
- Parameters:
- df: dataframe with date index
- config: A dictionary of optional arguments
- Optional arguments in config:
longitude_column: The name of the column with longitude data in a floating point format. Defaults to ‘longitude’. latitude_column: The name of the column with latitude data in a floating point format. Defaults to ‘latitude’. speed_column: The name of the column with speed data in a floating point format. Defaults to ‘speed’. resample_args: a dictionary of arguments for the Pandas resample function. For example to resample by hour, you would pass {“rule”: “1h”}.
- niimpy.preprocessing.location.location_local_time(df, longitude_column='longitude', latitude_column='latitude', resample_args={'rule': '1ME'})[source]
Calculates the local time of the user based on the longitude.
- Parameters:
- df: dataframe with date index
- config: A dictionary of optional arguments
- niimpy.preprocessing.location.location_significant_place_features(df, latitude_column='latitude', longitude_column='latitude', speed_column='speed', speed_threshold=0.277, resample_args={'rule': '1ME'}, **kwargs)[source]
Calculates features related to Significant Places.
- Parameters:
- df: dataframe with date index
- config: A dictionary of optional arguments
- Optional arguments in config:
longitude_column: The name of the column with longitude data in a floating point format. Defaults to ‘longitude’. latitude_column: The name of the column with latitude data in a floating point format. Defaults to ‘latitude’. speed_column: The name of the column with speed data in a floating point format. Defaults to ‘speed’. resample_args: a dictionary of arguments for the Pandas resample function. For example to resample by hour, you would pass {“rule”: “1h”}.
- niimpy.preprocessing.location.number_of_significant_places(df, latitude_column='latitude', longitude_column='longitude', resample_args={'rule': '1ME'}, **kwargs)[source]
Computes number of significant places.
This feature is included in location_significant_place_features as n_sps and this standalone function is not included in default location features.