niimpy.preprocessing.location module

niimpy.preprocessing.location.cluster_locations(lats, lons, min_samples=5, eps=200)[source]

Performs clustering on the locations

Parameters:

latspd.DataFrame: Latitudes
lonspd.DataFrame: Longitudes
mins_samplesint: Minimum number of samples to form a cluster. Default is 5.
epsfloat: Epsilone parameter in DBSCAN. The maximum distance between two neighbour samples. Default is 200.

Returns:

clustersarray: Array of clusters. -1 indicates outlier.

niimpy.preprocessing.location.compute_nbin_maxdist_home(lats, lons, latlon_home, home_radius=50)[source]

Computes number of bins in home and maximum distance to home

Parameters:

latspd.DataFrame: Latitudes
lonspd.DataFrame: Longitudes
latlon_homearray: A tuple (lat, lon) showing the coordinate of home

Returns:

(n_home, max_dist_home)tuple: n_home: number of bins the person has been near the home max_dist_home: maximum distance that the person has been from home

niimpy.preprocessing.location.distance_matrix(lats, lons)[source]

Compute distance matrix using great-circle distance formula

https://en.wikipedia.org/wiki/Great-circle_distance#Formulae

Parameters:

latsarray: Latitudes
lonsarray: Longitudes

Returns:

distsmatrix: Entry (i, j) shows the great-circle distance between point i and j, i.e. distance between (lats[i], lons[i]) and (lats[j], lons[j]).

niimpy.preprocessing.location.extract_features_location(df, features=None)[source]

Calculates location features

Parameters:

dfpd.DataFrame: dataframe of location data. It must contain these columns: latitude, longitude, user, group. speed is optional. If not provided, it will be computed manually.
speed_thresholdfloat: Bins whose speed is lower than speed_threshold are considred static and the rest are moving.
featuresmap (dictionary) of functions that compute features.: it is a map of map, where the keys to the first map is the name of functions that compute features and the nested map contains the keyword arguments to that function. If there is no arguments use an empty map. Default is None. If None, all the available functions are used. Those functions are in the dict location.ALL_FEATURES. You can implement your own function and use it instead or add it to the mentioned map.

Returns:

featurespd.DataFrame: Dataframe of computed features where the index is users and columns are the the features.

niimpy.preprocessing.location.filter_location(location, remove_disabled=True, remove_zeros=True, remove_network=False, latitude_column='latitude', longitude_column='longitude', label_column='label', provider_column='provider')[source]

Remove low-quality or weird location samples

Parameters:

locationpd.DataFrame: DataFrame of locations
remove_disabledbool: Remove locations whose label is disabled
remove_zerobool: Remove locations which their latitude and longitueds are close to 0
remove_networkbool: Keep only locations whose provider is gps

Returns:

locationpd.DataFrame

niimpy.preprocessing.location.find_home(lats, lons, times)[source]

Find coordinates of the home of a person

Home is defined as the place most visited between 12am - 6am. Locations within this time period first clustered and then the center of largest clusetr shows the home.

Parameters:

latsarray-like: Latitudes
lonsarray-like: Longitudes
timesarray-like: Time of the recorderd coordinates
Returns
——
(lat_home, lon_home)tuple of floats: Coordinates of the home

niimpy.preprocessing.location.get_speeds_totaldist(lats, lons, times)[source]

Computes speed of bins with dividing distance by their time difference

Parameters:

latsarray-like: Array of latitudes
lonsarray-like: Array of longitudes
timesarray-like: Array of times associted with bins
Returns
——
(speeds, total_distances)tuple of speeds (array) and total distance travled (float)

niimpy.preprocessing.location.location_distance_features(df, latitude_column='latitude', longitude_column='latitude', speed_column='speed', resample_args={'rule': '1ME'}, **kwargs)[source]

Calculates features related to distance and speed.

Parameters:

df: dataframe with date index
config: A dictionary of optional arguments
Optional arguments in config:: longitude_column: The name of the column with longitude data in a floating point format. Defaults to ‘longitude’. latitude_column: The name of the column with latitude data in a floating point format. Defaults to ‘latitude’. speed_column: The name of the column with speed data in a floating point format. Defaults to ‘speed’. resample_args: a dictionary of arguments for the Pandas resample function. For example to resample by hour, you would pass {“rule”: “1h”}.

niimpy.preprocessing.location.location_local_time(df, longitude_column='longitude', latitude_column='latitude', resample_args={'rule': '1ME'})[source]

Calculates the local time of the user based on the longitude.

Parameters:

df: dataframe with date index
config: A dictionary of optional arguments

niimpy.preprocessing.location.location_significant_place_features(df, latitude_column='latitude', longitude_column='latitude', speed_column='speed', speed_threshold=0.277, resample_args={'rule': '1ME'}, **kwargs)[source]

Calculates features related to Significant Places.

Parameters:

df: dataframe with date index
config: A dictionary of optional arguments
Optional arguments in config:: longitude_column: The name of the column with longitude data in a floating point format. Defaults to ‘longitude’. latitude_column: The name of the column with latitude data in a floating point format. Defaults to ‘latitude’. speed_column: The name of the column with speed data in a floating point format. Defaults to ‘speed’. resample_args: a dictionary of arguments for the Pandas resample function. For example to resample by hour, you would pass {“rule”: “1h”}.

niimpy.preprocessing.location.number_of_significant_places(df, latitude_column='latitude', longitude_column='longitude', resample_args={'rule': '1ME'}, **kwargs)[source]

Computes number of significant places.

This feature is included in location_significant_place_features as n_sps and this standalone function is not included in default location features.