niimpy.preprocessing.communication module

niimpy.preprocessing.communication.call_count(df, config={})[source]

This function returns the number of times, within the specified timeframe, when a call has been received, missed, or initiated. If there is no specified timeframe, the function sets a 30 min default time window. The function aggregates this number by user, by timewindow.

Parameters:
df: pandas.DataFrame

Input data frame

config: dict

Dictionary keys containing optional arguments for the computation of features. Keys can be column names, other dictionaries, etc. The functions needs the column name where the data is stored; if none is given, the default name employed by Aware Framework will be used. To include information about the resampling window, please include the selected parameters from pandas.DataFrame.resample in a dictionary called resample_args.

Returns:
result: dataframe

Resulting dataframe

niimpy.preprocessing.communication.call_distribution(df, config={})[source]

Calculates the distribution of calls sent and received over a time interval. The function first aggregates the number of calls over a shorter time interval, the bins, and then calculates the distribution of the message count over a longer interval, the time window.

Parameters:
df: pandas.DataFrame

Input data frame

config: dict

Dictionary keys containing optional arguments for the computation of features. Keys can be column names, other dictionaries, etc.

This function accepts col_name (default “call_type”), a time interval (default 1d) and a bin interval (default 1h).

Returns:
result: dataframe

Resulting dataframe

niimpy.preprocessing.communication.call_duration_mean(df, config={})[source]

This function returns the average duration of each call type, within the specified timeframe. The call types are incoming, outgoing, and missed. If there is no specified timeframe, the function sets a 30 min default time window. The function aggregates this number by user, by timewindow.

Parameters:
df: pandas.DataFrame

Input data frame

config: dict

Dictionary keys containing optional arguments for the computation of features. Keys can be column names, other dictionaries, etc. The functions needs the column name where the data is stored; if none is given, the default name employed by Aware Framework will be used. To include information about the resampling window, please include the selected parameters from pandas.DataFrame.resample in a dictionary called resample_args.

Returns:
result: dataframe

Resulting dataframe

niimpy.preprocessing.communication.call_duration_median(df, config={})[source]

This function returns the median duration of each call type, within the specified timeframe. The call types are incoming, outgoing, and missed. If there is no specified timeframe, the function sets a 30 min default time window. The function aggregates this number by user, by timewindow.

Parameters:
df: pandas.DataFrame

Input data frame

bat: pandas.DataFrame

Dataframe with the battery information

config: dict

Dictionary keys containing optional arguments for the computation of features. Keys can be column names, other dictionaries, etc. The functions needs the column name where the data is stored; if none is given, the default name employed by Aware Framework will be used. To include information about the resampling window, please include the selected parameters from pandas.DataFrame.resample in a dictionary called resample_args.

Returns:
result: dataframe

Resulting dataframe

niimpy.preprocessing.communication.call_duration_std(df, config={})[source]

This function returns the standard deviation of the duration of each call type, within the specified timeframe. The call types are incoming, outgoing, and missed. If there is no specified timeframe, the function sets a 30 min default time window. The function aggregates this number by user, by timewindow.

Parameters:
df: pandas.DataFrame

Input data frame

config: dict

Dictionary keys containing optional arguments for the computation of features. Keys can be column names, other dictionaries, etc. The functions needs the column name where the data is stored; if none is given, the default name employed by Aware Framework will be used. To include information about the resampling window, please include the selected parameters from pandas.DataFrame.resample in a dictionary called resample_args.

Returns:
result: dataframe

Resulting dataframe

niimpy.preprocessing.communication.call_duration_total(df, config={})[source]

This function returns the total duration of each call type, within the specified timeframe. The call types are incoming, outgoing, and missed. If there is no specified timeframe, the function sets a 30 min default time window. The function aggregates this number by user, by timewindow.

Parameters:
df: pandas.DataFrame

Input data frame

config: dict

Dictionary keys containing optional arguments for the computation of features. Keys can be column names, other dictionaries, etc. The functions needs the column name where the data is stored; if none is given, the default name employed by Aware Framework will be used. To include information about the resampling window, please include the selected parameters from pandas.DataFrame.resample in a dictionary called resample_args.

Returns:
result: dataframe

Resulting dataframe

niimpy.preprocessing.communication.call_outgoing_incoming_ratio(df, config={})[source]

This function returns the ratio of outgoing calls over incoming calls, within the specified timeframe. If there is no specified timeframe, the function sets a 30 min default time window. The function aggregates this number by user, by timewindow.

Parameters:
df: pandas.DataFrame

Input data frame

config: dict

Dictionary keys containing optional arguments for the computation of features. Keys can be column names, other dictionaries, etc. The functions needs the column name where the data is stored; if none is given, the default name employed by Aware Framework will be used. To include information about the resampling window, please include the selected parameters from pandas.DataFrame.resample in a dictionary called resample_args.

Returns:
result: dataframe

Resulting dataframe

niimpy.preprocessing.communication.extract_features_comms(df, features=None)[source]

This function computes and organizes the selected features for calls and SMS events. The function aggregates the features by user, by time window. If no time window is specified, it will automatically aggregate the features in 30 mins non-overlapping windows.

The complete list of features that can be calculated are: call_duration_total, call_duration_mean, call_duration_median, call_duration_std, call_count, call_outgoing_incoming_ratio, sms_count

Parameters:
df: pandas.DataFrame

Input data frame

features: dict, optional

Dictionary keys contain the names of the features to compute. If none is given, all features will be computed.

Returns:
result: dataframe

Resulting dataframe

niimpy.preprocessing.communication.group_data(df)[source]

Group the dataframe by a standard set of columns listed in group_by_columns.

niimpy.preprocessing.communication.message_count(df, config={})[source]

This function returns the number of times, within the specified timeframe, when an SMS has been sent/received. If there is no specified timeframe, the function sets a 30 min default time window. The function aggregates this number by user, by timewindow.

Parameters:
df: pandas.DataFrame

Input data frame

config: dict

Dictionary keys containing optional arguments for the computation of features. Keys can be column names, other dictionaries, etc. The functions needs the column name where the data is stored; if none is given, the default name employed by Aware Framework will be used. To include information about the resampling window, please include the selected parameters from pandas.DataFrame.resample in a dictionary called resample_args.

Returns:
result: dataframe

Resulting dataframe

niimpy.preprocessing.communication.message_distribution(df, config={})[source]

Calculates the distribution of messages sent and received over a time interval. The function first aggregates the number of messages over a shorter time interval, the bins, and then calculates the distribution of the message count over a longer interval, the time window.

Parameters:
df: pandas.DataFrame

Input data frame

config: dict

Dictionary keys containing optional arguments for the computation of features. Keys can be column names, other dictionaries, etc.

This function accepts col_name, a time interval (default 1d) and a bin interval (default 1h).

if col_name is given, the data is first filtered to remove NaN values in that column.

Returns:
result: dataframe

Resulting dataframe

niimpy.preprocessing.communication.message_outgoing_incoming_ratio(df, config={})[source]

This function returns the ratio of outgoing messages over incoming messages, within the specified timeframe. If there is no specified timeframe, the function sets a 30 min default time window. The function aggregates this number by user, by timewindow.

Parameters:
df: pandas.DataFrame

Input data frame

config: dict

Dictionary keys containing optional arguments for the computation of features. Keys can be column names, other dictionaries, etc. The functions needs the column name where the data is stored; if none is given, the default name employed by Aware Framework will be used. To include information about the resampling window, please include the selected parameters from pandas.DataFrame.resample in a dictionary called resample_args.

Returns:
result: dataframe

Resulting dataframe

niimpy.preprocessing.communication.reset_groups(df)[source]

Group the dataframe by a standard set of columns listed in group_by_columns.