niimpy.preprocessing.util module
- niimpy.preprocessing.util.aggregate(df, freq, method_numerical='mean', method_categorical='first', groups=['user'], **resample_kwargs)[source]
Grouping and resampling the data. This function performs separated resampling for different types of columns: numerical and categorical.
- Parameters:
- dfpandas Dataframe
Dataframe to resample
- freqstring
Frequency to resample the data. Requires the dataframe to have datetime-like index.
- method_numericalstr
Resampling method for numerical columns. Possible values: ‘sum’, ‘mean’, ‘median’. Default value is ‘mean’.
- method_categoricalstr
Resampling method for categorical columns. Possible values: ‘first’, ‘mode’, ‘last’.
- groupslist
Columns used for groupby operation.
- resample_kwargsdict
keywords to pass pandas resampling function
- Returns:
- An aggregated and resampled multi-index dataframe.
- niimpy.preprocessing.util.date_range(df, start, end)[source]
Extract out a certain date range from a DataFrame.
Extract out a certain data range from a dataframe. The index must be the dates, and the index must be sorted.
- niimpy.preprocessing.util.df_normalize(df, tz=None, old_tz=None)[source]
Normalize a df (from sql) before presenting it to the user.
This sets the dataframe index to the time values, and converts times to pandas.TimeStamp:s. Modifies the data frame inplace.
- niimpy.preprocessing.util.install_extensions()[source]
Automatically install sqlite extension functions.
Only works on Linux for now, improvements welcome.
- niimpy.preprocessing.util.occurrence(series, bins=5, interval='1h')[source]
Resamples by grouping_width and aggregates by the number of bins with data.
With default options, this reproduces the logic of the “occurrence” database function, without needing the database.
- Parameters:
- seriespandas.Series
A pandas series of pandas.Timestamps.
- binsint
The number of bins each time interval is divided into.
- intervalstr
Length of the time interval. Default is “1h”.
- Returns:
- pandas.DataFrame
Dataframe with timestamp index and ‘occurance’ column.
- niimpy.preprocessing.util.set_encoding(df, to_encoding='utf-8', from_encoding='iso-8859-1')[source]
Recode the dataframe to a different encoding. This is useful when the encoding in a data file is set incorrectly and utf characters are garbled.
- Parameters:
- dfpandas.DataFrame
Dataframe to recode
- to_encodingstr
Encoding to convert to. Default is ‘utf-8’.
- from_encodingstr
Encoding to convert from. Default is ‘iso-8859-1’.
- Returns:
- pandas.DataFrame
Recoded dataframe.
- niimpy.preprocessing.util.tmp_timezone(new_tz)[source]
Temporarily override the global timezone for a black.
This is used as a context manager:
with tmp_timezone('Europe/Berlin'): ....
Note: this overrides the global timezone. In the future, there will be a way to handle timezones as non-global variables, which should be preferred.