niimpy.preprocessing.survey module
- niimpy.preprocessing.survey.clean_survey_column_names(df)[source]
This function takes a pandas DataFrame as input and cleans the column names by removing or replacing specified characters. It helps to ensure standardized and clean column names for further analysis or processing.
- Parameters:
- dfpandas dataframe
The input DataFrame with column names to be cleaned.
- Returns:
- dfpandas.DataFrame
The DataFrame with cleaned column names.
- niimpy.preprocessing.survey.convert_survey_to_numerical_answer(df, id_map, use_prefix=False)[source]
Convert text answers into numerical value (assuming a long dataframe). Use answer mapping dictionaries provided by the users to convert the answers. Can convert multiple questions having the same prefix (e.g., PSS10_1, PSS10_2, …,PSS10_9) if prefix mapping is provided. Function returns original values for the answers that have not been specified for conversion.
- Parameters:
- dfpandas dataframe
Dataframe containing the questions
- answer_colstr
Name of the column containing the answers
- question_idstr
Name of the column containing the question id.
- id_mapdictionary
Dictionary containing answer mappings (value) for each question_id (key), or a dictionary containing a map for each question id prefix if use_prefix option is used.
- use_prefixboolean
If False, uses given map (id_map) to convert questions. The default is False. If True, use question id prefix map, so that multiple question_id’s having the same prefix may be converted on the same time.
- Returns:
- resultpandas series
Series containing converted values and original values for aswers hat are not supposed to be converted.
- niimpy.preprocessing.survey.extract_features_survey(df, features=None)[source]
Calculates survey features
- Parameters:
- dfpd.DataFrame
dataframe of survey data. Must follow Niimpy format. In additions, each survey question must be in a single column and the column name must be formatted as survey-id_question-number (for example PHQ9_3).
- featuresmap (dictionary) of functions that compute features.
it is a map of map, where the keys to the first map is the name of functions that compute features and the nested map contains the keyword arguments to that function. If there is no arguments use an empty map. Default is None. If None, all the available functions are used. Those functions are in the dict survey.ALL_FEATURES. You can implement your own function and use it instead or add it to the mentioned map.
- Returns:
- featurespd.DataFrame
Dataframe of computed features where the index is users and columns are the the features.
- niimpy.preprocessing.survey.group_data(df)[source]
Group the dataframe by a standard set of columns listed in group_by_columns.
- niimpy.preprocessing.survey.reset_groups(df)[source]
Group the dataframe by a standard set of columns listed in group_by_columns.
- niimpy.preprocessing.survey.sum_survey_scores(df, survey_prefix=None)[source]
Sum all columns (like
PHQ9_*
) to get a survey score.Parameters
- df: pandas DataFrame
DataFrame should be a DateTime index, an answer_column with numeric scores, and an id_column with question IDs like “PHQ9_1”, “PHQ9_2”, etc. The given survey_prefix is the “PHQ9” (no underscore) part which selects the right questions (rows not matching this prefix won’t be included).
- survey_prefix: string
The survey prefix in the ‘id’ column, e.g. ‘PHQ9’. An ‘_’ is appended.
- niimpy.preprocessing.survey.survey_statistic(df, config)[source]
Return statistics for a single survey question or a list of questions. Assuming that each of the columns contains numerical values representing answers, this function returns the mean, maximum, minimum and standard deviation for each question in separate columns.
- Parameters:
- df: pandas.DataFrame
Input data frame
- config: dict
Dictionary keys containing optional arguments for the computation of screen information
- configuration options include:
- columns: string or list(string), optional
A list of columns to process. If empty, the prefix will be used to identify columns
- prefix: string or list(string)
required unless columns is given. The function will process columns whose name starts with the prefix (QID_0, QID_1, …)
- Returns:
- dict: pandas.DataFrame
A dataframe containing summaries of each questionaire.