{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Survey Data\n",
"\n",
"Survey single survey row can contain answers to multiple questions. The survey dataframe should contain a `user` column containing the user ID. In addition, multiple columns with anwers to survey questions should be provided (see example below for clarification). Each column title represents the question and the value on a given row represents the answer. As usual, the DataFrame index is the timestamp of the answer.\n",
"\n",
"Question titles should be converted into a string with a questionaire prefix and a question number. For example, the first question in \"PHQ2\" would be \"PHQ2_1\". We provide utilities for converting some common questionaires to this format, as shown below. Similarly, answers should be converted into numerical values."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load data"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Artificial example survey data\n",
"import niimpy\n",
"from niimpy import config\n",
"import niimpy.preprocessing.survey as survey\n",
"from niimpy.preprocessing.survey import *\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\")"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" user | \n",
" age | \n",
" gender | \n",
" Little interest or pleasure in doing things. | \n",
" Feeling down; depressed or hopeless. | \n",
" Feeling nervous; anxious or on edge. | \n",
" Not being able to stop or control worrying. | \n",
" In the last month; how often have you felt that you were unable to control the important things in your life? | \n",
" In the last month; how often have you felt confident about your ability to handle your personal problems? | \n",
" In the last month; how often have you felt that things were going your way? | \n",
" In the last month; how often have you been able to control irritations in your life? | \n",
" In the last month; how often have you felt that you were on top of things? | \n",
" In the last month; how often have you been angered because of things that were outside of your control? | \n",
" In the last month; how often have you felt difficulties were piling up so high that you could not overcome them? | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 20 | \n",
" Male | \n",
" several-days | \n",
" more-than-half-the-days | \n",
" not-at-all | \n",
" nearly-every-day | \n",
" almost-never | \n",
" sometimes | \n",
" fairly-often | \n",
" never | \n",
" sometimes | \n",
" very-often | \n",
" fairly-often | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 32 | \n",
" Male | \n",
" more-than-half-the-days | \n",
" more-than-half-the-days | \n",
" not-at-all | \n",
" several-days | \n",
" never | \n",
" never | \n",
" very-often | \n",
" sometimes | \n",
" never | \n",
" fairly-often | \n",
" never | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 15 | \n",
" Male | \n",
" more-than-half-the-days | \n",
" not-at-all | \n",
" several-days | \n",
" not-at-all | \n",
" never | \n",
" very-often | \n",
" very-often | \n",
" fairly-often | \n",
" never | \n",
" never | \n",
" almost-never | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" 35 | \n",
" Female | \n",
" not-at-all | \n",
" nearly-every-day | \n",
" not-at-all | \n",
" several-days | \n",
" very-often | \n",
" fairly-often | \n",
" very-often | \n",
" never | \n",
" sometimes | \n",
" never | \n",
" fairly-often | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" 23 | \n",
" Male | \n",
" more-than-half-the-days | \n",
" not-at-all | \n",
" more-than-half-the-days | \n",
" several-days | \n",
" almost-never | \n",
" very-often | \n",
" almost-never | \n",
" sometimes | \n",
" sometimes | \n",
" very-often | \n",
" never | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" user age gender Little interest or pleasure in doing things. \\\n",
"0 1 20 Male several-days \n",
"1 2 32 Male more-than-half-the-days \n",
"2 3 15 Male more-than-half-the-days \n",
"3 4 35 Female not-at-all \n",
"4 5 23 Male more-than-half-the-days \n",
"\n",
" Feeling down; depressed or hopeless. Feeling nervous; anxious or on edge. \\\n",
"0 more-than-half-the-days not-at-all \n",
"1 more-than-half-the-days not-at-all \n",
"2 not-at-all several-days \n",
"3 nearly-every-day not-at-all \n",
"4 not-at-all more-than-half-the-days \n",
"\n",
" Not being able to stop or control worrying. \\\n",
"0 nearly-every-day \n",
"1 several-days \n",
"2 not-at-all \n",
"3 several-days \n",
"4 several-days \n",
"\n",
" In the last month; how often have you felt that you were unable to control the important things in your life? \\\n",
"0 almost-never \n",
"1 never \n",
"2 never \n",
"3 very-often \n",
"4 almost-never \n",
"\n",
" In the last month; how often have you felt confident about your ability to handle your personal problems? \\\n",
"0 sometimes \n",
"1 never \n",
"2 very-often \n",
"3 fairly-often \n",
"4 very-often \n",
"\n",
" In the last month; how often have you felt that things were going your way? \\\n",
"0 fairly-often \n",
"1 very-often \n",
"2 very-often \n",
"3 very-often \n",
"4 almost-never \n",
"\n",
" In the last month; how often have you been able to control irritations in your life? \\\n",
"0 never \n",
"1 sometimes \n",
"2 fairly-often \n",
"3 never \n",
"4 sometimes \n",
"\n",
" In the last month; how often have you felt that you were on top of things? \\\n",
"0 sometimes \n",
"1 never \n",
"2 never \n",
"3 sometimes \n",
"4 sometimes \n",
"\n",
" In the last month; how often have you been angered because of things that were outside of your control? \\\n",
"0 very-often \n",
"1 fairly-often \n",
"2 never \n",
"3 never \n",
"4 very-often \n",
"\n",
" In the last month; how often have you felt difficulties were piling up so high that you could not overcome them? \n",
"0 fairly-often \n",
"1 never \n",
"2 almost-never \n",
"3 fairly-often \n",
"4 never "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = niimpy.read_csv(config.SURVEY_PATH, tz='Europe/Helsinki')\n",
"df.head()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preprocessing\n",
"\n",
"Currently the dataframe columns are raw questions and answers from the survey. We will use `Niimpy` to convert them to a numerical format, but first the dataframe should follow the general `Niimpy` Schema. The rows should be indexed by a datetime index, rather than a number.\n",
"\n",
"Since the data does not contain a timestamp, we must assume that each user has only completed the survey once. If the surveys were completed on January 1st 2020, for example, we would replace the index with this date."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# Assign the same time index to all survey responses\n",
"df.index = [pd.Timestamp(\"1.1.2020\", tz='Europe/Helsinki')]*df.shape[0]"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Next we will convert the questions to a standard identifier format `Niimpy` will understand. The questions are from PHQ2, GAD2 and PSS10 standard surveys. `Niimpy` provides mappings from raw question text to question ids for these surveys. The identifiers is constructed from a prefix (the questionnaire category: GAD, PHQ, PSQI etc.), followed by the question number (1,2,3). You can define your own identifiers or use the ones provided by `Niimpy`.\n",
"\n",
"Before applying the mapping, the column names should be cleaned using the `clean_survey_column_names` function. This removes punctuation in the question text."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'Little interest or pleasure in doing things': 'PHQ2_1',\n",
" 'Feeling down depressed or hopeless': 'PHQ2_2'}"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# For example, the mapping dictionary for PHQ2 is\n",
"PHQ2_MAP"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" user | \n",
" age | \n",
" gender | \n",
" PHQ2_1 | \n",
" PHQ2_2 | \n",
" GAD2_1 | \n",
" GAD2_2 | \n",
" PSS10_2 | \n",
" PSS10_4 | \n",
" PSS10_5 | \n",
" PSS10_6 | \n",
" PSS10_7 | \n",
" PSS10_8 | \n",
" PSS10_9 | \n",
"
\n",
" \n",
" \n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 1 | \n",
" 20 | \n",
" Male | \n",
" several-days | \n",
" more-than-half-the-days | \n",
" not-at-all | \n",
" nearly-every-day | \n",
" almost-never | \n",
" sometimes | \n",
" fairly-often | \n",
" never | \n",
" sometimes | \n",
" very-often | \n",
" fairly-often | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 2 | \n",
" 32 | \n",
" Male | \n",
" more-than-half-the-days | \n",
" more-than-half-the-days | \n",
" not-at-all | \n",
" several-days | \n",
" never | \n",
" never | \n",
" very-often | \n",
" sometimes | \n",
" never | \n",
" fairly-often | \n",
" never | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 3 | \n",
" 15 | \n",
" Male | \n",
" more-than-half-the-days | \n",
" not-at-all | \n",
" several-days | \n",
" not-at-all | \n",
" never | \n",
" very-often | \n",
" very-often | \n",
" fairly-often | \n",
" never | \n",
" never | \n",
" almost-never | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 4 | \n",
" 35 | \n",
" Female | \n",
" not-at-all | \n",
" nearly-every-day | \n",
" not-at-all | \n",
" several-days | \n",
" very-often | \n",
" fairly-often | \n",
" very-often | \n",
" never | \n",
" sometimes | \n",
" never | \n",
" fairly-often | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 5 | \n",
" 23 | \n",
" Male | \n",
" more-than-half-the-days | \n",
" not-at-all | \n",
" more-than-half-the-days | \n",
" several-days | \n",
" almost-never | \n",
" very-often | \n",
" almost-never | \n",
" sometimes | \n",
" sometimes | \n",
" very-often | \n",
" never | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" user age gender PHQ2_1 \\\n",
"2020-01-01 00:00:00+02:00 1 20 Male several-days \n",
"2020-01-01 00:00:00+02:00 2 32 Male more-than-half-the-days \n",
"2020-01-01 00:00:00+02:00 3 15 Male more-than-half-the-days \n",
"2020-01-01 00:00:00+02:00 4 35 Female not-at-all \n",
"2020-01-01 00:00:00+02:00 5 23 Male more-than-half-the-days \n",
"\n",
" PHQ2_2 GAD2_1 \\\n",
"2020-01-01 00:00:00+02:00 more-than-half-the-days not-at-all \n",
"2020-01-01 00:00:00+02:00 more-than-half-the-days not-at-all \n",
"2020-01-01 00:00:00+02:00 not-at-all several-days \n",
"2020-01-01 00:00:00+02:00 nearly-every-day not-at-all \n",
"2020-01-01 00:00:00+02:00 not-at-all more-than-half-the-days \n",
"\n",
" GAD2_2 PSS10_2 PSS10_4 \\\n",
"2020-01-01 00:00:00+02:00 nearly-every-day almost-never sometimes \n",
"2020-01-01 00:00:00+02:00 several-days never never \n",
"2020-01-01 00:00:00+02:00 not-at-all never very-often \n",
"2020-01-01 00:00:00+02:00 several-days very-often fairly-often \n",
"2020-01-01 00:00:00+02:00 several-days almost-never very-often \n",
"\n",
" PSS10_5 PSS10_6 PSS10_7 \\\n",
"2020-01-01 00:00:00+02:00 fairly-often never sometimes \n",
"2020-01-01 00:00:00+02:00 very-often sometimes never \n",
"2020-01-01 00:00:00+02:00 very-often fairly-often never \n",
"2020-01-01 00:00:00+02:00 very-often never sometimes \n",
"2020-01-01 00:00:00+02:00 almost-never sometimes sometimes \n",
"\n",
" PSS10_8 PSS10_9 \n",
"2020-01-01 00:00:00+02:00 very-often fairly-often \n",
"2020-01-01 00:00:00+02:00 fairly-often never \n",
"2020-01-01 00:00:00+02:00 never almost-never \n",
"2020-01-01 00:00:00+02:00 never fairly-often \n",
"2020-01-01 00:00:00+02:00 very-often never "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Convert column name to id, based on provided mappers from niimpy\n",
"column_map = {**PHQ2_MAP, **PSS10_MAP, **GAD2_MAP}\n",
"df = survey.clean_survey_column_names(df)\n",
"df = df.rename(column_map, axis = 1)\n",
"df.head()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Now the dataframe follows the `Niimpy` standard schema. Next we will use `niimpy` to convert the raw answers to numerical values for further analysis. For this, we need a mapping `{raw_answer: numerical_answer}`, which `niimpy` provides within the `survey` module. You can also use your own mapping.\n",
"\n",
"Based on the question's id, `niimpy` maps the raw answers to their numerical presentation."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'PSS': {'never': 0,\n",
" 'almost never': 1,\n",
" 'sometimes': 2,\n",
" 'fairly often': 3,\n",
" 'very often': 4},\n",
" 'PHQ2': {'not at all': 0,\n",
" 'several days': 1,\n",
" 'more than half the days': 2,\n",
" 'nearly every day': 3},\n",
" 'GAD2': {'not at all': 0,\n",
" 'several days': 1,\n",
" 'more than half the days': 2,\n",
" 'nearly every day': 3}}"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# The mapping dictionary included in Niimpy is\n",
"ID_MAP_PREFIX"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" user | \n",
" age | \n",
" gender | \n",
" PHQ2_1 | \n",
" PHQ2_2 | \n",
" GAD2_1 | \n",
" GAD2_2 | \n",
" PSS10_2 | \n",
" PSS10_4 | \n",
" PSS10_5 | \n",
" PSS10_6 | \n",
" PSS10_7 | \n",
" PSS10_8 | \n",
" PSS10_9 | \n",
"
\n",
" \n",
" \n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 1 | \n",
" 20 | \n",
" Male | \n",
" 1 | \n",
" 2 | \n",
" 0 | \n",
" 3 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
" 0 | \n",
" 2 | \n",
" 4 | \n",
" 3 | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 2 | \n",
" 32 | \n",
" Male | \n",
" 2 | \n",
" 2 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 4 | \n",
" 2 | \n",
" 0 | \n",
" 3 | \n",
" 0 | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 3 | \n",
" 15 | \n",
" Male | \n",
" 2 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 4 | \n",
" 4 | \n",
" 3 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 4 | \n",
" 35 | \n",
" Female | \n",
" 0 | \n",
" 3 | \n",
" 0 | \n",
" 1 | \n",
" 4 | \n",
" 3 | \n",
" 4 | \n",
" 0 | \n",
" 2 | \n",
" 0 | \n",
" 3 | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 5 | \n",
" 23 | \n",
" Male | \n",
" 2 | \n",
" 0 | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" 4 | \n",
" 1 | \n",
" 2 | \n",
" 2 | \n",
" 4 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" user age gender PHQ2_1 PHQ2_2 GAD2_1 GAD2_2 \\\n",
"2020-01-01 00:00:00+02:00 1 20 Male 1 2 0 3 \n",
"2020-01-01 00:00:00+02:00 2 32 Male 2 2 0 1 \n",
"2020-01-01 00:00:00+02:00 3 15 Male 2 0 1 0 \n",
"2020-01-01 00:00:00+02:00 4 35 Female 0 3 0 1 \n",
"2020-01-01 00:00:00+02:00 5 23 Male 2 0 2 1 \n",
"\n",
" PSS10_2 PSS10_4 PSS10_5 PSS10_6 PSS10_7 \\\n",
"2020-01-01 00:00:00+02:00 1 2 3 0 2 \n",
"2020-01-01 00:00:00+02:00 0 0 4 2 0 \n",
"2020-01-01 00:00:00+02:00 0 4 4 3 0 \n",
"2020-01-01 00:00:00+02:00 4 3 4 0 2 \n",
"2020-01-01 00:00:00+02:00 1 4 1 2 2 \n",
"\n",
" PSS10_8 PSS10_9 \n",
"2020-01-01 00:00:00+02:00 4 3 \n",
"2020-01-01 00:00:00+02:00 3 0 \n",
"2020-01-01 00:00:00+02:00 0 1 \n",
"2020-01-01 00:00:00+02:00 0 3 \n",
"2020-01-01 00:00:00+02:00 4 0 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Transform raw answers to numerical values\n",
"transformed_df = survey.convert_survey_to_numerical_answer(\n",
" df, id_map=ID_MAP_PREFIX, use_prefix=True\n",
")\n",
"transformed_df.head()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Survey score sums\n",
"\n",
"Next we can calucate the sum of each survey using the survey ID in the column name."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" user | \n",
" PHQ2 | \n",
" PSS10 | \n",
" GAD2 | \n",
"
\n",
" \n",
" \n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 1 | \n",
" 3 | \n",
" 15 | \n",
" 3 | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 2 | \n",
" 4 | \n",
" 9 | \n",
" 1 | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 3 | \n",
" 2 | \n",
" 12 | \n",
" 1 | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 4 | \n",
" 3 | \n",
" 16 | \n",
" 1 | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 5 | \n",
" 2 | \n",
" 14 | \n",
" 3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" user PHQ2 PSS10 GAD2\n",
"2020-01-01 00:00:00+02:00 1 3 15 3\n",
"2020-01-01 00:00:00+02:00 2 4 9 1\n",
"2020-01-01 00:00:00+02:00 3 2 12 1\n",
"2020-01-01 00:00:00+02:00 4 3 16 1\n",
"2020-01-01 00:00:00+02:00 5 2 14 3"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum_df = sum_survey_scores(transformed_df, [\"PHQ2\", \"PSS10\", \"GAD2\"])\n",
"sum_df.head()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Survey statistics\n",
"\n",
"Another common preprocessing step is to resample results to reduce noise or simplify the data. The `survey.survey_statistic` function split the results by time intervals and return relevant statistics of each survey sum or each question column over that interval.\n",
"\n",
"Note that since the example data contains a single time for each participant, the standard deviation is `NaN` and the other statistics are predictable."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" user | \n",
" PHQ2_mean | \n",
" PHQ2_min | \n",
" PHQ2_max | \n",
" PHQ2_std | \n",
" PSS10_mean | \n",
" PSS10_min | \n",
" PSS10_max | \n",
" PSS10_std | \n",
" GAD2_mean | \n",
" GAD2_min | \n",
" GAD2_max | \n",
" GAD2_std | \n",
"
\n",
" \n",
" \n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 1 | \n",
" 3.0 | \n",
" 3.0 | \n",
" 3.0 | \n",
" NaN | \n",
" 15.0 | \n",
" 15.0 | \n",
" 15.0 | \n",
" NaN | \n",
" 3.0 | \n",
" 3.0 | \n",
" 3.0 | \n",
" NaN | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 2 | \n",
" 4.0 | \n",
" 4.0 | \n",
" 4.0 | \n",
" NaN | \n",
" 9.0 | \n",
" 9.0 | \n",
" 9.0 | \n",
" NaN | \n",
" 1.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" NaN | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 3 | \n",
" 2.0 | \n",
" 2.0 | \n",
" 2.0 | \n",
" NaN | \n",
" 12.0 | \n",
" 12.0 | \n",
" 12.0 | \n",
" NaN | \n",
" 1.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" NaN | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 4 | \n",
" 3.0 | \n",
" 3.0 | \n",
" 3.0 | \n",
" NaN | \n",
" 16.0 | \n",
" 16.0 | \n",
" 16.0 | \n",
" NaN | \n",
" 1.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" NaN | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 5 | \n",
" 2.0 | \n",
" 2.0 | \n",
" 2.0 | \n",
" NaN | \n",
" 14.0 | \n",
" 14.0 | \n",
" 14.0 | \n",
" NaN | \n",
" 3.0 | \n",
" 3.0 | \n",
" 3.0 | \n",
" NaN | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 996 | \n",
" 3.0 | \n",
" 3.0 | \n",
" 3.0 | \n",
" NaN | \n",
" 17.0 | \n",
" 17.0 | \n",
" 17.0 | \n",
" NaN | \n",
" 2.0 | \n",
" 2.0 | \n",
" 2.0 | \n",
" NaN | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 997 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" NaN | \n",
" 13.0 | \n",
" 13.0 | \n",
" 13.0 | \n",
" NaN | \n",
" 1.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" NaN | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 998 | \n",
" 2.0 | \n",
" 2.0 | \n",
" 2.0 | \n",
" NaN | \n",
" 13.0 | \n",
" 13.0 | \n",
" 13.0 | \n",
" NaN | \n",
" 2.0 | \n",
" 2.0 | \n",
" 2.0 | \n",
" NaN | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 999 | \n",
" 4.0 | \n",
" 4.0 | \n",
" 4.0 | \n",
" NaN | \n",
" 21.0 | \n",
" 21.0 | \n",
" 21.0 | \n",
" NaN | \n",
" 5.0 | \n",
" 5.0 | \n",
" 5.0 | \n",
" NaN | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 1000 | \n",
" 4.0 | \n",
" 4.0 | \n",
" 4.0 | \n",
" NaN | \n",
" 14.0 | \n",
" 14.0 | \n",
" 14.0 | \n",
" NaN | \n",
" 2.0 | \n",
" 2.0 | \n",
" 2.0 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
1000 rows × 13 columns
\n",
"
"
],
"text/plain": [
" user PHQ2_mean PHQ2_min PHQ2_max PHQ2_std \\\n",
"2020-01-01 00:00:00+02:00 1 3.0 3.0 3.0 NaN \n",
"2020-01-01 00:00:00+02:00 2 4.0 4.0 4.0 NaN \n",
"2020-01-01 00:00:00+02:00 3 2.0 2.0 2.0 NaN \n",
"2020-01-01 00:00:00+02:00 4 3.0 3.0 3.0 NaN \n",
"2020-01-01 00:00:00+02:00 5 2.0 2.0 2.0 NaN \n",
"... ... ... ... ... ... \n",
"2020-01-01 00:00:00+02:00 996 3.0 3.0 3.0 NaN \n",
"2020-01-01 00:00:00+02:00 997 0.0 0.0 0.0 NaN \n",
"2020-01-01 00:00:00+02:00 998 2.0 2.0 2.0 NaN \n",
"2020-01-01 00:00:00+02:00 999 4.0 4.0 4.0 NaN \n",
"2020-01-01 00:00:00+02:00 1000 4.0 4.0 4.0 NaN \n",
"\n",
" PSS10_mean PSS10_min PSS10_max PSS10_std \\\n",
"2020-01-01 00:00:00+02:00 15.0 15.0 15.0 NaN \n",
"2020-01-01 00:00:00+02:00 9.0 9.0 9.0 NaN \n",
"2020-01-01 00:00:00+02:00 12.0 12.0 12.0 NaN \n",
"2020-01-01 00:00:00+02:00 16.0 16.0 16.0 NaN \n",
"2020-01-01 00:00:00+02:00 14.0 14.0 14.0 NaN \n",
"... ... ... ... ... \n",
"2020-01-01 00:00:00+02:00 17.0 17.0 17.0 NaN \n",
"2020-01-01 00:00:00+02:00 13.0 13.0 13.0 NaN \n",
"2020-01-01 00:00:00+02:00 13.0 13.0 13.0 NaN \n",
"2020-01-01 00:00:00+02:00 21.0 21.0 21.0 NaN \n",
"2020-01-01 00:00:00+02:00 14.0 14.0 14.0 NaN \n",
"\n",
" GAD2_mean GAD2_min GAD2_max GAD2_std \n",
"2020-01-01 00:00:00+02:00 3.0 3.0 3.0 NaN \n",
"2020-01-01 00:00:00+02:00 1.0 1.0 1.0 NaN \n",
"2020-01-01 00:00:00+02:00 1.0 1.0 1.0 NaN \n",
"2020-01-01 00:00:00+02:00 1.0 1.0 1.0 NaN \n",
"2020-01-01 00:00:00+02:00 3.0 3.0 3.0 NaN \n",
"... ... ... ... ... \n",
"2020-01-01 00:00:00+02:00 2.0 2.0 2.0 NaN \n",
"2020-01-01 00:00:00+02:00 1.0 1.0 1.0 NaN \n",
"2020-01-01 00:00:00+02:00 2.0 2.0 2.0 NaN \n",
"2020-01-01 00:00:00+02:00 5.0 5.0 5.0 NaN \n",
"2020-01-01 00:00:00+02:00 2.0 2.0 2.0 NaN \n",
"\n",
"[1000 rows x 13 columns]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"survey.survey_statistic(sum_df, columns=[\"PHQ2\", \"PSS10\", \"GAD2\"])"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"survey_statistic also works for indidual questions. You can specify the questionnaire that you want statistics of by passing a value into the `prefix` parameter or pass a list of questions as the `columns` parameter. "
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" user | \n",
" PHQ2_1_mean | \n",
" PHQ2_1_min | \n",
" PHQ2_1_max | \n",
" PHQ2_1_std | \n",
" PHQ2_2_mean | \n",
" PHQ2_2_min | \n",
" PHQ2_2_max | \n",
" PHQ2_2_std | \n",
"
\n",
" \n",
" \n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 1 | \n",
" 1.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" NaN | \n",
" 2.0 | \n",
" 2.0 | \n",
" 2.0 | \n",
" NaN | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 2 | \n",
" 2.0 | \n",
" 2.0 | \n",
" 2.0 | \n",
" NaN | \n",
" 2.0 | \n",
" 2.0 | \n",
" 2.0 | \n",
" NaN | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 3 | \n",
" 2.0 | \n",
" 2.0 | \n",
" 2.0 | \n",
" NaN | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" NaN | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 4 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" NaN | \n",
" 3.0 | \n",
" 3.0 | \n",
" 3.0 | \n",
" NaN | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 5 | \n",
" 2.0 | \n",
" 2.0 | \n",
" 2.0 | \n",
" NaN | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" NaN | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 996 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" NaN | \n",
" 3.0 | \n",
" 3.0 | \n",
" 3.0 | \n",
" NaN | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 997 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" NaN | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" NaN | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 998 | \n",
" 1.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" NaN | \n",
" 1.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" NaN | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 999 | \n",
" 2.0 | \n",
" 2.0 | \n",
" 2.0 | \n",
" NaN | \n",
" 2.0 | \n",
" 2.0 | \n",
" 2.0 | \n",
" NaN | \n",
"
\n",
" \n",
" 2020-01-01 00:00:00+02:00 | \n",
" 1000 | \n",
" 2.0 | \n",
" 2.0 | \n",
" 2.0 | \n",
" NaN | \n",
" 2.0 | \n",
" 2.0 | \n",
" 2.0 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
1000 rows × 9 columns
\n",
"
"
],
"text/plain": [
" user PHQ2_1_mean PHQ2_1_min PHQ2_1_max \\\n",
"2020-01-01 00:00:00+02:00 1 1.0 1.0 1.0 \n",
"2020-01-01 00:00:00+02:00 2 2.0 2.0 2.0 \n",
"2020-01-01 00:00:00+02:00 3 2.0 2.0 2.0 \n",
"2020-01-01 00:00:00+02:00 4 0.0 0.0 0.0 \n",
"2020-01-01 00:00:00+02:00 5 2.0 2.0 2.0 \n",
"... ... ... ... ... \n",
"2020-01-01 00:00:00+02:00 996 0.0 0.0 0.0 \n",
"2020-01-01 00:00:00+02:00 997 0.0 0.0 0.0 \n",
"2020-01-01 00:00:00+02:00 998 1.0 1.0 1.0 \n",
"2020-01-01 00:00:00+02:00 999 2.0 2.0 2.0 \n",
"2020-01-01 00:00:00+02:00 1000 2.0 2.0 2.0 \n",
"\n",
" PHQ2_1_std PHQ2_2_mean PHQ2_2_min PHQ2_2_max \\\n",
"2020-01-01 00:00:00+02:00 NaN 2.0 2.0 2.0 \n",
"2020-01-01 00:00:00+02:00 NaN 2.0 2.0 2.0 \n",
"2020-01-01 00:00:00+02:00 NaN 0.0 0.0 0.0 \n",
"2020-01-01 00:00:00+02:00 NaN 3.0 3.0 3.0 \n",
"2020-01-01 00:00:00+02:00 NaN 0.0 0.0 0.0 \n",
"... ... ... ... ... \n",
"2020-01-01 00:00:00+02:00 NaN 3.0 3.0 3.0 \n",
"2020-01-01 00:00:00+02:00 NaN 0.0 0.0 0.0 \n",
"2020-01-01 00:00:00+02:00 NaN 1.0 1.0 1.0 \n",
"2020-01-01 00:00:00+02:00 NaN 2.0 2.0 2.0 \n",
"2020-01-01 00:00:00+02:00 NaN 2.0 2.0 2.0 \n",
"\n",
" PHQ2_2_std \n",
"2020-01-01 00:00:00+02:00 NaN \n",
"2020-01-01 00:00:00+02:00 NaN \n",
"2020-01-01 00:00:00+02:00 NaN \n",
"2020-01-01 00:00:00+02:00 NaN \n",
"2020-01-01 00:00:00+02:00 NaN \n",
"... ... \n",
"2020-01-01 00:00:00+02:00 NaN \n",
"2020-01-01 00:00:00+02:00 NaN \n",
"2020-01-01 00:00:00+02:00 NaN \n",
"2020-01-01 00:00:00+02:00 NaN \n",
"2020-01-01 00:00:00+02:00 NaN \n",
"\n",
"[1000 rows x 9 columns]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d = survey.survey_statistic(transformed_df, prefix ='PHQ')\n",
"pd.DataFrame(d)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "niimpy",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}