{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Survey Data\n",
    "\n",
    "Survey single survey row can contain answers to multiple questions. The survey dataframe should contain a `user` column containing the user ID. In addition, multiple columns with anwers to survey questions should be provided (see example below for clarification). Each column title represents the question and the value on a given row represents the answer. As usual, the DataFrame index is the timestamp of the answer.\n",
    "\n",
    "Question titles should be converted into a string with a questionaire prefix and a question number. For example, the first question in \"PHQ2\" would be \"PHQ2_1\". We provide utilities for converting some common questionaires to this format, as shown below. Similarly, answers should be converted into numerical values."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Artificial example survey data\n",
    "import niimpy\n",
    "from niimpy import config\n",
    "import niimpy.preprocessing.survey as survey\n",
    "from niimpy.preprocessing.survey import *\n",
    "import warnings\n",
    "warnings.filterwarnings(\"ignore\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>age</th>\n",
       "      <th>gender</th>\n",
       "      <th>Little interest or pleasure in doing things.</th>\n",
       "      <th>Feeling down; depressed or hopeless.</th>\n",
       "      <th>Feeling nervous; anxious or on edge.</th>\n",
       "      <th>Not being able to stop or control worrying.</th>\n",
       "      <th>In the last month; how often have you felt that you were unable to control the important things in your life?</th>\n",
       "      <th>In the last month; how often have you felt confident about your ability to handle your personal problems?</th>\n",
       "      <th>In the last month; how often have you felt that things were going your way?</th>\n",
       "      <th>In the last month; how often have you been able to control irritations in your life?</th>\n",
       "      <th>In the last month; how often have you felt that you were on top of things?</th>\n",
       "      <th>In the last month; how often have you been angered because of things that were outside of your control?</th>\n",
       "      <th>In the last month; how often have you felt difficulties were piling up so high that you could not overcome them?</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>20</td>\n",
       "      <td>Male</td>\n",
       "      <td>several-days</td>\n",
       "      <td>more-than-half-the-days</td>\n",
       "      <td>not-at-all</td>\n",
       "      <td>nearly-every-day</td>\n",
       "      <td>almost-never</td>\n",
       "      <td>sometimes</td>\n",
       "      <td>fairly-often</td>\n",
       "      <td>never</td>\n",
       "      <td>sometimes</td>\n",
       "      <td>very-often</td>\n",
       "      <td>fairly-often</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>32</td>\n",
       "      <td>Male</td>\n",
       "      <td>more-than-half-the-days</td>\n",
       "      <td>more-than-half-the-days</td>\n",
       "      <td>not-at-all</td>\n",
       "      <td>several-days</td>\n",
       "      <td>never</td>\n",
       "      <td>never</td>\n",
       "      <td>very-often</td>\n",
       "      <td>sometimes</td>\n",
       "      <td>never</td>\n",
       "      <td>fairly-often</td>\n",
       "      <td>never</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>15</td>\n",
       "      <td>Male</td>\n",
       "      <td>more-than-half-the-days</td>\n",
       "      <td>not-at-all</td>\n",
       "      <td>several-days</td>\n",
       "      <td>not-at-all</td>\n",
       "      <td>never</td>\n",
       "      <td>very-often</td>\n",
       "      <td>very-often</td>\n",
       "      <td>fairly-often</td>\n",
       "      <td>never</td>\n",
       "      <td>never</td>\n",
       "      <td>almost-never</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>35</td>\n",
       "      <td>Female</td>\n",
       "      <td>not-at-all</td>\n",
       "      <td>nearly-every-day</td>\n",
       "      <td>not-at-all</td>\n",
       "      <td>several-days</td>\n",
       "      <td>very-often</td>\n",
       "      <td>fairly-often</td>\n",
       "      <td>very-often</td>\n",
       "      <td>never</td>\n",
       "      <td>sometimes</td>\n",
       "      <td>never</td>\n",
       "      <td>fairly-often</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>23</td>\n",
       "      <td>Male</td>\n",
       "      <td>more-than-half-the-days</td>\n",
       "      <td>not-at-all</td>\n",
       "      <td>more-than-half-the-days</td>\n",
       "      <td>several-days</td>\n",
       "      <td>almost-never</td>\n",
       "      <td>very-often</td>\n",
       "      <td>almost-never</td>\n",
       "      <td>sometimes</td>\n",
       "      <td>sometimes</td>\n",
       "      <td>very-often</td>\n",
       "      <td>never</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   user  age  gender Little interest or pleasure in doing things.  \\\n",
       "0     1   20    Male                                 several-days   \n",
       "1     2   32    Male                      more-than-half-the-days   \n",
       "2     3   15    Male                      more-than-half-the-days   \n",
       "3     4   35  Female                                   not-at-all   \n",
       "4     5   23    Male                      more-than-half-the-days   \n",
       "\n",
       "  Feeling down; depressed or hopeless. Feeling nervous; anxious or on edge.  \\\n",
       "0              more-than-half-the-days                           not-at-all   \n",
       "1              more-than-half-the-days                           not-at-all   \n",
       "2                           not-at-all                         several-days   \n",
       "3                     nearly-every-day                           not-at-all   \n",
       "4                           not-at-all              more-than-half-the-days   \n",
       "\n",
       "  Not being able to stop or control worrying.  \\\n",
       "0                            nearly-every-day   \n",
       "1                                several-days   \n",
       "2                                  not-at-all   \n",
       "3                                several-days   \n",
       "4                                several-days   \n",
       "\n",
       "  In the last month; how often have you felt that you were unable to control the important things in your life?  \\\n",
       "0                                       almost-never                                                              \n",
       "1                                              never                                                              \n",
       "2                                              never                                                              \n",
       "3                                         very-often                                                              \n",
       "4                                       almost-never                                                              \n",
       "\n",
       "  In the last month; how often have you felt confident about your ability to handle your personal problems?  \\\n",
       "0                                          sometimes                                                          \n",
       "1                                              never                                                          \n",
       "2                                         very-often                                                          \n",
       "3                                       fairly-often                                                          \n",
       "4                                         very-often                                                          \n",
       "\n",
       "  In the last month; how often have you felt that things were going your way?  \\\n",
       "0                                       fairly-often                            \n",
       "1                                         very-often                            \n",
       "2                                         very-often                            \n",
       "3                                         very-often                            \n",
       "4                                       almost-never                            \n",
       "\n",
       "  In the last month; how often have you been able to control irritations in your life?  \\\n",
       "0                                              never                                     \n",
       "1                                          sometimes                                     \n",
       "2                                       fairly-often                                     \n",
       "3                                              never                                     \n",
       "4                                          sometimes                                     \n",
       "\n",
       "  In the last month; how often have you felt that you were on top of things?  \\\n",
       "0                                          sometimes                           \n",
       "1                                              never                           \n",
       "2                                              never                           \n",
       "3                                          sometimes                           \n",
       "4                                          sometimes                           \n",
       "\n",
       "  In the last month; how often have you been angered because of things that were outside of your control?  \\\n",
       "0                                         very-often                                                        \n",
       "1                                       fairly-often                                                        \n",
       "2                                              never                                                        \n",
       "3                                              never                                                        \n",
       "4                                         very-often                                                        \n",
       "\n",
       "  In the last month; how often have you felt difficulties were piling up so high that you could not overcome them?  \n",
       "0                                       fairly-often                                                                \n",
       "1                                              never                                                                \n",
       "2                                       almost-never                                                                \n",
       "3                                       fairly-often                                                                \n",
       "4                                              never                                                                "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = niimpy.read_csv(config.SURVEY_PATH, tz='Europe/Helsinki')\n",
    "df.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preprocessing\n",
    "\n",
    "Currently the dataframe columns are raw questions and answers from the survey. We will use `Niimpy` to convert them to a numerical format, but first the dataframe should follow the general `Niimpy` Schema. The rows should be indexed by a datetime index, rather than a number.\n",
    "\n",
    "Since the data does not contain a timestamp, we must assume that each user has only completed the survey once. If the surveys were completed on January 1st 2020, for example, we would replace the index with this date."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Assign the same time index to all survey responses\n",
    "df.index = [pd.Timestamp(\"1.1.2020\", tz='Europe/Helsinki')]*df.shape[0]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next we will convert the questions to a standard identifier format `Niimpy` will understand. The questions are from PHQ2, GAD2 and PSS10 standard surveys. `Niimpy` provides mappings from raw question text to question ids for these surveys. The identifiers is constructed from a prefix (the questionnaire category: GAD, PHQ, PSQI etc.), followed by the question number (1,2,3). You can define your own identifiers or use the ones provided by `Niimpy`.\n",
    "\n",
    "Before applying the mapping, the column names should be cleaned using the `clean_survey_column_names` function. This removes punctuation in the question text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'Little interest or pleasure in doing things': 'PHQ2_1',\n",
       " 'Feeling down depressed or hopeless': 'PHQ2_2'}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# For example, the mapping dictionary for PHQ2 is\n",
    "PHQ2_MAP"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>age</th>\n",
       "      <th>gender</th>\n",
       "      <th>PHQ2_1</th>\n",
       "      <th>PHQ2_2</th>\n",
       "      <th>GAD2_1</th>\n",
       "      <th>GAD2_2</th>\n",
       "      <th>PSS10_2</th>\n",
       "      <th>PSS10_4</th>\n",
       "      <th>PSS10_5</th>\n",
       "      <th>PSS10_6</th>\n",
       "      <th>PSS10_7</th>\n",
       "      <th>PSS10_8</th>\n",
       "      <th>PSS10_9</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>1</td>\n",
       "      <td>20</td>\n",
       "      <td>Male</td>\n",
       "      <td>several-days</td>\n",
       "      <td>more-than-half-the-days</td>\n",
       "      <td>not-at-all</td>\n",
       "      <td>nearly-every-day</td>\n",
       "      <td>almost-never</td>\n",
       "      <td>sometimes</td>\n",
       "      <td>fairly-often</td>\n",
       "      <td>never</td>\n",
       "      <td>sometimes</td>\n",
       "      <td>very-often</td>\n",
       "      <td>fairly-often</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>2</td>\n",
       "      <td>32</td>\n",
       "      <td>Male</td>\n",
       "      <td>more-than-half-the-days</td>\n",
       "      <td>more-than-half-the-days</td>\n",
       "      <td>not-at-all</td>\n",
       "      <td>several-days</td>\n",
       "      <td>never</td>\n",
       "      <td>never</td>\n",
       "      <td>very-often</td>\n",
       "      <td>sometimes</td>\n",
       "      <td>never</td>\n",
       "      <td>fairly-often</td>\n",
       "      <td>never</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>3</td>\n",
       "      <td>15</td>\n",
       "      <td>Male</td>\n",
       "      <td>more-than-half-the-days</td>\n",
       "      <td>not-at-all</td>\n",
       "      <td>several-days</td>\n",
       "      <td>not-at-all</td>\n",
       "      <td>never</td>\n",
       "      <td>very-often</td>\n",
       "      <td>very-often</td>\n",
       "      <td>fairly-often</td>\n",
       "      <td>never</td>\n",
       "      <td>never</td>\n",
       "      <td>almost-never</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>4</td>\n",
       "      <td>35</td>\n",
       "      <td>Female</td>\n",
       "      <td>not-at-all</td>\n",
       "      <td>nearly-every-day</td>\n",
       "      <td>not-at-all</td>\n",
       "      <td>several-days</td>\n",
       "      <td>very-often</td>\n",
       "      <td>fairly-often</td>\n",
       "      <td>very-often</td>\n",
       "      <td>never</td>\n",
       "      <td>sometimes</td>\n",
       "      <td>never</td>\n",
       "      <td>fairly-often</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>5</td>\n",
       "      <td>23</td>\n",
       "      <td>Male</td>\n",
       "      <td>more-than-half-the-days</td>\n",
       "      <td>not-at-all</td>\n",
       "      <td>more-than-half-the-days</td>\n",
       "      <td>several-days</td>\n",
       "      <td>almost-never</td>\n",
       "      <td>very-often</td>\n",
       "      <td>almost-never</td>\n",
       "      <td>sometimes</td>\n",
       "      <td>sometimes</td>\n",
       "      <td>very-often</td>\n",
       "      <td>never</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           user  age  gender                   PHQ2_1  \\\n",
       "2020-01-01 00:00:00+02:00     1   20    Male             several-days   \n",
       "2020-01-01 00:00:00+02:00     2   32    Male  more-than-half-the-days   \n",
       "2020-01-01 00:00:00+02:00     3   15    Male  more-than-half-the-days   \n",
       "2020-01-01 00:00:00+02:00     4   35  Female               not-at-all   \n",
       "2020-01-01 00:00:00+02:00     5   23    Male  more-than-half-the-days   \n",
       "\n",
       "                                            PHQ2_2                   GAD2_1  \\\n",
       "2020-01-01 00:00:00+02:00  more-than-half-the-days               not-at-all   \n",
       "2020-01-01 00:00:00+02:00  more-than-half-the-days               not-at-all   \n",
       "2020-01-01 00:00:00+02:00               not-at-all             several-days   \n",
       "2020-01-01 00:00:00+02:00         nearly-every-day               not-at-all   \n",
       "2020-01-01 00:00:00+02:00               not-at-all  more-than-half-the-days   \n",
       "\n",
       "                                     GAD2_2       PSS10_2       PSS10_4  \\\n",
       "2020-01-01 00:00:00+02:00  nearly-every-day  almost-never     sometimes   \n",
       "2020-01-01 00:00:00+02:00      several-days         never         never   \n",
       "2020-01-01 00:00:00+02:00        not-at-all         never    very-often   \n",
       "2020-01-01 00:00:00+02:00      several-days    very-often  fairly-often   \n",
       "2020-01-01 00:00:00+02:00      several-days  almost-never    very-often   \n",
       "\n",
       "                                PSS10_5       PSS10_6    PSS10_7  \\\n",
       "2020-01-01 00:00:00+02:00  fairly-often         never  sometimes   \n",
       "2020-01-01 00:00:00+02:00    very-often     sometimes      never   \n",
       "2020-01-01 00:00:00+02:00    very-often  fairly-often      never   \n",
       "2020-01-01 00:00:00+02:00    very-often         never  sometimes   \n",
       "2020-01-01 00:00:00+02:00  almost-never     sometimes  sometimes   \n",
       "\n",
       "                                PSS10_8       PSS10_9  \n",
       "2020-01-01 00:00:00+02:00    very-often  fairly-often  \n",
       "2020-01-01 00:00:00+02:00  fairly-often         never  \n",
       "2020-01-01 00:00:00+02:00         never  almost-never  \n",
       "2020-01-01 00:00:00+02:00         never  fairly-often  \n",
       "2020-01-01 00:00:00+02:00    very-often         never  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Convert column name to id, based on provided mappers from niimpy\n",
    "column_map = {**PHQ2_MAP, **PSS10_MAP, **GAD2_MAP}\n",
    "df = survey.clean_survey_column_names(df)\n",
    "df = df.rename(column_map, axis = 1)\n",
    "df.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now the dataframe follows the `Niimpy` standard schema. Next we will use `niimpy` to convert the raw answers to numerical values for further analysis. For this, we need a mapping `{raw_answer: numerical_answer}`, which `niimpy` provides within the `survey` module. You can also use your own mapping.\n",
    "\n",
    "Based on the question's id,  `niimpy` maps the raw answers to their numerical presentation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'PSS': {'never': 0,\n",
       "  'almost never': 1,\n",
       "  'sometimes': 2,\n",
       "  'fairly often': 3,\n",
       "  'very often': 4},\n",
       " 'PHQ2': {'not at all': 0,\n",
       "  'several days': 1,\n",
       "  'more than half the days': 2,\n",
       "  'nearly every day': 3},\n",
       " 'GAD2': {'not at all': 0,\n",
       "  'several days': 1,\n",
       "  'more than half the days': 2,\n",
       "  'nearly every day': 3}}"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The mapping dictionary included in Niimpy is\n",
    "ID_MAP_PREFIX"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>age</th>\n",
       "      <th>gender</th>\n",
       "      <th>PHQ2_1</th>\n",
       "      <th>PHQ2_2</th>\n",
       "      <th>GAD2_1</th>\n",
       "      <th>GAD2_2</th>\n",
       "      <th>PSS10_2</th>\n",
       "      <th>PSS10_4</th>\n",
       "      <th>PSS10_5</th>\n",
       "      <th>PSS10_6</th>\n",
       "      <th>PSS10_7</th>\n",
       "      <th>PSS10_8</th>\n",
       "      <th>PSS10_9</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>1</td>\n",
       "      <td>20</td>\n",
       "      <td>Male</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>2</td>\n",
       "      <td>32</td>\n",
       "      <td>Male</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>3</td>\n",
       "      <td>15</td>\n",
       "      <td>Male</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>4</td>\n",
       "      <td>35</td>\n",
       "      <td>Female</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>5</td>\n",
       "      <td>23</td>\n",
       "      <td>Male</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           user  age  gender  PHQ2_1  PHQ2_2  GAD2_1  GAD2_2  \\\n",
       "2020-01-01 00:00:00+02:00     1   20    Male       1       2       0       3   \n",
       "2020-01-01 00:00:00+02:00     2   32    Male       2       2       0       1   \n",
       "2020-01-01 00:00:00+02:00     3   15    Male       2       0       1       0   \n",
       "2020-01-01 00:00:00+02:00     4   35  Female       0       3       0       1   \n",
       "2020-01-01 00:00:00+02:00     5   23    Male       2       0       2       1   \n",
       "\n",
       "                           PSS10_2  PSS10_4  PSS10_5  PSS10_6  PSS10_7  \\\n",
       "2020-01-01 00:00:00+02:00        1        2        3        0        2   \n",
       "2020-01-01 00:00:00+02:00        0        0        4        2        0   \n",
       "2020-01-01 00:00:00+02:00        0        4        4        3        0   \n",
       "2020-01-01 00:00:00+02:00        4        3        4        0        2   \n",
       "2020-01-01 00:00:00+02:00        1        4        1        2        2   \n",
       "\n",
       "                           PSS10_8  PSS10_9  \n",
       "2020-01-01 00:00:00+02:00        4        3  \n",
       "2020-01-01 00:00:00+02:00        3        0  \n",
       "2020-01-01 00:00:00+02:00        0        1  \n",
       "2020-01-01 00:00:00+02:00        0        3  \n",
       "2020-01-01 00:00:00+02:00        4        0  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Transform raw answers to numerical values\n",
    "transformed_df = survey.convert_survey_to_numerical_answer(\n",
    "    df, id_map=ID_MAP_PREFIX, use_prefix=True\n",
    ")\n",
    "transformed_df.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Survey score sums\n",
    "\n",
    "Next we can calucate the sum of each survey using the survey ID in the column name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>PHQ2</th>\n",
       "      <th>PSS10</th>\n",
       "      <th>GAD2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>15</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>9</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>12</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>16</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "      <td>14</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           user  PHQ2  PSS10  GAD2\n",
       "2020-01-01 00:00:00+02:00     1     3     15     3\n",
       "2020-01-01 00:00:00+02:00     2     4      9     1\n",
       "2020-01-01 00:00:00+02:00     3     2     12     1\n",
       "2020-01-01 00:00:00+02:00     4     3     16     1\n",
       "2020-01-01 00:00:00+02:00     5     2     14     3"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sum_df = sum_survey_scores(transformed_df, [\"PHQ2\", \"PSS10\", \"GAD2\"])\n",
    "sum_df.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Survey statistics\n",
    "\n",
    "Another common preprocessing step is to resample results to reduce noise or simplify the data. The `survey.survey_statistic` function split the results by time intervals and return relevant statistics of each survey sum or each question column over that interval.\n",
    "\n",
    "Note that since the example data contains a single time for each participant, the standard deviation is `NaN` and the other statistics are predictable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>PHQ2_mean</th>\n",
       "      <th>PHQ2_min</th>\n",
       "      <th>PHQ2_max</th>\n",
       "      <th>PHQ2_std</th>\n",
       "      <th>PSS10_mean</th>\n",
       "      <th>PSS10_min</th>\n",
       "      <th>PSS10_max</th>\n",
       "      <th>PSS10_std</th>\n",
       "      <th>GAD2_mean</th>\n",
       "      <th>GAD2_min</th>\n",
       "      <th>GAD2_max</th>\n",
       "      <th>GAD2_std</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>1</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>15.0</td>\n",
       "      <td>15.0</td>\n",
       "      <td>15.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>2</td>\n",
       "      <td>4.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>9.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>3</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>12.0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>4</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>16.0</td>\n",
       "      <td>16.0</td>\n",
       "      <td>16.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>5</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>14.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>996</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>17.0</td>\n",
       "      <td>17.0</td>\n",
       "      <td>17.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>997</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>13.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>998</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>13.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>999</td>\n",
       "      <td>4.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>21.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>5.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>1000</td>\n",
       "      <td>4.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>14.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1000 rows × 13 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                           user  PHQ2_mean  PHQ2_min  PHQ2_max  PHQ2_std  \\\n",
       "2020-01-01 00:00:00+02:00     1        3.0       3.0       3.0       NaN   \n",
       "2020-01-01 00:00:00+02:00     2        4.0       4.0       4.0       NaN   \n",
       "2020-01-01 00:00:00+02:00     3        2.0       2.0       2.0       NaN   \n",
       "2020-01-01 00:00:00+02:00     4        3.0       3.0       3.0       NaN   \n",
       "2020-01-01 00:00:00+02:00     5        2.0       2.0       2.0       NaN   \n",
       "...                         ...        ...       ...       ...       ...   \n",
       "2020-01-01 00:00:00+02:00   996        3.0       3.0       3.0       NaN   \n",
       "2020-01-01 00:00:00+02:00   997        0.0       0.0       0.0       NaN   \n",
       "2020-01-01 00:00:00+02:00   998        2.0       2.0       2.0       NaN   \n",
       "2020-01-01 00:00:00+02:00   999        4.0       4.0       4.0       NaN   \n",
       "2020-01-01 00:00:00+02:00  1000        4.0       4.0       4.0       NaN   \n",
       "\n",
       "                           PSS10_mean  PSS10_min  PSS10_max  PSS10_std  \\\n",
       "2020-01-01 00:00:00+02:00        15.0       15.0       15.0        NaN   \n",
       "2020-01-01 00:00:00+02:00         9.0        9.0        9.0        NaN   \n",
       "2020-01-01 00:00:00+02:00        12.0       12.0       12.0        NaN   \n",
       "2020-01-01 00:00:00+02:00        16.0       16.0       16.0        NaN   \n",
       "2020-01-01 00:00:00+02:00        14.0       14.0       14.0        NaN   \n",
       "...                               ...        ...        ...        ...   \n",
       "2020-01-01 00:00:00+02:00        17.0       17.0       17.0        NaN   \n",
       "2020-01-01 00:00:00+02:00        13.0       13.0       13.0        NaN   \n",
       "2020-01-01 00:00:00+02:00        13.0       13.0       13.0        NaN   \n",
       "2020-01-01 00:00:00+02:00        21.0       21.0       21.0        NaN   \n",
       "2020-01-01 00:00:00+02:00        14.0       14.0       14.0        NaN   \n",
       "\n",
       "                           GAD2_mean  GAD2_min  GAD2_max  GAD2_std  \n",
       "2020-01-01 00:00:00+02:00        3.0       3.0       3.0       NaN  \n",
       "2020-01-01 00:00:00+02:00        1.0       1.0       1.0       NaN  \n",
       "2020-01-01 00:00:00+02:00        1.0       1.0       1.0       NaN  \n",
       "2020-01-01 00:00:00+02:00        1.0       1.0       1.0       NaN  \n",
       "2020-01-01 00:00:00+02:00        3.0       3.0       3.0       NaN  \n",
       "...                              ...       ...       ...       ...  \n",
       "2020-01-01 00:00:00+02:00        2.0       2.0       2.0       NaN  \n",
       "2020-01-01 00:00:00+02:00        1.0       1.0       1.0       NaN  \n",
       "2020-01-01 00:00:00+02:00        2.0       2.0       2.0       NaN  \n",
       "2020-01-01 00:00:00+02:00        5.0       5.0       5.0       NaN  \n",
       "2020-01-01 00:00:00+02:00        2.0       2.0       2.0       NaN  \n",
       "\n",
       "[1000 rows x 13 columns]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "survey.survey_statistic(sum_df, columns=[\"PHQ2\", \"PSS10\", \"GAD2\"])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "survey_statistic also works for indidual questions. You can specify the questionnaire that you want statistics of by passing a value into the `prefix` parameter or pass a list of questions as the `columns` parameter. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>PHQ2_1_mean</th>\n",
       "      <th>PHQ2_1_min</th>\n",
       "      <th>PHQ2_1_max</th>\n",
       "      <th>PHQ2_1_std</th>\n",
       "      <th>PHQ2_2_mean</th>\n",
       "      <th>PHQ2_2_min</th>\n",
       "      <th>PHQ2_2_max</th>\n",
       "      <th>PHQ2_2_std</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>2</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>3</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>4</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>5</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>996</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>997</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>998</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>999</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-01 00:00:00+02:00</th>\n",
       "      <td>1000</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1000 rows × 9 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                           user  PHQ2_1_mean  PHQ2_1_min  PHQ2_1_max  \\\n",
       "2020-01-01 00:00:00+02:00     1          1.0         1.0         1.0   \n",
       "2020-01-01 00:00:00+02:00     2          2.0         2.0         2.0   \n",
       "2020-01-01 00:00:00+02:00     3          2.0         2.0         2.0   \n",
       "2020-01-01 00:00:00+02:00     4          0.0         0.0         0.0   \n",
       "2020-01-01 00:00:00+02:00     5          2.0         2.0         2.0   \n",
       "...                         ...          ...         ...         ...   \n",
       "2020-01-01 00:00:00+02:00   996          0.0         0.0         0.0   \n",
       "2020-01-01 00:00:00+02:00   997          0.0         0.0         0.0   \n",
       "2020-01-01 00:00:00+02:00   998          1.0         1.0         1.0   \n",
       "2020-01-01 00:00:00+02:00   999          2.0         2.0         2.0   \n",
       "2020-01-01 00:00:00+02:00  1000          2.0         2.0         2.0   \n",
       "\n",
       "                           PHQ2_1_std  PHQ2_2_mean  PHQ2_2_min  PHQ2_2_max  \\\n",
       "2020-01-01 00:00:00+02:00         NaN          2.0         2.0         2.0   \n",
       "2020-01-01 00:00:00+02:00         NaN          2.0         2.0         2.0   \n",
       "2020-01-01 00:00:00+02:00         NaN          0.0         0.0         0.0   \n",
       "2020-01-01 00:00:00+02:00         NaN          3.0         3.0         3.0   \n",
       "2020-01-01 00:00:00+02:00         NaN          0.0         0.0         0.0   \n",
       "...                               ...          ...         ...         ...   \n",
       "2020-01-01 00:00:00+02:00         NaN          3.0         3.0         3.0   \n",
       "2020-01-01 00:00:00+02:00         NaN          0.0         0.0         0.0   \n",
       "2020-01-01 00:00:00+02:00         NaN          1.0         1.0         1.0   \n",
       "2020-01-01 00:00:00+02:00         NaN          2.0         2.0         2.0   \n",
       "2020-01-01 00:00:00+02:00         NaN          2.0         2.0         2.0   \n",
       "\n",
       "                           PHQ2_2_std  \n",
       "2020-01-01 00:00:00+02:00         NaN  \n",
       "2020-01-01 00:00:00+02:00         NaN  \n",
       "2020-01-01 00:00:00+02:00         NaN  \n",
       "2020-01-01 00:00:00+02:00         NaN  \n",
       "2020-01-01 00:00:00+02:00         NaN  \n",
       "...                               ...  \n",
       "2020-01-01 00:00:00+02:00         NaN  \n",
       "2020-01-01 00:00:00+02:00         NaN  \n",
       "2020-01-01 00:00:00+02:00         NaN  \n",
       "2020-01-01 00:00:00+02:00         NaN  \n",
       "2020-01-01 00:00:00+02:00         NaN  \n",
       "\n",
       "[1000 rows x 9 columns]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "d = survey.survey_statistic(transformed_df, prefix ='PHQ')\n",
    "pd.DataFrame(d)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "niimpy",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}