{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Basic transformations\n",
    "\n",
    "This page shows some basic transformations you can do once you have read data.  Really, it is simply a pandas crash course, since pandas provides all the operations you may need and there is no need for us to re-invent things.  Pandas provides a solid but flexible base for us to build advanced operations on top of.\n",
    "\n",
    "You can read more at the [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/index.html)."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extracting single rows and columns\n",
    "Let's first import mobile phone battery status data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "TZ = 'Europe/Helsinki'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/u/24/rantahj1/unix/miniconda3/envs/niimpy/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    }
   ],
   "source": [
    "import niimpy\n",
    "from niimpy import config\n",
    "import warnings\n",
    "warnings.filterwarnings(\"ignore\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read the data\n",
    "df = niimpy.read_csv(config.MULTIUSER_AWARE_BATTERY_PATH, tz='Europe/Helsinki')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then check first rows of the dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>device</th>\n",
       "      <th>time</th>\n",
       "      <th>battery_level</th>\n",
       "      <th>battery_status</th>\n",
       "      <th>battery_health</th>\n",
       "      <th>battery_adaptor</th>\n",
       "      <th>datetime</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:20:02.924999952+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578529e+09</td>\n",
       "      <td>74</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2020-01-09 02:20:02.924999952+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:21:30.405999899+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578529e+09</td>\n",
       "      <td>73</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2020-01-09 02:21:30.405999899+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:24:12.805999994+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578529e+09</td>\n",
       "      <td>72</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2020-01-09 02:24:12.805999994+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:35:38.561000109+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578530e+09</td>\n",
       "      <td>72</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2020-01-09 02:35:38.561000109+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:35:38.953000069+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578530e+09</td>\n",
       "      <td>72</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2020-01-09 02:35:38.953000069+02:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             user        device          time  \\\n",
       "2020-01-09 02:20:02.924999952+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578529e+09   \n",
       "2020-01-09 02:21:30.405999899+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578529e+09   \n",
       "2020-01-09 02:24:12.805999994+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578529e+09   \n",
       "2020-01-09 02:35:38.561000109+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578530e+09   \n",
       "2020-01-09 02:35:38.953000069+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578530e+09   \n",
       "\n",
       "                                     battery_level  battery_status  \\\n",
       "2020-01-09 02:20:02.924999952+02:00             74               3   \n",
       "2020-01-09 02:21:30.405999899+02:00             73               3   \n",
       "2020-01-09 02:24:12.805999994+02:00             72               3   \n",
       "2020-01-09 02:35:38.561000109+02:00             72               2   \n",
       "2020-01-09 02:35:38.953000069+02:00             72               2   \n",
       "\n",
       "                                     battery_health  battery_adaptor  \\\n",
       "2020-01-09 02:20:02.924999952+02:00               2                0   \n",
       "2020-01-09 02:21:30.405999899+02:00               2                0   \n",
       "2020-01-09 02:24:12.805999994+02:00               2                0   \n",
       "2020-01-09 02:35:38.561000109+02:00               2                0   \n",
       "2020-01-09 02:35:38.953000069+02:00               2                2   \n",
       "\n",
       "                                                               datetime  \n",
       "2020-01-09 02:20:02.924999952+02:00 2020-01-09 02:20:02.924999952+02:00  \n",
       "2020-01-09 02:21:30.405999899+02:00 2020-01-09 02:21:30.405999899+02:00  \n",
       "2020-01-09 02:24:12.805999994+02:00 2020-01-09 02:24:12.805999994+02:00  \n",
       "2020-01-09 02:35:38.561000109+02:00 2020-01-09 02:35:38.561000109+02:00  \n",
       "2020-01-09 02:35:38.953000069+02:00 2020-01-09 02:35:38.953000069+02:00  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Get a single column, in this case all **users**:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2020-01-09 02:20:02.924999952+02:00    jd9INuQ5BBlW\n",
       "2020-01-09 02:21:30.405999899+02:00    jd9INuQ5BBlW\n",
       "2020-01-09 02:24:12.805999994+02:00    jd9INuQ5BBlW\n",
       "2020-01-09 02:35:38.561000109+02:00    jd9INuQ5BBlW\n",
       "2020-01-09 02:35:38.953000069+02:00    jd9INuQ5BBlW\n",
       "                                           ...     \n",
       "2019-08-09 00:30:48.073999882+03:00    dvWdLQesv21a\n",
       "2019-08-09 00:32:40.717999935+03:00    dvWdLQesv21a\n",
       "2019-08-09 00:34:23.114000082+03:00    dvWdLQesv21a\n",
       "2019-08-09 00:36:05.505000114+03:00    dvWdLQesv21a\n",
       "2019-08-07 06:53:35.671000004+03:00    dvWdLQesv21a\n",
       "Name: user, Length: 634, dtype: object"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['user']"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Get a single row, in this case the **5th** (the first row is zero):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "user                                      jd9INuQ5BBlW\n",
       "device                                    3p83yASkOb_B\n",
       "time                                    1578530138.953\n",
       "battery_level                                       72\n",
       "battery_status                                       2\n",
       "battery_health                                       2\n",
       "battery_adaptor                                      2\n",
       "datetime           2020-01-09 02:35:38.953000069+02:00\n",
       "Name: 2020-01-09 02:35:38.953000069+02:00, dtype: object"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[4]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Listing unique users\n",
    "We can list unique users by using `pandas.unique()` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['jd9INuQ5BBlW', 'iGyXetHE3S8u', 'dvWdLQesv21a'], dtype=object)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['user'].unique()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## List unique values\n",
    "Same applies to other data features/columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 3,  2,  5, -1, -3])"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['battery_status'].unique()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extract data of only one subject\n",
    "We can extract data of only one subject by following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>device</th>\n",
       "      <th>time</th>\n",
       "      <th>battery_level</th>\n",
       "      <th>battery_status</th>\n",
       "      <th>battery_health</th>\n",
       "      <th>battery_adaptor</th>\n",
       "      <th>datetime</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:20:02.924999952+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578529e+09</td>\n",
       "      <td>74</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2020-01-09 02:20:02.924999952+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:21:30.405999899+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578529e+09</td>\n",
       "      <td>73</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2020-01-09 02:21:30.405999899+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:24:12.805999994+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578529e+09</td>\n",
       "      <td>72</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2020-01-09 02:24:12.805999994+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:35:38.561000109+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578530e+09</td>\n",
       "      <td>72</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2020-01-09 02:35:38.561000109+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:35:38.953000069+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578530e+09</td>\n",
       "      <td>72</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2020-01-09 02:35:38.953000069+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 23:02:13.938999891+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>OWd1Uau8POix</td>\n",
       "      <td>1.578604e+09</td>\n",
       "      <td>73</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2020-01-09 23:02:13.938999891+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 23:10:37.262000084+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>OWd1Uau8POix</td>\n",
       "      <td>1.578604e+09</td>\n",
       "      <td>73</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2020-01-09 23:10:37.262000084+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 23:22:13.966000080+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>OWd1Uau8POix</td>\n",
       "      <td>1.578605e+09</td>\n",
       "      <td>72</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2020-01-09 23:22:13.966000080+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 23:32:13.959000111+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>OWd1Uau8POix</td>\n",
       "      <td>1.578606e+09</td>\n",
       "      <td>71</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2020-01-09 23:32:13.959000111+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 23:39:06.799999952+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>OWd1Uau8POix</td>\n",
       "      <td>1.578606e+09</td>\n",
       "      <td>71</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2020-01-09 23:39:06.799999952+02:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>373 rows × 8 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             user        device          time  \\\n",
       "2020-01-09 02:20:02.924999952+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578529e+09   \n",
       "2020-01-09 02:21:30.405999899+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578529e+09   \n",
       "2020-01-09 02:24:12.805999994+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578529e+09   \n",
       "2020-01-09 02:35:38.561000109+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578530e+09   \n",
       "2020-01-09 02:35:38.953000069+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578530e+09   \n",
       "...                                           ...           ...           ...   \n",
       "2020-01-09 23:02:13.938999891+02:00  jd9INuQ5BBlW  OWd1Uau8POix  1.578604e+09   \n",
       "2020-01-09 23:10:37.262000084+02:00  jd9INuQ5BBlW  OWd1Uau8POix  1.578604e+09   \n",
       "2020-01-09 23:22:13.966000080+02:00  jd9INuQ5BBlW  OWd1Uau8POix  1.578605e+09   \n",
       "2020-01-09 23:32:13.959000111+02:00  jd9INuQ5BBlW  OWd1Uau8POix  1.578606e+09   \n",
       "2020-01-09 23:39:06.799999952+02:00  jd9INuQ5BBlW  OWd1Uau8POix  1.578606e+09   \n",
       "\n",
       "                                     battery_level  battery_status  \\\n",
       "2020-01-09 02:20:02.924999952+02:00             74               3   \n",
       "2020-01-09 02:21:30.405999899+02:00             73               3   \n",
       "2020-01-09 02:24:12.805999994+02:00             72               3   \n",
       "2020-01-09 02:35:38.561000109+02:00             72               2   \n",
       "2020-01-09 02:35:38.953000069+02:00             72               2   \n",
       "...                                            ...             ...   \n",
       "2020-01-09 23:02:13.938999891+02:00             73               3   \n",
       "2020-01-09 23:10:37.262000084+02:00             73               3   \n",
       "2020-01-09 23:22:13.966000080+02:00             72               3   \n",
       "2020-01-09 23:32:13.959000111+02:00             71               3   \n",
       "2020-01-09 23:39:06.799999952+02:00             71               3   \n",
       "\n",
       "                                     battery_health  battery_adaptor  \\\n",
       "2020-01-09 02:20:02.924999952+02:00               2                0   \n",
       "2020-01-09 02:21:30.405999899+02:00               2                0   \n",
       "2020-01-09 02:24:12.805999994+02:00               2                0   \n",
       "2020-01-09 02:35:38.561000109+02:00               2                0   \n",
       "2020-01-09 02:35:38.953000069+02:00               2                2   \n",
       "...                                             ...              ...   \n",
       "2020-01-09 23:02:13.938999891+02:00               2                0   \n",
       "2020-01-09 23:10:37.262000084+02:00               2                0   \n",
       "2020-01-09 23:22:13.966000080+02:00               2                0   \n",
       "2020-01-09 23:32:13.959000111+02:00               2                0   \n",
       "2020-01-09 23:39:06.799999952+02:00               2                0   \n",
       "\n",
       "                                                               datetime  \n",
       "2020-01-09 02:20:02.924999952+02:00 2020-01-09 02:20:02.924999952+02:00  \n",
       "2020-01-09 02:21:30.405999899+02:00 2020-01-09 02:21:30.405999899+02:00  \n",
       "2020-01-09 02:24:12.805999994+02:00 2020-01-09 02:24:12.805999994+02:00  \n",
       "2020-01-09 02:35:38.561000109+02:00 2020-01-09 02:35:38.561000109+02:00  \n",
       "2020-01-09 02:35:38.953000069+02:00 2020-01-09 02:35:38.953000069+02:00  \n",
       "...                                                                 ...  \n",
       "2020-01-09 23:02:13.938999891+02:00 2020-01-09 23:02:13.938999891+02:00  \n",
       "2020-01-09 23:10:37.262000084+02:00 2020-01-09 23:10:37.262000084+02:00  \n",
       "2020-01-09 23:22:13.966000080+02:00 2020-01-09 23:22:13.966000080+02:00  \n",
       "2020-01-09 23:32:13.959000111+02:00 2020-01-09 23:32:13.959000111+02:00  \n",
       "2020-01-09 23:39:06.799999952+02:00 2020-01-09 23:39:06.799999952+02:00  \n",
       "\n",
       "[373 rows x 8 columns]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['user'] == 'jd9INuQ5BBlW']"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Renaming a column or columns\n",
    "Dataframe column can be renamed using `pandas.DataFrame.rename()` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>device</th>\n",
       "      <th>timestamp</th>\n",
       "      <th>battery_level</th>\n",
       "      <th>battery_status</th>\n",
       "      <th>battery_health</th>\n",
       "      <th>battery_adaptor</th>\n",
       "      <th>datetime</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:20:02.924999952+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578529e+09</td>\n",
       "      <td>74</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2020-01-09 02:20:02.924999952+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:21:30.405999899+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578529e+09</td>\n",
       "      <td>73</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2020-01-09 02:21:30.405999899+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:24:12.805999994+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578529e+09</td>\n",
       "      <td>72</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2020-01-09 02:24:12.805999994+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:35:38.561000109+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578530e+09</td>\n",
       "      <td>72</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2020-01-09 02:35:38.561000109+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:35:38.953000069+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>3p83yASkOb_B</td>\n",
       "      <td>1.578530e+09</td>\n",
       "      <td>72</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2020-01-09 02:35:38.953000069+02:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             user        device     timestamp  \\\n",
       "2020-01-09 02:20:02.924999952+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578529e+09   \n",
       "2020-01-09 02:21:30.405999899+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578529e+09   \n",
       "2020-01-09 02:24:12.805999994+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578529e+09   \n",
       "2020-01-09 02:35:38.561000109+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578530e+09   \n",
       "2020-01-09 02:35:38.953000069+02:00  jd9INuQ5BBlW  3p83yASkOb_B  1.578530e+09   \n",
       "\n",
       "                                     battery_level  battery_status  \\\n",
       "2020-01-09 02:20:02.924999952+02:00             74               3   \n",
       "2020-01-09 02:21:30.405999899+02:00             73               3   \n",
       "2020-01-09 02:24:12.805999994+02:00             72               3   \n",
       "2020-01-09 02:35:38.561000109+02:00             72               2   \n",
       "2020-01-09 02:35:38.953000069+02:00             72               2   \n",
       "\n",
       "                                     battery_health  battery_adaptor  \\\n",
       "2020-01-09 02:20:02.924999952+02:00               2                0   \n",
       "2020-01-09 02:21:30.405999899+02:00               2                0   \n",
       "2020-01-09 02:24:12.805999994+02:00               2                0   \n",
       "2020-01-09 02:35:38.561000109+02:00               2                0   \n",
       "2020-01-09 02:35:38.953000069+02:00               2                2   \n",
       "\n",
       "                                                               datetime  \n",
       "2020-01-09 02:20:02.924999952+02:00 2020-01-09 02:20:02.924999952+02:00  \n",
       "2020-01-09 02:21:30.405999899+02:00 2020-01-09 02:21:30.405999899+02:00  \n",
       "2020-01-09 02:24:12.805999994+02:00 2020-01-09 02:24:12.805999994+02:00  \n",
       "2020-01-09 02:35:38.561000109+02:00 2020-01-09 02:35:38.561000109+02:00  \n",
       "2020-01-09 02:35:38.953000069+02:00 2020-01-09 02:35:38.953000069+02:00  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.rename(columns={'time': 'timestamp'}, inplace=True)\n",
    "df.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Change datatypes\n",
    "Let's then check the dataframe datatypes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "user                                        object\n",
       "device                                      object\n",
       "timestamp                                  float64\n",
       "battery_level                                int64\n",
       "battery_status                               int64\n",
       "battery_health                               int64\n",
       "battery_adaptor                              int64\n",
       "datetime           datetime64[ns, Europe/Helsinki]\n",
       "dtype: object"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can change the datatypes with `pandas.astype()` function. Here we change **battery_health** datatype to categorical:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "user                                        object\n",
       "device                                      object\n",
       "timestamp                                  float64\n",
       "battery_level                                int64\n",
       "battery_status                               int64\n",
       "battery_health                            category\n",
       "battery_adaptor                              int64\n",
       "datetime           datetime64[ns, Europe/Helsinki]\n",
       "dtype: object"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.astype({'battery_health': 'category'}).dtypes"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Transforming a column to a new value\n",
    "Dataframe values can be transformed (decoded etc.) into new values by using `pandas.transform()`function.\n",
    "\n",
    "Here we add one to the column values.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2020-01-09 02:20:02.924999952+02:00    1\n",
       "2020-01-09 02:21:30.405999899+02:00    1\n",
       "2020-01-09 02:24:12.805999994+02:00    1\n",
       "2020-01-09 02:35:38.561000109+02:00    1\n",
       "2020-01-09 02:35:38.953000069+02:00    3\n",
       "                                      ..\n",
       "2019-08-09 00:30:48.073999882+03:00    2\n",
       "2019-08-09 00:32:40.717999935+03:00    2\n",
       "2019-08-09 00:34:23.114000082+03:00    2\n",
       "2019-08-09 00:36:05.505000114+03:00    2\n",
       "2019-08-07 06:53:35.671000004+03:00    2\n",
       "Name: battery_adaptor, Length: 634, dtype: int64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['battery_adaptor'].transform(lambda x: x + 1)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Resample\n",
    "Dataframe down/upsampling can be done with `pandas.resample()` function.\n",
    "\n",
    "Here we downsample the data by hour and aggregate the mean:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2019-08-05 14:00:00+03:00    45.500000\n",
       "2019-08-05 15:00:00+03:00    44.000000\n",
       "2019-08-05 16:00:00+03:00    43.000000\n",
       "2019-08-05 17:00:00+03:00    42.000000\n",
       "2019-08-05 18:00:00+03:00    41.000000\n",
       "                               ...    \n",
       "2020-01-09 19:00:00+02:00    86.166667\n",
       "2020-01-09 20:00:00+02:00    82.000000\n",
       "2020-01-09 21:00:00+02:00    78.428571\n",
       "2020-01-09 22:00:00+02:00    75.000000\n",
       "2020-01-09 23:00:00+02:00    72.000000\n",
       "Freq: h, Name: battery_level, Length: 3779, dtype: float64"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['battery_level'].resample('h').agg(\"mean\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Groupby\n",
    "For groupwise data inspection, we can use `pandas.DataFrame.groupby()` function.\n",
    "\n",
    "Let's first load dataframe having several subjects belonging to different groups."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>activity</th>\n",
       "      <th>group</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>timestamp</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-03-27 06:00:00-05:00</th>\n",
       "      <td>u00</td>\n",
       "      <td>2</td>\n",
       "      <td>none</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-03-27 07:00:00-05:00</th>\n",
       "      <td>u00</td>\n",
       "      <td>1</td>\n",
       "      <td>none</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-03-27 08:00:00-05:00</th>\n",
       "      <td>u00</td>\n",
       "      <td>2</td>\n",
       "      <td>none</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-03-27 09:00:00-05:00</th>\n",
       "      <td>u00</td>\n",
       "      <td>3</td>\n",
       "      <td>none</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-03-27 10:00:00-05:00</th>\n",
       "      <td>u00</td>\n",
       "      <td>4</td>\n",
       "      <td>none</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-05-31 18:00:00-05:00</th>\n",
       "      <td>u59</td>\n",
       "      <td>5</td>\n",
       "      <td>mild</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-05-31 19:00:00-05:00</th>\n",
       "      <td>u59</td>\n",
       "      <td>5</td>\n",
       "      <td>mild</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-05-31 20:00:00-05:00</th>\n",
       "      <td>u59</td>\n",
       "      <td>4</td>\n",
       "      <td>mild</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-05-31 21:00:00-05:00</th>\n",
       "      <td>u59</td>\n",
       "      <td>5</td>\n",
       "      <td>mild</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-05-31 22:00:00-05:00</th>\n",
       "      <td>u59</td>\n",
       "      <td>1</td>\n",
       "      <td>mild</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>55907 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                          user  activity group\n",
       "timestamp                                     \n",
       "2013-03-27 06:00:00-05:00  u00         2  none\n",
       "2013-03-27 07:00:00-05:00  u00         1  none\n",
       "2013-03-27 08:00:00-05:00  u00         2  none\n",
       "2013-03-27 09:00:00-05:00  u00         3  none\n",
       "2013-03-27 10:00:00-05:00  u00         4  none\n",
       "...                        ...       ...   ...\n",
       "2013-05-31 18:00:00-05:00  u59         5  mild\n",
       "2013-05-31 19:00:00-05:00  u59         5  mild\n",
       "2013-05-31 20:00:00-05:00  u59         4  mild\n",
       "2013-05-31 21:00:00-05:00  u59         5  mild\n",
       "2013-05-31 22:00:00-05:00  u59         1  mild\n",
       "\n",
       "[55907 rows x 3 columns]"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = niimpy.read_csv(config.SL_ACTIVITY_PATH, tz='Europe/Helsinki')\n",
    "df.set_index('timestamp', inplace=True)\n",
    "df"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can summarize the data by grouping the observations by **group** and **user**, and then aggregating the mean:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>activity</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>group</th>\n",
       "      <th>user</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"13\" valign=\"top\">mild</th>\n",
       "      <th>u02</th>\n",
       "      <td>0.922348</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u04</th>\n",
       "      <td>1.466960</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u07</th>\n",
       "      <td>0.914457</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u16</th>\n",
       "      <td>0.702918</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u20</th>\n",
       "      <td>0.277946</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u24</th>\n",
       "      <td>0.938028</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u27</th>\n",
       "      <td>0.653724</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u31</th>\n",
       "      <td>0.929495</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u35</th>\n",
       "      <td>0.519455</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u43</th>\n",
       "      <td>0.809045</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u49</th>\n",
       "      <td>1.159767</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u58</th>\n",
       "      <td>0.620621</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u59</th>\n",
       "      <td>1.626263</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">moderate</th>\n",
       "      <th>u18</th>\n",
       "      <td>0.445323</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u52</th>\n",
       "      <td>1.051735</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">moderately severe</th>\n",
       "      <th>u17</th>\n",
       "      <td>0.489510</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u23</th>\n",
       "      <td>0.412884</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"15\" valign=\"top\">none</th>\n",
       "      <th>u00</th>\n",
       "      <td>1.182973</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u03</th>\n",
       "      <td>0.176737</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u05</th>\n",
       "      <td>0.606742</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u09</th>\n",
       "      <td>1.095908</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u10</th>\n",
       "      <td>0.662612</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u14</th>\n",
       "      <td>1.005859</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u15</th>\n",
       "      <td>0.295990</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u30</th>\n",
       "      <td>0.933036</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u32</th>\n",
       "      <td>1.113593</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u36</th>\n",
       "      <td>0.936281</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u42</th>\n",
       "      <td>0.378851</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u44</th>\n",
       "      <td>0.292580</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u47</th>\n",
       "      <td>0.396026</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u51</th>\n",
       "      <td>0.828662</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u56</th>\n",
       "      <td>0.840967</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"6\" valign=\"top\">severe</th>\n",
       "      <th>u01</th>\n",
       "      <td>1.063660</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u19</th>\n",
       "      <td>0.571792</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u33</th>\n",
       "      <td>0.733115</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u34</th>\n",
       "      <td>0.454789</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u45</th>\n",
       "      <td>0.441134</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u53</th>\n",
       "      <td>0.389404</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        activity\n",
       "group             user          \n",
       "mild              u02   0.922348\n",
       "                  u04   1.466960\n",
       "                  u07   0.914457\n",
       "                  u16   0.702918\n",
       "                  u20   0.277946\n",
       "                  u24   0.938028\n",
       "                  u27   0.653724\n",
       "                  u31   0.929495\n",
       "                  u35   0.519455\n",
       "                  u43   0.809045\n",
       "                  u49   1.159767\n",
       "                  u58   0.620621\n",
       "                  u59   1.626263\n",
       "moderate          u18   0.445323\n",
       "                  u52   1.051735\n",
       "moderately severe u17   0.489510\n",
       "                  u23   0.412884\n",
       "none              u00   1.182973\n",
       "                  u03   0.176737\n",
       "                  u05   0.606742\n",
       "                  u09   1.095908\n",
       "                  u10   0.662612\n",
       "                  u14   1.005859\n",
       "                  u15   0.295990\n",
       "                  u30   0.933036\n",
       "                  u32   1.113593\n",
       "                  u36   0.936281\n",
       "                  u42   0.378851\n",
       "                  u44   0.292580\n",
       "                  u47   0.396026\n",
       "                  u51   0.828662\n",
       "                  u56   0.840967\n",
       "severe            u01   1.063660\n",
       "                  u19   0.571792\n",
       "                  u33   0.733115\n",
       "                  u34   0.454789\n",
       "                  u45   0.441134\n",
       "                  u53   0.389404"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(['group','user']).agg(\"mean\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary statistics\n",
    "\n",
    "There are many ways you may want to get an overview of your data.\n",
    "\n",
    "Let's first load mobile phone screen activity data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user</th>\n",
       "      <th>device</th>\n",
       "      <th>time</th>\n",
       "      <th>screen_status</th>\n",
       "      <th>datetime</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:06:41.573999882+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>OWd1Uau8POix</td>\n",
       "      <td>1.578528e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>2020-01-09 02:06:41.573999882+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:09:29.151999950+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>OWd1Uau8POix</td>\n",
       "      <td>1.578529e+09</td>\n",
       "      <td>1</td>\n",
       "      <td>2020-01-09 02:09:29.151999950+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:09:32.790999889+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>OWd1Uau8POix</td>\n",
       "      <td>1.578529e+09</td>\n",
       "      <td>3</td>\n",
       "      <td>2020-01-09 02:09:32.790999889+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:11:41.996000051+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>OWd1Uau8POix</td>\n",
       "      <td>1.578529e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>2020-01-09 02:11:41.996000051+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-01-09 02:16:19.010999918+02:00</th>\n",
       "      <td>jd9INuQ5BBlW</td>\n",
       "      <td>OWd1Uau8POix</td>\n",
       "      <td>1.578529e+09</td>\n",
       "      <td>1</td>\n",
       "      <td>2020-01-09 02:16:19.010999918+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-07 17:42:41.009999990+03:00</th>\n",
       "      <td>dvWdLQesv21a</td>\n",
       "      <td>i8jmoIuoe12Mo</td>\n",
       "      <td>1.565189e+09</td>\n",
       "      <td>2</td>\n",
       "      <td>2019-08-07 17:42:41.009999990+03:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-07 18:32:41.009999990+03:00</th>\n",
       "      <td>dvWdLQesv21a</td>\n",
       "      <td>i8jmoIuoe12Mo</td>\n",
       "      <td>1.565192e+09</td>\n",
       "      <td>1</td>\n",
       "      <td>2019-08-07 18:32:41.009999990+03:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-07 19:22:41.009999990+03:00</th>\n",
       "      <td>dvWdLQesv21a</td>\n",
       "      <td>i8jmoIuoe12Mo</td>\n",
       "      <td>1.565195e+09</td>\n",
       "      <td>0</td>\n",
       "      <td>2019-08-07 19:22:41.009999990+03:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-07 20:12:41.009999990+03:00</th>\n",
       "      <td>dvWdLQesv21a</td>\n",
       "      <td>i8jmoIuoe12Mo</td>\n",
       "      <td>1.565198e+09</td>\n",
       "      <td>1</td>\n",
       "      <td>2019-08-07 20:12:41.009999990+03:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-07 21:02:41.009999990+03:00</th>\n",
       "      <td>dvWdLQesv21a</td>\n",
       "      <td>i8jmoIuoe12Mo</td>\n",
       "      <td>1.565201e+09</td>\n",
       "      <td>2</td>\n",
       "      <td>2019-08-07 21:02:41.009999990+03:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>343 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             user         device  \\\n",
       "2020-01-09 02:06:41.573999882+02:00  jd9INuQ5BBlW   OWd1Uau8POix   \n",
       "2020-01-09 02:09:29.151999950+02:00  jd9INuQ5BBlW   OWd1Uau8POix   \n",
       "2020-01-09 02:09:32.790999889+02:00  jd9INuQ5BBlW   OWd1Uau8POix   \n",
       "2020-01-09 02:11:41.996000051+02:00  jd9INuQ5BBlW   OWd1Uau8POix   \n",
       "2020-01-09 02:16:19.010999918+02:00  jd9INuQ5BBlW   OWd1Uau8POix   \n",
       "...                                           ...            ...   \n",
       "2019-08-07 17:42:41.009999990+03:00  dvWdLQesv21a  i8jmoIuoe12Mo   \n",
       "2019-08-07 18:32:41.009999990+03:00  dvWdLQesv21a  i8jmoIuoe12Mo   \n",
       "2019-08-07 19:22:41.009999990+03:00  dvWdLQesv21a  i8jmoIuoe12Mo   \n",
       "2019-08-07 20:12:41.009999990+03:00  dvWdLQesv21a  i8jmoIuoe12Mo   \n",
       "2019-08-07 21:02:41.009999990+03:00  dvWdLQesv21a  i8jmoIuoe12Mo   \n",
       "\n",
       "                                             time  screen_status  \\\n",
       "2020-01-09 02:06:41.573999882+02:00  1.578528e+09              0   \n",
       "2020-01-09 02:09:29.151999950+02:00  1.578529e+09              1   \n",
       "2020-01-09 02:09:32.790999889+02:00  1.578529e+09              3   \n",
       "2020-01-09 02:11:41.996000051+02:00  1.578529e+09              0   \n",
       "2020-01-09 02:16:19.010999918+02:00  1.578529e+09              1   \n",
       "...                                           ...            ...   \n",
       "2019-08-07 17:42:41.009999990+03:00  1.565189e+09              2   \n",
       "2019-08-07 18:32:41.009999990+03:00  1.565192e+09              1   \n",
       "2019-08-07 19:22:41.009999990+03:00  1.565195e+09              0   \n",
       "2019-08-07 20:12:41.009999990+03:00  1.565198e+09              1   \n",
       "2019-08-07 21:02:41.009999990+03:00  1.565201e+09              2   \n",
       "\n",
       "                                                               datetime  \n",
       "2020-01-09 02:06:41.573999882+02:00 2020-01-09 02:06:41.573999882+02:00  \n",
       "2020-01-09 02:09:29.151999950+02:00 2020-01-09 02:09:29.151999950+02:00  \n",
       "2020-01-09 02:09:32.790999889+02:00 2020-01-09 02:09:32.790999889+02:00  \n",
       "2020-01-09 02:11:41.996000051+02:00 2020-01-09 02:11:41.996000051+02:00  \n",
       "2020-01-09 02:16:19.010999918+02:00 2020-01-09 02:16:19.010999918+02:00  \n",
       "...                                                                 ...  \n",
       "2019-08-07 17:42:41.009999990+03:00 2019-08-07 17:42:41.009999990+03:00  \n",
       "2019-08-07 18:32:41.009999990+03:00 2019-08-07 18:32:41.009999990+03:00  \n",
       "2019-08-07 19:22:41.009999990+03:00 2019-08-07 19:22:41.009999990+03:00  \n",
       "2019-08-07 20:12:41.009999990+03:00 2019-08-07 20:12:41.009999990+03:00  \n",
       "2019-08-07 21:02:41.009999990+03:00 2019-08-07 21:02:41.009999990+03:00  \n",
       "\n",
       "[343 rows x 5 columns]"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = niimpy.read_csv(config.MULTIUSER_AWARE_SCREEN_PATH, tz='Europe/Helsinki')\n",
    "df"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hourly data\n",
    "It is easy to get the amount of data (observations) in each hour"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2019-08-05  14    20\n",
       "            15     1\n",
       "            16     1\n",
       "            17     1\n",
       "            18     1\n",
       "                  ..\n",
       "2020-01-09  18     4\n",
       "            20     4\n",
       "            21    19\n",
       "            22     3\n",
       "            23    12\n",
       "Length: 94, dtype: int64"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hourly = df.groupby([df.index.date, df.index.hour]).size()\n",
    "hourly"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(datetime.date(2019, 8, 5), 14) had 20 data points\n"
     ]
    }
   ],
   "source": [
    "# The index is the (day, hour) pairs and the\n",
    "# value is the number at that time\n",
    "print('%s had %d data points'%(hourly.index[0], hourly.iloc[0]))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Occurrence\n",
    "\n",
    "In niimpy, **occurrence** is a way to see the completeness of data.\n",
    "\n",
    "Occurence is calculated as:\n",
    "* Divide the data into time intervals (1 hour by default).\n",
    "* Divide each interval into bins (5 bins by default).\n",
    "* Count the number of bins that have data.  This is $occurrence$.\n",
    "* For each interval, report $occurrence$.\n",
    "\n",
    "This isn't the perfect measure, but is reasonably effective and simple to calculate.  For data which isn't continuous (like screen data we are actually using), it shows how much the sensor has been used.\n",
    "\n",
    "Note that the \"uniformly present data\" is not true for all data sources."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>occurrence</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>time</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2019-08-05 14:00:00+03:00</th>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-05 15:00:00+03:00</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-05 16:00:00+03:00</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-05 17:00:00+03:00</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-08-05 18:00:00+03:00</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           occurrence\n",
       "time                                 \n",
       "2019-08-05 14:00:00+03:00           4\n",
       "2019-08-05 15:00:00+03:00           1\n",
       "2019-08-05 16:00:00+03:00           1\n",
       "2019-08-05 17:00:00+03:00           1\n",
       "2019-08-05 18:00:00+03:00           1"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "occurrences = niimpy.util.occurrence(df.index)\n",
    "occurrences.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can create a simplified presentation (pivot table) for the data by using `pandas.pivot()`function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"5\" halign=\"left\">occurrence</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>hour</th>\n",
       "      <th>14</th>\n",
       "      <th>15</th>\n",
       "      <th>16</th>\n",
       "      <th>17</th>\n",
       "      <th>18</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>day</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2019-08-05</th>\n",
       "      <td>4.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           occurrence                    \n",
       "hour               14   15   16   17   18\n",
       "day                                      \n",
       "2019-08-05        4.0  1.0  1.0  1.0  1.0"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "occurrences[\"day\"] = occurrences.index.date\n",
    "occurrences[\"hour\"] = occurrences.index.hour\n",
    "occurrences.head(5).pivot_table(index='day',columns='hour')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}