{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "d4d5d7e4",
"metadata": {},
"source": [
"# Tracker Data"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "5d4a7163",
"metadata": {},
"source": [
"## Introduction\n",
"\n",
"Fitness tracker is a rich source of longitudinal data captured at high frequency. Those can include step counts, heart rate, calories expenditure, or sleep time. This notebook explains how we can use `niimpy` to extract some basic statistic and features from step count data.\n",
"\n",
"A dataframe with fittness data should contain the following columns (column names can be different, but in that case they must be provided as parameters):\n",
"- `user`: Subject ID\n",
"- `device`: Device ID\n",
"- `steps`: Number of steps measured on the time interval\n",
"\n",
"As usual, the index should be the time of the measurements. Step count is calculated between that time and the previous timestamp."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "d802fffa",
"metadata": {},
"source": [
"## Read data"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "b170d597",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import niimpy.preprocessing.tracker as tracker\n",
"from niimpy import config\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\")"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "7a13bbd4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(73, 4)"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = pd.read_csv(config.STEP_SUMMARY_PATH, index_col=0)\n",
"# Converting the index as date\n",
"data.index = pd.to_datetime(data.index)\n",
"data.shape"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "245e4af7",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" user | \n",
" date | \n",
" time | \n",
" steps | \n",
"
\n",
" \n",
" \n",
" \n",
" 2021-07-01 00:00:00 | \n",
" wiam9xme | \n",
" 2021-07-01 | \n",
" 00:00:00.000 | \n",
" 0 | \n",
"
\n",
" \n",
" 2021-07-01 01:00:00 | \n",
" wiam9xme | \n",
" 2021-07-01 | \n",
" 01:00:00.000 | \n",
" 0 | \n",
"
\n",
" \n",
" 2021-07-01 02:00:00 | \n",
" wiam9xme | \n",
" 2021-07-01 | \n",
" 02:00:00.000 | \n",
" 0 | \n",
"
\n",
" \n",
" 2021-07-01 03:00:00 | \n",
" wiam9xme | \n",
" 2021-07-01 | \n",
" 03:00:00.000 | \n",
" 0 | \n",
"
\n",
" \n",
" 2021-07-01 04:00:00 | \n",
" wiam9xme | \n",
" 2021-07-01 | \n",
" 04:00:00.000 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" user date time steps\n",
"2021-07-01 00:00:00 wiam9xme 2021-07-01 00:00:00.000 0\n",
"2021-07-01 01:00:00 wiam9xme 2021-07-01 01:00:00.000 0\n",
"2021-07-01 02:00:00 wiam9xme 2021-07-01 02:00:00.000 0\n",
"2021-07-01 03:00:00 wiam9xme 2021-07-01 03:00:00.000 0\n",
"2021-07-01 04:00:00 wiam9xme 2021-07-01 04:00:00.000 0"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.head()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "d3141ebc",
"metadata": {},
"source": [
"## Getting basic statistics\n",
"\n",
"Using `niimpy` we can extract a user's step count statistic within a time window. The statistics include:\n",
"\n",
"- `mean`: average number of steps taken within the time range\n",
"- `standard deviation`: standard deviation of steps \n",
"- `max`: max steps taken within a day during the time range\n",
"- `min`: min steps taken within a day during the time range\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "c86865e6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" min_sum_step | \n",
" max_sum_step | \n",
" std_sum_step | \n",
" avg_sum_step | \n",
" median_sum_step | \n",
" user | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 5616 | \n",
" 13025 | \n",
" 3352.347745 | \n",
" 8437.383562 | \n",
" 6480.0 | \n",
" wiam9xme | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" min_sum_step max_sum_step std_sum_step avg_sum_step median_sum_step \\\n",
"0 5616 13025 3352.347745 8437.383562 6480.0 \n",
"\n",
" user \n",
"0 wiam9xme "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tracker.step_summary(data, value_col = 'steps')"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "7c6805a0",
"metadata": {},
"source": [
"## Feature extraction\n",
"\n",
"Assuming that the step count comes in at hourly resolution, we can compute the distribution of daily step count at each hour. The daily distribution is helpful to look at if for example, we want to see at what hours a user is most active at."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "6cf2639c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{: {}} {}\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" user | \n",
" step_distribution | \n",
" step_sum | \n",
"
\n",
" \n",
" \n",
" \n",
" 2021-07-01 00:00:00 | \n",
" wiam9xme | \n",
" 0.000000 | \n",
" 5616.0 | \n",
"
\n",
" \n",
" 2021-07-01 01:00:00 | \n",
" wiam9xme | \n",
" 0.000000 | \n",
" 5616.0 | \n",
"
\n",
" \n",
" 2021-07-01 02:00:00 | \n",
" wiam9xme | \n",
" 0.000000 | \n",
" 5616.0 | \n",
"
\n",
" \n",
" 2021-07-01 03:00:00 | \n",
" wiam9xme | \n",
" 0.000000 | \n",
" 5616.0 | \n",
"
\n",
" \n",
" 2021-07-01 04:00:00 | \n",
" wiam9xme | \n",
" 0.000000 | \n",
" 5616.0 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 2021-07-03 19:00:00 | \n",
" wiam9xme | \n",
" 0.025162 | \n",
" 12002.0 | \n",
"
\n",
" \n",
" 2021-07-03 20:00:00 | \n",
" wiam9xme | \n",
" 0.001000 | \n",
" 12002.0 | \n",
"
\n",
" \n",
" 2021-07-03 21:00:00 | \n",
" wiam9xme | \n",
" 0.029495 | \n",
" 12002.0 | \n",
"
\n",
" \n",
" 2021-07-03 22:00:00 | \n",
" wiam9xme | \n",
" 0.000000 | \n",
" 12002.0 | \n",
"
\n",
" \n",
" 2021-07-03 23:00:00 | \n",
" wiam9xme | \n",
" 0.000000 | \n",
" 12002.0 | \n",
"
\n",
" \n",
"
\n",
"
72 rows × 3 columns
\n",
"
"
],
"text/plain": [
" user step_distribution step_sum\n",
"2021-07-01 00:00:00 wiam9xme 0.000000 5616.0\n",
"2021-07-01 01:00:00 wiam9xme 0.000000 5616.0\n",
"2021-07-01 02:00:00 wiam9xme 0.000000 5616.0\n",
"2021-07-01 03:00:00 wiam9xme 0.000000 5616.0\n",
"2021-07-01 04:00:00 wiam9xme 0.000000 5616.0\n",
"... ... ... ...\n",
"2021-07-03 19:00:00 wiam9xme 0.025162 12002.0\n",
"2021-07-03 20:00:00 wiam9xme 0.001000 12002.0\n",
"2021-07-03 21:00:00 wiam9xme 0.029495 12002.0\n",
"2021-07-03 22:00:00 wiam9xme 0.000000 12002.0\n",
"2021-07-03 23:00:00 wiam9xme 0.000000 12002.0\n",
"\n",
"[72 rows x 3 columns]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"f = tracker.tracker_step_distribution\n",
"step_distribution = tracker.extract_features_tracker(data, features={f: {}})\n",
"step_distribution"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "niimpy",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}