Tracker Data
Introduction
Fitness tracker is a rich source of longitudinal data captured at high frequency. Those can include step counts, heart rate, calories expenditure, or sleep time. This notebook explains how we can use niimpy
to extract some basic statistic and features from step count data.
Read data
[1]:
import niimpy
import pandas as pd
import niimpy.preprocessing.tracker as tracker
from niimpy import config
import warnings
warnings.filterwarnings("ignore")
/u/24/rantahj1/unix/miniconda3/envs/niimpy/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
[2]:
data = pd.read_csv(config.STEP_SUMMARY_PATH, index_col=0)
# Converting the index as date
data.index = pd.to_datetime(data.index)
data.shape
[2]:
(73, 4)
[3]:
data.head()
[3]:
user | date | time | steps | |
---|---|---|---|---|
2021-07-01 00:00:00 | wiam9xme | 2021-07-01 | 00:00:00.000 | 0 |
2021-07-01 01:00:00 | wiam9xme | 2021-07-01 | 01:00:00.000 | 0 |
2021-07-01 02:00:00 | wiam9xme | 2021-07-01 | 02:00:00.000 | 0 |
2021-07-01 03:00:00 | wiam9xme | 2021-07-01 | 03:00:00.000 | 0 |
2021-07-01 04:00:00 | wiam9xme | 2021-07-01 | 04:00:00.000 | 0 |
Getting basic statistics
Using niimpy
we can extract a user’s step count statistic within a time window. The statistics include:
mean
: average number of steps taken within the time rangestandard deviation
: standard deviation of stepsmax
: max steps taken within a day during the time rangemin
: min steps taken within a day during the time range
[4]:
tracker.step_summary(data, {'value_col': 'steps'})
[4]:
user | median_sum_step | avg_sum_step | std_sum_step | min_sum_step | max_sum_step | |
---|---|---|---|---|---|---|
0 | wiam9xme | 6480.0 | 8437.383562 | 3352.347745 | 5616 | 13025 |
Feature extraction
Assuming that the step count comes in at hourly resolution, we can compute the distribution of daily step count at each hour. The daily distribution is helpful to look at if for example, we want to see at what hours a user is most active at.
[5]:
f = tracker.tracker_step_distribution
step_distribution = tracker.extract_features_tracker(data, features={f: {}})
step_distribution
{<function tracker_step_distribution at 0x7f5291ceaf20>: {}} {}
[5]:
user | step_distribution | step_sum | |
---|---|---|---|
2021-07-01 00:00:00 | wiam9xme | 0.000000 | 5616.0 |
2021-07-01 01:00:00 | wiam9xme | 0.000000 | 5616.0 |
2021-07-01 02:00:00 | wiam9xme | 0.000000 | 5616.0 |
2021-07-01 03:00:00 | wiam9xme | 0.000000 | 5616.0 |
2021-07-01 04:00:00 | wiam9xme | 0.000000 | 5616.0 |
... | ... | ... | ... |
2021-07-03 19:00:00 | wiam9xme | 0.025162 | 12002.0 |
2021-07-03 20:00:00 | wiam9xme | 0.001000 | 12002.0 |
2021-07-03 21:00:00 | wiam9xme | 0.029495 | 12002.0 |
2021-07-03 22:00:00 | wiam9xme | 0.000000 | 12002.0 |
2021-07-03 23:00:00 | wiam9xme | 0.000000 | 12002.0 |
72 rows × 3 columns