Tracker Data

Introduction

Fitness tracker is a rich source of longitudinal data captured at high frequency. Those can include step counts, heart rate, calories expenditure, or sleep time. This notebook explains how we can use niimpy to extract some basic statistic and features from step count data.

A dataframe with fittness data should contain the following columns (column names can be different, but in that case they must be provided as parameters):

user: Subject ID
device: Device ID
steps: Number of steps measured on the time interval

As usual, the index should be the time of the measurements. Step count is calculated between that time and the previous timestamp.

Read data

[1]:

import pandas as pd
import niimpy.preprocessing.tracker as tracker
from niimpy import config
import warnings
warnings.filterwarnings("ignore")

[2]:

data = pd.read_csv(config.STEP_SUMMARY_PATH, index_col=0)
# Converting the index as date
data.index = pd.to_datetime(data.index)
data.shape

[2]:

(73, 4)

[3]:

data.head()

[3]:

	user	date	time
2021-07-01 00:00:00	wiam9xme	2021-07-01	00:00:00.000
2021-07-01 01:00:00	wiam9xme	2021-07-01	01:00:00.000
2021-07-01 02:00:00	wiam9xme	2021-07-01	02:00:00.000
2021-07-01 03:00:00	wiam9xme	2021-07-01	03:00:00.000
2021-07-01 04:00:00	wiam9xme	2021-07-01	04:00:00.000

Getting basic statistics

Using niimpy we can extract a user’s step count statistic within a time window. The statistics include:

mean: average number of steps taken within the time range
standard deviation: standard deviation of steps
max: max steps taken within a day during the time range
min: min steps taken within a day during the time range

[4]:

tracker.step_summary(data, value_col = 'steps')

[4]:

	min_sum_step	max_sum_step	std_sum_step	avg_sum_step	median_sum_step	user
0	5616	13025	3352.347745	8437.383562	6480.0	wiam9xme

Feature extraction

Assuming that the step count comes in at hourly resolution, we can compute the distribution of daily step count at each hour. The daily distribution is helpful to look at if for example, we want to see at what hours a user is most active at.

[5]:

f = tracker.tracker_step_distribution
step_distribution = tracker.extract_features_tracker(data, features={f: {}})
step_distribution

{<function tracker_step_distribution at 0x7a2e3a9e6fc0>: {}} {}

[5]:

	user	step_distribution	step_sum
2021-07-01 00:00:00	wiam9xme	0.000000	5616.0
2021-07-01 01:00:00	wiam9xme	0.000000	5616.0
2021-07-01 02:00:00	wiam9xme	0.000000	5616.0
2021-07-01 03:00:00	wiam9xme	0.000000	5616.0
2021-07-01 04:00:00	wiam9xme	0.000000	5616.0
...	...	...	...
2021-07-03 19:00:00	wiam9xme	0.025162	12002.0
2021-07-03 20:00:00	wiam9xme	0.001000	12002.0
2021-07-03 21:00:00	wiam9xme	0.029495	12002.0
2021-07-03 22:00:00	wiam9xme	0.000000	12002.0
2021-07-03 23:00:00	wiam9xme	0.000000	12002.0

72 rows × 3 columns