Analysis

Social Signature

We can calculate a social signature from email activity. First we’ll load the example email data from the sample data.

[1]:

import os
import niimpy
from niimpy import config

path = os.path.join(config.GOOGLE_TAKEOUT_DIR, "Takeout", "Mail", "All mail Including Spam and Trash.mbox")
data = niimpy.reading.google_takeout.email_activity(path, sentiment=False)

/u/24/rantahj1/unix/src/niimpy/niimpy/reading/google_takeout.py:491: UserWarning: Could not parse message timestamp: 2023-12-15 12:19:43+00:00
  warnings.warn(f"Could not parse message timestamp: {received}")
/u/24/rantahj1/unix/src/niimpy/niimpy/reading/google_takeout.py:505: UserWarning: Failed to format received time: Sat, 15 DeNot a timec 2023 12:19:43 0000
  warnings.warn(f"Failed to format received time: {received}")

Email data contains a “to” and and a “from” column. This data is pseudonymized: the email addresses are replaced by integer ids. The user is represented as ID 0.

[2]:

data[["from", "to"]]

[2]:

	from	to
timestamp
2023-12-15 12:19:43+00:00	0	[6]
2023-12-15 12:29:43+00:00	0	[2, 6]
2023-12-15 12:29:43+00:00	0	[2, 6]
2023-12-15 12:39:43+00:00	2	[0]
2023-12-15 12:39:43+00:00	2	[0]

[3]:

import niimpy.analysis.social_signature

niimpy.analysis.social_signature.social_signature(data)

[3]:

to
2    0.4
6    0.6
dtype: float64

Rythms

The rythm function in niimpy.analysis.rhythms calculates general rhythms from different types of input data. It

Resamples the data into bins. For example, if the bin is 1 day, each row will represent a day.
Accumulates the binned historical data to a certain time period. For example, if the period is a week and the bin is a day, the first row will be the sum of data for the first day of the week, the second row the sum for the second day of the week and so on.
Calculate the percentage each bin represents of a third period, frequency. In the above example, if the frequency is 2 days, the data of the first two rows is scaled to sum to 1.

[4]:

data = niimpy.read_csv(config.MULTIUSER_AWARE_CALLS_PATH, tz='Europe/Helsinki')
data = data[data["user"] == 'iGyXetHE3S8u']
data.head()

[4]:

	user	device	time	call_type	call_duration	datetime
2019-08-08 22:32:25.256999969+03:00	iGyXetHE3S8u	Cq9vueHh3zVs	1.565293e+09	incoming	1217	2019-08-08 22:32:25.256999969+03:00
2019-08-08 22:53:35.107000113+03:00	iGyXetHE3S8u	Cq9vueHh3zVs	1.565294e+09	incoming	383	2019-08-08 22:53:35.107000113+03:00
2019-08-08 22:31:34.539999962+03:00	iGyXetHE3S8u	Cq9vueHh3zVs	1.565293e+09	incoming	1142	2019-08-08 22:31:34.539999962+03:00
2019-08-08 22:43:45.834000111+03:00	iGyXetHE3S8u	Cq9vueHh3zVs	1.565293e+09	incoming	1170	2019-08-08 22:43:45.834000111+03:00
2019-08-08 22:55:33.053999901+03:00	iGyXetHE3S8u	Cq9vueHh3zVs	1.565294e+09	incoming	497	2019-08-08 22:55:33.053999901+03:00

[5]:

from niimpy.analysis.rhythms import rhythm

duration_rhythm = rhythm(data, cols=["call_duration"], period="1D", freq="12h", bin="1h")
duration_rhythm.plot()

[5]:

<Axes: >

../../_images/user_guide_Analysis_8_1.png