
Loading the dataset
First of all, it is essential to download the dataset. Follow the preceding steps from the Technical requirements section and download the data. Gmail (https://takeout.google.com/settings/takeout) provides data in mbox format. For this chapter, I loaded my own personal email from Google Mail. For privacy reasons, I cannot share the dataset. However, I will show you different EDA operations that you can perform to analyze several aspects of your email behavior:
Let's load the required libraries:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
2.hen you have loaded the libraries, load the dataset:
import mailbox
mboxfile = "PATH TO DOWNLOADED MBOX FIL"
mbox = mailbox.mbox(mboxfile)
mbox
Note that it is essential that you replace the mbox file path with your own path.
The output of the preceding code is as follows:
<mailbox.mbox at 0x7f124763f5c0>
The output indicates that the mailbox has been successfully created.
3.ext, let's see the list of available keys:
for key in mbox[0].keys():
print(key)
The output of the preceding code is as follows:
X-GM-THRID
X-Gmail-Labels
Delivered-To
Received
X-Google-Smtp-Source
X-Received
ARC-Seal
ARC-Message-Signature
ARC-Authentication-Results
Return-Path
Received
Received-SPF
Authentication-Results
DKIM-Signature
DKIM-Signature
Subject
From
To
Reply-To
Date
MIME-Version
Content-Type
X-Mailer
X-Complaints-To
X-Feedback-ID
List-Unsubscribe
Message-ID
The preceding output shows the list of keys that are present in the extracted dataset.