Hands-On Exploratory Data Analysis with Python

上QQ阅读APP看书，第一时间看更新

Making sense of data

It is crucial to identify the type of data under analysis. In this section, we are going to learn about different types of data that you can encounter during analysis. Different disciplines store different kinds of data for different purposes. For example, medical researchers store patients' data, universities store students' and teachers' data, and real estate industries storehouse and building datasets. A dataset contains many observations about a particular object. For instance, a dataset about patients in a hospital can contain many observations. A patient can be described by a patient identifier (ID), name, address, weight, date of birth, address, email, and gender. Each of these features that describes a patient is a variable. Each observation can have a specific value for each of these variables. For example, a patient can have the following:

PATIENT_ID = 1001
Name = Yoshmi Mukhiya
Address = Mannsverk 61, 5094, Bergen, Norway
Date of birth = 10th July 2018
Email = yoshmimukhiya@gmail.com
Weight = 10 
Gender = Female

These datasets are stored in hospitals and are presented for analysis. Most of this data is stored in some sort of database management system in tables/schema. An example of a table for storing patient information is shown here:

            
PATIENT_ID           NAME           ADDRESS           DOB           EMAIL           Gender           WEIGHT

            
001           Suresh Kumar Mukhiya           Mannsverk, 61           30.12.1989           skmu@hvl.no           Male           68

            
002           Yoshmi Mukhiya           Mannsverk 61, 5094, Bergen           10.07.2018           yoshmimukhiya@gmail.com           Female           1

            
003           Anju Mukhiya           Mannsverk 61, 5094, Bergen           10.12.1997           anjumukhiya@gmail.com           Female           24

            
004           Asha Gaire           Butwal, Nepal           30.11.1990           aasha.gaire@gmail.com           Female           23

            
005           Ola Nordmann           Danmark, Sweden           12.12.1789           ola@gmail.com           Male           75

To summarize the preceding table, there are four observations (001, 002, 003, 004, 005). Each observation describes variables (PatientID, name, address, dob, email, gender, and weight). Most of the dataset broadly falls into two groups—numerical data and categorical data.