Hands-On Exploratory Data Analysis with Python
上QQ阅读APP看书,第一时间看更新

Getting started with EDA

As mentioned earlier, we are going to use Python as the main tool for data analysis. Yay! Well, if you ask me why, Python has been consistently ranked among the top 10 programming languages and is widely adopted for data analysis and data mining by data science experts. In this book, we assume you have a working knowledge of Python. If you are not familiar with Python, it's probably too early to get started with data analysis. I assume you are familiar with the following Python tools and packages:


        

Fundamental concepts of variables, string, and data types

Conditionals and functions

Sequences, collections, and iterations

Working with files

Object-oriented programming

Create arrays with NumPy, copy arrays, and divide arrays

Perform different operations on NumPy arrays

Understand array selections, advanced indexing, and expanding

Working with multi-dimensional arrays

Linear algebraic functions and built-in NumPy functions

Understand and create DataFrame objects

Subsetting data and indexing data 

Arithmetic functions, and mapping with pandas

Managing index

Building style for visual analysis

Loading linear datasets

Adjusting axes, grids, labels, titles, and legends

Saving plots

Importing the package

Using statistical packages from SciPy

Performing descriptive statistics

Inference and data analysis

 

Before diving into details about analysis, we need to make sure we are on the same page. Let's go through the checklist and verify that you meet all of the prerequisites to get the best out of this book:

 

Next, let's look at the basic operations of EDA using the NumPy library.

Python programming NumPy pandas Matplotlib SciPy