
Getting started with EDA
As mentioned earlier, we are going to use Python as the main tool for data analysis. Yay! Well, if you ask me why, Python has been consistently ranked among the top 10 programming languages and is widely adopted for data analysis and data mining by data science experts. In this book, we assume you have a working knowledge of Python. If you are not familiar with Python, it's probably too early to get started with data analysis. I assume you are familiar with the following Python tools and packages:
Fundamental concepts of variables, string, and data types
Conditionals and functions
Sequences, collections, and iterations
Working with files
Object-oriented programming
Create arrays with NumPy, copy arrays, and divide arrays
Perform different operations on NumPy arrays
Understand array selections, advanced indexing, and expanding
Working with multi-dimensional arrays
Linear algebraic functions and built-in NumPy functions
Understand and create DataFrame objects
Subsetting data and indexing data
Arithmetic functions, and mapping with pandas
Managing index
Building style for visual analysis
Loading linear datasets
Adjusting axes, grids, labels, titles, and legends
Saving plots
Importing the package
Using statistical packages from SciPy
Performing descriptive statistics
Inference and data analysis
Before diving into details about analysis, we need to make sure we are on the same page. Let's go through the checklist and verify that you meet all of the prerequisites to get the best out of this book:
Next, let's look at the basic operations of EDA using the NumPy library.
Python programming | NumPy | pandas | Matplotlib | SciPy |