- Data Analysis with Python
- Data Analysis With Python
- Analyzing Numerical Data with NumPy
- Arrays in NumPy
- Creating NumPy Array
- Python3
- Python3
- Operations on Numpy Arrays
- Arithmetic Operations
- Python3
- Python3
- Python3
- Python3
- NumPy Array Indexing
- Python NumPy Array Indexing
- Python3
- NumPy Array Slicing
- Python3
- Python3
- NumPy Array Broadcasting
- Python Exploratory Data Analysis Tutorial
- How to Learn Python From Scratch in 2023: An Expert Guide
- What is Data Analysis? An Expert Guide With Examples
- What is Data Science? The Definitive Guide
- How Data Science is Transforming the NBA
- Fighting the Climate Crisis with Data
- Benchmarking High-Performance pandas Alternatives
- Introduction to Python
- Intermediate Python
- Exploratory Data Analysis in Python
Data Analysis with Python
In this article, we will discuss how to do data analysis with Python. We will discuss all sorts of data analysis i.e. analyzing numerical data with NumPy, Tabular data with Pandas, data visualization Matplotlib, and Exploratory data analysis.
Data Analysis With Python
Data Analysis is the technique of collecting, transforming, and organizing data to make future predictions and informed data-driven decisions. It also helps to find possible solutions for a business problem. There are six steps for Data Analysis. They are:
- Ask or Specify Data Requirements
- Prepare or Collect Data
- Clean and Process
- Analyze
- Share
- Act or Report
Data Analysis with Python
Note: To know more about these steps refer to our Six Steps of Data Analysis Process tutorial.
Analyzing Numerical Data with NumPy
NumPy is an array processing package in Python and provides a high-performance multidimensional array object and tools for working with these arrays. It is the fundamental package for scientific computing with Python.
Arrays in NumPy
NumPy Array is a table of elements (usually numbers), all of the same types, indexed by a tuple of positive integers. In Numpy, the number of dimensions of the array is called the rank of the array. A tuple of integers giving the size of the array along each dimension is known as the shape of the array.
Creating NumPy Array
NumPy arrays can be created in multiple ways, with various ranks. It can also be created with the use of different data types like lists, tuples, etc. The type of the resultant array is deduced from the type of elements in the sequences. NumPy offers several functions to create arrays with initial placeholder content. These minimize the necessity of growing arrays, an expensive operation.
Python3
Empty Matrix using pandas
Python3
Matrix b : [0 0] Matrix a : [[0 0] [0 0]] Matrix c : [[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]]
Operations on Numpy Arrays
Arithmetic Operations
Python3
[ 7 77 23 130] [ 7 77 23 130] [ 8 79 26 134] [ 7 77 23 130]
Python3
Python3
[ 10 360 130 3000] [ 10 360 130 3000]
Python3
[ 2.5 14.4 1.3 3.33333333] [ 2.5 14.4 1.3 3.33333333]
NumPy Array Indexing
Indexing can be done in NumPy by using an array as an index. In the case of the slice, a view or shallow copy of the array is returned but in the index array, a copy of the original array is returned. Numpy arrays can be indexed with other arrays or any other sequence with the exception of tuples. The last element is indexed by -1 second last by -2 and so on.
Python NumPy Array Indexing
Python3
A sequential array with a negative step: [10 8 6 4 2] Elements at these indices are: [4 8 6]
NumPy Array Slicing
Consider the syntax x[obj] where x is the array and obj is the index. The slice object is the index in the case of basic slicing. Basic slicing occurs when obj is :
- a slice object that is of the form start: stop: step
- an integer
- or a tuple of slice objects and integers
All arrays generated by basic slicing are always the view in the original array.
Python3
print ( «\n a[10:] gfg-icon gfg-icon_arrow-right-editor padding-2px code-sidebar-button output-icon»>
Array is: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19] a[-8:17:1] = [12 13 14 15 16] a[10:] = [10 11 12 13 14 15 16 17 18 19]
Ellipsis can also be used along with basic slicing. Ellipsis (…) is the number of : objects needed to make a selection tuple of the same length as the dimensions of the array.
Python3
NumPy Array Broadcasting
The term broadcasting refers to how numpy treats arrays with different Dimensions during arithmetic operations which lead to certain constraints, the smaller array is broadcast across the larger array so that they have compatible shapes.
Let’s assume that we have a large data set, each datum is a list of parameters. In Numpy we have a 2-D array, where each row is a datum and the number of rows is the size of the data set. Suppose we want to apply some sort of scaling to all these data every parameter gets its own scaling factor or say Every parameter is multiplied by some factor.
Just to have a clear understanding, let’s count calories in foods using a macro-nutrient breakdown. Roughly put, the caloric parts of food are made of fats (9 calories per gram), protein (4 CPG), and carbs (4 CPG). So if we list some foods (our data), and for each food list its macro-nutrient breakdown (parameters), we can then multiply each nutrient by its caloric value (apply scaling) to compute the caloric breakdown of every food item.
With this transformation, we can now compute all kinds of useful information. For example, what is the total number of calories present in some food or, given a breakdown of my dinner know how many calories did I get from protein and so on.
Let’s see a naive way of producing this computation with Numpy:
Python Exploratory Data Analysis Tutorial
Learn the basics of Exploratory Data Analysis (EDA) in Python with Pandas, Matplotlib and NumPy, such as sampling, feature engineering, correlation, etc.
As you will know by now, the Python data manipulation library Pandas is used for data manipulation; For those who are just starting out, this might imply that this package can only be handy when preprocessing data, but much less is true: Pandas is also great to explore your data and to store it after you’re done preprocessing the data.
Additionally, for those who have been following DataCamp’s Python tutorials or that have already been introduced to the basics of SciPy, NumPy, Matplotlib and Pandas, it might be a good idea to recap some of the knowledge that you have built up.
Today’s tutorial will actually introduce you to some ways to explore your data efficiently with all the above packages so that you can start modeling your data:
How to Learn Python From Scratch in 2023: An Expert Guide
What is Data Analysis? An Expert Guide With Examples
What is Data Science? The Definitive Guide
How Data Science is Transforming the NBA
Fighting the Climate Crisis with Data
Benchmarking High-Performance pandas Alternatives
Introduction to Python
Master the basics of data analysis with Python in just four hours. This online course will introduce the Python interface and explore popular packages.
Intermediate Python
Level up your data science skills by creating visualizations using Matplotlib and manipulating DataFrames with pandas.
Exploratory Data Analysis in Python
Learn how to explore, visualize, and extract insights from data using exploratory data analysis (EDA) in Python.