Read excel file using python

Python Read Excel- Different ways to read an excel file using Python

Python Read Excel

An Excel file is a spreadsheet file containing some cells in rows and columns (Tabular view) and can help in the arrangement, calculation, sorting, and managing of data. The data in the spreadsheet may be numeric, text, formulas, hyperlinks, functions, etc. An XLS file stores data as binary streams. It can only be created by the most popular MS Excel or other spreadsheet programs. The file format .xlsx always indicates an excel file on its own.

The following image depicts an excel file created by the MS-excel program:

Excel File By MS Excel

How to read Excel files using Python

To read excel files using Python, we need to use some popular Python modules and methods. Let’s understand those as well.

Using Python xlrd module

xlrd is a python library or module to read and manage information from Excel files ( i.e. files in . xlsx format ). This Module will not be applicable for anything other than . xlsx files.

Let’s have a quick look at how to install xlrd module.

C:\Users\pc> pip install xlrd

As you are using python, You must have downloaded the pip package installer. You can also use another Python package manager of your choice.

Читайте также:  Pattern search in javascript

Installing Xlrd Module

In this method, We are going to use xlwings module along with the method associated with it (i.e. xlwings.Book() ).

This method will automatically open our .xlsx in the background for us in its original program (i.e. MS-Excel) where we can operate and manage our data.

#importing the xlwings module as xw import xlwings as xw #you can get ur excel file already been opened after execution of this command ws = xw.Book("C:\\Users\\pc\\Desktop\\students.xlsx").sheets['Sheet1']

From the above code snippet, We can get our Excel automatically opened on our desktop where we can access it.

Using Python pandas module

Pandas is an open-source Python library or module that provides in-built high-performance data structures and data analysis tools. It is most preferably used to analyze data along with two other core python libraries- Matplotlib for data visualization and NumPy for mathematical operations.

We are going to install this module in the same way as our previous module using the pip installer as follows.

C:\Users\pc> pip install pandas

The above code snippet will install the pandas module for us as follows.

Installing Panda Module

To read excel files, let’s run the following snippet of code.

# importing pandas module as pd import pandas as pd #using read_excel() method to read our excel file and storing the same in the variable named "df " df = pd.read_excel("C:\\Users\\pc\\Desktop\\students.xlsx") #printing our spreadsheet using print() method print(df)

In the above method, We are using read_excel () method to read our . xlsx file. We can use this method along with the pandas module as panda.read_excel() to read the excel file data into a DataFrame object (Here it is ‘ df ‘).

The above code snippet will print our spreadsheet as follows.

Read Excel Method

Using Python openpyxl module

Openpyxl is a Python library or module used to read or write from an Excel file. This module needs to be installed to use certain methods like load_workbook(), otherwise, we can’t use those methods, It will throw error . Let’s install this module using our command prompt.

C:\Users\pc> pip install openpyxl

The above code snippet will install our openpyxl module as follows.

Installing Openpyxl Module

In our second method, We are going to use our openpyxl module along with load_workbook() method as our following code snippet.

# importing openpyxl module import openpyxl #using load_workbook() method to read our excel file and storing to dataframe object table1 table1 = openpyxl.load_workbook("C:\\Users\\pc\\Desktop\\students.xlsx") #To access the table1 we need to activate and store to an another object (Here it is table2) table2 = table1.active for row in range(1, table2.max_row): for col in table2.iter_cols(1, table2.max_column): print(col[row].value, end = " ") print("\n")

In the above code snippet, We are using load_workbook() method to read our required excel file along with openpyxl module. We couldn’t use this method without importing this library or module. Not only this, This module is responsible for reading the location as a parameter( Here it is “C:\Users\pc\Desktop\students.xlsx” ) in our read_excel() method.

After reading our excel file and assigning it to table1, It needs to be activated. Otherwise, If we print the table1, the following output occurs.

Printing Table1

We are going to access table2 by using for loop as the above code snippet. We will get our results as follows.

Output By Method 2

Conclusion

In this article, We covered Different methods to read our Excel file using Python. We discussed some popular Modules along with some required methods of Python for our appropriate output. Hope You must have practiced and enjoyed our code snippets. We must visit again with some more exciting topics.

Источник

How to read excel (xlsx) file in python

The .xlsx is the extension of the excel document that can store a large amount of data in tabular form, and many types of arithmetic and logical calculation can be done easily in an excel spreadsheet. Sometimes it is required to read the data from the excel document using Python script for programming purposes. Many modules exist in Python to read the excel document. Some of the useful modules are xlrd, openpyxl, and pandas. The ways to use these modules to read the excel file in Python have been shown in this tutorial.

Pre-requisite:

A dummy excel file with the .xlsx extension will be required to check the examples of this tutorial. You can use any existing excel file or create a new one. Here, a new excel file named sales.xlsx file has been created with the following data. This file has used for reading by using different python modules in the next part of this tutorial.

Sales Date Sales Person Amount
12/05/18 Sila Ahmed 60000
06/12/19 Mir Hossain 50000
09/08/20 Sarmin Jahan 45000
07/04/21 Mahmudul Hasan 30000

Example-1: Read excel file using xlrd

The xlrd module is not installed with Python by default. So, you have to install the module before using it. The latest version of this module does not support the excel file with the .xlsx extension. So, you have to install the 1.2.0 version of this module to read the xlsx file. Run the following command from the terminal to install the required version of xlrd.

After completing the installation process, create a python file with the following script to read the sales.xlsx file using the xlrd module. open_workbook() function is used in the script open the xlsx file for reading. This excel file contains one sheet only. So, the workbook.sheet_by_index() function has been used in the script with the argument value 0. Next, the nested ‘for’ loop has used to read the cell values of the worksheet using the row and column values. Two range() functions have been used in the script to define the row and column size based on the sheet data. The cell_value() function has used to read the particular cell value of the sheet in each iteration of the loop. Each field in the output will be separated by one tab space.

# Import the xlrd module
import xlrd

# Open the Workbook
workbook = xlrd. open_workbook ( «sales.xlsx» )

# Open the worksheet
worksheet = workbook. sheet_by_index ( 0 )

# Iterate the rows and columns
for i in range ( 0 , 5 ) :
for j in range ( 0 , 3 ) :
# Print the cell values with tab space
print ( worksheet. cell_value ( i , j ) , end = ‘ \t ‘ )
print ( » )

Output:

The following output will appear after executing the above script.

Example-2: Read excel file using openpyxl

The openpyxl is another python module to read the xlsx file, and it is also not installed with Python by default. Run the following command from the terminal to install this module before using it.

After completing the installation process, create a python file with the following script to read the sales.xlsx file. Like the xlrd module, the openpyxl module has the load_workbook() function to open the xlsx file for reading. The sales.xlsx file is used as the argument value of this function. The object of the wookbook.active has been created in the script to read the values of the max_row and the max_column properties. These properties have been used in the nested for loops to read the content of the sales.xlsx file. The range() function has been used to read the rows of the sheet, and the iter_cols() function has been used to read the columns of the sheet. Each field in the output will be separated by two tab spaces.

# Import openyxl module
import openpyxl

# Define variable to load the wookbook
wookbook = openpyxl. load_workbook ( «sales.xlsx» )

# Define variable to read the active sheet:
worksheet = wookbook. active

# Iterate the loop to read the cell values
for i in range ( 0 , worksheet. max_row ) :
for col in worksheet. iter_cols ( 1 , worksheet. max_column ) :
print ( col [ i ] . value , end = » \t \t » )
print ( » )

Output:

The following output will appear after executing the above script.

Example-3: Read excel file using pandas

The pandas module is not installed with python-like the previous module. So, if you didn’t install it before, then you have to install it. Run the following command to install the pandas from the terminal.

After completing the installation process, create a python file with the following script to read the sales.xlsx file. The read_excel() function of pandas is used for reading the xlsx file. This function has used in the script to read the sales.xlsx file. The DataFrame() function has used here to read the content of the xlsx file in the data frame and store the values in the variable named data. The value of the data has been printed later.

# Import pandas
import pandas as pd

# Load the xlsx file
excel_data = pd. read_excel ( ‘sales.xlsx’ )
# Read the values of the file in the dataframe
data = pd. DataFrame ( excel_data , columns = [
‘Sales Date’ , ‘Sales Person’ , ‘Amount’ ] )
# Print the content
print ( «The content of the file is: \n » , data )

Output:

The following output will appear after executing the above script. The output of this script is different from the previous two examples. The row numbers are printed in the first column, where the row value has counted from 0. The date values are aligned centrally. The names of the salespersons are aligned right. The amount is aligned left.

Conclusion:

The python users need to work with xlsx files for different programming purposes. Three different ways to read the xlsx file have been shown in this tutorial by using three python modules. Each module has different functions and properties to read the xlsx file. This tutorial will help python users read the xlsx file easily using the python script after reading this tutorial.

About the author

Fahmida Yesmin

I am a trainer of web programming courses. I like to write article or tutorial on various IT topics. I have a YouTube channel where many types of tutorials based on Ubuntu, Windows, Word, Excel, WordPress, Magento, Laravel etc. are published: Tutorials4u Help.

Источник

Оцените статью