Python read excel as list

Reading/parsing Excel (xls) files with Python [closed]

Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.

What is the best way to read Excel (XLS) files with Python (not CSV files). Is there a built-in package which is supported by default in Python to do this task?

I think we have a built-in-package in python import openpyxl I don’t know in which version they have added

13 Answers 13

I highly recommend xlrd for reading .xls files. But there are some limitations(refer to xlrd github page):

Warning

This library will no longer read anything other than .xls files. For alternatives that read newer file formats, please see http://www.python-excel.org/.

The following are also not supported but will safely and reliably be ignored:

- Charts, Macros, Pictures, any other embedded object, including embedded worksheets. - VBA modules - Formulas, but results of formula calculations are extracted. - Comments - Hyperlinks - Autofilters, advanced filters, pivot tables, conditional formatting, data validation 

Password-protected files are not supported and cannot be read by this library.

voyager mentioned the use of COM automation. Having done this myself a few years ago, be warned that doing this is a real PITA. The number of caveats is huge and the documentation is lacking and annoying. I ran into many weird bugs and gotchas, some of which took many hours to figure out.

UPDATE:

For newer .xlsx files, the recommended library for reading and writing appears to be openpyxl (thanks, Ikar Pohorský).

A little late to the party, but do you have any suggestions for libraries to overwrite an .xls file and preserve macros/pictures? I created a solution using xlrd/xlwt/xlutils and didn’t realize until the end that the macros/pictures were getting removed. I’ve used Openpyxl/XlsxWriter (for xlsx) in the past, but obviously none of these libraries are fitting the use case that I have. Does Pandas also do this since it uses the xlrd engine?

You can use pandas to do this, first install the required libraries:

$ pip install pandas openpyxl 
import pandas as pd xls = pd.ExcelFile(r"yourfilename.xls") # use r before absolute file path sheetX = xls.parse(2) #2 is the sheet number+1 thus if the file has only 1 sheet write 0 in paranthesis var1 = sheetX['ColumnName'] print(var1[1]) #1 is the row number. 

As of 2022 it appears Pandas uses openpyxl, you’ll need to pip install it: ImportError: Missing optional dependency ‘openpyxl’. Use pip or conda to install openpyxl.

You can choose any one of them http://www.python-excel.org/
I would recommended python xlrd library.

workbook = xlrd.open_workbook('your_file_name.xlsx') 
worksheet = workbook.sheet_by_name('Name of the Sheet') 
worksheet = workbook.sheet_by_index(0) 

The «read cell value» does not work. it raises a TypeError: ‘Sheet’ object is not callable. All of the rest worked great.

I think Pandas is the best way to go. There is already one answer here with Pandas using ExcelFile function, but it did not work properly for me. From here I found the read_excel function which works just fine:

import pandas as pd dfs = pd.read_excel("your_file_name.xlsx", sheet_name="your_sheet_name") print(dfs.head(10)) 

P.S. You need to have the xlrd installed for read_excel function to work

Update 21-03-2020: As you may see here, there are issues with the xlrd engine and it is going to be deprecated. The openpyxl is the best replacement. So as described here, the canonical syntax should be:

dfs = pd.read_excel("your_file_name.xlsx", sheet_name="your_sheet_name", engine="openpyxl") 

Update 03-03-2023: There are now several other options available. For example the Polars library that is written in Rust:

import polars as pl dfs = pl.read_excel("your_file_name.xlsx", sheet_name="your_sheet_name") 

Feel free to also check the PyArrow and pyodbc libraries.

@Zircoz how that should make a difference here? are any of my provided solutions not applicable to the .xls file format?

def xlsx(fname): import zipfile from xml.etree.ElementTree import iterparse z = zipfile.ZipFile(fname) strings = [el.text for e, el in iterparse(z.open('xl/sharedStrings.xml')) if el.tag.endswith('>t')] rows = [] row = <> value = '' for e, el in iterparse(z.open('xl/worksheets/sheet1.xml')): if el.tag.endswith('>v'): # Example: 84 value = el.text if el.tag.endswith('>c'): # Example: 84 if el.attrib.get('t') == 's': value = strings[int(value)] letter = el.attrib['r'] # Example: AZ22 while letter[-1].isdigit(): letter = letter[:-1] row[letter] = value value = '' if el.tag.endswith('>row'): rows.append(row) row = <> return rows 

Improvements added are fetching content by sheet name, using re to get the column and checking if sharedstrings are used.

def xlsx(fname,sheet): import zipfile from xml.etree.ElementTree import iterparse import re z = zipfile.ZipFile(fname) if 'xl/sharedStrings.xml' in z.namelist(): # Get shared strings strings = [element.text for event, element in iterparse(z.open('xl/sharedStrings.xml')) if element.tag.endswith('>t')] sheetdict = < element.attrib['name']:element.attrib['sheetId'] for event,element in iterparse(z.open('xl/workbook.xml')) if element.tag.endswith('>sheet') > rows = [] row = <> value = '' if sheet in sheets: sheetfile = 'xl/worksheets/sheet'+sheets[sheet]+'.xml' #print(sheet,sheetfile) for event, element in iterparse(z.open(sheetfile)): # get value or index to shared strings if element.tag.endswith('>v') or element.tag.endswith('>t'): value = element.text # If value is a shared string, use value as an index if element.tag.endswith('>c'): if element.attrib.get('t') == 's': value = strings[int(value)] # split the row/col information so that the row leter(s) can be separate letter = re.sub('\d','',element.attrib['r']) row[letter] = value value = '' if element.tag.endswith('>row'): rows.append(row) row = <> return rows 

Источник

Import Excel file in to Python as a list

I want to import one column with 10 rows in to Python as a list. So I have in excel for example: One, Two, Three, Four. Ten Everything written in column A over row 1-10. Now I want to import these cells into Python, so that my result is:

list = ['One', 'Two', 'Three', 'Four', . 'Ten'] 

Since I am a total noob in programming, I have no clue how to do it. So please tell me the most easiest way. All tutorials I have found, did’t got me the result I want. Thank you I am using Python 2.7

Easiest way is to use pandas . Highlight the column entries and copy. Then use l = pd.to_clipboard().values Note that you will need a title for your column to make it work really easily. Also don’t assign variables to list this will prevent you from using the function list(stuff)

Thanks for your help! Unfortunately I get an Attribute Error back: ‘module’ object has no attribute ‘to_clipboard’ I also tried it with ‘df = pd.read_excel(‘Test.xlsx’, sheetname=’Tabelle1′)’ but then I get a DataFrame and not a list.

3 Answers 3

Even though pandas is a great library, for your simple task you can just use xlrd:

import xlrd wb = xlrd.open_workbook(path_to_my_workbook) ws = wb.sheet_by_index(0) mylist = ws.col_values(0) 

Note that list is not a good name for a variable in Python, because that is the name of a built-in function.

I am unsure if your data is in xlsx form or CSV form. If XLSX, use this Python Excel tutorial. If CSV, it is much easier, and you can follow the code snippet below. If you don’t want to use pandas, you can use the numpy library. Use the example code snippet below for taking the top row of a CSV file:

import numpy as np csv_file = np.genfromtxt('filepath/relative/to/your/script.csv', delimiter=',', dtype=str) top_row = csv_file[:].tolist() 

This will work for a file that has only one column of text. If you have more columns, use the following snippet to just get the first column. The ‘0’ indicates the first column.

I recommend installing pandas.

import pandas df = pandas.read_excel('path/to/data.xlsx') # The options of that method are quite neat; Stores to a pandas.DataFrame object print df.head() # show a preview of the loaded data idx_of_column = 5-1 # in case the column of interest is the 5th in Excel print list(df.iloc[:,idx_of_column]) # access via index print list(df.loc[['my_row_1','my_row_2'],['my_column_1','my_column_2']]) # access certain elements via row and column names print list(df['my_column_1']) # straight forward access via column name 
from xlrd import open_workbook wb = open_workbook('simple.xls') for s in wb.sheets(): print 'Sheet:',s.name for row in range(s.nrows): values = [] for col in range(s.ncols): values.append(s.cell(row,col).value) print ','.join(values) 

Источник

how to read an excel file and convert the content to a list of lists in python?

I have this data in an excel file (each line in a cell):

#module 0 size: 9 bs: 2.27735e-08 1 35 62 93 116 167 173 176 182 #module 1 size: 5 bs: 0.00393944 2 11 29 128 130 #module 2 size: 13 bs: 1.00282e-07 8 19 20 25 26 58 67 132 150 153 185 187 188 

I want to read the data from the excel file and make a list of lists out of the even lines.
desired output:

[[1,35,62,93,116,167,173,176,182], [2,11,29,128,130], [8,19,20,25,26,58,67,132,150,153,185,187,188]] 

Please supply the expected [minimal, complete, verifiable example]( stackoverflow.com/help/minimal-reproducible-example). Since each stop you’ve listed is covered by existing tutorials and other documentation, we need your specific coding problem to understand where you’re stuck.

4 Answers 4

Look into OpenPyXL, I use it often to work with complex workbooks at my job. Once imported, rows in the workbook can be appended to lists like so:

for row in worksheet.rows: rowValuesList.append(row) 

Each cell being it’s own value in the list. Then you could append rowValuesList to a master list to create your list of lists.

The Library ‘xlrd’ is perfect for manipulating excel files.

import xlrd def main(): # Path to excel file file_path = ('PATH_TO_FILE') # Import complete excel workbook excel_workbook = xlrd.open_workbook(file_path) # Import specific sheet by index excel_sheet = excel_workbook.sheet_by_index(0) # Create array for each row relevantData = [] # Loop through each row of excel sheet for row in range(excel_sheet.nrows): #nrows returns number of rows # If even if row % 2 != 0: # Convert row to array and append to relevantData array relevantData.append(rowToArray(row)) print(relevantData) def rowToArray(row): """ excel_sheet.cell_value(row,0) -> Get the data in the row defined .split() -> returns list of string, spilt at the white spaces, map(int, <>) -> map all values in list to integers lits(map(<>)) -> reconverts result into a list """ return list(map(int, excel_sheet.cell_value(row,0).split())) main() 
[[1, 35, 62, 93, 116, 167, 173, 176, 182], [2, 11, 29, 128, 130], [8, 19, 20, 25, 26, 58, 67, 132, 150, 153, 185, 187, 188]] 

Источник

Читайте также:  Моя первая страница с фреймами
Оцените статью