5 Simple Ways to Read Excel in Python
When working with data analysis or processing tasks in Python, one of the most common operations is dealing with Excel spreadsheets. Excel files, known for their widespread use in various industries, provide an effective way to store tabular data. Python, with its extensive ecosystem of libraries, offers multiple ways to interact with these files. In this blog post, we'll explore five simple ways to read Excel files in Python, ensuring you have the tools to handle your data efficiently.
1. Using openpyxl
The openpyxl library is one of the most straightforward methods to read Excel files (.xlsx specifically). Here’s how you can do it:
from openpyxl import load_workbook # Load an existing workbook wb = load_workbook(filename = 'example.xlsx') # Choose the active sheet sheet = wb.active # Iterate through rows and columns for row in sheet.iter_rows(min_row=1, max_col=3, max_row=2): for cell in row: print(cell.value)
Openpyxl is particularly useful if you need to read and write Excel 2010 xlsx/xlsm/xltx/xltm files without installing Microsoft Office.
💡 Note: openpyxl doesn't require additional software like Microsoft Excel to be installed. It reads the xlsx file directly, which makes it very portable and convenient for deployment.
2. Using pandas
While pandas is known for its data manipulation capabilities, it's also highly effective for reading Excel files:
import pandas as pd # Read Excel file into a DataFrame df = pd.read_excel('example.xlsx', sheet_name='Sheet1') # Display the DataFrame print(df)
With pandas, you can manipulate the data frame with ease, perform operations like filtering, sorting, and even save back into Excel or other formats.
3. Using xlrd
The xlrd library provides an option to read Excel files (.xls and .xlsx) and offers functionalities similar to openpyxl:
import xlrd # Open the workbook and select the first sheet wb = xlrd.open_workbook('example.xls') sh = wb.sheet_by_index(0) # Iterate through rows and columns for rownum in range(sh.nrows): for colnum in range(sh.ncols): print(sh.cell_value(rownum, colnum))
4. Python-Excelerator for Legacy Files
Python-Excelerator is mainly used for reading older Excel files (.xls):
from pyExcelerator import * # Open the Excel file book = open_workbook('example.xls') # Get the first sheet sheet = book.sheet_by_index(0) # Read row by row for row in range(sheet.nrows): row_values = [cell.value for cell in sheet.row(row)] print(row_values)
📌 Note: Python-Excelerator is suitable for older .xls files, but it's less maintained compared to openpyxl or pandas, and might not support all features of newer Excel formats.
5. Using xlsxwriter for Reading and Writing
While xlsxwriter is primarily designed for writing Excel files, you can use it in conjunction with other libraries to handle reading operations:
import pandas as pd from xlsxwriter import Workbook # Read Excel file df = pd.read_excel('example.xlsx') # Create a new Excel file and write data with pd.ExcelWriter('output.xlsx', engine='xlsxwriter') as writer: df.to_excel(writer, sheet_name='Sheet1') # Print dataframe content print(df)
💼 Note: xlsxwriter’s primary purpose is to write to Excel files, but reading can be managed through pandas, which can then use xlsxwriter to write back or modify the data.
To recap, we've explored five different libraries for reading Excel files in Python:
- openpyxl for handling .xlsx files with easy manipulation.
- pandas for powerful data analysis and manipulation capabilities.
- xlrd for reading both .xls and .xlsx files.
- Python-Excelerator for legacy Excel support.
- xlsxwriter for a combination of reading with pandas and writing capabilities.
Each library serves a slightly different purpose, but together they provide a robust set of tools for dealing with Excel files in Python. Depending on your specific needs, be it simplicity, compatibility, or advanced data manipulation, one of these methods should fit your requirements perfectly.
What are the main differences between openpyxl, pandas, and xlrd?
+
openpyxl focuses on reading and manipulating xlsx files directly, pandas provides high-level data manipulation tools, and xlrd can handle both .xls and .xlsx files, but with limited manipulation capabilities compared to pandas or openpyxl.
Can I read very large Excel files with these libraries?
+
Yes, pandas and openpyxl can handle large Excel files, but performance might vary based on your machine’s specifications. For extremely large files, consider using chunks or optimizing your code.
Which library is best for writing data back to Excel?
+
For writing data, xlsxwriter is very efficient and allows for formatting, but pandas with ‘xlsxwriter’ as the engine provides a good compromise between reading and writing capabilities.