Paperwork

3 Ways to Compare Excel Sheets with Python Code

3 Ways to Compare Excel Sheets with Python Code
How To Compare 2 Excel Sheets Using Python

Comparing Excel sheets is an essential task for anyone working with data analysis, financial auditing, or data management. Python, with its powerful libraries like pandas and openpyxl, provides various tools to automate this process, making it efficient and less error-prone. This blog post will guide you through three different methods to compare Excel sheets using Python, tailored to different needs and data complexities.

Method 1: Using Pandas for Quick Comparisons

How To Compare Data In Two Excel Sheets

Pandas is one of the most popular Python libraries for data manipulation and analysis. Here’s how you can use it to compare two Excel files:

  1. Load the Files: Use pandas to read both Excel files into DataFrame objects.
  2. import pandas as pd
    
    df1 = pd.read_excel('file1.xlsx')
    df2 = pd.read_excel('file2.xlsx')
  3. Compare DataFrames: Directly compare the two DataFrames to find differences. Here’s a simple way to do that:
  4. # Find rows that differ
    diff_df = pd.concat([df1, df2]).drop_duplicates(keep=False)
    
    print("Differences:")
    print(diff_df)

    This will list out all the rows that are unique to either file.

  5. Additional Options: You can further refine the comparison by checking for differences in specific columns or by applying conditions.

⚠️ Note: Remember that the structure (column names and order) of the sheets should be identical for straightforward comparisons.

Method 2: Detailed Comparison with Openpyxl

Compare Excel Worksheets

For a more detailed analysis where you need to compare not just data but also formatting or cell properties, openpyxl comes into play:

  1. Load Workbooks: First, load both Excel files using openpyxl.
  2. from openpyxl import load_workbook
    
    wb1 = load_workbook('file1.xlsx')
    wb2 = load_workbook('file2.xlsx')
  3. Iterate Through Sheets: Loop through the sheets of both workbooks to compare cell by cell.
  4. sheet1 = wb1.active
    sheet2 = wb2.active
    
    for row in range(1, max(sheet1.max_row, sheet2.max_row) + 1):
        for col in range(1, max(sheet1.max_column, sheet2.max_column) + 1):
            cell1 = sheet1.cell(row=row, column=col)
            cell2 = sheet2.cell(row=row, column=col)
            if cell1.value != cell2.value:
                print(f"Difference at row {row}, column {col}: {cell1.value} vs {cell2.value}")
  5. Visualize Differences: You can even visualize differences in cell formats, styles, etc., by comparing cell properties.

🎯 Note: Openpyxl requires more computational power and time for comparisons due to its detailed nature.

Method 3: Automated Testing with xlrd and xlwt

Python In Excel Excel Python Integration Explained Dataconomy

For those who are involved in automated testing or need to integrate comparisons into CI/CD pipelines, using xlrd to read and xlwt to write Excel files can be quite effective:

  1. Install Dependencies: Ensure you have xlrd and xlwt installed.
  2. Read and Compare: Open the Excel files and compare them programmatically.
  3. import xlrd
    from xlutils.copy import copy
    
    book1 = xlrd.open_workbook('file1.xls')
    book2 = xlrd.open_workbook('file2.xls')
    
    for sheet in book1.sheet_names():
        sheet1 = book1.sheet_by_name(sheet)
        sheet2 = book2.sheet_by_name(sheet)
        
        for row_idx in range(sheet1.nrows):
            for col_idx in range(sheet1.ncols):
                cell1 = sheet1.cell(row_idx, col_idx)
                cell2 = sheet2.cell(row_idx, col_idx)
                if cell1.value != cell2.value:
                    print(f"Difference at {sheet}, row {row_idx + 1}, col {col_idx + 1}")
  4. Generate Reports: Use xlwt to create a new Excel file highlighting differences.

💡 Note: This method is particularly useful for creating automated tests that need to produce reports or logs of differences.

In conclusion, Python offers several effective methods for comparing Excel sheets, catering to different levels of detail and requirements. Whether you're looking for a quick check with pandas, a detailed comparison with openpyxl, or integration into automated tests, Python's libraries make the process seamless and efficient.

Can I compare sheets from different Excel files?

How To Compare Excel Worksheets For Differences Masjapanese
+

Yes, all the methods discussed above allow for comparison across different Excel files, as long as the structure or part of the structure remains consistent.

What if my sheets have different headers or structures?

Writing And Reading Excel Data From Python Postnetwork Academy
+

You would need to either manually or programmatically align the headers or use more advanced comparison techniques like matching columns by data type or content.

Is there a way to automate comparing multiple sheets in Python?

Python Vs Excel What Should You Learn
+

Yes, by looping through each sheet in the workbooks, you can automate the comparison process. Openpyxl and xlrd/xlwt are particularly useful for this due to their ability to iterate through sheets.

Related Articles

Back to top button