Introduction

I LOVE Excel. For many years, in many jobs, I've used it to do all kinds of things. I've always said that it is because I am inherently lazy that I spend a lot of time using the power of Excel to do all my work for me. If my boss wanted me to take a load of data and produce a report, then I would just employ Excel to consume the data and have another worksheet working hard to turn it in to something presentable, while I pretend that I'm working really hard.

But there are some occasions where it's too hard to hack around limitations of Excel (and VBA) and you need to do a bit more heavy lifting. That's where Python can help, and in particular, a Python package called xlrd

There are other ways to read Excel files using Python, e.g. with Pandas, but I may well cover those in separate articles.

Objectives

In this article I'm going to show you how to open and read an Excel Workbook (with a .xlsx extension) and read from cells, and worksheets.

Install and import xlrd

xlrd is not in Pythons Standard Library, so it needs to be installed. The full documentation can be found here.

To install xlrd we need to go to the Command Line and type:

pip install xlrd

And at the start of our Python program it can be imported by including the line

import xlrd

Opening a Workbook

When we talk about Excel files, what that means is Workbooks. Excel has a hierarchy of objects, and a file with a .xlsx extension is a Workbook. Each Workbook can contain multiple Worksheets, and each Worksheet contains cells that can be referenced by rows (indicated by a number) and columns (indicated by a letter).

With xlrd to open a Workbook, you use the open_workbook command and assign it to a variable:

workbookData = xlrd.open_workbook("myWorkbook.xlsx")

Now, the variable workbookData contains everything about that Excel workbook.

Worksheets

We probably want to find out about the Worksheets that are in this Workbook. There is a sheet_names() object which is a list of all the worksheets.

workbookSheets = workbookData.sheet_names()
print(workbookSheets)

If you know you want a worksheet named "Budget" then you can get that by typing:

budgetSheet = workbookData.sheet_by_name('Budget')

Supposing you didn't know what the worksheet was called, but you knew it was the first one in the Workbook, then you could also just reference it by its index. Worksheets are zero-indexed in xlrd, so to get the first worksheet, we would type:

budgetSheet = workbookData.sheet_by_index(0)

Once you had that sheet, you could find out information about it, like the name of the sheet:

budgetSheetName = budgetSheet.name
print(f"Sheet Name: {budgetSheetName}")

Rows and Columns

This is what our budget worksheet looks like.

Budget Worksheet

Like worksheets, rows and columns are also zero-indexed, so the third row can be referenced by typing:

thirdRow = budgetSheet.row(0)

And the first column can be referenced by:

firstCol = budgetSheet.col(0)

Both these commands return a list of all the cells, with the type (e.g. text, number, etc) and value of the cell.

Let's print them out and have a look:

print(thirdRow)
print(firstCol)

[text:'Food', number:500.0]
[text:'Budget 2020', text:'Rent', text:'Food', text:'Insurance']

If you just want the values of the cells, you can simply say:

thirdRow = budgetSheet.row_values(2)
print(thirdRow)

and this would return:

['Food', 500.0]

If you want to know how many rows and columns are present in the worksheet, nrows and ncols will help.

numRows = budgetSheet.nrows
numCols = budgetSheet.ncols

print(f"There are {numRows} rows and {numCols} columns in this sheet")

And running this, will produce this line of text:

print(f"There are {numRows} rows and {numCols} columns in this sheet")

Cells

Finally, if we know what cell we are looking for, we can simplty return the value of that one cell.

For instance, cell A1 in the sheet has the value 'Budget 2020'. Rather than reading the first row or column, and getting the first value of the list, we can simply give the row and column reference to the cell_value method.

cellA1Value = budgetSheet.cell_value(0,0)

Conclusion

This has been a whistle stop tour of xlrd. There is a lot more it can do for you, like information about dates, formatting and ranges. But to simply start you off, this short post has demonstrated how to open a workbook, find out how many worksheets it contains, and enable you to find the value and type of each cell that has a value in it.

I hope you find this useful. Comping up, I'll be writing another post on how to write data to Excel files from Python and this will help to use Python as a place to inject some seriously powerful possibilities to your data workflow.