How to import all fields from xls as strings into a Pandas dataframe?

Question

I am trying to import a file from xlsx into a Python Pandas dataframe. I would like to prevent fields/columns being interpreted as integers and thus losing leading zeros or other desired heterogenous formatting.

So for an Excel sheet with 100 columns, I would do the following using a dict comprehension with range(99).

import pandas as pd
filename = 'C:\DemoFile.xlsx'

fields = {col: str for col in range(99)}

df = pd.read_excel(filename, sheetname=0, converters=fields)

These import files do have a varying number of columns all the time, and I am looking to handle this differently than changing the range manually all the time.

Does somebody have any further suggestions or alternatives for reading Excel files into a dataframe and treating all fields as strings by default?

Many thanks!

Answer 1

Use dtype=str when calling .read_excel()

import pandas as pd
filename = 'C:\DemoFile.xlsx'

df = pd.read_excel(filename, dtype=str)

Answer 2

Try this:

xl = pd.ExcelFile(r'C:\DemoFile.xlsx')
ncols = xl.book.sheet_by_index(0).ncols
df = xl.parse(0, converters={i : str for i in range(ncols)})

UPDATE:

In [261]: type(xl)
Out[261]: pandas.io.excel.ExcelFile

In [262]: type(xl.book)
Out[262]: xlrd.book.Book

Answer 3

the usual solution is:

read in one row of data just to get the column names and number of columns
create the dictionary automatically where each columns has a string type
re-read the full data using the dictionary created at step 2.

How to import all fields from xls as strings into a Pandas dataframe?

Question

3 answers

solution1
1 2022-02-28 09:32:14

solution2
0 ACCPTED 2017-01-25 23:03:00

solution3
-1 2017-01-25 22:31:32

How to import all fields from xls as strings into a Pandas dataframe?

Question

3 answers

solution1 1 2022-02-28 09:32:14

solution2 0 ACCPTED 2017-01-25 23:03:00

solution3 -1 2017-01-25 22:31:32

solution1
1 2022-02-28 09:32:14

solution2
0 ACCPTED 2017-01-25 23:03:00

solution3
-1 2017-01-25 22:31:32