Read multiple excel files in order from folder with Python

Question

I have a folder with several excel files in the format xls and xlsx and I am trying to read them and concatenate them in one single Dataframe. The problem that I am facing is that python does not read the files in the folder in the correct order.

My folder contains the following files: 190.xls , 195.xls , 198.xls , 202.xlsx , 220.xlsx and so on

This is my code:

import pandas as pd
from pathlib import Path

my_path = 'my_Dataset/'

xls_files = pd.concat([pd.read_excel(f2) for f2 in Path(my_path).rglob('*.xls')], sort = False)

xlsx_files = pd.concat([pd.read_excel(f1) for f1 in Path(my_path).rglob('*.xlsx')],sort = False)

all_files = pd.concat([xls_files,xlsx_files],sort = False).reset_index(drop=True))

I get what I want but the FILES ARE NOT CONCATENATED IN ORDER AS THEY WERE IN THE FOLDER!!!!! meaning that in the all_files Dataframe I first have data from 202.xlsx and then from 190.xls

How can I solve this problem? Thank you in advance!

Answer 1

Try using

import pandas as pd
from pathlib import Path

my_path = 'my_Dataset/'
all_files = pd.concat([pd.read_excel(f) for f in sorted(list(Path(my_path).rglob('*.xls')) + list(Path(my_path).rglob('*.xlsx')), key=lambda x: int(x.stem))],sort = False).reset_index(drop=True)

Answer 2

Update this

all_files = pd.concat([xls_files,xlsx_files],sort = False).reset_index(drop=True))

to this

all_files = pd.concat([xlsx_files,xls_files],sort = False).reset_index(drop=True))

Read multiple excel files in order from folder with Python

Question

2 answers

solution1
2 ACCPTED 2020-02-24 09:55:57

solution2
0 2020-02-24 09:53:55

Read multiple excel files in order from folder with Python

Question

2 answers

solution1 2 ACCPTED 2020-02-24 09:55:57

solution2 0 2020-02-24 09:53:55

solution1
2 ACCPTED 2020-02-24 09:55:57

solution2
0 2020-02-24 09:53:55