简体   繁体   中英

How to compile multiple excel files in numeric order (file1.xls, file2.xls, etc) into one python file?

I am trying to compile several .xls files together. I found some code that works but it put in the files out of order. The files are names therm_sensor1.xls, therm_sensor2.xls, etc. I need the output to be in numeric order but my current code seems to have them scrambled. I am very new to computer coding so an explanation would be helpful :) Also my current output has all the data except for the top 6 lines. I have no idea why it is doing this.

import pandas as pd
import glob

glob.glob('therm_sensor*.xls')

all_data = pd.DataFrame()
for f in glob.glob('therm_sensor*.xls'):
    df = pd.read_excel(f)
    all_data = all_data.append(df, ignore_index=True)

print(all_data.to_string())

Output:

6    1.739592e-05  0.30           NaN

7    2.024840e-05  0.35           NaN

8    2.309999e-05  0.40           NaN

...

502  2.949562e-10  0.95           NaN

503  3.113220e-10  1.00           NaN

Is problem with order of reading or sorting post appending? For the first one a simple sort on list of files would do the trick and for later a simple solution would be to add an incremental index column

The problem here is (probably) due to the difference in the way humans and computers tend to sort things. Take a list like this:

files = ['file10.xls', 'file2.xls', 'file1.xls']

The computer sorts this list in a way that looks unintuitive to humans (because it goes 1 , 10 , 2 ):

>>> sorted(files)
['file1.xls', 'file10.xls', 'file2.xls']

But if you change the sort criteria you can get a more intuitive result. Here, that means isolating the part of the filename that contains the number and turning it into an integer so the computer can sort it correctly:

>>> sorted(files, key=lambda s: int(s[4:-4]))
['file1.xls', 'file2.xls', 'file10.xls']

In your use case, this should do the trick:

sorted(glob.glob('therm_sensor*.xls'), key=lambda s: int(s[12:-4]))

I had something similar issue, eventually, I figured out a way. So I am gonna give you the solution that worked for me. One key thing I did was name the column names before passing to dataframe. See if this helps.

fileList=glob.glob("*.csv")
    dfList=[]
    colnames=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34]
    for filename in fileList:
        print(filename)
        df=pd.read_csv(filename, header=None)
        dfList.append(df)
    concatDf=pd.concat(dfList, axis=0)
    concatDf.columns=colnames
    #concatDf.to_csv(outfile, index=None) -# You dont need this. 
concatenate()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM