I am trying to compile several .xls files together. I found some code that works but it put in the files out of order. The files are names therm_sensor1.xls, therm_sensor2.xls, etc. I need the output to be in numeric order but my current code seems to have them scrambled. I am very new to computer coding so an explanation would be helpful :) Also my current output has all the data except for the top 6 lines. I have no idea why it is doing this.
import pandas as pd
import glob
glob.glob('therm_sensor*.xls')
all_data = pd.DataFrame()
for f in glob.glob('therm_sensor*.xls'):
df = pd.read_excel(f)
all_data = all_data.append(df, ignore_index=True)
print(all_data.to_string())
Output:
6 1.739592e-05 0.30 NaN
7 2.024840e-05 0.35 NaN
8 2.309999e-05 0.40 NaN
...
502 2.949562e-10 0.95 NaN
503 3.113220e-10 1.00 NaN
Is problem with order of reading or sorting post appending? For the first one a simple sort on list of files would do the trick and for later a simple solution would be to add an incremental index column
The problem here is (probably) due to the difference in the way humans and computers tend to sort things. Take a list like this:
files = ['file10.xls', 'file2.xls', 'file1.xls']
The computer sorts this list in a way that looks unintuitive to humans (because it goes 1
, 10
, 2
):
>>> sorted(files)
['file1.xls', 'file10.xls', 'file2.xls']
But if you change the sort criteria you can get a more intuitive result. Here, that means isolating the part of the filename that contains the number and turning it into an integer so the computer can sort it correctly:
>>> sorted(files, key=lambda s: int(s[4:-4]))
['file1.xls', 'file2.xls', 'file10.xls']
In your use case, this should do the trick:
sorted(glob.glob('therm_sensor*.xls'), key=lambda s: int(s[12:-4]))
I had something similar issue, eventually, I figured out a way. So I am gonna give you the solution that worked for me. One key thing I did was name the column names before passing to dataframe. See if this helps.
fileList=glob.glob("*.csv")
dfList=[]
colnames=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34]
for filename in fileList:
print(filename)
df=pd.read_csv(filename, header=None)
dfList.append(df)
concatDf=pd.concat(dfList, axis=0)
concatDf.columns=colnames
#concatDf.to_csv(outfile, index=None) -# You dont need this.
concatenate()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.