I am currently accessing xls files within a path that I have defined by:
path = os.chdir('C:\\Users\\BKS\\Desktop\\python\\pk list')
files = os.listdir(path)
files_xls = [f for f in files if f[-3:] == 'xls']
df = [pd.read_excel(f, 'Sheet1')[['Exp. m/z','Intensity']] for f in files_xls]
Then I thought to myself, what if the xls files are organized in different folders? Is there a way to do the files_xls looping to every file within a folder? So that means, accessing folders and then looping through each file, then go to the next folder then do the same?
I wish to obtain the names of the folders that each xls files are in, and merge them to df that looks like:
Tag1 Tag2 Tag
1 1 A01.xls
2 1 A02.xls
3 2 A03.xls
4 2 A04.xls
5 3 A05.xls
These xls files will be in a folder:
'C:\\Users\\BKS\\Desktop\\python\\pk list\\20170620 Sample 1-48'
and some other xls files will be in another folder:
'C:\\Users\\BKS\\Desktop\\python\\pk list\\20170620 Sample 49-96'
These folders may have the xls files with same names, but with different data in them. So my objective is to loop through all the files within pk list folder in order and merge the names of the folders such as 20170620 Sample 1-48 to the df. Let's say for the above table, these two folders have A01 ~ A05.xls:
Tag1 Tag2 Folder Name Tag
1 1 20170620 Sample 1-48 A01.xls
2 1 20170620 Sample 1-48 A02.xls
3 2 20170620 Sample 1-48 A03.xls
4 2 20170620 Sample 1-48 A04.xls
5 3 20170620 Sample 1-48 A05.xls
1 1 20170620 Sample 49-96 A01.xls
2 1 20170620 Sample 49-96 A02.xls
3 2 20170620 Sample 49-96 A03.xls
4 2 20170620 Sample 49-96 A04.xls
5 3 20170620 Sample 49-96 A05.xls
You could import glob
and use it:
(This strictly assumes that you have only subfolders under pk list
folder and no files. Also, all files present in 1st subfolder must be present in the other subfolders)
import glob
import os
os.chdir("C:\\Users\\BKS\\Desktop\\python\\pk list\\20170620 Sample 1-48")
filenames = glob.glob("*.xls")
os.chdir("C:\\Users\\BKS\\Desktop\\python\\pk list")
foldernames = glob.glob("*")
for filename in filenames:
df = []
for foldername in foldernames:
# merge according to your requirement
df.append(pd.read_excel(f, 'Sheet1')[['Exp. m/z','Intensity']])
# Use merged 'df' here
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.