简体   繁体   中英

python how to access all the files that are in different folders

I am currently accessing xls files within a path that I have defined by:

path = os.chdir('C:\\Users\\BKS\\Desktop\\python\\pk list')
files = os.listdir(path)
files_xls = [f for f in files if f[-3:] == 'xls']

df = [pd.read_excel(f, 'Sheet1')[['Exp. m/z','Intensity']] for f in files_xls]

Then I thought to myself, what if the xls files are organized in different folders? Is there a way to do the files_xls looping to every file within a folder? So that means, accessing folders and then looping through each file, then go to the next folder then do the same?

I wish to obtain the names of the folders that each xls files are in, and merge them to df that looks like:

Tag1  Tag2   Tag
1     1      A01.xls
2     1      A02.xls
3     2      A03.xls
4     2      A04.xls
5     3      A05.xls

These xls files will be in a folder:

'C:\\Users\\BKS\\Desktop\\python\\pk list\\20170620 Sample 1-48'

and some other xls files will be in another folder:

'C:\\Users\\BKS\\Desktop\\python\\pk list\\20170620 Sample 49-96'

These folders may have the xls files with same names, but with different data in them. So my objective is to loop through all the files within pk list folder in order and merge the names of the folders such as 20170620 Sample 1-48 to the df. Let's say for the above table, these two folders have A01 ~ A05.xls:

Tag1  Tag2  Folder Name             Tag
1     1     20170620 Sample 1-48    A01.xls
2     1     20170620 Sample 1-48    A02.xls
3     2     20170620 Sample 1-48    A03.xls
4     2     20170620 Sample 1-48    A04.xls
5     3     20170620 Sample 1-48    A05.xls
1     1     20170620 Sample 49-96   A01.xls
2     1     20170620 Sample 49-96   A02.xls
3     2     20170620 Sample 49-96   A03.xls
4     2     20170620 Sample 49-96   A04.xls
5     3     20170620 Sample 49-96   A05.xls

You could import glob and use it:
(This strictly assumes that you have only subfolders under pk list folder and no files. Also, all files present in 1st subfolder must be present in the other subfolders)

import glob
import os

os.chdir("C:\\Users\\BKS\\Desktop\\python\\pk list\\20170620 Sample 1-48")
filenames = glob.glob("*.xls")
os.chdir("C:\\Users\\BKS\\Desktop\\python\\pk list")
foldernames = glob.glob("*")

for filename in filenames:
    df = []
    for foldername in foldernames:
        # merge according to your requirement
        df.append(pd.read_excel(f, 'Sheet1')[['Exp. m/z','Intensity']])
    # Use merged 'df' here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM