简体   繁体   English

循环以从两个列表创建数据框

[英]Loop to create dataframes from two lists

I am trying to read in multiple files in a directory into individual dataframes, but I need to make the name of each dataframe a substring from the original filename. 我试图将目录中的多个文件读入单个数据帧,但是我需要使每个数据帧的名称成为原始文件名的子字符串。

# master list of substrings to look for in filename
sub_list = ['ABC', 'DEF', 'GHI', 'JKL', 'MNO', 'PQR']

# set path
path = 'C:/Users/my_user/Desktop/my_folder'

# get list of files with full path
files = glob.glob(os.path.join(path, '*.xlsx'))

# empty list for extracted substrings
df_names = []

Below is how I am extracting the substrings from the filename 下面是我从文件名中提取子字符串的方法

for filename in files:
    if any(sub in filename for sub in sub_list):
        name = [sub_str for sub_str in sub_list if(sub_str in filename)]
        helper = '' # empty string to join with list element to convert to string
        name = helper.join(name) # convert list element to a string
        df_names.append(name)

I iterate over the df_names list to create dataframes 我遍历df_names列表以创建数据df_names

for name in (df_names):
    exec('{} = pd.DataFrame()'.format(name))

However I'm not sure how to add the actual data to these dataframes. 但是我不确定如何将实际数据添加到这些数据框中。 I assume there is another way to do this, but haven't been able to figure out how. 我认为还有另一种方法可以做到这一点,但还没有弄清楚如何做。 Maybe using dictionaries? 也许使用字典?

I've tried the following, but this overwrites all previous names and leaves me with one dataframe named name . 我尝试了以下操作,但这会覆盖所有以前的名称,并留下一个名为name数据框。

for name, file in zip(df_names, files):
    name = pd.read_excel(file)

Have you considered storing your dataframes in a dictionary instead of in a list? 您是否考虑过将数据帧存储在字典中而不是列表中?

Instead of: 代替:

for name, file in zip(df_names, files):
    name = pd.read_excel(file)

You could use: 您可以使用:

dfs = {}

for name, file in zip(df_names, files):
    dfs[name] = pd.read_excel(file)

You could then get the dataframe for file 'ABC' (assuming 'ABC' is a filename) like this: 然后,您可以获取文件“ ABC”的数据框(假设“ ABC”是文件名),如下所示:

dfs['ABC']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM