简体   繁体   English

遍历数据帧列表并将值附加到不同的数据帧

[英]Looping over a list of dataframes and appending values to different dataframes

I have a list of files and a list of Dataframes, and I want to use 1 "for" loop to open the first file of the list, extract some data and write it into the first Dataframe, then open the second file, do the same thing and write it into the second dataframe, etc. So I wrote this:我有一个文件列表和一个数据帧列表,我想使用一个“for”循环打开列表的第一个文件,提取一些数据并将其写入第一个 Dataframe,然后打开第二个文件,执行同样的事情并将其写入第二个 dataframe 等。所以我写了这个:

import pandas as pd
filename1 = 'file1.txt'
filename2 = 'file2.txt'

filenames = [filename1, filename2]

columns = ['a', 'b', 'c']

df1 = pd.DataFrame(columns = columns)
df2 = pd.DataFrame(columns = columns)

dfs = [df1, df2]

for name, df in zip(filenames, dfs):
    info = open(name, 'r')
    # go through the file, find some values
    df = df.append({'''dictionary with found values'''})

However, when I run the code, instead of having my data written into the df1 and df2, which I created in the beginning, those dataframes stay empty, and a new dataframe appears in the list of variables, called df, where my data is stored, also it seems to be re-written at every execution of the loop... How do I solve this in the simplest way?但是,当我运行代码时,我的数据并没有写入我在开始时创建的 df1 和 df2,而是这些数据帧保持为空,并且新的 dataframe 出现在变量列表中,称为 df,我的数据是存储,而且它似乎在每次循环执行时都被重写......我如何以最简单的方式解决这个问题? The main goal is to have several different dataframes, each corresponding to a different file, in the end of the loop over the list of files.主要目标是在文件列表循环结束时有几个不同的数据帧,每个数据帧对应一个不同的文件。 So I don't really care when and how the dataframes are created, I only want a new dataframe to be filled with values when a new file is open.所以我真的不在乎何时以及如何创建数据框,我只希望在打开新文件时用新的 dataframe 填充值。

Each time you loop through dfs, df is actually a copy of the DataFrame object, not the actual object you created.每次循环 dfs 时,df 实际上是 DataFrame object 的副本,而不是您创建的实际 object。 Thus, when you assign a new DataFrame to df, the result is assigned to a new variable.因此,当您将新的 DataFrame 分配给 df 时,结果将分配给一个新变量。 Re-write your code like this:像这样重写你的代码:

dfs = []

for name in filenames:
    with open(name, 'r') as info:
        dfs.append(pd.read_csv(info))

The reason for this is that Python doesn't know you're trying to re-assign the variable names "df1" and "df2".这样做的原因是 Python 不知道您正在尝试重新分配变量名称“df1”和“df2”。 The list you declare "dfs" is simply a list of two empty dataframes.您声明“dfs”的列表只是两个空数据框的列表。 You never alter that list after creation, so it remains a list of two empty dataframes, which happen to individually be referenced as "df1" and "df2".创建后您永远不会更改该列表,因此它仍然是两个空数据框的列表,它们恰好分别被引用为“df1”和“df2”。

I don't know how you're constructing a DF from the file, so I'm just going to assume you have a function somewhere called make_df_from_file(filename) that handles the open() and parsing of a CSV, dict, whatever.我不知道你是如何从文件构建 DF 的,所以我假设你有一个名为make_df_from_file(filename)的 function 来处理open()和 CSV、dict 等的解析。

If you want to have a list of dataframes, it's easiest to just declare a list and add them one at a time, rather than trying to give each DF a separate name:如果你想要一个数据帧列表,最简单的方法是声明一个列表并一次添加一个,而不是试图给每个 DF 一个单独的名称:

df_list = []

for name in filenames:
   df_list.append(make_df_from_file(name))

If you want to get a bit slicker (and faster) about it, you can use a list comprehension which combines the previous script into a single line:如果你想更流畅(更快)一点,你可以使用列表推导,将前面的脚本组合成一行:

df_list = [make_df_from_file(name) for name in filenames]

To reference individual dataframes in that list, you get just pull them out by index as you would any other list:要引用该列表中的单个数据框,您只需像其他任何列表一样按索引将它们拉出:

df1 = df_list[0]
df2 = df_list[1]
...

but that's often more trouble than it's worth.但这往往比它的价值更麻烦。

If you want to then combine all the DFs into a single one,pandas.concat() is your friend:如果你想所有的 DFs 组合成一个,pandas.concat()是你的朋友:

from pandas import concat
dfs = concat(df_list)

or, if you don't care about df_list other than as an intermediate step:或者,如果你不关心df_list除了作为一个中间步骤:

from pandas import concat
dfs = concat([make_df_from_file(name) for name in filenames])

And if you absolutely need to give separate names to all the dataframes, you can get ultra-hacky with it.而且,如果您绝对需要为所有数据框提供单独的名称,那么您可能会变得非常 hacky。 (Seriously, you shouldn't normally do this, but it's fun and awful. See this link for more bad ideas along these lines.) (说真的,你通常不应该这样做,但它既有趣又糟糕。请参阅此链接了解更多关于这些方面的坏主意。)

for n, d in enumerate(dfs):
    locals()[f'df{n+1}'] = d

If the text files are dictionaries or can be converted to dictionaries with keys: a, b, and c, after reading;如果文本文件是字典或可以转换为字典,键为:a、b、c,阅读后; just like the dataframes columns you created (a, b, c).就像您创建的数据框列(a、b、c)一样。 Then they can be assigned this way然后他们可以这样分配

import pandas as pd
filename1 = 'file1.txt'
filename2 = 'file2.txt'

filenames = [filename1, filename2]

columns = ['a', 'b', 'c']

df1 = pd.DataFrame(columns = columns)
df2 = pd.DataFrame(columns = columns)

dfs = [df1, df2]

for name, df in zip(filenames, dfs):
    with open(name, 'r') as info:
        for key in info.keys():
           df[key] = info[key]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM