简体   繁体   English

如何在python中使用for循环创建多个数据帧

[英]How do I create multiple data frames using a for loop in python

I'm trying to make multiple dataframes that are subsets of existing dataframes. 我正在尝试制作多个数据框,这些数据框是现有数据框的子集。

I have df_list which is actually a list of datasets: 我有df_list实际上是数据集列表:

df_list = [df1B, df2B, df3B, df4B, df5B, df6B, df7B, df8B, df9B, df10B, df11B, df12B, df13B, df14B, df15B, df16B, df17B, df18B, df19B, df20B, df21B, df22B, df23B, df24B, df25B, df26B, df27B, df28B, df30B, df31B, df32B, df33B, df34B, df35B]

If I want to make a subset of a single data set I do this and it works: 如果要创建单个数据集的子集,请执行此操作,并且可以:

df2B = df2B.groupby(['Location']).get_group(36)

It takes all locations with number 36, but when I try to do it for all the data sets in a for loop it doesn't work 它使用编号为36的所有位置,但是当我尝试对for循环中的所有数据集执行操作时,它不起作用

for df in df_list:
    df = df.groupby(['Location']).get_group(36)

But this is not making it for each dataset. 但这并不是每个数据集都能做到的。 It doesn't show any error message but it doesn't do anything else either :( 它没有显示任何错误消息,但也没有执行其他任何操作:(

Should I just write the same line 35 times ??? 我应该只在同一行上写35次吗? I hope I have a better option. 我希望我有一个更好的选择。

If I understand correctly, you can use a list comprehension for this: 如果我理解正确,您可以为此使用list理解:

subset_df_list = [df.groupby('Location').get_group(36) for df in df_list]

As an aside, your for loop doesn't work because you just keep assigning back to df . 顺便说一句,您的for循环不起作用,因为您只是继续分配回df You probably want this, which is also the equivalent of the above comprehension: 您可能需要这样做,这也等同于上述理解:

subset_df_list = []

for df in df_list:
    subset_df = df.groupby('Location').get_group(36)
    subset_df_list.append(subset_df)
df = [pd.DataFrame({'Location': np.random.randint(0,5,size=(100))}) for i in range(10)]
df = list(map(lambda x: x.groupby('Location').get_group(1), df))

You're assigning to your loop variable, which is then thrown away on the next go around. 您将分配给循环变量,然后将其丢弃。 DataFrame.append isn't inplace , and doesn't have an inplace parameter. DataFrame.append不是inplace ,并且不具有inplace参数。 Instead: 代替:

df1 = pd.DataFrame({'gr': [1,1,2,2], 'v': [1,2,3,2]})
df2 = pd.DataFrame({'gr': [1,1,2,2], 'v': [6,5,4,3]})
df_combined = pd.DataFrame({'gr': [], 'v':[]})
df_combined
Empty DataFrame
Columns: [gr, v]
Index: []
for df in [df1, df2]:
    df_combined = df_combined.append(df.groupby('gr').get_group(1))
df_combined
#     gr    v
# 0  1.0  1.0
# 1  1.0  2.0
# 0  1.0  6.0
# 1  1.0  5.0

Unless you want a list of DataFrames, which it suddenly seems like you do. 除非您想要一个DataFrames列表,否则突然看起来就像您这样做。 (I was thrown by df.append() . For a list , append adds to the end in place. For a DataFrame, it does not. In the list case, you want: (我被df.append()抛出。对于listappend添加到末尾。对于DataFrame,则不添加。在列表的情况下,您需要:

# setup as before
combined_dfs = []
for df in [df1, df2]:
    combined_dfs = df_combined.append(df.groupby('gr').get_group(1))

It's a funny way to use DataFrames, but there ya go! 这是使用DataFrames的一种有趣的方式,但是可以! :D :D

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM