简体   繁体   English

如何将 for 循环中的 append.pkl 文件转换为在 for 循环中创建的 pandas dataframe?

[英]How to append .pkl files in for loop to pandas dataframe created in for loop?

I have a seemingly simple piece of code but somehow it is not working.我有一段看似简单的代码,但不知何故它不起作用。 The goal of the code is to find all pickle data in a folder, load the first one in a for loop as a pandas dataframe which is named under a variable which did not exist before, if the variable exists, it should load the remaining pickle files as pandas and append them to the newly created pandas dataframe from the first loop:代码的目标是在一个文件夹中找到所有的pickle数据,在for循环中加载第一个作为pandas dataframe,它在一个以前不存在的变量下命名,如果变量存在,它应该加载剩余的pickle文件为 pandas 和 append 到新创建的 pandas Z6A8064B5DF47945555500553C47 从第一个循环:

import pandas as pd
import os

# Creating the first Dataframe using dictionary 
df1  = pd.DataFrame({"a":[1, 2, 3, 4], 
                         "b":[5, 6, 7, 8]}) 
  
# Creating the Second Dataframe using dictionary 
df2 = pd.DataFrame({"a":[1, 2, 3], 
                    "b":[5, 6, 7]}) 


df1.append(df2) 

works fine printing:印刷精美:

    a   b
0   1   5
1   2   6
2   3   7
3   4   8
0   1   5
1   2   6
2   3   7

However when I try to append the dataframes from my stored pickle files in a for loop it does not print an error but it only works for the first dataframe:但是,当我尝试 append 在 for 循环中存储的泡菜文件中的数据帧时,它不会打印错误,但它仅适用于第一个 dataframe:

df1.to_pickle("DF1.pkl")
df2.to_pickle("DF2.pkl")

files = [f for f in os.listdir('.') if os.path.isfile(f)]
#The line above should produce the line below
files=["DF1.pkl", "DF2.pkl"]

for i in files:
    if ".pkl" in i:
        if "ALL_DATA" not in globals():
            ALL_DATA=pd.read_pickle(i)
        else:
            ALL_DATA.append(pd.read_pickle(i))

which only prints:仅打印:

a   b
0   1   5
1   2   6
2   3   7
3   4   8

Who can help me clarify?谁能帮我澄清一下?

DataFrame.append returns a new object so though you call ALL_DATA.append(pd.read_pickle(i)) as you never write that back to ALL_DATA those changes are discarded. DataFrame.append返回一个新的 object 所以尽管你调用ALL_DATA.append(pd.read_pickle(i))因为你永远不会丢弃这些更改写回 ALL_。 You need to assign the changes back:您需要重新分配更改:

ALL_DATA = ALL_DATA.append(pd.read_pickle(i))

However, appending in a loop is inefficient as it will copy data upon every iteration so you should avoid it.但是,在循环中追加是低效的,因为它会在每次迭代时复制数据,所以你应该避免它。 Instead, append to a list, which is fast, and then concat once after the loop.相反, concat到一个列表,这很快,然后在循环后连接一次。

l = [] # Holds everything you may possibly append
for i in files:
    if ".pkl" in i:
        if "ALL_DATA" not in globals():
            ALL_DATA=pd.read_pickle(i)
        else:
            l.append(pd.read_pickle(i)) # List append which modifies `l`

# Create df from ALL_DATA and everything that you append
ALL_DATA = pd.concat([ALL_DATA, *l])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM