简体   繁体   English

如何在循环内将数据框追加到现有数据框

[英]how to append a dataframe to an existing dataframe inside a loop

I made a simple DataFrame named middle_dataframe in python which looks like this and only has one row of data: display of the existing dataframe And I want to append a new dataframe generated each time in a loop to this existing dataframe. 我在python中制作了一个简单的DataFrame,命名为middle_dataframe ,它看起来像这样,只有一行数据: 现有数据帧的显示,并且我想将每次循环生成的新数据帧附加到此现有数据帧。 This is my program: 这是我的程序:

    k = 2
    for k in range(2, 32021):
        header = whole_seq_data[k]
        if header.startswith('>'):
            id_name = get_ucsc_ids(header)
            (chromosome, start_p, end_p) = get_chr_coordinates_from_string(header)
        if whole_seq_data[k + 1].startswith('[ATGC]'):
            seq = whole_seq_data[k + 1]
        df_temp = pd.DataFrame(
            {
                "ucsc_id":[id_name],
                "chromosome":[chromosome],
                "start_position":[start_p],
                "end_position":[end_p],
                "whole_sequence":[seq]
            }
        )
        middle_dataframe.append(df_temp)
        k = k + 2

My iterations in the for loop seems to be fine and I checked the variables that stored the correct value after using regular expression. 我在for循环中的迭代似乎很好,并且在使用正则表达式后检查了存储正确值的变量。 But the middle_dataframe doesn`t have any changes. 但是middle_dataframe没有任何变化。 And I can not figure out why. 而且我不知道为什么。

The DataFrame.append method returns the result of the append, rather than appending in-place ( link to the official docs on append ). DataFrame.append方法返回添加的结果,而不是就地添加( 链接到append上的官方文档 )。 The fix should be to replace this line: 解决方法是替换此行:

        middle_dataframe.append(df_temp)

with this: 有了这个:

    middle_dataframe = middle_dataframe.append(df_temp)

Depending on how that works with your data, you might need also to pass in the parameter ignore_index=True . 根据数据的处理方式,您可能还需要传递参数ignore_index=True

The docs warn that appending one row at a time to a DataFrame can be more computationally intensive than building a python list and converting it into a DataFrame all at once. 文档警告说,一次将一行添加到DataFrame可能比构建python列表并将其立即转换成DataFrame的计算量更大。 That's something to look into if your current approach ends up too slow for your purposes. 如果您当前的方法最终因您的目的而变得太慢,则需要考虑这一点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM