Python Pandas追加数据框

Question

我有一种情况，我正在将UUID列添加到.csv文件。 同时，我正在检查源文件并将它们与已处理的文件进行比较-如果源文件中还有其他行，我计划将这些新行附加到目标文件中。 我之所以要附加而不是覆盖文件，是因为需要保持先前处理的行的UUID不变。

因此，对于追加行的情况，我检查源文件和目标文件的行数是否相同。 如果不是这样，我将使用与目标文件中的行数相等的行号（来自源文件）的数据创建新的数据框。

在这一点上，我尝试将新创建的数据框附加到目标数据框，但是它一直失败。 我收到以下错误：

 > RuntimeWarning: '<' not supported between instances of 'int' and > 'str', sort order is undefined for incomparable objects result = > result.union(other)

我正在使用的代码如下：

import os, uuid
import pandas as pd


def process_files():
    source_dir = "C:\\Projects\\test\\raw"
    destination_dir = "C:\\Projects\\test\\processed"

    for file_name in os.listdir(source_dir):
        if file_name.endswith((".csv", ".new")):
            df_source = pd.read_csv(source_dir + "/" + file_name, sep=";")

            if os.path.isfile(destination_dir + "/" + file_name):
                df_destination = pd.read_csv(destination_dir + "/" + file_name, sep=",", header=None)

                if df_source.shape[0] != (df_destination.shape[0]):
                    df_newlines = pd.read_csv(source_dir + "/" + file_name, sep=";", skiprows=df_destination.shape[0], header=None)
                    df_newlines.insert(0, "uu_id", pd.Series([uuid.uuid4() for i in range(len(df_newlines))]))
                    df_destination.append(df_newlines, ignore_index=True)
                    df_destination.to_csv(destination_dir + "/" + file_name, sep=",", header=False, mode="w", index=False)
                else:
                    continue
            else:
                df_source.insert(0,"uu_id", pd.Series([uuid.uuid4() for i in range(len(df_source))]))
                df_source.to_csv(destination_dir + "/" + file_name, sep=",", header=False, mode="w", index=False)
        else:
            continue


process_files()

我检查了两个数据框的dtype，它们每列匹配。 我还强制将列重命名为具有相同的字符串，但这不能解决问题。 知道我在执行追加操作时有什么问题（注释出追加行将运行脚本而不会出现问题）吗？

谢谢您，最好的问候，Bostjan

Answer 1

免责声明：由于缺乏信誉，我无权发表评论

通常， append不被使用。 因此，我建议说

df_destination = df_destination.append(df_newlines, ignore_index=True)

希望就是这样。

除此之外，我建议使用os.walk和fnmatch浏览文件。

Python Pandas追加数据框

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-12-14 15:50:55

Python Pandas追加数据框

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-12-14 15:50:55

解决方案1
1 已采纳 2017-12-14 15:50:55