带有标量值的 pd.DataFrame

Question

我想通过在验证过程后保存新的 CSV 来从 CSV 文件中删除一些行。 我写了下面的代码，但它会导致错误。

with open(path_to_read_csv_file, "r") as csv_file:
    csv_reader = csv.DictReader(csv_file, delimiter=',')
    for line in csv_reader:
        # if validation(line[specific_column]):
            try:
                df = pd.DataFrame(line)
                df.to_csv(path_to_save_csv_file)

            except Exception as e:
                print('Something Happend!')
                print(e)
                continue

错误：

Something Happend!
If using all scalar values, you must pass an index

我还尝试通过df = pd.DataFrame(line, index=[0])添加索引值，但它只存储第一行，并在开头添加一个空列。 如何解决这个问题？

带有line另一个版本有效，但我无法在每一行达到特定的键值：

inFile = open(path_to_read_csv_file, 'r')
outFile = open(path_to_save_csv_file, 'w')

for line in inFile:
    try:
        print('Analysing:', line)

        # HERE, how can I get the specific column value? I used to use line[specific_column] in the last version
        if validation(line[specific_column]):
            outFile.write(line)
        else:
            continue

    except Exception as e:
        print('Something Happend!')
        print(e)
        continue

outFile.close()
inFile.close()

Answer 1

这应该对你有帮助。 基本上，您不能仅从标量值创建 DataFrame。 它们必须被包裹在例如。 一个list 。

Answer 2

构造函数pd.DataFrame期望您告诉您提供的数据也必须如何索引。 这在此处记录。

函数csv.DictReader使用

文件 f 第一行中的值将用作字段名。

有关更多信息，请参阅 csv文档。

因此，由csv_reader解析的每一line都是一个字典，其中键是 CSV 标头，值是特定行中的每一行。

例如，如果我的 CSV 是：

Header1, Header2, Header3
1,2,3
11,11,33

然后在第一次迭代中， line对象将是：

{'Header1': '1', 'Header2': '2', 'Header3': '3'}

现在，当您将其提供给pd.DataFrame ，您需要指定数据是什么以及标题/索引是什么。 在这种情况下，数据是['1', '2', '3']并且标题/索引是['Header1', 'Header2', 'Header3'] 。 这些可以分别通过调用line.values()和line.keys()来提取。

这是我所做的改变。

with open(path_to_read_csv_file, "r") as csv_file:
    csv_reader = csv.DictReader(csv_file, delimiter=',')
    for line in csv_reader:
        try:
            # validation ...
            df = pd.DataFrame(line.values(), line.keys())
            df.to_csv(path_to_save_csv_file)

        except Exception as e:
            print('Something Happend!')
            print(e)
            continue

带有标量值的 pd.DataFrame

问题描述

2 个解决方案

解决方案1
0 2020-10-17 10:38:11

解决方案2
0 2020-10-17 10:39:09

带有标量值的 pd.DataFrame

问题描述

2 个解决方案

解决方案1 0 2020-10-17 10:38:11

解决方案2 0 2020-10-17 10:39:09

解决方案1
0 2020-10-17 10:38:11

解决方案2
0 2020-10-17 10:39:09