将稀疏的 csv 文件读入熊猫

Question

I have a space separated CSV file in following format:我有一个空格分隔的 CSV 文件，格式如下：

2012-11-01 1 2012-12-01 4 2013-02-01 6
2012-12-01 2 2013-01-01 nan
2012-11-01 3 2012-12-01 5 2013-01-01 5 2013-04-01 7

basically dates followed by a value, but the dates are sparse.基本上是日期后跟一个值，但日期是稀疏的。 Some of the values are nan, or also could be missing.一些值是 nan，或者也可能丢失。 I would like to be able to read this into Pandas and line up the values based on the corresponding dates.我希望能够将其读入 Pandas 并根据相应的日期排列值。

Running Pandas:运行熊猫：

import pandas as pd
pd.read_csv('sparse.csv', sep=" ", parse_dates=True)

errors with:错误：

ValueError: Expecting 6 columns, got 8 in row 1

What would be a way to read this file and align the date/values?读取此文件并对齐日期/值的方法是什么？

(Is there some "pre-processing" I could do maybe?) （我可以做一些“预处理”吗？）

Thanks谢谢

Answer 1

CSV should contain rows with same count of fields. CSV 应包含具有相同字段数的行。 If it just pairs of date-number without relations between pairs, it isnt CSV, but just file of pairs.如果它只是成对的日期数字而没有成对之间的关系，那么它不是CSV，而只是成对的文件。 So, it should be parsed as file of pairs:所以，它应该被解析为成对的文件：

input = open("sparse.csv").read().split() # split by newlines and spaces
i = iter(input)
for date in i:
    if date != "nan":
        value = i.next()
        # process pairs

将稀疏的 csv 文件读入熊猫

问题描述

1 个解决方案

解决方案1
2 已采纳 2012-11-08 15:00:01

将稀疏的 csv 文件读入熊猫

问题描述

1 个解决方案

解决方案1 2 已采纳 2012-11-08 15:00:01

解决方案1
2 已采纳 2012-11-08 15:00:01