pandas - 读取非结构化 csv 并将其保存在 dataframe

Question

我有一个非结构化的 CSV，它没有所有行的列数一致。

输入 CSV 如下所示：

Row1,Col11,Col12,Col13
Row2,Col21,Col22,Col23,Col24,Col25
Row3,Col31,Col32
Row4,,,,Col44

请注意，这是逗号分隔的文件，很少有行甚至可能只有逗号来表示空值（例如第 4 行），但很少有可能有较少的值，其他值必须考虑空值（例如行的 rest )

我希望它按原样将此原始文件读入 pandas dataframe 中，如果是 Null，则放入 NaN。

像这样的东西：

     0        1        2        3        4        5
0    Row1     Col11    Col12    Col13    NaN      NaN
1    Row2     Col21    Col22    Col23    Col24    Col25
2    Row3     Col31    Col32    NaN      NaN      NaN
3    Row4     NaN      NaN      NaN      Col44    NaN

我正在使用 pandas.read_csv function 来阅读此内容，但看起来它使用第一行来确定列数，并且由于它不一致，因此会出错。

代码：

df= pd.read_csv(path, engine='python',  header = None)

错误：

Expected 4 fields in line 2, saw 6

我怎样才能解决这个问题？

Answer 1

这样的事情会有所帮助：

with open('out.txt') as f:
    df = pd.DataFrame([line.strip().split(',') for line in f.readlines()]
                     ).replace('', None).fillna(np.nan)

Output：

      0      1      2      3      4      5
0  Row1  Col11  Col12  Col13    NaN    NaN
1  Row2  Col21  Col22  Col23  Col24  Col25
2  Row3  Col31  Col32    NaN    NaN    NaN
3  Row4  Col31  Col32    NaN  Col44    NaN

pandas - 读取非结构化 csv 并将其保存在 dataframe

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-12-15 17:06:47

pandas - 读取非结构化 csv 并将其保存在 dataframe

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-12-15 17:06:47

解决方案1
1 已采纳 2020-12-15 17:06:47