pandas - 读取非结构化 csv 并将其保存在 dataframe

Question

I have an unstructured CSV, which does not have a consitent number of columns for all the rows.我有一个非结构化的 CSV，它没有所有行的列数一致。

Input CSV looks like this:输入 CSV 如下所示：

Row1,Col11,Col12,Col13
Row2,Col21,Col22,Col23,Col24,Col25
Row3,Col31,Col32
Row4,,,,Col44

Note that for this is comma seperated file, and few rows may even have just commas to represent nulls (eg. example row 4), but few may have less values it self, other values have to considered nulls (eg. rest of the rows)请注意，这是逗号分隔的文件，很少有行甚至可能只有逗号来表示空值（例如第 4 行），但很少有可能有较少的值，其他值必须考虑空值（例如行的 rest )

I want it to be read this raw file into the pandas dataframe as is, in case of Nulls, put NaN.我希望它按原样将此原始文件读入 pandas dataframe 中，如果是 Null，则放入 NaN。

Something like this:像这样的东西：

     0        1        2        3        4        5
0    Row1     Col11    Col12    Col13    NaN      NaN
1    Row2     Col21    Col22    Col23    Col24    Col25
2    Row3     Col31    Col32    NaN      NaN      NaN
3    Row4     NaN      NaN      NaN      Col44    NaN

I am using pandas.read_csv function to read this, but it looks like it uses the first row to determine number of columns, and since it is inconsistent it gives error.我正在使用 pandas.read_csv function 来阅读此内容，但看起来它使用第一行来确定列数，并且由于它不一致，因此会出错。

Code:代码：

df= pd.read_csv(path, engine='python',  header = None)

error:错误：

Expected 4 fields in line 2, saw 6

How can I fix this?我怎样才能解决这个问题？

Answer 1

Some thing like this would help:这样的事情会有所帮助：

with open('out.txt') as f:
    df = pd.DataFrame([line.strip().split(',') for line in f.readlines()]
                     ).replace('', None).fillna(np.nan)

Output: Output：

      0      1      2      3      4      5
0  Row1  Col11  Col12  Col13    NaN    NaN
1  Row2  Col21  Col22  Col23  Col24  Col25
2  Row3  Col31  Col32    NaN    NaN    NaN
3  Row4  Col31  Col32    NaN  Col44    NaN

pandas - 读取非结构化 csv 并将其保存在 dataframe

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-12-15 17:06:47

pandas - 读取非结构化 csv 并将其保存在 dataframe

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-12-15 17:06:47

解决方案1
1 已采纳 2020-12-15 17:06:47