[英]pandas - read unstructured csv and save it in dataframe
I have an unstructured CSV, which does not have a consitent number of columns for all the rows.我有一个非结构化的 CSV,它没有所有行的列数一致。
Input CSV looks like this:输入 CSV 如下所示:
Row1,Col11,Col12,Col13
Row2,Col21,Col22,Col23,Col24,Col25
Row3,Col31,Col32
Row4,,,,Col44
Note that for this is comma seperated file, and few rows may even have just commas to represent nulls (eg. example row 4), but few may have less values it self, other values have to considered nulls (eg. rest of the rows)请注意,这是逗号分隔的文件,很少有行甚至可能只有逗号来表示空值(例如第 4 行),但很少有可能有较少的值,其他值必须考虑空值(例如行的 rest )
I want it to be read this raw file into the pandas dataframe as is, in case of Nulls, put NaN.我希望它按原样将此原始文件读入 pandas dataframe 中,如果是 Null,则放入 NaN。
Something like this:像这样的东西:
0 1 2 3 4 5
0 Row1 Col11 Col12 Col13 NaN NaN
1 Row2 Col21 Col22 Col23 Col24 Col25
2 Row3 Col31 Col32 NaN NaN NaN
3 Row4 NaN NaN NaN Col44 NaN
I am using pandas.read_csv function to read this, but it looks like it uses the first row to determine number of columns, and since it is inconsistent it gives error.我正在使用 pandas.read_csv function 来阅读此内容,但看起来它使用第一行来确定列数,并且由于它不一致,因此会出错。
Code:代码:
df= pd.read_csv(path, engine='python', header = None)
error:错误:
Expected 4 fields in line 2, saw 6
How can I fix this?我怎样才能解决这个问题?
Some thing like this would help:这样的事情会有所帮助:
with open('out.txt') as f:
df = pd.DataFrame([line.strip().split(',') for line in f.readlines()]
).replace('', None).fillna(np.nan)
Output: Output:
0 1 2 3 4 5
0 Row1 Col11 Col12 Col13 NaN NaN
1 Row2 Col21 Col22 Col23 Col24 Col25
2 Row3 Col31 Col32 NaN NaN NaN
3 Row4 Col31 Col32 NaN Col44 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.