[英]Pandas: how to read csv with multiple lines on the same cell?
I have a csv
that I am not able to read using read_csv
Opening the csv
with sublime text shows something like: 我有一个
csv
,我无法使用read_csv
读取打开csv
与sublime文本显示如下:
col1,col2,col3
text,2,3
more text,3,4
HELLO
THIS IS FUN
,3,4
As you can see, the text HELLO THIS IS FUN
takes three lines, and pd.read_csv
is confused as it thinks these are three new observations. 正如你所看到的,文本
HELLO THIS IS FUN
需要三行,并且pd.read_csv
被混淆,因为它认为这是三个新观察。 How can I parse that correctly in Pandas? 我怎样才能在Pandas中正确解析?
Thanks! 谢谢!
It looks like you'll have to preprocess the data manually: 您似乎必须手动预处理数据:
with open('data.csv','r') as f:
lines = f.read().splitlines()
processed = []
cum_c = 0
buffer = ''
for line in lines:
buffer += line # Append the current line to a buffer
c = buffer.count(',')
if cum_c == 2:
processed.append(line)
buffer = ''
elif cum_c > 2:
raise # This should never happen
This assumes that your data only contains unwanted newlines, eg if you had data with say, 3 elements in one row, 2 elements in the next, then the next row should either be blank or contain only 1 element. 这假设您的数据仅包含不需要的换行符,例如,如果您有数据,例如,一行中有3个元素,下一行中有2个元素,则下一行应为空白或仅包含1个元素。 If it has 2 or more, ie it's missing a necessary newline, then an error is thrown.
如果它有2个或更多,即它缺少必要的换行符,则抛出错误。 You can accommodate this case if necessary with a minor modification.
如有必要,您可以通过微小的修改来适应这种情况。
Actually, it might be more efficient to remove newlines instead, but it shouldn't matter unless you have a lot of data. 实际上,删除换行可能更有效,但除非您拥有大量数据,否则无关紧要。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.