熊猫：如何在同一个单元格中读取多行的csv？

Question

I have a csv that I am not able to read using read_csv Opening the csv with sublime text shows something like: 我有一个csv ，我无法使用read_csv读取打开csv与sublime文本显示如下：

col1,col2,col3
text,2,3
more text,3,4
HELLO

THIS IS FUN
,3,4

As you can see, the text HELLO THIS IS FUN takes three lines, and pd.read_csv is confused as it thinks these are three new observations. 正如你所看到的，文本HELLO THIS IS FUN需要三行，并且pd.read_csv被混淆，因为它认为这是三个新观察。 How can I parse that correctly in Pandas? 我怎样才能在Pandas中正确解析？

Thanks! 谢谢！

Answer 1

It looks like you'll have to preprocess the data manually: 您似乎必须手动预处理数据：

with open('data.csv','r') as f:
    lines = f.read().splitlines()
processed = []
cum_c = 0
buffer = ''
for line in lines:
    buffer += line # Append the current line to a buffer
    c = buffer.count(',')
    if cum_c == 2:
        processed.append(line)
        buffer = ''
    elif cum_c > 2:
        raise # This should never happen

This assumes that your data only contains unwanted newlines, eg if you had data with say, 3 elements in one row, 2 elements in the next, then the next row should either be blank or contain only 1 element. 这假设您的数据仅包含不需要的换行符，例如，如果您有数据，例如，一行中有3个元素，下一行中有2个元素，则下一行应为空白或仅包含1个元素。 If it has 2 or more, ie it's missing a necessary newline, then an error is thrown. 如果它有2个或更多，即它缺少必要的换行符，则抛出错误。 You can accommodate this case if necessary with a minor modification. 如有必要，您可以通过微小的修改来适应这种情况。

Actually, it might be more efficient to remove newlines instead, but it shouldn't matter unless you have a lot of data. 实际上，删除换行可能更有效，但除非您拥有大量数据，否则无关紧要。

熊猫：如何在同一个单元格中读取多行的csv？

问题描述

1 个解决方案

解决方案1
1 2017-05-04 09:05:13

熊猫：如何在同一个单元格中读取多行的csv？

问题描述

1 个解决方案

解决方案1 1 2017-05-04 09:05:13

解决方案1
1 2017-05-04 09:05:13