I have a csv
that I am not able to read using read_csv
Opening the csv
with sublime text shows something like:
col1,col2,col3
text,2,3
more text,3,4
HELLO
THIS IS FUN
,3,4
As you can see, the text HELLO THIS IS FUN
takes three lines, and pd.read_csv
is confused as it thinks these are three new observations. How can I parse that correctly in Pandas?
Thanks!
It looks like you'll have to preprocess the data manually:
with open('data.csv','r') as f:
lines = f.read().splitlines()
processed = []
cum_c = 0
buffer = ''
for line in lines:
buffer += line # Append the current line to a buffer
c = buffer.count(',')
if cum_c == 2:
processed.append(line)
buffer = ''
elif cum_c > 2:
raise # This should never happen
This assumes that your data only contains unwanted newlines, eg if you had data with say, 3 elements in one row, 2 elements in the next, then the next row should either be blank or contain only 1 element. If it has 2 or more, ie it's missing a necessary newline, then an error is thrown. You can accommodate this case if necessary with a minor modification.
Actually, it might be more efficient to remove newlines instead, but it shouldn't matter unless you have a lot of data.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.