I tried googling and searching stackoverflow already, but most similar answers involve some other character acting as a separator (like comma or |) but in the csv I have, a line of data looks like this:
"2017-02-27 ""2017-02-25"" ""15438"" ""2017-02-27",19,"671"" ""1"" ""14"" ""John Smith"" ""614""
And each value is meant to be a column (so above would be 8 columns). Another problem is the value 2017-02-27",19,"671 is all in one column, which includes single quote marks and commas.
So it seems like the delimiter is this: "" ""
How can I read this in properly?
Also, as a side question, the headers are also listed as the first row of the csv file, but they are separated with just spaces (with the headers themselves using underscores such as: name_1 name_2 name_3). Is there a way to read this in while using read_csv or easier to just copy that row and paste it in to the name parameter as a list?
Thanks!
Edit: I already tried sep='"" ""' which returns everything as one column. Here is everything I tried (taken from other stackoverflow threads):
sep='"" ""'
sep=',\s+',quoting=csv.QUOTE_ALL
sep=" ", quotechar="~"
sep='["]* ["]*', engine='python'
If I take your data as you have it and place in a csv file, and run this
df = pd.read_csv('test.csv', header=None, sep='\s', engine='python').replace('"','', regex=True)
df
I get
0 1 2 3 4 5 6 7 8
0 2017-02-27 2017-02-25 15438 2017-02-27,19,671 1 14 John Smith 614
Then split the column in question:
df[['n1', 'n2', 'n3']] = df.loc[:, 3].str.split(',', expand=True)
0 1 2 3 4 5 6 7 8 n1 n2 n3
0 2017-02-27 2017-02-25 15438 2017-02-27,19,671 1 14 John Smith 614 2017-02-27 19 671
If this isn't the result your looking for, please comment.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.