I have a csv database that looks like this:
Date,String
2010-12-31,'This, is, an example string'
2011-12-31,"This is an, example string"
2012-12-31,This is an example, string
I am trying to use pandas, because I believe it is one of the most widespread libraries to working with this kind of situations. Is there a way to create a DataFrame taking into account only the first comma using the read_csv
function? (regardless of the fact that the string after has "" or '' or nothing to isolate it).
If not, what's the most efficient alternative to do so?
Thanks so much in advance for any help,
You can cheat by passing a regex for the sep
argument of read_csv
. The regex I used is ^([^,]+),
which grabs the first comma. I also used the engine
argument in order to avoid a pandas warning (since the default C engine does not support a regex sep) and the usecols
argument to make sure we only get the columns we want (without it we also get an "unnamed" column, I'm not sure why to be honest).
You can get more information about each argument in read_csv
docs .
test.csv
Date,String
2010-12-31,'This, is, an example string'
2011-12-31,"This is an, example string"
2012-12-31,This is an example, string
Then
print(pd.read_csv('test.csv', sep='^([^,]+),', engine='python', usecols=['Date', 'String']))
Outputs
Date String
0 2010-12-31 'This, is, an example string'
1 2011-12-31 "This is an, example string"
2 2012-12-31 This is an example, string
This will not work if you will have more than 2 "actual" columns in the CSV file
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.