简体   繁体   中英

Pandas read_csv only first comma

I have a csv database that looks like this:

Date,String
2010-12-31,'This, is, an example string'
2011-12-31,"This is an, example string"
2012-12-31,This is an example, string

I am trying to use pandas, because I believe it is one of the most widespread libraries to working with this kind of situations. Is there a way to create a DataFrame taking into account only the first comma using the read_csv function? (regardless of the fact that the string after has "" or '' or nothing to isolate it).

If not, what's the most efficient alternative to do so?

Thanks so much in advance for any help,

You can cheat by passing a regex for the sep argument of read_csv . The regex I used is ^([^,]+), which grabs the first comma. I also used the engine argument in order to avoid a pandas warning (since the default C engine does not support a regex sep) and the usecols argument to make sure we only get the columns we want (without it we also get an "unnamed" column, I'm not sure why to be honest).

You can get more information about each argument in read_csv docs .

test.csv

Date,String
2010-12-31,'This, is, an example string'
2011-12-31,"This is an, example string"
2012-12-31,This is an example, string

Then

print(pd.read_csv('test.csv', sep='^([^,]+),', engine='python', usecols=['Date', 'String']))

Outputs

         Date                         String
0  2010-12-31  'This, is, an example string'
1  2011-12-31   "This is an, example string"
2  2012-12-31     This is an example, string

This will not work if you will have more than 2 "actual" columns in the CSV file

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM