简体   繁体   中英

Split values in a column by delimiter and assign value to multiple columns in Pandas dataframe

My data frame has the following columns:

  • location_name
  • city
  • state
  • country

I would like to split the values in the location_name column and save it into the individual city, state, country columns.

The values in the location_name column looks like this:

location_name 
111 Washington Ave, Ellenville, NY 12428, United States
Tamil Nadu, India
Lynchburg, VA, United States
Peachtree Street, Atlanta, GA, United States
Nigeria

As you can see they are all not complete addressees containing street address, city, state and country. The last value will always be a country name and will always be available. Everything else state, city and street address are all optional so the number of elements can change.

df[['city','state', 'country']] = df['location_name'].str.split(',', expand=True)

But the above method does not account for missing state, city and street address values so does not update the right values in each column. My final output dataframe should look like this:

在此处输入图像描述

How would I do it?

You can't do this, because there is nothing that distinguishes a state from a city string-wise. There is no algorithmic discernible difference between the city 'New York' and the state 'Tamil Nadu'. Both have two words, and both words start with an uppercase character.

There are also no standard string characteristics within the desired columns. 'VA' is only uppercase, which could be characteristic of a state, but then again we have 'Tamil Nadu', which violates this uppercase assumption.

The only way I would see you do this if you have a dictionary with all cities and/or states. Then you can look up certain values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM