简体   繁体   English

通过分隔符拆分列中的值并将值分配给 Pandas dataframe 中的多个列

[英]Split values in a column by delimiter and assign value to multiple columns in Pandas dataframe

My data frame has the following columns:我的数据框有以下列:

  • location_name地点名称
  • city城市
  • state state
  • country国家

I would like to split the values in the location_name column and save it into the individual city, state, country columns.我想拆分 location_name 列中的值并将其保存到各个城市 state 国家列中。

The values in the location_name column looks like this: location_name 列中的值如下所示:

location_name 
111 Washington Ave, Ellenville, NY 12428, United States
Tamil Nadu, India
Lynchburg, VA, United States
Peachtree Street, Atlanta, GA, United States
Nigeria

As you can see they are all not complete addressees containing street address, city, state and country.如您所见,它们都不是包含街道地址、城市、state 和国家/地区的完整收件人。 The last value will always be a country name and will always be available.最后一个值将始终是国家名称,并且始终可用。 Everything else state, city and street address are all optional so the number of elements can change.其他所有内容 state、城市和街道地址都是可选的,因此元素的数量可以更改。

df[['city','state', 'country']] = df['location_name'].str.split(',', expand=True)

But the above method does not account for missing state, city and street address values so does not update the right values in each column.但上述方法不考虑缺少 state、城市和街道地址值,因此不会更新每列中的正确值。 My final output dataframe should look like this:我最终的 output dataframe 应该如下所示:

在此处输入图像描述

How would I do it?我该怎么做?

You can't do this, because there is nothing that distinguishes a state from a city string-wise.你不能这样做,因为没有什么可以区分 state 和城市字符串。 There is no algorithmic discernible difference between the city 'New York' and the state 'Tamil Nadu'.城市“纽约”和 state“泰米尔纳德邦”在算法上没有明显区别。 Both have two words, and both words start with an uppercase character.两者都有两个单词,并且两个单词都以大写字符开头。

There are also no standard string characteristics within the desired columns.所需列中也没有标准字符串特征。 'VA' is only uppercase, which could be characteristic of a state, but then again we have 'Tamil Nadu', which violates this uppercase assumption. 'VA' 只是大写字母,这可能是 state 的特征,但我们又遇到了 'Tamil Nadu',这违反了这个大写假设。

The only way I would see you do this if you have a dictionary with all cities and/or states.如果您有一本包含所有城市和/或州的字典,我会看到您这样做的唯一方法。 Then you can look up certain values.然后您可以查找某些值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM