[英]How to divide the column 'location' in given data frame?
I am working on a dataset where a columns is named as header. 我正在处理将列命名为标题的数据集。 The value are as mentioned.
该值如上所述。
df = pd.DataFrame(data={"location":["düsseldorf, nordrhein-westfalen, germany",
"durbanville , cape town, cape town , south africa"]})
I want to divide this column in ['city', 'state', 'country']
. 我想将此列划分为
['city', 'state', 'country']
。 Note the second row has duplicates. 请注意第二行重复。
I have tried the below but this does not deal with duplicates: 我已经尝试过下面的方法,但这不能处理重复项:
location = df.location.str.split(', ', n=2, expand=True)
location.columns = ['city', 'state', 'country']
You can use the unique_everseen
recipe available in the itertools
docs , which is also available in 3rd party libraries such as toolz.unique
. 您可以使用
itertools
文档中可用的unique_everseen
配方,也可以在第三方库(例如toolz.unique
使用该配方。
The logic can be incorporated in a list comprehension which iterates df['location']
. 该逻辑可以合并到迭代
df['location']
的列表理解中。 This is likely to be more efficient than Pandas string-based methods, which do not offer vectorised functionality. 这可能比不提供矢量化功能的基于Pandas字符串的方法更为有效。
from toolz import unique
res = pd.DataFrame([list(unique(map(str.strip, i.split(',')))) for i in df['location']])
res.columns = ['city', 'state', 'country']
print(res)
city state country
0 düsseldorf nordrhein-westfalen germany
1 durbanville cape town south africa
You can limit yourself only with pandas
to handle this problem: 您可以只使用
pandas
来限制自己以解决此问题:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
data_all=(['düsseldorf', 'nordrhein-westfalen', 'germany', 'durbanville', 'cape town', 'south africa'])
dfe = [[], [], []]
i = 0
j = 1
k = 2
while i < len(data_all):
dfe[0].append(data_all[i])
i += 3
while j < len(data_all):
dfe[1].append(data_all[j])
j += 3
while k < len(data_all):
dfe[2].append(data_all[k])
k += 3
d = {'city': dfe[0], 'state': dfe[1], 'country': dfe[2]}
df = pd.DataFrame(data=d)
print(df)
Result: 结果:
city state country
0 düsseldorf nordrhein-westfalen germany
1 durbanville cape town south africa
But actually I did not understand why you want to use duplicates, if you only have 3 columns: city, state and country. 但是实际上我不明白为什么如果只包含3列,为什么要使用重复项:城市,州和国家。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.