[英]Remove string from one column if present in string of another column pandas
我觉得我很接近,但我正在寻找这样的东西,其中新专栏写的是公司名称,其中没有城市:
company postal_code name state city \
2000-01-01 abc gresham co 97080 john mi gresham
2000-01-01 startup llc 97080 jeff hi portland
2001-01-01 beaverton business biz 99999 sam ca beaverton
2002-01-01 andy co 92222 joey or los angeles
new_col
2000-01-01 abc co
2000-01-01 startup llc
2001-01-01 business biz
2002-01-01 andy co
这是我到目前为止所拥有的,但它抛出了一个TypeError: unhashable type: 'Series'
:
for idx in df1.index:
if df1["city"].loc[idx] in df1['company'].loc[idx]:
print("figure out how to print to new column the company name without the city included")
else:
print(df1['company'].loc[idx])
谢谢!
这是一个解决方案:
df = (
df.reset_index()
.assign(new_col=df.reset_index()
.pipe(lambda x: x.assign(x=x['company'].str.split(' ')))
.explode('x')
.loc[lambda x: x['x'] != x['city'], 'x']
.groupby(level=0)
.agg(list)
.str.join(' ')
)
.set_index('index')
)
Output:
>>> df
company postal_code name state city new_col
index
2000-01-01 abc gresham co 97080 john mi gresham abc co
2000-01-01 startup llc 97080 jeff hi portland startup llc
2001-01-01 beaverton business biz 99999 sam ca beaverton business biz
2002-01-01 andy co 92222 joey or los angeles andy co
单线:
df = df.reset_index().assign(new_col=df.reset_index().pipe(lambda x: x.assign(x=x['company'].str.split(' '))).explode('x').loc[lambda x: x['x'] != x['city'], 'x'].groupby(level=0).agg(list).str.join(' ')).set_index('index')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.