[英]Removing characters from a dataframe column
以下是用于从“类别列表”和数据集中提取匹配值的代码。
matches= token.apply(lambda x: pd.Series(x).str.extractall("|".join(["({})".format(cat) for cat in Categories.HealthCare])))
match_list= [[m for m in match.values.ravel() if isinstance(m, str)] for match in matches]
match_df = pd.DataFrame({"Hc1":match_list})
def match_health(row):
categories = []
for bigram in row.bigram:
joined = ' '.join(bigram)
if joined in HealthCare:
categories.append(joined)
for trigram in row.trigram:
joined = ' '.join(trigram)
if joined in HealthCare:
categories.append(joined)
return categories
match_df['Hc2'] = df.apply(match_health, axis=1)
match_df['HealthCare'] = match_df[match_df.columns[[0,1]]].apply(lambda x: ','.join(x.dropna().astype(str)),axis=1)
结果如下:
Hc1 Hc2 HealthCare
0 [] [] [],[]
1 [Sauna, Jacuzzi] [Health Club, Steam Room] ['Sauna', 'Jacuzzi'],['Health Club', 'Steam Ro...
2 [Sauna, Jacuzzi] [Health Club, Steam Room] ['Sauna', 'Jacuzzi'],['Health Club', 'Steam Ro...
3 [Sauna, Jacuzzi] [Health Club, Steam Room] ['Sauna', 'Jacuzzi'],['Health Club', 'Steam Ro...
类型(match_df)
pandas.core.frame.DataFrame
但是我的输出应该没有'[]'-方括号和像这样的字符串周围的单引号:
Hc1 Hc2 HealthCare
0
1 Sauna, Jacuzzi Health Club, Steam Room Sauna,Jacuzzi,Health Club,Steam Ro...
2 Sauna, Jacuzzi Health Club, Steam Room Sauna,Jacuzzi,Health Club,Steam Ro...
3 Sauna, Jacuzzi Health Club, Steam Room Sauna,Jacuzzi,Health Club,Steam Ro...
需要帮忙。
您可以致电.str.replace
:
match_df['HealthCare'] = match_df['HealthCare']\
.astype(str).str.replace(r"[\[\]']", '')
match_df['HealthCare'] = match_df['HealthCare'].map(lambda x: x.replace('[','').replace(']','').replace("'",''))
这可以替换所有方括号和单引号。
O / P:
HealthCare
0
1 Sauna,Jacuzzi,Health Club,Steam Ro...
2 Sauna,Jacuzzi,Health Club,Steam Ro...
3 Sauna,Jacuzzi,Health Club,Steam Ro...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.