[英]Replace string with substring in DataFrame Column
I'm trying to match a column in a DataFrame to one of a list of substrings. 我正在尝试将DataFrame中的列与子字符串列表之一匹配。
eg take the column ( strings
) with the following values: 例如,使用具有以下值的列(
strings
):
text1C1
text2A
text2
text4
text4B
text4A3
And create a new column which has matched them to the following substrings: 并创建一个将它们与以下子字符串匹配的新列:
vals = ['text1', 'text2', 'text3', 'text4', 'text4B']
The code I have at the moment works, but it seems like a really inefficient way of solving the problem. 我目前拥有的代码可以正常工作,但似乎是解决问题的一种非常低效的方法。
df = pd.DataFrame({'strings': ['text1C1', 'text2A', 'text2', 'text4', 'text4B', 'text4A3']})
for v in vals:
df.loc[df[df['strings'].str.contains(v)].index, 'matched strings'] = v
This returns the following DataFrame, which is what I need. 这将返回以下DataFrame,这是我需要的。
strings matched strings
0 text1C1 text1
1 text2A text2
2 text2 text2
3 text4 text4
4 text4B text4B
5 text4A3 text4
Is there a more efficient way of doing this especially for larger DataFrames (10k+ rows)? 有没有更有效的方法来做到这一点,尤其是对于较大的DataFrame(10k +行)?
I cant think of how to deal with one of the items of vals
also being a substring of another ( text4
is a substring of text4B
) 我想不出如何处理的项目之一
vals
也被另一个子串( text4
是的一个子text4B
)
Use generator with next
for match first value: 使用具有
next
生成器来匹配第一个值:
s = vals[::-1]
df['matched strings1'] = df['strings'].apply(lambda x: next(y for y in s if y in x))
print (df)
strings matched strings matched strings1
0 text1C1 text1 text1
1 text2A text2 text2
2 text2 text2 text2
3 text4 text4 text4
4 text4B text4B text4B
5 text4A3 text4 text4
More general solution if possible no matched values with iter
and default parameter of next
: 如果可能,则采用更通用的解决方案,如果没有与
iter
和next
参数为默认值的匹配值:
f = lambda x: next(iter(y for y in s if y in x), 'no match')
df['matched strings1'] = df['strings'].apply(f)
Your solution should be improved: 您的解决方案应该得到改进:
for v in vals:
df.loc[df['strings'].str.contains(v, regex=False), 'matched strings'] = v
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.