简体   繁体   English

用DataFrame列中的子字符串替换字符串

[英]Replace string with substring in DataFrame Column

I'm trying to match a column in a DataFrame to one of a list of substrings. 我正在尝试将DataFrame中的列与子字符串列表之一匹配。

eg take the column ( strings ) with the following values: 例如,使用具有以下值的列( strings ):

text1C1
text2A
text2
text4
text4B
text4A3

And create a new column which has matched them to the following substrings: 并创建一个将它们与以下子字符串匹配的新列:

vals = ['text1', 'text2', 'text3', 'text4', 'text4B']

The code I have at the moment works, but it seems like a really inefficient way of solving the problem. 我目前拥有的代码可以正常工作,但似乎是解决问题的一种非常低效的方法。

df = pd.DataFrame({'strings': ['text1C1', 'text2A', 'text2', 'text4', 'text4B', 'text4A3']})


for v in vals:
        df.loc[df[df['strings'].str.contains(v)].index, 'matched strings'] = v

This returns the following DataFrame, which is what I need. 这将返回以下DataFrame,这是我需要的。

   strings    matched strings
0  text1C1              text1
1   text2A              text2
2    text2              text2
3    text4              text4
4   text4B             text4B
5  text4A3              text4

Is there a more efficient way of doing this especially for larger DataFrames (10k+ rows)? 有没有更有效的方法来做到这一点,尤其是对于较大的DataFrame(10k +行)?

I cant think of how to deal with one of the items of vals also being a substring of another ( text4 is a substring of text4B ) 我想不出如何处理的项目之一vals也被另一个子串( text4是的一个子text4B

Use generator with next for match first value: 使用具有next生成器来匹配第一个值:

s = vals[::-1]
df['matched strings1'] = df['strings'].apply(lambda x: next(y for y in s if y in x))
print (df)
   strings matched strings matched strings1
0  text1C1           text1            text1
1   text2A           text2            text2
2    text2           text2            text2
3    text4           text4            text4
4   text4B          text4B           text4B
5  text4A3           text4            text4

More general solution if possible no matched values with iter and default parameter of next : 如果可能,则采用更通用的解决方案,如果没有与iternext参数为默认值的匹配值:

f = lambda x: next(iter(y for y in s if y in x), 'no match')
df['matched strings1'] = df['strings'].apply(f)

Your solution should be improved: 您的解决方案应该得到改进:

for v in vals:
    df.loc[df['strings'].str.contains(v, regex=False), 'matched strings'] = v

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果列表中的字符串在 Pandas DataFrame 列中包含 ZE83AED3DDF4667DEC0DAAAACB2BB3BE0BZ,如何替换它 - How to replace a string in a list if it contains a substring in Pandas DataFrame column Pandas DataFrame - 如果存在 substring,则替换列中的 substring - Pandas DataFrame - replace substring in column if a substring exists pandas dataframe 替换列的多个 substring - pandas dataframe replace multiple substring of column 如果熊猫数据框中包含特定的子字符串,请替换该字符串 - Replace string in pandas dataframe if it contains specific substring 如果整个字符串包含熊猫数据框中的子字符串,则替换整个字符串 - Replace whole string if it contains substring in pandas dataframe 正则表达式仅在 DataFrame 中用 substring 替换包含 substring 的字符串 - Regex expression to replace a string containing a substring with the substring only in a DataFrame 用列表中的子字符串替换 Pandas 列中的字符串 - Replace string in Pandas column by substring from list 根据字符串是否是 pandas Dataframe 中的子字符串创建一列 - Create a column based on if a string is a substring in pandas Dataframe 如何删除Dataframe列中的字符串子串? - How to Remove a Substring of String in a Dataframe Column? 通过引用字符串 position 检查 dataframe 列中的 substring - Check substring in dataframe column by referencing string position
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM