简体   繁体   English

如果值包含特定子字符串,如何从列值中删除单词?

[英]How to strip words from a column value if the value contains specific substrings?

I have row values as such:我有这样的行值:

         ID     MyColumn      
0        A      "Best Position 3 5"
1        B      "Healthy (unexpired)
2        C      "At-Large"
3        D      "Run 2 Position 1"
4        E      "Hello"
4        E      "None"
4        E      "Tomorrow"

I want to scan this table for any rows that contain substrings "Position", and then for those rows keep only the first instance of an int.我想扫描此表以查找包含子字符串“Position”的任何行,然后对于这些行仅保留 int 的第一个实例。 I have the Lambda / regex for taking the first instance of an int in a value:我有 Lambda / regex 用于在值中获取 int 的第一个实例:

...str.replace(r'\D+', '').str.split()

but I'm not sure how to apply it on the condition of substring appearances.但我不确定如何在 substring 出现的情况下应用它。

Resulting set:结果集:

         ID     MyColumn      
0        A      "3"
1        B      "Healthy (unexpired)
2        C      "At-Large"
3        D      "2"
4        E      "Hello"
4        E      "None"
4        E      "Tomorrow"

We might be able to use str.replace here with a smart regex:我们也许可以在这里使用带有智能正则表达式的str.replace

regex = '.*?(\d+).*(?:Position|unexpired).*|.*?(?:Position|unexpired).*?(\d+).*'
df['new'] = df.loc['MyColumn'].str.replace(regex, '\1\2', case=False)

Use Series.str.contains with Series.str.extract for first integer with Series.mask and last replace by original non matched values by Series.fillna :Series.str.containsSeries.str.extract用于第一个 integer 与Series.mask并最后由Series.fillna替换为原始不匹配值:

mask= df['MyColumn'].str.contains('Position|unexpired', case=False)
df['MyColumn']=(df['MyColumn'].mask(mask,df['MyColumn'].str.extract(r'(\d+)',expand=False))
                              .fillna(df['MyColumn']))
print (df)
  ID              MyColumn
0  A                     3
1  B  "Healthy (unexpired)
2  C            "At-Large"
3  D                     2
4  E               "Hello"
4  E                "None"
4  E            "Tomorrow"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果A列包含某些特定的字符串或A列中句子之外的单词集,则如何更新B列的值 - How to update value of column B if column A contains some specific string or set of words out of sentence in column A 如果列中的值包含两个单词,则创建一个新列和 map 值 - If value in a column contains both words, then create a new column and map the value 检查dataframe列中的每个值是否包含来自另一个dataframe列的单词 - Check if each value in a dataframe column contains words from another dataframe column 如果列的字符串值包含特定模式,如何从 pandas 数据帧中提取整行 - How to extract entire rows from pandas data frame, if a column's string value contains a specific pattern 如果数据框列有特定的词改变值 - If dataframe column has specific words alter value 根据另一列包含的单词有条件地为一列赋值 - Conditionally give value to one column based on the words that another column contains 获取在熊猫中包含特定值的列名 - get column name that contains a specific value in pandas 如何从字段中删除 1 和/或 2 个单词 - How to strip 1 and/or 2 words from field 如何从分隔的字符串中剥离值 - How to strip a value from a delimited string Pandas 如果行值包含列表中的项目作为子字符串,则将行值保存到不同的 dataframe - Pandas If row value contains items from a list as substrings, save row value to a different dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM