How to strip words from a column value if the value contains specific substrings?

Question

I have row values as such:

         ID     MyColumn      
0        A      "Best Position 3 5"
1        B      "Healthy (unexpired)
2        C      "At-Large"
3        D      "Run 2 Position 1"
4        E      "Hello"
4        E      "None"
4        E      "Tomorrow"

I want to scan this table for any rows that contain substrings "Position", and then for those rows keep only the first instance of an int. I have the Lambda / regex for taking the first instance of an int in a value:

...str.replace(r'\D+', '').str.split()

but I'm not sure how to apply it on the condition of substring appearances.

Resulting set:

         ID     MyColumn      
0        A      "3"
1        B      "Healthy (unexpired)
2        C      "At-Large"
3        D      "2"
4        E      "Hello"
4        E      "None"
4        E      "Tomorrow"

Answer 1

We might be able to use str.replace here with a smart regex:

regex = '.*?(\d+).*(?:Position|unexpired).*|.*?(?:Position|unexpired).*?(\d+).*'
df['new'] = df.loc['MyColumn'].str.replace(regex, '\1\2', case=False)

Answer 2

Use Series.str.contains with Series.str.extract for first integer with Series.mask and last replace by original non matched values by Series.fillna :

mask= df['MyColumn'].str.contains('Position|unexpired', case=False)
df['MyColumn']=(df['MyColumn'].mask(mask,df['MyColumn'].str.extract(r'(\d+)',expand=False))
                              .fillna(df['MyColumn']))
print (df)
  ID              MyColumn
0  A                     3
1  B  "Healthy (unexpired)
2  C            "At-Large"
3  D                     2
4  E               "Hello"
4  E                "None"
4  E            "Tomorrow"

How to strip words from a column value if the value contains specific substrings?

Question

2 answers

solution1
2 2020-12-17 06:07:38

solution2
1 ACCPTED 2020-12-17 06:04:37

How to strip words from a column value if the value contains specific substrings?

Question

2 answers

solution1 2 2020-12-17 06:07:38

solution2 1 ACCPTED 2020-12-17 06:04:37

solution1
2 2020-12-17 06:07:38

solution2
1 ACCPTED 2020-12-17 06:04:37