I have row values as such:
ID MyColumn
0 A "Best Position 3 5"
1 B "Healthy (unexpired)
2 C "At-Large"
3 D "Run 2 Position 1"
4 E "Hello"
4 E "None"
4 E "Tomorrow"
I want to scan this table for any rows that contain substrings "Position", and then for those rows keep only the first instance of an int. I have the Lambda / regex for taking the first instance of an int in a value:
...str.replace(r'\D+', '').str.split()
but I'm not sure how to apply it on the condition of substring appearances.
Resulting set:
ID MyColumn
0 A "3"
1 B "Healthy (unexpired)
2 C "At-Large"
3 D "2"
4 E "Hello"
4 E "None"
4 E "Tomorrow"
We might be able to use str.replace
here with a smart regex:
regex = '.*?(\d+).*(?:Position|unexpired).*|.*?(?:Position|unexpired).*?(\d+).*'
df['new'] = df.loc['MyColumn'].str.replace(regex, '\1\2', case=False)
Use Series.str.contains
with Series.str.extract
for first integer with Series.mask
and last replace by original non matched values by Series.fillna
:
mask= df['MyColumn'].str.contains('Position|unexpired', case=False)
df['MyColumn']=(df['MyColumn'].mask(mask,df['MyColumn'].str.extract(r'(\d+)',expand=False))
.fillna(df['MyColumn']))
print (df)
ID MyColumn
0 A 3
1 B "Healthy (unexpired)
2 C "At-Large"
3 D 2
4 E "Hello"
4 E "None"
4 E "Tomorrow"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.