I am very new at regex, so I am struggling with my code.
I have a dataframe, df
, structured like this:
NAME PERCENT
0 APPLE COMPANY A 57 638 232 stocks 0.12322
1 BANANA 1 COMPANY B 12 946 201 stocks 0.02768
2 ORANGE COMPANY C 8 354 229 stocks 0.01786
df = pd.DataFrame({
'NAME': ['APPLE COMPANY A 57 638 232 stocks', 'BANANA 1 COMPANY B 12 946 201 stocks', 'ORANGE COMPANY C 8 354 229 stocks'],
'PERCENT': [0.12322, 0.02768 , 0.01786]
})
I want to extract the integers from NAME
, but not all integers (note that in row 1 we have BANANA 1 COMPANY B
where I want to ignore the integer 1
before COMPANY
. I want to extract only those integers before stocks
.
I want the output to look like this:
NAME PERCENT STOCKS
0 APPLE COMPANY A 0.12322 57638232
1 BANANA 1 COMPANY B 0.02768 12946201
2 ORANGE COMAPNY C 0.01786 8354229
So far I only have this, which doesn't produce what I want:
df['NAME'].str.findall(r'\b\d+\b')
Edit: Note that the number of stocks may change from thousands to millions, meaning that there is no pattern.
This regex
will extract what are you looking for
\d+\s\d+\s\d+
Matchs:
57 638 232
12 946 201
8 354 229
From :
'NAME': ['APPLE COMPANY A 57 638 232 stocks', 'BANANA 1 COMPANY B 12 946 201 stocks', 'ORANGE COMPANY C 8 354 229 stocks']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.