简体   繁体   中英

Extract integers with white space from a string

I am very new at regex, so I am struggling with my code.

I have a dataframe, df , structured like this:

                                    NAME  PERCENT
0     APPLE COMPANY A  57 638 232 stocks  0.12322
1  BANANA 1 COMPANY B  12 946 201 stocks  0.02768
2     ORANGE COMPANY C  8 354 229 stocks  0.01786

df = pd.DataFrame({
    'NAME': ['APPLE COMPANY A  57 638 232 stocks', 'BANANA 1 COMPANY B  12 946 201 stocks', 'ORANGE COMPANY C  8 354 229 stocks'],
    'PERCENT': [0.12322, 0.02768 , 0.01786]
    })

I want to extract the integers from NAME , but not all integers (note that in row 1 we have BANANA 1 COMPANY B where I want to ignore the integer 1 before COMPANY . I want to extract only those integers before stocks .

I want the output to look like this:

                 NAME  PERCENT    STOCKS
0     APPLE COMPANY A  0.12322  57638232
1  BANANA 1 COMPANY B  0.02768  12946201
2    ORANGE COMAPNY C  0.01786   8354229

So far I only have this, which doesn't produce what I want:

df['NAME'].str.findall(r'\b\d+\b')

Edit: Note that the number of stocks may change from thousands to millions, meaning that there is no pattern.

This regex will extract what are you looking for

\d+\s\d+\s\d+

Matchs:

57 638 232

12 946 201

8 354 229

From :

'NAME': ['APPLE COMPANY A  57 638 232 stocks', 'BANANA 1 COMPANY B  12 946 201 stocks', 'ORANGE COMPANY C  8 354 229 stocks']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM