[英]Extract integers with white space from a string
I am very new at regex, so I am struggling with my code. 我是正则表达式的新手,所以我正在努力处理我的代码。
I have a dataframe, df
, structured like this: 我有一个数据帧, df
,结构如下:
NAME PERCENT
0 APPLE COMPANY A 57 638 232 stocks 0.12322
1 BANANA 1 COMPANY B 12 946 201 stocks 0.02768
2 ORANGE COMPANY C 8 354 229 stocks 0.01786
df = pd.DataFrame({
'NAME': ['APPLE COMPANY A 57 638 232 stocks', 'BANANA 1 COMPANY B 12 946 201 stocks', 'ORANGE COMPANY C 8 354 229 stocks'],
'PERCENT': [0.12322, 0.02768 , 0.01786]
})
I want to extract the integers from NAME
, but not all integers (note that in row 1 we have BANANA 1 COMPANY B
where I want to ignore the integer 1
before COMPANY
. I want to extract only those integers before stocks
. 我想从NAME
提取整数,但不是所有整数(请注意,在第1行中我们有BANANA 1 COMPANY B
,我想在COMPANY
之前忽略整数1
我想在stocks
之前stocks
提取那些整数。
I want the output to look like this: 我希望输出看起来像这样:
NAME PERCENT STOCKS
0 APPLE COMPANY A 0.12322 57638232
1 BANANA 1 COMPANY B 0.02768 12946201
2 ORANGE COMAPNY C 0.01786 8354229
So far I only have this, which doesn't produce what I want: 到目前为止我只有这个,这不会产生我想要的东西:
df['NAME'].str.findall(r'\b\d+\b')
Edit: Note that the number of stocks may change from thousands to millions, meaning that there is no pattern. 编辑:请注意,股票数量可能会从数千变为数百万,这意味着没有模式。
This regex
will extract what are you looking for 这个regex
将提取你在寻找什么
\d+\s\d+\s\d+
Matchs: 配衬:
57 638 232
12 946 201
8 354 229
From : 来自:
'NAME': ['APPLE COMPANY A 57 638 232 stocks', 'BANANA 1 COMPANY B 12 946 201 stocks', 'ORANGE COMPANY C 8 354 229 stocks']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.