使用正则表达式从字符串列中删除小数和非数字

Question

I have a dataframe column with strings like this: 我有一个带有这样的字符串的数据框列：

df.column1:
0 R$ 27.467.522,00 (Vinte e sete milhões, quatro...
1 NaN
2 R$ 35.314.312,12 (Trinta e cinco milhões, trezentos...
3 R$ 1.231,34 (Mil duzentos e trinta e um reais e...

I want only to get the numbers, disconsidering the decimals, so it gets to look like this: 我只想获取数字，而不是小数，因此它看起来像这样：

df.column1:
0 27467522
1 NaN
2 35314312
3 1231

I'm trying to do that with regex: 我正在尝试使用正则表达式：

df['column1']=df['column1'].str.extract('[REGEX CODE]')

However I'm not used with Regex. 但是我不使用正则表达式。 I tried solutions like: 我尝试了以下解决方案：

df['column1']=df['column1'].str.extract('(.*?,)').str.extract('(\d+)')
df['column1']=df['column1'].str.extract('(\s*,.*)').str.extract('(\d+)')

But I haven't been able to make it right. 但是我一直无法做到这一点。 Can someone help? 有人可以帮忙吗？

Answer 1

Use str.replace then str.extract 使用str.replace然后str.extract

df.column1.str.replace('.', '').str.extract(r'(\d+)')

          0
0  27467522
1       NaN
2  35314312
3      1231

Decimals are indicated by commas here, so by replacing periods and using extract to find the first match, the number will be matched, ignoring the decimal. 小数在这里用逗号表示，因此通过替换句点并使用提取找到第一个匹配项，数字将被匹配，而忽略小数点。

使用正则表达式从字符串列中删除小数和非数字

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-06-04 00:41:10

使用正则表达式从字符串列中删除小数和非数字

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-06-04 00:41:10

解决方案1
2 已采纳 2018-06-04 00:41:10