简体   繁体   English

使用正则表达式从字符串列中删除小数和非数字

[英]Deleting decimals and non digits from string column using regex

I have a dataframe column with strings like this: 我有一个带有这样的字符串的数据框列:

df.column1:
0 R$ 27.467.522,00 (Vinte e sete milhões, quatro...
1 NaN
2 R$ 35.314.312,12 (Trinta e cinco milhões, trezentos...
3 R$ 1.231,34 (Mil duzentos e trinta e um reais e...

I want only to get the numbers, disconsidering the decimals, so it gets to look like this: 我只想获取数字,而不是小数,因此它看起来像这样:

df.column1:
0 27467522
1 NaN
2 35314312
3 1231

I'm trying to do that with regex: 我正在尝试使用正则表达式:

df['column1']=df['column1'].str.extract('[REGEX CODE]')

However I'm not used with Regex. 但是我不使用正则表达式。 I tried solutions like: 我尝试了以下解决方案:

df['column1']=df['column1'].str.extract('(.*?,)').str.extract('(\d+)')
df['column1']=df['column1'].str.extract('(\s*,.*)').str.extract('(\d+)')

But I haven't been able to make it right. 但是我一直无法做到这一点。 Can someone help? 有人可以帮忙吗?

Use str.replace then str.extract 使用str.replace然后str.extract

df.column1.str.replace('.', '').str.extract(r'(\d+)')

          0
0  27467522
1       NaN
2  35314312
3      1231

Decimals are indicated by commas here, so by replacing periods and using extract to find the first match, the number will be matched, ignoring the decimal. 小数在这里用逗号表示,因此通过替换句点并使用提取找到第一个匹配项,数字将被匹配,而忽略小数点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM