[英]Deleting decimals and non digits from string column using regex
I have a dataframe column with strings like this: 我有一个带有这样的字符串的数据框列:
df.column1:
0 R$ 27.467.522,00 (Vinte e sete milhões, quatro...
1 NaN
2 R$ 35.314.312,12 (Trinta e cinco milhões, trezentos...
3 R$ 1.231,34 (Mil duzentos e trinta e um reais e...
I want only to get the numbers, disconsidering the decimals, so it gets to look like this: 我只想获取数字,而不是小数,因此它看起来像这样:
df.column1:
0 27467522
1 NaN
2 35314312
3 1231
I'm trying to do that with regex: 我正在尝试使用正则表达式:
df['column1']=df['column1'].str.extract('[REGEX CODE]')
However I'm not used with Regex. 但是我不使用正则表达式。 I tried solutions like:
我尝试了以下解决方案:
df['column1']=df['column1'].str.extract('(.*?,)').str.extract('(\d+)')
df['column1']=df['column1'].str.extract('(\s*,.*)').str.extract('(\d+)')
But I haven't been able to make it right. 但是我一直无法做到这一点。 Can someone help?
有人可以帮忙吗?
Use str.replace
then str.extract
使用
str.replace
然后str.extract
df.column1.str.replace('.', '').str.extract(r'(\d+)')
0
0 27467522
1 NaN
2 35314312
3 1231
Decimals are indicated by commas here, so by replacing periods and using extract to find the first match, the number will be matched, ignoring the decimal. 小数在这里用逗号表示,因此通过替换句点并使用提取找到第一个匹配项,数字将被匹配,而忽略小数点。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.