使用正则表达式提取数字

Question

I want to extract numbers using regular expression 我想使用正则表达式提取数字

df['price'][0]

has 拥有

'[<em class="letter" id="infoJiga">3,402,000</em>]'

And I want to extract 3402000 我要提取3402000

How can I get this in pandas dataframe? 如何在pandas数据框中获取此信息？

Answer 1

However the value is a string, try the below code. 但是，该值是一个字符串，请尝试以下代码。

#your code    
df['price'][0] returns  '[<em class="letter" id="infoJiga">3,402,000</em>]'

let us say this is x.

y = ''.join(c for c in x.split('>')[1]  if c.isdigit()).strip()
print (y)

output: 3402000

Hope it works. 希望它能工作。

Answer 2

The simplest regex assuming nothing about the environment may be ([\\d,]*) . 不考虑环境的最简单正则表达式可能是([\\d,]*) 。 Than you can pandas' to_numeric function. 比起熊猫，您可以使用to_numeric函数。

Answer 3

Are all your values formatted the same way? 所有值的格式都一样吗？ If so, you can use a simple regular expression to extract the numeric values then convert them to int . 如果是这样，您可以使用一个简单的正则表达式提取数字值，然后将它们转换为int 。

import pandas as pd
import re

test_data = ['[<em class="letter" id="infoJiga">3,402,000</em>]','[<em class="letter" id="infoJiga">3,401,000</em>]','[<em class="letter" id="infoJiga">3,400,000</em>]','[<em class="letter" id="infoJiga">2,000</em>]']
df = pd.DataFrame(test_data)
>>> df[0]
0    [<em class="letter" id="infoJiga">3,402,000</em>]
1    [<em class="letter" id="infoJiga">3,401,000</em>]
2    [<em class="letter" id="infoJiga">3,400,000</em>]
3        [<em class="letter" id="infoJiga">2,000</em>]
Name: 0, dtype: object

Define a method that extracts and returns to integer 定义一个提取并返回整数的方法

def get_numeric(data):
    match = re.search('>(.+)<', data)
    if match:
        return int(match.group(1).replace(',',''))    
    return None

Apply it to DataFrame 将其应用于DataFrame

df[1] = df[0].apply(get_numeric)
>>> df[1]
0    3402000
1    3401000
2    3400000
3       2000
Name: 1, dtype: int64

使用正则表达式提取数字

问题描述

3 个解决方案

解决方案1
0 已采纳 2018-08-18 11:52:59

解决方案2
0 2018-08-18 11:58:06

解决方案3
0 2018-08-18 12:24:41

使用正则表达式提取数字

问题描述

3 个解决方案

解决方案1 0 已采纳 2018-08-18 11:52:59

解决方案2 0 2018-08-18 11:58:06

解决方案3 0 2018-08-18 12:24:41

解决方案1
0 已采纳 2018-08-18 11:52:59

解决方案2
0 2018-08-18 11:58:06

解决方案3
0 2018-08-18 12:24:41