创建正则表达式模式以提取浮点数和整数

Question

I am having an issue in creating a pattern recognition function to extract all the numbers from a data frame column and print them. 我在创建模式识别功能以从数据框列中提取所有数字并进行打印时遇到问题。

I have tried to create a regex pattern after looking at the data camp tutorial and the other questions on stack overflow, but I have not been able to create a pattern that will extract all the numbers and print them. 在查看了数据训练营教程和其他有关堆栈溢出的问题之后，我试图创建一个正则表达式模式，但是我无法创建一个将提取所有数字并打印出来的模式。 Essentially, the EA patterns that I created and the HR patterns with floats like say 1.12 are not returning results. 本质上，我创建的EA模式和带有浮点数（例如1.12）的HR模式不会返回结果。

import re
import pandas as pd
data = ['1EA @ 3217.45;', 'ST - .63HR@165;', 'ST - .5HR@123;', 'ST - 1.08HR@165;', '1EA @ 3217.45;', 'ST - .85HR@165;', 'ST - .85HR@165;', '1EA @ 3217.45;', 'ST - .12HR@165;', 'OT - 1.12HR @ 165;', 'ST - .55HR@123;OT - 0.82HR @ 123;', 'ST - .5HR@165;', 'OT - 0.45HR @ 123;', 'ST - .6HR@123;', 'ST - 1.42HR@123;', '1EA @ 1500;', 'ST - .3HR@123;', 'ST - 1HR@111;OT - 0.25HR @ 111;']
Travel = pd.DataFrame(data, columns=['Rate Breakup Description'])

for a in Travel['Rate Breakup Description']:
    print(re.search('.(\d+)HR | (\d+)EA | (\d+)HR | (\d+)EA', a, re.I|re.M))

My objective is to be able to have a pattern recognition function that will extract all the numbers regardless of the different string patterns and print them in the order they appear. 我的目标是能够拥有一种模式识别功能，该功能将提取所有数字，而与不同的字符串模式无关，并按出现的顺序打印它们。

Answer 1

You may use 您可以使用

Travel['Result'] = Travel['Rate Breakup Description'].str.findall(r'\d*\.?\d+(?=HR|EA)').apply(', '.join)

The pattern will match 模式将匹配

\\d* - 0+ digits \\d* -0+个数字
\\.? - an optional . -可选的.
\\d+ - 1+ digits \\d+ -1个以上数字
(?=HR|EA) - followed with HR or EA . (?=HR|EA) -后跟HR或EA 。

The .str.findall will return all matches it finds in an input string, and .apply(', '.join) will join the results with a comma+space. .str.findall将返回它在输入字符串中找到的所有匹配项，而.apply(', '.join) join .apply(', '.join)将结果加逗号+空格。

If there is a single match expected in each input, you might use an alternative solution: 如果每个输入中期望有一个匹配项，则可以使用替代解决方案：

Travel['Result'] = Travel['Rate Breakup Description'].str.extract(r'(\d*\.?\d+)(?:HR|EA)', expand=False)

Here, (\\d*\\.?\\d+) is a capturing group due to the parentheses, this part is returned by .str.extract and (?:HR|EA) is a non-capturing group (so that it is not returned) matching either HR or EA . 在这里， (\\d*\\.?\\d+)是捕获组由于括号，这部分是由返回.str.extract和(?:HR|EA)是一个非捕获组（使其不返回）匹配HR或EA 。

创建正则表达式模式以提取浮点数和整数

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-01-28 16:13:23

创建正则表达式模式以提取浮点数和整数

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-01-28 16:13:23

解决方案1
0 已采纳 2019-01-28 16:13:23