简体   繁体   English

在Python中使用RegEx提取内容

[英]Using RegEx in Python to extract contents

Good evening,晚上好,

I am very new to Python and RegEx.我对 Python 和 RegEx 很陌生。 I have the following sentence:我有以下句子:

-75.76 Card INSURANCEGrabPay ASIA DIRECT to Paid AM 1:16 +100.00 3257 UpAmex Top PM 9:55 +300.00 3257 UpAmex Top PM 9:55 -400.00 Card LTDGrabPay PTE AXS to Paid PM 9:57 (SGD) Amount Details Time here. appear will transactions cashless your All 2022 Feb 15 on made transactions GrabPay points 52 earned points Rewards 475.76 SGD spent Amount 0.24 SGD balance Wallet 2022 Feb 15 Summary statement daily your here

I would like to search for just '-' and the amount after that.我只想搜索“-”以及之后的金额。

After that, I would like to skip 2 words and extract ALL words if need be in a single group (I will read more about groups but for now i would need in a single group, which i can later use to split and get the words from that string) just before 'Paid'之后,如果需要在一个组中,我想跳过 2 个单词并提取所有单词(我将阅读更多关于组的信息,但现在我需要在一个组中,稍后我可以使用它来拆分和获取单词来自那个字符串)就在“付费”之前

For instance, I would get例如,我会得到

-75.76 ASIA Direct to
-400 PTE AXS to

What would be the regex command?什么是正则表达式命令? Also, is there a good regex tutorial where I can read up on?另外,是否有一个很好的正则表达式教程可供我阅读?

Rather than give you the actual regex, I'll gently nudge you in the right direction.我不会给你实际的正则表达式,而是轻轻地把你推向正确的方向。 It's more satisfying that way.这样更令人满意。

"Words" here are seperated by spaces.这里的“词”之间用空格隔开。 So what you're searching for is a group of characters (captured), a space, characters again, space, characters, space, then capture everything and end with "PAID".所以你要搜索的是一组字符(捕获),一个空格,再次是字符,空格,字符,空格,然后捕获所有内容并以“PAID”结尾。 Try to create a regex to do that.尝试创建一个正则表达式来做到这一点。

If you'd like to brush up on regex, check out Regex101 .如果您想温习一下正则表达式,请查看Regex101 It's a web tool to test out regex, along with a debugger and a cheat sheet.这是一个用于测试正则表达式的 web 工具,以及一个调试器和一个备忘单。

For now I have created one match having 2 groups ie, group1 for the amount and group2 for all the words (that include "to " string also).现在我已经创建了一个具有 2 个组的匹配项,即 group1 的数量和 group2 的所有单词(也包括“to”字符串)。

Regex:正则表达式:

(-\d+\.?\d+) \w+ \w+ ([\w ]+)?Paid

You can check the details here: https://regex101.com/r/eUMgdW/1您可以在这里查看详细信息: https://regex101.com/r/eUMgdW/1

Python code: Python 代码:

import re
output = re.findall("""(-\d+\.?\d+) \w+ \w+ ([\w ]+)?Paid""", your_input_string)

for found in output:
    print(found)

#('-75.76', 'ASIA DIRECT to ')
#('-400.00', 'PTE AXS to ')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM