简体   繁体   English

如何使用正则表达式提取段落中的数据

[英]how to extract data in paragraph using regex

Customer Reference N139211508474572 Entry Date 05/19/2021 Extra Information NEFT IN UTR FROM S S DISTRIBUTOR N139211508474 572TXN REF NO 23621001323客户参考 N139211508474572 输入日期 05/19/2021 额外信息 NEFT IN UTR 来自S S 分销商 N139211508474 572TXN REF NO 23621001323

How can I extract the vender company name like SS DISTRIBUTOR, In this FROM is constant in multiple data I have, I did the regex code to extract customer reference number(?<=Customer Reference ).+(?= Entry Date) like this it's working, and give me a code to extract the vender company name.如何提取供应商公司名称,如 SS DISTRIBUTOR,在这个 FROM 在我拥有的多个数据中是恒定的,我做了正则表达式代码来提取客户参考号(?<=客户参考)。+(?=输入日期)像这样它正在工作,并给我一个代码来提取供应商公司名称。

The customer reference number is not constant it will have mixed numbers and alphabets or only numbers.客户参考号不是固定的,它会混合数字和字母或只有数字。

Assuming the vender company name is located between the keyword FROM and the customer reference number, would you please try:假设供应商公司名称位于关键字FROM和客户参考号之间,请尝试:

Customer Reference (.*).* FROM (.*) \1

The Group2 captures the vendor company name SS DISTRIBUTOR . Group2 捕获供应商公司名称SS DISTRIBUTOR

Demo演示

You didn't specify what language you were actually using, and as @some-programmer-dude mentioned regex is not necessarily the best solution for searching in a string.您没有指定您实际使用的语言,正如@some-programmer-dude提到的,正则表达式不一定是在字符串中搜索的最佳解决方案。

You tagged python so assuming that is what you are using, you can consider using split and index to get whatever is after FROM and 2 words before REF :您标记python ,因此假设您正在使用它,您可以考虑使用splitindex来获取FROM之后的任何内容和REF之前的 2 个单词:

s = "Customer Reference N139211508474572 Entry Date 05/19/2021 Extra Information NEFT IN UTR FROM S S DISTRIBUTOR N139211508474 572TXN REF NO 23621001323"
# Convert all to UPPER case first
s = s.upper()
# Clean unnecessary whitespaces first just in case
s = " ".join(s.split())
# Get all text after FROM
s = s.split("FROM")[1]
# Get all text which are 2 words before REF
index_of_REF = s.split().index("REF")
s = " ".join(s.split()[:index_of_REF-2])
print(s)

This gives:这给出了:

S S DISTRIBUTOR

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM