如何使用正则表达式提取段落中的数据

Question

Customer Reference N139211508474572 Entry Date 05/19/2021 Extra Information NEFT IN UTR FROM S S DISTRIBUTOR N139211508474 572TXN REF NO 23621001323客户参考 N139211508474572 输入日期 05/19/2021 额外信息 NEFT IN UTR 来自S S 分销商 N139211508474 572TXN REF NO 23621001323

How can I extract the vender company name like SS DISTRIBUTOR, In this FROM is constant in multiple data I have, I did the regex code to extract customer reference number(?<=Customer Reference ).+(?= Entry Date) like this it's working, and give me a code to extract the vender company name.如何提取供应商公司名称，如 SS DISTRIBUTOR，在这个 FROM 在我拥有的多个数据中是恒定的，我做了正则表达式代码来提取客户参考号（？<=客户参考）。+（？=输入日期）像这样它正在工作，并给我一个代码来提取供应商公司名称。

The customer reference number is not constant it will have mixed numbers and alphabets or only numbers.客户参考号不是固定的，它会混合数字和字母或只有数字。

Answer 1

Assuming the vender company name is located between the keyword FROM and the customer reference number, would you please try:假设供应商公司名称位于关键字FROM和客户参考号之间，请尝试：

Customer Reference (.*).* FROM (.*) \1

The Group2 captures the vendor company name SS DISTRIBUTOR . Group2 捕获供应商公司名称SS DISTRIBUTOR 。

Demo演示

Answer 2

You didn't specify what language you were actually using, and as @some-programmer-dude mentioned regex is not necessarily the best solution for searching in a string.您没有指定您实际使用的语言，正如@some-programmer-dude提到的，正则表达式不一定是在字符串中搜索的最佳解决方案。

You tagged python so assuming that is what you are using, you can consider using split and index to get whatever is after FROM and 2 words before REF :您标记python ，因此假设您正在使用它，您可以考虑使用split和index来获取FROM之后的任何内容和REF之前的 2 个单词：

s = "Customer Reference N139211508474572 Entry Date 05/19/2021 Extra Information NEFT IN UTR FROM S S DISTRIBUTOR N139211508474 572TXN REF NO 23621001323"
# Convert all to UPPER case first
s = s.upper()
# Clean unnecessary whitespaces first just in case
s = " ".join(s.split())
# Get all text after FROM
s = s.split("FROM")[1]
# Get all text which are 2 words before REF
index_of_REF = s.split().index("REF")
s = " ".join(s.split()[:index_of_REF-2])
print(s)

This gives:这给出了：

S S DISTRIBUTOR

如何使用正则表达式提取段落中的数据

问题描述

2 个解决方案

解决方案1
1 2021-06-04 06:18:05

解决方案2
0 2021-06-04 04:28:15

如何使用正则表达式提取段落中的数据

问题描述

2 个解决方案

解决方案1 1 2021-06-04 06:18:05

解决方案2 0 2021-06-04 04:28:15

解决方案1
1 2021-06-04 06:18:05

解决方案2
0 2021-06-04 04:28:15