简体   繁体   中英

how to extract data in paragraph using regex

Customer Reference N139211508474572 Entry Date 05/19/2021 Extra Information NEFT IN UTR FROM S S DISTRIBUTOR N139211508474 572TXN REF NO 23621001323

How can I extract the vender company name like SS DISTRIBUTOR, In this FROM is constant in multiple data I have, I did the regex code to extract customer reference number(?<=Customer Reference ).+(?= Entry Date) like this it's working, and give me a code to extract the vender company name.

The customer reference number is not constant it will have mixed numbers and alphabets or only numbers.

Assuming the vender company name is located between the keyword FROM and the customer reference number, would you please try:

Customer Reference (.*).* FROM (.*) \1

The Group2 captures the vendor company name SS DISTRIBUTOR .

Demo

You didn't specify what language you were actually using, and as @some-programmer-dude mentioned regex is not necessarily the best solution for searching in a string.

You tagged python so assuming that is what you are using, you can consider using split and index to get whatever is after FROM and 2 words before REF :

s = "Customer Reference N139211508474572 Entry Date 05/19/2021 Extra Information NEFT IN UTR FROM S S DISTRIBUTOR N139211508474 572TXN REF NO 23621001323"
# Convert all to UPPER case first
s = s.upper()
# Clean unnecessary whitespaces first just in case
s = " ".join(s.split())
# Get all text after FROM
s = s.split("FROM")[1]
# Get all text which are 2 words before REF
index_of_REF = s.split().index("REF")
s = " ".join(s.split()[:index_of_REF-2])
print(s)

This gives:

S S DISTRIBUTOR

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM