简体   繁体   中英

Python-To Extract Data from Text file using Regex using python script

i want to Extract firm name(Samsung India Electronics Pvt. Ltd.) from my text file that are present in next line after Firm name. i have extract some data by my code but i am not able to extact firm name because i am new in python or python regex

import re
hand = open(r'C:\Users\sachin.s\Downloads\wordFile_Billing_PrintDocument_7528cc93-3644-4e38-a7b3-10f721fa2049.txt')
copy=False
for line in hand:
    line = line.rstrip()
    if re.search('Order Number\S*: [0-9.]+', line):
        print(line)
    if re.search('Invoice No\S*: [0-9.]+', line):
        print(line)
    if re.search('Invoice Date\S*: [0-9.]+', line):
        print(line)
    if re.search('PO No\S*: [0-9.]+', line):
        print(line)

Firm Name: Address:
Samsung India Electronics Pvt. Ltd.
Regd Office: 6th Floor, DLF Centre, Sansad Marg, New Delhi-110001

SAMSUNG INDIA ELECTRONICS PVT LTD, MEDCHAL MANDAL HYDERABAD

RANGA REDDY DISTRICT HYDERABAD TELANGANA 501401 Phone: 1234567 Fax No: Branch: S5S2 - [SIEL]HYDERABAD
Order Number: 1403543436
Currency: INR
Invoice No: 36S2I0030874
Invoice Date: 15.12.2018
PI No: 5929947652

Use regex:

import re

data = """
Firm Name: Address:
Samsung India Electronics Pvt. Ltd.
Regd Office: 6th Floor, DLF Centre, Sansad Marg, New Delhi-110001

SAMSUNG INDIA ELECTRONICS PVT LTD, MEDCHAL MANDAL HYDERABAD

RANGA REDDY DISTRICT HYDERABAD TELANGANA 501401 Phone: 1234567 Fax No: Branch: S5S2 - [SIEL]HYDERABAD
Order Number: 1403543436
Currency: INR
Invoice No: 36S2I0030874
Invoice Date: 15.12.2018
PI No: 5929947652
"""

result = re.findall('Address:(.*)Regd', data, re.MULTILINE|re.DOTALL)[0]

 Samsung India Electronics Pvt. Ltd.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM