簡體   English   中英

如何通過python正則表達式將數據從指定位置提取到某些行

[英]How to Extract data from specified position to some lines by python regex

在此先感謝我有一些文本文件作為輸入,其中數據處於某種模式我想從這些文本文件中提取數據。 我的代碼適用於許多文件,但有一次失敗,所以我需要一些幫助

我在 Total 和 Words 之間提取數據的第一種格式

全部的
254285.00
45771.30
300056 30
詞:

我的代碼失敗的第二種格式我想在總字數之前提取 3 個值

原件/副本三份稅務發票(根據 2017 年中央商品和服務稅規則第 46 條)第 1 頁,共 1 頁 KS LINGAPPA AND SON 工業區,14 號地塊。 KSSIDC TBDam 路。 霍薩佩特-
583201 州:卡納塔克邦
州代碼:29
GSTIN:29AAEFK8072G122 電話:PAN:AAEFK8072G CIN
發票編號:OS/20-21/5
發票日期:29/08/2020
收款人:收件人代碼:GSCPL 收件人姓名:GREAT SANDS CONSULTING PRIVATE
LIMITED 地址:70, TUMKUR ROAD,YESHWANTHPUR,BANGALURU(Bangalore) Urban
卡納塔克邦, 560022 GSTIN: 29AAECG5355M1Z3
州:卡納塔克邦:AAECG5355M
州代碼:29 供應地:卡納塔克邦
me Urbane Karnataka, 560022
反向充電適用 - N

SAC 和描述
總稅額
應稅價值 SGST/
UTGST
速度
SGSTI CGST UTGST 費率金額
CGST 金額
IGST費率
IGST 金額
11
998599 & 市場開發
47076.00
55549.68
8473.68
9.00
0.00
4236.84
9.00
4236.84
0.00
8473.68
55549.58
47076.00
全部的:
字數:盧比五萬五千五百四十九和派薩
六十八只
FO-KS LINGAS DINGAPA ASOSA
拉尼爾·喬斯
伙伴
授權簽名

if line.strip() == "Total":
        copy = True
        continue
     if line.strip() == "Total:":
        copy = True
        continue
     elif line.strip() == "Amount":
        copy = False
        continue

     elif copy:
        cnt=cnt+1
        if cnt==1:
          Taxable_Value.append(line)
        if cnt==2:
          Total_Tax.append(line)
        if cnt==3:
          Total_Amount.append(line)
          break

請嘗試下面的正則表達式來匹配數字,並取出匹配的組。

((?:[\d. ]+\s){3})?Total:?\s((?:[\d. ]+\s)*)Amount

代碼

import re
case1="""
thanks in advance I have some text files as input in which data is in some pattern I want to extract the data from these text files. My code is working for many files but it fails at one point so I would like some help

first format where i am extracting data between Total and Words

Total
254285.00
45771.30
300056 30
Amount in word:

Second format where my code fails i want to extract 3 values before Total words

Original/DuplicatesTriplicate TAX INVOICE (Under Rule 46 of the Central Goods & Service Tax Rules, 2017) Page 1 of 1 KS LINGAPPA AND SON Industrial Area, Plot No 14. KSSIDC TBDam Road. Hosapete-
583201 State: Karnataka
State Code: 29
GSTIN: 29AAEFK8072G122 Phone: PAN: AAEFK8072G CIN
Invoice No: OS/20-21/5
Invoice Date: 29/08/2020
Bill To: Recepient Code: GSCPL Recepient Name: GREAT SANDS CONSULTING PRIVATE
LIMITED Address: 70, TUMKUR ROAD,YESHWANTHPUR,BANGALURU(Bangalore) Urban
Karnataka, 560022 GSTIN: 29AAECG5355M1Z3
State: Karnataka PAN: AAECG5355M
State Code: 29 Place of Supply: Karnataka
me Urbane Karnataka, 560022
Reverse Charge Applicable - N

SAC & Description
Total Tax Total Amount
Taxable Val SGST/
UTGST
Rate
SGSTI CGST UTGST Rate Amount
CGST Amount
IGST Rate
IGST Amount
11
998599 & Market Devlopment
47076.00
55549.68
8473.68
9.00
0.00
4236.84
9.00
4236.84
0.00
8473.68
55549.58
47076.00
Total:
Amount in words: Rupees Fifty Five Thousand Five Hundred Fourty Nine & Paisa
Sixty Eight Only
FO-K.S. LINGAS DINGAPA ASOSA
Ranil Jos
Partner
Authorised Signature
"""

output=re.findall("((?:[\d. ]+\s){3})?Total:?\s((?:[\d. ]+\s)*)Amount",case1)
for o in output:
    print(o[0] or o[1])

輸出

254285.00
45771.30
300056 30


8473.68
55549.58
47076.00

正則表達式演示

如果您正在從文件中讀取內容,請使用以下代碼。

import re
with open("test.txt","r") as f:
    case1=f.read();
    output=re.findall("((?:[\d. ]+\s){3})?Total:?\s((?:[\d. ]+\s)*)Amount",case1)
    for o in output:
        print(o[0] or o[1])

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM