[英]How to Extract data from specified position to some lines by python regex
在此先感謝我有一些文本文件作為輸入,其中數據處於某種模式我想從這些文本文件中提取數據。 我的代碼適用於許多文件,但有一次失敗,所以我需要一些幫助
我在 Total 和 Words 之間提取數據的第一種格式
全部的
254285.00
45771.30
300056 30
量詞:
我的代碼失敗的第二種格式我想在總字數之前提取 3 個值
原件/副本三份稅務發票(根據 2017 年中央商品和服務稅規則第 46 條)第 1 頁,共 1 頁 KS LINGAPPA AND SON 工業區,14 號地塊。 KSSIDC TBDam 路。 霍薩佩特-
583201 州:卡納塔克邦
州代碼:29
GSTIN:29AAEFK8072G122 電話:PAN:AAEFK8072G CIN
發票編號:OS/20-21/5
發票日期:29/08/2020
收款人:收件人代碼:GSCPL 收件人姓名:GREAT SANDS CONSULTING PRIVATE
LIMITED 地址:70, TUMKUR ROAD,YESHWANTHPUR,BANGALURU(Bangalore) Urban
卡納塔克邦, 560022 GSTIN: 29AAECG5355M1Z3
州:卡納塔克邦:AAECG5355M
州代碼:29 供應地:卡納塔克邦
me Urbane Karnataka, 560022
反向充電適用 - N
SAC 和描述
總稅額
應稅價值 SGST/
UTGST
速度
SGSTI CGST UTGST 費率金額
CGST 金額
IGST費率
IGST 金額
11
998599 & 市場開發
47076.00
55549.68
8473.68
9.00
0.00
4236.84
9.00
4236.84
0.00
8473.68
55549.58
47076.00
全部的:
字數:盧比五萬五千五百四十九和派薩
六十八只
FO-KS LINGAS DINGAPA ASOSA
拉尼爾·喬斯
伙伴
授權簽名
if line.strip() == "Total":
copy = True
continue
if line.strip() == "Total:":
copy = True
continue
elif line.strip() == "Amount":
copy = False
continue
elif copy:
cnt=cnt+1
if cnt==1:
Taxable_Value.append(line)
if cnt==2:
Total_Tax.append(line)
if cnt==3:
Total_Amount.append(line)
break
請嘗試下面的正則表達式來匹配數字,並取出匹配的組。
((?:[\d. ]+\s){3})?Total:?\s((?:[\d. ]+\s)*)Amount
代碼
import re
case1="""
thanks in advance I have some text files as input in which data is in some pattern I want to extract the data from these text files. My code is working for many files but it fails at one point so I would like some help
first format where i am extracting data between Total and Words
Total
254285.00
45771.30
300056 30
Amount in word:
Second format where my code fails i want to extract 3 values before Total words
Original/DuplicatesTriplicate TAX INVOICE (Under Rule 46 of the Central Goods & Service Tax Rules, 2017) Page 1 of 1 KS LINGAPPA AND SON Industrial Area, Plot No 14. KSSIDC TBDam Road. Hosapete-
583201 State: Karnataka
State Code: 29
GSTIN: 29AAEFK8072G122 Phone: PAN: AAEFK8072G CIN
Invoice No: OS/20-21/5
Invoice Date: 29/08/2020
Bill To: Recepient Code: GSCPL Recepient Name: GREAT SANDS CONSULTING PRIVATE
LIMITED Address: 70, TUMKUR ROAD,YESHWANTHPUR,BANGALURU(Bangalore) Urban
Karnataka, 560022 GSTIN: 29AAECG5355M1Z3
State: Karnataka PAN: AAECG5355M
State Code: 29 Place of Supply: Karnataka
me Urbane Karnataka, 560022
Reverse Charge Applicable - N
SAC & Description
Total Tax Total Amount
Taxable Val SGST/
UTGST
Rate
SGSTI CGST UTGST Rate Amount
CGST Amount
IGST Rate
IGST Amount
11
998599 & Market Devlopment
47076.00
55549.68
8473.68
9.00
0.00
4236.84
9.00
4236.84
0.00
8473.68
55549.58
47076.00
Total:
Amount in words: Rupees Fifty Five Thousand Five Hundred Fourty Nine & Paisa
Sixty Eight Only
FO-K.S. LINGAS DINGAPA ASOSA
Ranil Jos
Partner
Authorised Signature
"""
output=re.findall("((?:[\d. ]+\s){3})?Total:?\s((?:[\d. ]+\s)*)Amount",case1)
for o in output:
print(o[0] or o[1])
輸出
254285.00
45771.30
300056 30
8473.68
55549.58
47076.00
如果您正在從文件中讀取內容,請使用以下代碼。
import re
with open("test.txt","r") as f:
case1=f.read();
output=re.findall("((?:[\d. ]+\s){3})?Total:?\s((?:[\d. ]+\s)*)Amount",case1)
for o in output:
print(o[0] or o[1])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.