繁体   English   中英

如何通过python正则表达式将数据从指定位置提取到某些行

[英]How to Extract data from specified position to some lines by python regex

在此先感谢我有一些文本文件作为输入,其中数据处于某种模式我想从这些文本文件中提取数据。 我的代码适用于许多文件,但有一次失败,所以我需要一些帮助

我在 Total 和 Words 之间提取数据的第一种格式

全部的
254285.00
45771.30
300056 30
词:

我的代码失败的第二种格式我想在总字数之前提取 3 个值

原件/副本三份税务发票(根据 2017 年中央商品和服务税规则第 46 条)第 1 页,共 1 页 KS LINGAPPA AND SON 工业区,14 号地块。 KSSIDC TBDam 路。 霍萨佩特-
583201 州:卡纳塔克邦
州代码:29
GSTIN:29AAEFK8072G122 电话:PAN:AAEFK8072G CIN
发票编号:OS/20-21/5
发票日期:29/08/2020
收款人:收件人代码:GSCPL 收件人姓名:GREAT SANDS CONSULTING PRIVATE
LIMITED 地址:70, TUMKUR ROAD,YESHWANTHPUR,BANGALURU(Bangalore) Urban
卡纳塔克邦, 560022 GSTIN: 29AAECG5355M1Z3
州:卡纳塔克邦:AAECG5355M
州代码:29 供应地:卡纳塔克邦
me Urbane Karnataka, 560022
反向充电适用 - N

SAC 和描述
总税额
应税价值 SGST/
UTGST
速度
SGSTI CGST UTGST 费率金额
CGST 金额
IGST费率
IGST 金额
11
998599 & 市场开发
47076.00
55549.68
8473.68
9.00
0.00
4236.84
9.00
4236.84
0.00
8473.68
55549.58
47076.00
全部的:
字数:卢比五万五千五百四十九和派萨
六十八只
FO-KS LINGAS DINGAPA ASOSA
拉尼尔·乔斯
伙伴
授权签名

if line.strip() == "Total":
        copy = True
        continue
     if line.strip() == "Total:":
        copy = True
        continue
     elif line.strip() == "Amount":
        copy = False
        continue

     elif copy:
        cnt=cnt+1
        if cnt==1:
          Taxable_Value.append(line)
        if cnt==2:
          Total_Tax.append(line)
        if cnt==3:
          Total_Amount.append(line)
          break

请尝试下面的正则表达式来匹配数字,并取出匹配的组。

((?:[\d. ]+\s){3})?Total:?\s((?:[\d. ]+\s)*)Amount

代码

import re
case1="""
thanks in advance I have some text files as input in which data is in some pattern I want to extract the data from these text files. My code is working for many files but it fails at one point so I would like some help

first format where i am extracting data between Total and Words

Total
254285.00
45771.30
300056 30
Amount in word:

Second format where my code fails i want to extract 3 values before Total words

Original/DuplicatesTriplicate TAX INVOICE (Under Rule 46 of the Central Goods & Service Tax Rules, 2017) Page 1 of 1 KS LINGAPPA AND SON Industrial Area, Plot No 14. KSSIDC TBDam Road. Hosapete-
583201 State: Karnataka
State Code: 29
GSTIN: 29AAEFK8072G122 Phone: PAN: AAEFK8072G CIN
Invoice No: OS/20-21/5
Invoice Date: 29/08/2020
Bill To: Recepient Code: GSCPL Recepient Name: GREAT SANDS CONSULTING PRIVATE
LIMITED Address: 70, TUMKUR ROAD,YESHWANTHPUR,BANGALURU(Bangalore) Urban
Karnataka, 560022 GSTIN: 29AAECG5355M1Z3
State: Karnataka PAN: AAECG5355M
State Code: 29 Place of Supply: Karnataka
me Urbane Karnataka, 560022
Reverse Charge Applicable - N

SAC & Description
Total Tax Total Amount
Taxable Val SGST/
UTGST
Rate
SGSTI CGST UTGST Rate Amount
CGST Amount
IGST Rate
IGST Amount
11
998599 & Market Devlopment
47076.00
55549.68
8473.68
9.00
0.00
4236.84
9.00
4236.84
0.00
8473.68
55549.58
47076.00
Total:
Amount in words: Rupees Fifty Five Thousand Five Hundred Fourty Nine & Paisa
Sixty Eight Only
FO-K.S. LINGAS DINGAPA ASOSA
Ranil Jos
Partner
Authorised Signature
"""

output=re.findall("((?:[\d. ]+\s){3})?Total:?\s((?:[\d. ]+\s)*)Amount",case1)
for o in output:
    print(o[0] or o[1])

输出

254285.00
45771.30
300056 30


8473.68
55549.58
47076.00

正则表达式演示

如果您正在从文件中读取内容,请使用以下代码。

import re
with open("test.txt","r") as f:
    case1=f.read();
    output=re.findall("((?:[\d. ]+\s){3})?Total:?\s((?:[\d. ]+\s)*)Amount",case1)
    for o in output:
        print(o[0] or o[1])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM