[英]Extract the rows from the text + python regex
我正在嘗試從文本文件中提取整個行,但是它沒有按預期工作。
樣本文本文件內容:
data = """Add TTFF LEVERERGE 30 mp -5%
Some Text, Some Text
5882950 Abc Lahd
Pos Sequence Batch datax datay dataz dataa datab
1 00061680 904834 20.35 REV 177,650 5329,50
Bundled 2-rev 42al/xyz
Neon Classic Unit 1300 abc \ 1638\48
2 00012815 55244 815 FWD 164,720 18448,64
UnBundled 2-pag
Mathrine Classic straight Tilt 2 xyz / 23,2x23gb
150st/xyz 20 abc/xyz
3 90072815 65944 212 KRT 164,720 18448,64
UnBundled 2-pag
Mathrine Classic straight Tilt 2 xyz / 23,2x23gb
150st/bunt 20 bunt/bal
Some Valid Text
Some More Valid Text Some More Valid Text"""
我希望所有三行都以列表格式從中提取特定值。
邏輯是:
(由於前兩個步驟不起作用,因此在re.findall的此步驟中未將#3視為正則表達式)
$re.findall(r'(^\d{1,2}\s.*?\n^\d)', data, re.DOTALL|re.M)
['1 00061680 904834 20.35 REV 177,650 5329,50\nBundled 2-rev 42al/xyz\nNeon Classic Unit 1300 abc \\ 1638\x048\n2',
'3 90072815 65944 212 KRT 164,720 18448,64\nUnBundled 2-pag\nMathrine Classic straight Tilt 2 xyz / 23,2x23gb\n1']
預期結果是:
['1 00061680 904834 20.35 REV 177,650 5329,50\nBundled 2-rev 42al/xyz\nNeon Classic Unit 1300 abc \\ 1638\x048\n',
'2 00012815 55244 815 FWD 164,720 18448,64\n UnBundled 2-pag\n Mathrine Classic straight Tilt 2 xyz / 23,2x23gb\n 150st/xyz 20 abc/xyz',
'3 90072815 65944 212 KRT 164,720 18448,64\nUnBundled 2-pag\nMathrine Classic straight Tilt 2 xyz / 23,2x23gb\n150st/bunt 20 bunt/bal']
任何指導/幫助從文本中提取行?
如果您的正則表達式必須作為模式的一部分進行“計數”,則我不打算使用正則表達式,而應使用解析器-regex用於常規模式,而不是用於計數(盡管這里有些ppl創建了我的正則表達式)認為不可能)。
這是一種簡單明了的非正則表達式方法。 由於您沒有提供重要的“ STOP HERE”標記,因此必須清理最后一個項目。 我高度懷疑' Some Valid Text Some More Valid Text Some More Valid Text']'
將成為您文本的一部分,因此不符合“停止”的條件。
輸出也不包含終止符'\\n'
n'-我用它們將行分割為-well-行。 你可以添加一個'\\n'
時join()
荷蘭國際集團的part
■如果你真的需要它們:
data = """Add TTFF LEVERERGE 30 mp -5%
Some Text, Some Text
5882950 Abc Lahd
Pos Sequence Batch datax datay dataz dataa datab
1 00061680 904834 20.35 REV 177,650 5329,50
Bundled 2-rev 42al/xyz
Neon Classic Unit 1300 abc \ 1638\48
2 00012815 55244 815 FWD 164,720 18448,64
UnBundled 2-pag
Mathrine Classic straight Tilt 2 xyz / 23,2x23gb
150st/xyz 20 abc/xyz
3 90072815 65944 212 KRT 164,720 18448,64
UnBundled 2-pag
Mathrine Classic straight Tilt 2 xyz / 23,2x23gb
150st/bunt 20 bunt/bal
Some Valid Text
Some More Valid Text Some More Valid Text"""
rdata = data.split('\n')
skipprows = rdata.index('Pos Sequence Batch datax datay dataz dataa datab')
lines = rdata[skipprows + 1:]
i = 1 # looking for this + space at string start to see when 1 line id done
part = [] # collects parts that belong to one line
result = [] # holds the joined lines from part
for li in lines:
if li.startswith(f'{i} '): # look for linenr + space
if part: # do not add empty parts
result.append(' '.join(part)) # add joined if something in it
part = [li] # start with current li for next parts
i += 1 # increase so we look for next one
else:
part.append(li)
if part: # add last part if not empty
result.append(' '.join(part))
print(result) # print all
輸出:
['1 00061680 904834 20.35 REV 177,650 5329,50 Bundled 2-rev 42al/xyz Neon Classic Unit 1300 abc \\ 1638\x048',
'2 00012815 55244 815 FWD 164,720 18448,64 UnBundled 2-pag Mathrine Classic straight Tilt 2 xyz / 23,2x23gb 150st/xyz 20 abc/xyz',
'3 90072815 65944 212 KRT 164,720 18448,64 UnBundled 2-pag Mathrine Classic straight Tilt 2 xyz / 23,2x23gb 150st/bunt 20 bunt/bal Some Valid Text Some More Valid Text Some More Valid Text']
警告 :如果您的台詞恰好像:
1 Some thing to eat
and some more data of it, containing
2 packs each
2 Some other thing to eat to get more muscles
and even more text containing
3 things that make you BIGGGER
3 Last text ....
解析將變得很困難,您將無法獲得正確的數據。
使用re.findall()
函數和特定的正則表達式模式:
rows = re.findall(r'(^\d{1,2} .+?)(?=\n(?:\d+ |Some Valid Tex))', data, re.DOTALL | re.M)
print(rows)
輸出:
['1 00061680 904834 20.35 REV 177,650 5329,50\nBundled 2-rev 42al/xyz\nNeon Classic Unit 1300 abc \\ 1638\x048', '2 00012815 55244 815 FWD 164,720 18448,64\nUnBundled 2-pag\nMathrine Classic straight Tilt 2 xyz / 23,2x23gb\n150st/xyz 20 abc/xyz', '3 90072815 65944 212 KRT 164,720 18448,64\nUnBundled 2-pag\nMathrine Classic straight Tilt 2 xyz / 23,2x23gb\n150st/bunt 20 bunt/bal']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.