![](/img/trans.png)
[英]Extract certain data from multiple .txt files using Python and RegEx
[英]Using regex to extract two elements from txt file and rename (python)
我正在嘗試使用正則表達式重命名 python 中的一堆工資單 txt 文件。 我想為此使用的元素是personnummer (社會安全號碼)和datum (日期)。 Personnummer的格式如下 \d\d\d\d\d\d-\d\d\d\d 並使用下面的代碼自行正常工作。
但是當我嘗試添加datum和personnummer時,它的格式如下 GFROM:\d\d\d\d\d\d\d\d (我只想要數字,而不是 GFROM 部分)我遇到語法錯誤。
你有什么建議嗎? 我瀏覽了以前的帖子,但沒有真正找到任何東西。
提前謝謝了。
/安德魯
import os
import re
mydir = 'C:/Users/atutt-wi/Desktop/USB/Matrikelkort/matrikelkort prov'
personnummer = "(\d\d\d\d\d\d\-\d\d\d\d)"
datum = "(GFROM:(\d\d\d\d\d\d\d\d))"
for arch in os.listdir(mydir):
archpath = os.path.join(mydir, arch)
with open(archpath) as f:
txt = f.read()
s = re.search(personnummer, txt)
t = re.search(datum, txt)
name = '19' + s.group() + ' ' + '20' + t.group() + ' Matrikelkort'+ '.txt'
newpath = os.path.join(mydir, name)
os.rename(archpath, newpath)```
**The input files look like this;**
DATUM: 010122 KUND:20290
XXX KOMMUN SIDA: 23 70677
PERSONS NAME UTB-KOD ANS.DAT: 010206-3008
BOK/ G T ARBETS- ARB ARB L L P B BRUT L FAST
GÄLLER GÄLLER AVG LÖP AV CAK/ BEFATTNINGS R Y ANST TIDS TID TID P G L L AVDR K BLPP BELOPP LÖNE UPP DEL
FR O M T O M KOD FÖR DB NR TAL BSK -BENÄMNING P P FORM VILLKOR % HEL L R G G FROM L FROM FIP*A lÖN TIML OMF PEN
----------------------------------------------------------------------------------------------------------------------------------------
760701 790630 110 83 20 5070LOK HEMSAMARIT 5 1 4 10004000 Ö 7607 000000 800 000000
790701 800108 970 76 21 5017ANA-T HEMSAMARIT 5T1 4 00004000 K 077907 000000000000 000000
KUNDNR:20290 SIDA: 023 70677 GFROM:19760701 GTOM:19800108 PERSONS NAME 010206-3008
000001L 2 000001010122 33399CMT011MATRIKELKORT Matrikelkort 000001CMZ029050330-7118 01-01-22 CMZ02901
120290
**The errors i got**
runfile('C:/Users/atutt-wi/Desktop/USB/regex personnummer och datum matrikelkort tool.py', wdir='C:/Users/atutt-wi/Desktop/USB')
Traceback (most recent call last):
File "<ipython-input-21-f7cd01adb9a3>", line 1, in <module>
runfile('C:/Users/atutt-wi/Desktop/USB/regex personnummer och datum matrikelkort tool.py', wdir='C:/Users/atutt-wi/Desktop/USB')
File "C:\Users\atutt-wi\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py",
line 827, in runfile
execfile(filename, namespace)
File "C:\Users\atutt-wi\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py",
line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/atutt-wi/Desktop/USB/regex personnummer och datum matrikelkort tool.py", line 24, in <module>
os.rename(archpath, newpath)
OSError: [WinError 123] Incorrect syntax for file name,
directory name or volume label: 'C:/Users/atutt-wi/Desktop/USB/Matrikelkort/matrikelkort prov\\File17.txt' ->
'C:/Users/atutt-wi/Desktop/USB/Matrikelkort/matrikelkort prov\\010206-3008 20GFROM:19760701 Matrikelkort.txt'
**Update: When i removed the ':' from GFROM i get the following error**
File "C:\Users\atutt-wi\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "C:\Users\atutt-wi\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/atutt-wi/Desktop/USB/regex personnummer och datum matrikelkort tool.py", line 22, in <module>
name = '19' + s.group() + ' ' + '20' + t.group() + ' Matrikelkort'+ '.txt'
AttributeError: 'NoneType' object has no attribute 'group'
這是您可以嘗試的片段:
import os
import re
rx_num = re.compile(r"\s(\d{6}-\d{4})\s", re.M)
rx_dat = re.compile("GFROM:(\d\d\d\d\d\d\d\d)\s", re.M)
for arch in os.listdir(mydir):
archpath = os.path.join(mydir, arch)
with open(archpath) as f:
txt = f.read()
s_match = rx_num.search(txt)
s = s_match.group() if s_match is not None else "[Missing]"
t_match = rx_dat.search(txt)
t = t_match.group() if t_match is not None else "[Missing]"
name = '19' + s + ' ' + '20' + t + ' Matrikelkort'+ '.txt'
newpath = os.path.join(mydir, name)
os.rename(archpath, newpath)
compile
的使用是可選的,但我發現它更清晰。 我還添加了re.M
,它是“Multiline”的標志。 最后,我在組之前和之后添加了那些\s
以確保像 'abd123456-7890def' 這樣的字符串不匹配。 另外,請記住,您只會獲得與此代碼匹配的第一個匹配項。 如果您想要每場比賽,請嘗試使用findall 。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.