I'm trying to rename a bunch of payslip txt files i python using regex. The elements that I want to use for this are personnummer (social security number) and datum (date). Personnummer is formatted like this \d\d\d\d\d\d-\d\d\d\d and works fine by itself using the code below.
But when i try to add datum as well as personnummer , which is formatted like this GFROM:\d\d\d\d\d\d\d\d (i only want the numbers, not the GFROM part) I run into a syntax error.
Do you have any suggestions? I've looked through the previous posts but haven't really found anything there.
Many thanks in advance.
/Andrew
import os
import re
mydir = 'C:/Users/atutt-wi/Desktop/USB/Matrikelkort/matrikelkort prov'
personnummer = "(\d\d\d\d\d\d\-\d\d\d\d)"
datum = "(GFROM:(\d\d\d\d\d\d\d\d))"
for arch in os.listdir(mydir):
archpath = os.path.join(mydir, arch)
with open(archpath) as f:
txt = f.read()
s = re.search(personnummer, txt)
t = re.search(datum, txt)
name = '19' + s.group() + ' ' + '20' + t.group() + ' Matrikelkort'+ '.txt'
newpath = os.path.join(mydir, name)
os.rename(archpath, newpath)```
**The input files look like this;**
DATUM: 010122 KUND:20290
XXX KOMMUN SIDA: 23 70677
PERSONS NAME UTB-KOD ANS.DAT: 010206-3008
BOK/ G T ARBETS- ARB ARB L L P B BRUT L FAST
GÄLLER GÄLLER AVG LÖP AV CAK/ BEFATTNINGS R Y ANST TIDS TID TID P G L L AVDR K BLPP BELOPP LÖNE UPP DEL
FR O M T O M KOD FÖR DB NR TAL BSK -BENÄMNING P P FORM VILLKOR % HEL L R G G FROM L FROM FIP*A lÖN TIML OMF PEN
----------------------------------------------------------------------------------------------------------------------------------------
760701 790630 110 83 20 5070LOK HEMSAMARIT 5 1 4 10004000 Ö 7607 000000 800 000000
790701 800108 970 76 21 5017ANA-T HEMSAMARIT 5T1 4 00004000 K 077907 000000000000 000000
KUNDNR:20290 SIDA: 023 70677 GFROM:19760701 GTOM:19800108 PERSONS NAME 010206-3008
000001L 2 000001010122 33399CMT011MATRIKELKORT Matrikelkort 000001CMZ029050330-7118 01-01-22 CMZ02901
120290
**The errors i got**
runfile('C:/Users/atutt-wi/Desktop/USB/regex personnummer och datum matrikelkort tool.py', wdir='C:/Users/atutt-wi/Desktop/USB')
Traceback (most recent call last):
File "<ipython-input-21-f7cd01adb9a3>", line 1, in <module>
runfile('C:/Users/atutt-wi/Desktop/USB/regex personnummer och datum matrikelkort tool.py', wdir='C:/Users/atutt-wi/Desktop/USB')
File "C:\Users\atutt-wi\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py",
line 827, in runfile
execfile(filename, namespace)
File "C:\Users\atutt-wi\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py",
line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/atutt-wi/Desktop/USB/regex personnummer och datum matrikelkort tool.py", line 24, in <module>
os.rename(archpath, newpath)
OSError: [WinError 123] Incorrect syntax for file name,
directory name or volume label: 'C:/Users/atutt-wi/Desktop/USB/Matrikelkort/matrikelkort prov\\File17.txt' ->
'C:/Users/atutt-wi/Desktop/USB/Matrikelkort/matrikelkort prov\\010206-3008 20GFROM:19760701 Matrikelkort.txt'
**Update: When i removed the ':' from GFROM i get the following error**
File "C:\Users\atutt-wi\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "C:\Users\atutt-wi\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/atutt-wi/Desktop/USB/regex personnummer och datum matrikelkort tool.py", line 22, in <module>
name = '19' + s.group() + ' ' + '20' + t.group() + ' Matrikelkort'+ '.txt'
AttributeError: 'NoneType' object has no attribute 'group'
Here is a snippet you could try:
import os
import re
rx_num = re.compile(r"\s(\d{6}-\d{4})\s", re.M)
rx_dat = re.compile("GFROM:(\d\d\d\d\d\d\d\d)\s", re.M)
for arch in os.listdir(mydir):
archpath = os.path.join(mydir, arch)
with open(archpath) as f:
txt = f.read()
s_match = rx_num.search(txt)
s = s_match.group() if s_match is not None else "[Missing]"
t_match = rx_dat.search(txt)
t = t_match.group() if t_match is not None else "[Missing]"
name = '19' + s + ' ' + '20' + t + ' Matrikelkort'+ '.txt'
newpath = os.path.join(mydir, name)
os.rename(archpath, newpath)
The use of
compile
is optional, but I find it clearer. I also added there.M
which is the flag for 'Multiline'. Lastly, I added those\s
before and after the groups to ensure a string like 'abd123456-7890def' would not match. Also, keep in mind that you will onsly get the first match with this code. If you want every match, try using findall instead.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.