This is the structure of the files:
(number)/firstdirectory/unimportant/unimportant/lastdirectory.DAT
I need to write a regex that will place the number, the first directory, and the last directory in groups 1, 2, and 3 respectively.
example of other files(files I use to test):
(1)/Downloads/Maps/Map of Places.pdf
(25)/Publications/1995Publications.pdf
(31)/Table-of-Contents.pdf
This is what I have:
import re
reggie = r"^.* \(([0-9]*)\)(.*)\/([^\/]*)\.(.*)$"
with open('test2.txt') as f:
lines = f.readlines()
for line in lines:
match = re.search(reggie, line)
if match:
num = match.group(1)
sub = match.group(2)
file = match.group(3)
print(num, sub, file)
What I hope to get is:
1 Downloads Map of Places
25 Publications 1995Publications
31 Table-of-Contents (assumes theres no first directory and just takes the last)
What I end up getting is:
1 /Downloads/Maps Map of Places
25 /Publications 1963Publications
31 Table of Contents
It's very close, the only problem is, when there's more than 2 directories, the middle ones are included with the first one and there's unnecessary forward slashes before the first directory.
I've been thinking about this for a couple hours, and I'm stumped. My best attempt was to force a forward slash after the number to remove the unnecessary ones in the output, then adding an optional one after the first directory, in cases where there's more than 2 directories.
Like this:
reggie = r"^.*\(([0-9]*)\)\/(.*)\/*([^\/]*)\.(.*)$"
However, with this, all the directories merge into one and there is no last directory.
Any help would be appreciated, it seems like a simple solution, but I must be looking at it all wrong.
First of all regex is not the way to go. Pathlib should be used instead.
Here is the regex solution if you do wish to use it anyway:
import re
regex = re.compile(r"\((\d+)\)(?:/([^/]+))?.*/([^\.]+)\..*$")
paths = ["(1)/Downloads/Maps/Map of Places.pdf","(25)/Publications/1995Publications.pdf","(31)/Table-of-Contents.pdf"]
for path in paths:
print(regex.match(path).groups())
Output:
('1', 'Downloads', 'Map of Places')
('25', 'Publications', '1995Publications')
('31', None, 'Table-of-Contents')
Instead of using a regex, you should use Pathlib . It is more reliable and supports different operating systems:
import pathlib
paths = ["(1)/Downloads/Maps/Map of Places.pdf","(25)/Publications/1995Publications.pdf","(31)/Table-of-Contents.pdf"]
for path in map(pathlib.PurePath, paths): # Convert all paths to PurePaths
path_parts = path.parts
number = path_parts[0]
filename = path.stem
root_directory = path_parts[1] if len(path_parts) > 2 else None
print((number, root_directory, filename))
Output:
('(1)', 'Downloads', 'Map of Places')
('(25)', 'Publications', '1995Publications')
('(31)', None, 'Table-of-Contents')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.