简体   繁体   中英

Create a list of lists from reading a text file

so i'm trying to automate a tedious task.

I have this test.txt it concludes file paths to some pdf files.

 "L:\Advertentie woningplattegronden\Definitieve plattegronden\Gemeente Delft\Complex 1004\Copy\1004A0Oa00 Jacob Gillishof 10.pdf"
"L:\Advertentie woningplattegronden\Definitieve plattegronden\Gemeente Delft\Complex 1004\Copy\1004A0Oa00 Jacob Gillishof 11.pdf"
"L:\Advertentie woningplattegronden\Definitieve plattegronden\Gemeente Delft\Complex 1004\Copy\1004A0Oa00 Jacob Gillishof 14.pdf"

what I need my script to do for step 1 is make a list of every line which I did with:

with open('Test.txt') as f:
textlines = f.read().splitlines()
print(textlines)

which results in:

[
    '"L:\\Advertentie woningplattegronden\\Definitieve plattegronden\\Gemeente Delft\\Complex 1004\\Copy\\1004A0Oa00 Jacob Gillishof 10.pdf"',
    '"L:\\Advertentie woningplattegronden\\Definitieve plattegronden\\Gemeente Delft\\Complex 1004\\Copy\\1004A0Oa00 Jacob Gillishof 11.pdf"',
    '"L:\\Advertentie woningplattegronden\\Definitieve plattegronden\\Gemeente Delft\\Complex 1004\\Copy\\1004A0Oa00 Jacob Gillishof 14.pdf"',
    "",
    "",
]

not sure why the last two objects are empty string though.

then I want to create another list that loops through the textlines list and seperates everything within the \ of the paths

So i want a list containing:

some_list = [
    "L:",
    "Advertentie woningplattegronden",
    "Definitieve plattegronden",
    "Gemeente Delft",
    "Complex 1004",
    "Copy",
    "1004A0Oa00 Jacob Gillishof 10.pdf",
]

Eventually I want to be able to put some indexes from some_list into a new variable so I can later on create a file (csv) that contains these variables.

Everytime I try to loop through the first list I get an error telling me the string index is out of range.

I'm not asking for a complete scripts btw, but some guidance would be nice on how to proceed with this script.

Thanks in advance!

Something like this, maybe? I've peppered some helpful comments here and there.

filenames = []

with open("file.txt", "r") as file:
    for line in file:
        line = line.strip()  # remove any trailing/leading spaces
        line = line.strip('"')  # remove wrapping quotes
        if line:  # if there still is content...
            filenames.append(line)  # save the valid line.

filename_components = [
    filename.split("\\")  # Split the filename by backslashes
    for filename in filenames  # for each filename  # in the filenames we just stored
]

for split_name in filename_components:
    print(split_name)  # print out each split name

outputs eg

['L:', 'Advertentie woningplattegronden', 'Definitieve plattegronden', 'Gemeente Delft', 'Complex 1004', 'Copy', '1004A0Oa00 Jacob Gillishof 10.pdf']
['L:', 'Advertentie woningplattegronden', 'Definitieve plattegronden', 'Gemeente Delft', 'Complex 1004', 'Copy', '1004A0Oa00 Jacob Gillishof 11.pdf']
['L:', 'Advertentie woningplattegronden', 'Definitieve plattegronden', 'Gemeente Delft', 'Complex 1004', 'Copy', '1004A0Oa00 Jacob Gillishof 14.pdf']

You could try using.split("\")

splittedLines = [l.split("\") for l in textlines]

First, you need to clean your inputs a little bit. Those empty strings are probably empty lines at the end of the file, so you will have to ignore those. Also, notice that your lines come wrapped in double quotes, which is probably not what you want. You can remove them with .strip('"')

Lastly, I guess IndexError s probably come from trying to find the backslash in the empty lines, which makes me think you're manually searching for them instead of using split. As @Bernd said, using .split("\\") on every line will cut the string into all the pieces you want and return a list with them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM