简体   繁体   English

通过读取文本文件创建列表列表

[英]Create a list of lists from reading a text file

so i'm trying to automate a tedious task.所以我正在尝试自动化一项繁琐的任务。

I have this test.txt it concludes file paths to some pdf files.我有这个 test.txt 它总结了一些 pdf 文件的文件路径。

 "L:\Advertentie woningplattegronden\Definitieve plattegronden\Gemeente Delft\Complex 1004\Copy\1004A0Oa00 Jacob Gillishof 10.pdf"
"L:\Advertentie woningplattegronden\Definitieve plattegronden\Gemeente Delft\Complex 1004\Copy\1004A0Oa00 Jacob Gillishof 11.pdf"
"L:\Advertentie woningplattegronden\Definitieve plattegronden\Gemeente Delft\Complex 1004\Copy\1004A0Oa00 Jacob Gillishof 14.pdf"

what I need my script to do for step 1 is make a list of every line which I did with:我需要我的脚本为第 1 步做的是列出我所做的每一行:

with open('Test.txt') as f:
textlines = f.read().splitlines()
print(textlines)

which results in:这导致:

[
    '"L:\\Advertentie woningplattegronden\\Definitieve plattegronden\\Gemeente Delft\\Complex 1004\\Copy\\1004A0Oa00 Jacob Gillishof 10.pdf"',
    '"L:\\Advertentie woningplattegronden\\Definitieve plattegronden\\Gemeente Delft\\Complex 1004\\Copy\\1004A0Oa00 Jacob Gillishof 11.pdf"',
    '"L:\\Advertentie woningplattegronden\\Definitieve plattegronden\\Gemeente Delft\\Complex 1004\\Copy\\1004A0Oa00 Jacob Gillishof 14.pdf"',
    "",
    "",
]

not sure why the last two objects are empty string though.不知道为什么最后两个对象是空字符串。

then I want to create another list that loops through the textlines list and seperates everything within the \ of the paths然后我想创建另一个列表,该列表循环遍历 textlines 列表并分隔路径 \ 中的所有内容

So i want a list containing:所以我想要一个包含以下内容的列表:

some_list = [
    "L:",
    "Advertentie woningplattegronden",
    "Definitieve plattegronden",
    "Gemeente Delft",
    "Complex 1004",
    "Copy",
    "1004A0Oa00 Jacob Gillishof 10.pdf",
]

Eventually I want to be able to put some indexes from some_list into a new variable so I can later on create a file (csv) that contains these variables.最终,我希望能够将 some_list 中的一些索引放入一个新变量中,以便稍后创建一个包含这些变量的文件 (csv)。

Everytime I try to loop through the first list I get an error telling me the string index is out of range.每次我尝试遍历第一个列表时,都会收到一条错误消息,告诉我字符串索引超出范围。

I'm not asking for a complete scripts btw, but some guidance would be nice on how to proceed with this script.顺便说一句,我并不是要一个完整的脚本,但是关于如何继续使用这个脚本的一些指导会很好。

Thanks in advance!提前致谢!

Something like this, maybe?像这样的东西,也许? I've peppered some helpful comments here and there.我在这里和那里发表了一些有用的评论。

filenames = []

with open("file.txt", "r") as file:
    for line in file:
        line = line.strip()  # remove any trailing/leading spaces
        line = line.strip('"')  # remove wrapping quotes
        if line:  # if there still is content...
            filenames.append(line)  # save the valid line.

filename_components = [
    filename.split("\\")  # Split the filename by backslashes
    for filename in filenames  # for each filename  # in the filenames we just stored
]

for split_name in filename_components:
    print(split_name)  # print out each split name

outputs eg输出例如

['L:', 'Advertentie woningplattegronden', 'Definitieve plattegronden', 'Gemeente Delft', 'Complex 1004', 'Copy', '1004A0Oa00 Jacob Gillishof 10.pdf']
['L:', 'Advertentie woningplattegronden', 'Definitieve plattegronden', 'Gemeente Delft', 'Complex 1004', 'Copy', '1004A0Oa00 Jacob Gillishof 11.pdf']
['L:', 'Advertentie woningplattegronden', 'Definitieve plattegronden', 'Gemeente Delft', 'Complex 1004', 'Copy', '1004A0Oa00 Jacob Gillishof 14.pdf']

You could try using.split("\")您可以尝试使用.split("\")

splittedLines = [l.split("\") for l in textlines]

First, you need to clean your inputs a little bit.首先,您需要稍微清理一下您的输入。 Those empty strings are probably empty lines at the end of the file, so you will have to ignore those.这些空字符串可能是文件末尾的空行,因此您必须忽略它们。 Also, notice that your lines come wrapped in double quotes, which is probably not what you want.另外,请注意您的行用双引号括起来,这可能不是您想要的。 You can remove them with .strip('"')您可以使用.strip('"')删除它们

Lastly, I guess IndexError s probably come from trying to find the backslash in the empty lines, which makes me think you're manually searching for them instead of using split.最后,我猜IndexError可能来自试图在空行中找到反斜杠,这让我觉得你是手动搜索它们而不是使用拆分。 As @Bernd said, using .split("\\") on every line will cut the string into all the pieces you want and return a list with them.正如@Bernd 所说,在每一行上使用.split("\\")会将字符串切割成您想要的所有部分并返回一个列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM