如何在python脚本中添加文件路径

Question

I have got a script that delves into a given directory recursively, digs out all files that have a specific file name and puts the contents of all the ones it finds into a single file. 我有一个脚本，可以递归地研究给定目录，挖掘出具有特定文件名的所有文件，并将找到的所有文件的内容放入单个文件中。

The script works as I need it to in terms of hunting the files and outputting all the contents to a single xml file. 在搜寻文件并将所有内容输出到单个xml文件方面，脚本可以按我的需要工作。 However I would like it to add in the file path as well so I know the location of each given file it has found. 但是我也想在文件路径中添加它，因此我知道它找到的每个给定文件的位置。 This is my code so far: 到目前为止这是我的代码：

import glob

lines=[] #list
for inputfile in glob.glob('D:\path\to\scan\**\project_information.xml', recursive=True): 
    with open ('D:\path\result\output.xml', 'w') as out_file:
        with open (inputfile, "rt") as in_file:  
            print("<projectpath>", inputfile, file=out_file)
            for line in in_file: 
                lines.append(line.rstrip('\n'))
            for linenum, line in enumerate(lines):
                print(line, file=out_file)

Each project_information.xml the script looks for looks like the below: 脚本查找的每个project_information.xml如下所示：

<project>
<name>project1</name>
<Projection>UTM84-30N</Projection>
<Build>Jan 30 2015</Build>
</project>

Ideally for every entry that is found and subsequently added to my output I would like it write the file path within the tag. 理想情况下，对于找到的每个条目，然后将其添加到我的输出中，我希望它在标记中写入文件路径。 (You may have guess this is an XML table to go in to Excel) for example - （您可能会猜想这是要插入Excel的XML表），例如-

<project>
**<filepath>C:\path\to\file\**
<name>project1</name>
<Projection>UTM84-30N</Projection>
<Build>Jan 30 2015</Build>
</project>
<project>
**<filepath>C:\path\to\file\blah\**
<name>project2</name>
<Projection>UTM84-30N</Projection>
<Build>Jan 30 2015</Build>
</project>

I thought adding a line after the with statement would do it but that only appears to print the first occurrence and not get in the loop - Not only that even if that did work, it wouldn't be within the parent tag. 我以为在with语句之后添加一行会做到这一点，但这似乎只显示第一次出现的内容，而不会进入循环-不仅如此，即使它确实起作用，也不会在父标记内。

print("<projectpath>", inputfile, file=out_file)

Pretty new to python so any suggestions or changes to my crude code are very much appreciated! python非常新，因此非常感谢我对原始代码的任何建议或更改！

############UPDATE ############ UPDATE

So as @olinox14 suggested I had a look at the lxml module and got it to add a new tag in with that. 因此，正如@ olinox14建议的那样，我查看了lxml模块，并在其中添加了一个新标签。

 tree = ET.parse(in_file) root = tree.getroot() for child in root: folder = ET.SubElement(child, 'folder') folder.text = (in_file.name) tree.write('C:\\\\output\\\\output2.xml')

Answer 1

Many things here: 这里有很多东西：

AsGPhilo said in the comments, consider using a dedicated library to handle XML files, most known is lxml . 正如GPhilo在评论中所说，请考虑使用专用库来处理XML文件，最著名的是lxml 。
You should permute your for inputfile in... and with open(...) as out_file to avoid opening/closing this file at each iteration 您应该将输入文件置换for inputfile in...并with open(...) as out_file以避免在每次迭代时打开/关闭此文件
Do not use print(..., file=...) when you can use out_file.write(...) 可以使用out_file.write(...)时不要使用print(..., file=...) out_file.write(...)

A good start here, you could need to arrange it: 这是一个好的开始，您可能需要进行安排：

import glob
from lxml import etree

root = etree.Element("root")

with open ('D:\path\result\output.xml', 'w') as out_file:
    for inputfile in glob.glob('D:\path\to\scan\**\project_information.xml', recursive=True): 

        with open (inputfile, "r") as in_file:
            project = etree.SubElement(root, "project", path=in_file)
            # ... continue with your processing

    root.write(out_file, pretty_print=True)

如何在python脚本中添加文件路径

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-03-12 10:01:53

如何在python脚本中添加文件路径

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-03-12 10:01:53

解决方案1
0 已采纳 2019-03-12 10:01:53