如何在一组xml文档中添加新的xml属性？

Question

I am normalizing into a pandas dataframe a bunch of xml documents from a directory. 我正在将目录中的一堆xml文档标准化为pandas数据框。 However, In order to do it correctly I think it is easier to assign to each xml document an id, for later concatenating all the documents in a single dataframe. 但是，为了正确执行此操作，我认为为每个xml文档分配一个ID会更容易，以便以后将所有文档合并到一个数据框中。 Therefore, I tried to add a numerical id as follows: 因此，我尝试添加一个数字ID，如下所示：

for filepath in glob(os.path.join('../data/trainingFiles/', '*.xml')):
    with open(filepath) as f:
        xml_doc = BeautifulSoup(f.read(), 'lxml')
        for i, sentences in enumerate(xml_doc.find_all("sentence")):
            sentences['pandas_id'] = str(i)
            print(sentences)

However, the above code is adding to all the sentence attributes different numerical ids inside the same document. 但是，以上代码将同一文档内所有不同的数字ID添加到所有句子属性中。 How can I assign a different id to each document (ie I would like to add the same id to all the elements inside a document)?.. is there any way of carrying a reference of which document I am adding the pandas id attribute? 如何为每个文档分配不同的ID（即，我想向文档中的所有元素添加相同的ID）？..有什么方法可以携带对我要添加熊猫ID属性的文档的引用？

Answer 1

You can try this code, the document_id will change only with each file: 您可以尝试使用以下代码， document_id将仅随每个文件更改：

for document_id, filepath in enumerate(glob(os.path.join('../data/trainingFiles/', '*.xml'))):
    with open(filepath) as f:
        xml_doc = BeautifulSoup(f.read(), 'lxml')
        for sentences in xml_doc.find_all("sentence"):
            sentences['pandas_id'] = str(document_id)
            print(sentences)

如何在一组xml文档中添加新的xml属性？

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-08-23 21:36:52

如何在一组xml文档中添加新的xml属性？

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-08-23 21:36:52

解决方案1
1 已采纳 2018-08-23 21:36:52