在 Python 中使用 BeautifulSoup 识别和替换 XML 元素

Question

I am trying to use BeautifulSoup4 to find and replace specific elements within an XML.我正在尝试使用 BeautifulSoup4 来查找和替换 XML 中的特定元素。 More specifically, I want to find all instances of 'file_name'(in the example below the file name is 'Cyp26A1_atRA_minus_tet_plus.txt') and replace it with the full path for that document - which is saved in the 'file_name_replacement_dir' variable.更具体地说，我想找到“file_name”的所有实例（在下面的示例中，文件名是“Cyp26A1_atRA_minus_tet_plus.txt”）并将其替换为该文档的完整路径 - 保存在“file_name_replacement_dir”变量中。 My first task, the bit i'm stuck on, is to isolate the section of interest so that I can replace it using the replaceWith() method.我的第一个任务，也就是我坚持的一点，是隔离感兴趣的部分，以便我可以使用 replaceWith() 方法替换它。

The XML XML

      <ParameterGroup name="Experiment_22">
        <Parameter name="Data is Row Oriented" type="bool" value="1"/>
        <Parameter name="Experiment Type" type="unsignedInteger" value="0"/>
        <Parameter name="File Name" type="file" value="Cyp26A1_atRA_minus_tet_plus.txt"/>
        <Parameter name="First Row" type="unsignedInteger" value="1"/>

There are actually 44 experiments with 4 different file names (So 11 with file name 1, 11 with file name 2 and so on).实际上有 44 个实验有 4 个不同的文件名（所以 11 个文件名 1，11 个文件名 2 等等）。 So the above snippet of XML is repeated 44 times, just with different files stored in the "File Name" line.所以上面的 XML 片段重复了 44 次，只是在“文件名”行中存储了不同的文件。

My Code so far我的代码到目前为止

xml_dir = 'D:\MPhil\Model_Building\Models\Retinoic_acid\[06]\RAR_Models\Model_Line_2'
xml_file_name = 'RARa_RXR_M22.cps'
xml=model_dir+'\\'+model_name
file_name_replacement_dir = D:\MPhil\Model_Building\Models\Retinoic_acid\[06]\RAR_Models
soup = BeautifulSoup(open(xml))
print soup.find_all('parametergroup name="Experiment_22"')

This however returns an empty list.然而，这会返回一个空列表。 I've also tried a few other functions in place of 'soup.findall()' but still haven't been able to find a handle to the filename.我还尝试了一些其他函数来代替“soup.findall()”，但仍然无法找到文件名的句柄。 Does anybody know how to do what I'm trying to do?有人知道如何做我想做的事吗？

Answer 1

xml = '<ParameterGroup name="Experiment_22">\
<Parameter name="Data is Row Oriented" type="bool" value="1"/>\
<Parameter name="Experiment Type" type="unsignedInteger" value="0"/>\
<Parameter name="File Name" type="file" value="Cyp26A1_atRA_minus_tet_plus.txt"/>\
<Parameter name="First Row" type="unsignedInteger" value="1"/>\
</ParameterGroup>'

from bs4 import BeautifulSoup
import os
soup = BeautifulSoup(xml)

for tag in soup.find_all("parameter", {'name': 'File Name'}):
    tag['value'] = os.path.join('new_dir', tag['value'])

print soup

Close your XML 'ParameterGroup' tag.关闭您的 XML 'ParameterGroup' 标签。
Capitalisation of tags may not work with BeautifulSoup, try parameter in lower case.标签的大写可能不适用于 BeautifulSoup，请尝试小写parameter 。
use os.path to manipulate paths so that it works cross-platforms.使用os.path来操作路径，使其跨平台工作。

Answer 2

Your selector for find_all is wrong you need to separate the tag name and attribute like so:您的 find_all 选择器是错误的，您需要像这样将标签名称和属性分开：

find_all("Parameter",{'name':'File Name'})

That will get you all the file name tags directly.这将直接为您提供所有文件名标签。 If you really need the parent tag then pass in "ParameterGroup" without the attribute dictionary.如果您确实需要父标签，则在没有属性字典的情况下传入“ParameterGroup”。

Not sure if BeautifulSoup require lower casing your tags, you may have to experiment with that.不确定 BeautifulSoup 是否需要小写您的标签，您可能需要对此进行试验。

在 Python 中使用 BeautifulSoup 识别和替换 XML 元素

问题描述

2 个解决方案

解决方案1
4 已采纳 2015-06-26 09:56:43

解决方案2
2 2015-06-26 09:43:12

在 Python 中使用 BeautifulSoup 识别和替换 XML 元素

问题描述

2 个解决方案

解决方案1 4 已采纳 2015-06-26 09:56:43

解决方案2 2 2015-06-26 09:43:12

解决方案1
4 已采纳 2015-06-26 09:56:43

解决方案2
2 2015-06-26 09:43:12