[英]Identify and replace elements of XML using BeautifulSoup in Python
I am trying to use BeautifulSoup4 to find and replace specific elements within an XML.我正在尝试使用 BeautifulSoup4 来查找和替换 XML 中的特定元素。 More specifically, I want to find all instances of 'file_name'(in the example below the file name is 'Cyp26A1_atRA_minus_tet_plus.txt') and replace it with the full path for that document - which is saved in the 'file_name_replacement_dir' variable.
更具体地说,我想找到“file_name”的所有实例(在下面的示例中,文件名是“Cyp26A1_atRA_minus_tet_plus.txt”)并将其替换为该文档的完整路径 - 保存在“file_name_replacement_dir”变量中。 My first task, the bit i'm stuck on, is to isolate the section of interest so that I can replace it using the replaceWith() method.
我的第一个任务,也就是我坚持的一点,是隔离感兴趣的部分,以便我可以使用 replaceWith() 方法替换它。
The XML XML
<ParameterGroup name="Experiment_22">
<Parameter name="Data is Row Oriented" type="bool" value="1"/>
<Parameter name="Experiment Type" type="unsignedInteger" value="0"/>
<Parameter name="File Name" type="file" value="Cyp26A1_atRA_minus_tet_plus.txt"/>
<Parameter name="First Row" type="unsignedInteger" value="1"/>
There are actually 44 experiments with 4 different file names (So 11 with file name 1, 11 with file name 2 and so on).实际上有 44 个实验有 4 个不同的文件名(所以 11 个文件名 1,11 个文件名 2 等等)。 So the above snippet of XML is repeated 44 times, just with different files stored in the "File Name" line.
所以上面的 XML 片段重复了 44 次,只是在“文件名”行中存储了不同的文件。
My Code so far我的代码到目前为止
xml_dir = 'D:\MPhil\Model_Building\Models\Retinoic_acid\[06]\RAR_Models\Model_Line_2'
xml_file_name = 'RARa_RXR_M22.cps'
xml=model_dir+'\\'+model_name
file_name_replacement_dir = D:\MPhil\Model_Building\Models\Retinoic_acid\[06]\RAR_Models
soup = BeautifulSoup(open(xml))
print soup.find_all('parametergroup name="Experiment_22"')
This however returns an empty list.然而,这会返回一个空列表。 I've also tried a few other functions in place of 'soup.findall()' but still haven't been able to find a handle to the filename.
我还尝试了一些其他函数来代替“soup.findall()”,但仍然无法找到文件名的句柄。 Does anybody know how to do what I'm trying to do?
有人知道如何做我想做的事吗?
xml = '<ParameterGroup name="Experiment_22">\
<Parameter name="Data is Row Oriented" type="bool" value="1"/>\
<Parameter name="Experiment Type" type="unsignedInteger" value="0"/>\
<Parameter name="File Name" type="file" value="Cyp26A1_atRA_minus_tet_plus.txt"/>\
<Parameter name="First Row" type="unsignedInteger" value="1"/>\
</ParameterGroup>'
from bs4 import BeautifulSoup
import os
soup = BeautifulSoup(xml)
for tag in soup.find_all("parameter", {'name': 'File Name'}):
tag['value'] = os.path.join('new_dir', tag['value'])
print soup
parameter
in lower case.parameter
。os.path
to manipulate paths so that it works cross-platforms.os.path
来操作路径,使其跨平台工作。Your selector for find_all is wrong you need to separate the tag name and attribute like so:您的 find_all 选择器是错误的,您需要像这样将标签名称和属性分开:
find_all("Parameter",{'name':'File Name'})
That will get you all the file name tags directly.这将直接为您提供所有文件名标签。 If you really need the parent tag then pass in "ParameterGroup" without the attribute dictionary.
如果您确实需要父标签,则在没有属性字典的情况下传入“ParameterGroup”。
Not sure if BeautifulSoup require lower casing your tags, you may have to experiment with that.不确定 BeautifulSoup 是否需要小写您的标签,您可能需要对此进行试验。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.