BeaufifulSoup，标签名称中带有“-”的标签的lxml导航？

Question

<body>
<response status="success">
<policy>
<shared/>
<panorama>
<address>
    <entry name="text">
    <tag1></tag1>>
    <tag2></tag2>
    </entry>>
</address>
<service>
....
</service>
<pre-rulebase>
</pre-rulebase>
<security>
<rules>
    <entry name="some text">
    <tag1>text</tag1>>
    <tag2>text</tag2>
    </entry>
    <entry name="more text">
    <tag1>text</tag1>
    <tag2>text</tag2>
    </entry>
    ...
    </rules>
</security>
<post-rulebase>
    <entry name="some text">
    <tag1>text</tag1>>
    <tag2>text</tag2>
    </entry>
    <entry name="more text">
    <tag1>text</tag1>>
    <tag2>text</tag2>
    </entry>
</post-rulebase>
</panorama>
</policy>
</response> 
</body>

Hi,你好，

I am trying to parse above xml file using Python BeautifulSoup and lxml.我正在尝试使用 Python BeautifulSoup 和 lxml 解析以上 xml 文件。 Usually I navigate to the element using '.'.通常我使用'.'导航到元素。 eg例如

from bs4 import BeautifulSoup
with open('sample.xml', 'r') as xml_file:
    soup = BeautifulSoup(xml_file, 'lxml')

for item in soup.body.response.policy.panorama.address.find('entry'):
    <some code action>

My problem is with navigating via above for tags like '' and ''.我的问题是通过上面导航“”和“”等标签。 Since there is "-" in the tag name, the "."由于标签名称中有“-”，因此“。” navigation is not working.导航不起作用。 Also since the child tags have same names, i cannot use it direct find.此外，由于子标签具有相同的名称，我不能使用它直接查找。 How can I navigate and iterate thru tags under '' ie '' tags?如何导航和遍历“即”标签下的标签？

Answer 1

You can probably do it like this:您可能可以这样做：

from lxml import etree
rules = """[your xml, fixed]"""
doc = etree.XML(rules)
for i in doc.xpath('//post-rulebase//entry'):
    print(i.tag,i.attrib['name'])
    for t in i.xpath('.//*'):
        print(t.tag,t.text)

Output: Output：

entry some text
tag1 text
tag2 text
entry more text
tag1 text
tag2 text

Answer 2

Another method.另一种方法。

from simplified_scrapy import SimplifiedDoc, req, utils
html = '''
<address>
    <entry name="text">
    <tag1></tag1>>
    <tag2></tag2>
    </entry>>
</address>
<service>
....
</service>
<post-rulebase>
    <entry name="some text">
    <tag1>text</tag1>>
    <tag2>text</tag2>
    </entry>
    <entry name="more text">
    <tag1>text</tag1>>
    <tag2>text</tag2>
    </entry>
</post-rulebase>'''
doc = SimplifiedDoc(html)
entry = doc.select('post-rulebase').entry
print(entry)
print(entry.children.text)

Result:结果：

{'name': 'some text', 'tag': 'entry', 'html': '\n    <tag1>text</tag1>>\n    <tag2>text</tag2>\n    '}
['text', 'text']

Here are more examples: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples这里有更多例子： https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples

BeaufifulSoup，标签名称中带有“-”的标签的lxml导航？

问题描述

2 个解决方案

解决方案1
2 2020-06-17 18:47:34

解决方案2
0 2020-06-17 21:45:57

BeaufifulSoup，标签名称中带有“-”的标签的lxml导航？

问题描述

2 个解决方案

解决方案1 2 2020-06-17 18:47:34

解决方案2 0 2020-06-17 21:45:57

解决方案1
2 2020-06-17 18:47:34

解决方案2
0 2020-06-17 21:45:57