简体   繁体   English

使用 python 中的美丽汤从 xml 文件中提取特定标签

[英]extract a specific tag from xml file using beautiful soup in python

I have an xml file (lets call is abc.xml) which looks like this.我有一个看起来像这样的 xml 文件(让我们调用的是 abc.xml)。

<?xml version="1.0" encoding="UTF-8"?>

<properties>
  <product name="XYZ" version="123"/>
  <application-links>
    <application-links>
      <id>111111111111111</id>
      <name>Link_1</name>
      <primary>true</primary>
      <type>applinks.ABC</type>
      <display-url>http://ABC.displayURL</display-url>
      <rpc-url>http://ABC.displayURL</rpc-url>
    </application-links>
  </application-links>
</properties>

my python code is like this我的 python 代码是这样的

f = open ('file.xml', 'r')
from bs4 import BeautifulSoup
soup = BeautifulSoup(f,'lxml')

print(soup.product)

for applinks in soup.application-links:
    print(applinks)

which prints the following打印以下内容

<product name="XYZ" version="123"></product>
Traceback (most recent call last):
  File "parse.py", line 7, in <module>
    for applinks in soup.application-links:
NameError: name 'links' is not defined

Please can you help me understand how to print lines which have tags including a dash/hyphen '-'请你能帮我理解如何打印包含破折号/连字符'-'的标签的行

I don't know if beautifulsoup is the best option here, but I really suggest using the ElementTree module in python like so:我不知道beautifulsoup是否是这里的最佳选择,但我真的建议在 python 中使用ElementTree模块,如下所示:

>>> import xml.etree.ElementTree as ET
>>> root = ET.parse('file.xml').getroot()
>>> for app in root.findall('*/application-links/'):
...     print(app.text)
111111111111111
Link_1
true
applinks.ABC
http://ABC.displayURL
http://ABC.displayURL

So, to print the value inside the <name> tag, you can do so:因此,要打印<name>标记内的值,您可以这样做:

>>> for app in root.findall('*/application-links/name'):
...     print(app.text)
Link_1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM