[英]how to remove specifc tags in xml using python
我必須在apache-tomcat web.xml文件中刪除一些特定標簽
<?xml version="1.0" encoding="ISO-8859-1"?>
<web-app xmlns="http://java.sun.com/xml/ns/javaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
http://java.sun.com/xml/ns/javaee/web-app_3_0.xsd"
version="3.0">
<!-- ======================== Introduction ============================== -->
<!-- This document defines default values for *all* web applications -->
<!-- loaded into this instance of Tomcat. As each application is -->
<!-- deployed, this file is processed, followed by the -->
<!-- "/WEB-INF/web.xml" deployment descriptor from your own -->
<!-- applications. -->
<!-- -->
<!-- WARNING: Do not configure application-specific resources here! -->
<!-- They should go in the "/WEB-INF/web.xml" file in your application. -->
<servlet>
<servlet-name>default</servlet-name>
<servlet-class>org.apache.catalina.servlets.DefaultServlet</servlet-class>
<init-param>
<param-name>debug</param-name>
<param-value>0</param-value>
</init-param>
<init-param>
<param-name>listings</param-name>
<param-value>false</param-value>
</init-param>
<load-on-startup>1</load-on-startup>
</servlet>
<servlet>
<servlet-name>jsp</servlet-name>
<servlet-class>org.apache.jasper.servlet.JspServlet</servlet-class>
<init-param>
<param-name>fork</param-name>
<param-value>false</param-value>
</init-param>
<init-param>
<param-name>xpoweredBy</param-name>
<param-value>false</param-value>
</init-param>
<load-on-startup>3</load-on-startup>
</servlet>
<servlet>
<servlet-name>cgi</servlet-name>
<servlet-class>org.apache.catalina.servlets.CGIServlet</servlet-class>
<init-param>
<param-name>debug</param-name>
<param-value>0</param-value>
</init-param>
<init-param>
<param-name>cgiPathPrefix</param-name>
<param-value>WEB-INF/cgi</param-value>
</init-param>
<load-on-startup>5</load-on-startup>
</servlet>
</<web-app>
如果servlet-name == cgi我需要刪除整個servlet標簽。 我的代碼如下:
from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('web.xml')
servlets = tree.findall('servlet')
print "servlets : ",servlets
for servlet in servlets:
servlet_names = foo.findall('servlet-name')
for servlet_name in servlet_names:
if servlet_name == "cgi" :
print "servlet_name :", servlet_name
servlet.remove(servlet-name)
我將o / p作為servlet:[]而不是所有servlet,並且無法進入for循環。 誰能幫我 ?。
#!/usr/bin/python
from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('web.xml')
root = tree.getroot()
ns = '{http://java.sun.com/xml/ns/javaee}'
servlets = root.findall(ns + 'servlet')
print "servlets : ",servlets
for servlet in servlets:
servlet_names = servlet.findall(ns + 'servlet-name')
for servlet_name in servlet_names:
if servlet_name.text == "cgi" :
print "servlet_name :", servlet_name.text
print "removed the cgi serverlet", root.remove(servlet)
=====輸出=============== Servlet:[http://java.sun.com/xml/ns/javaee}位於7f84e09b35a8的servlet,http:// java .sun.com / xml / ns / javaee} servlet在7f84e09b3878>,http://java.sun.com/xml/ns/javaee} servlet在7f84e09b3bd8>] servlet_name:cgi刪除了cgi serverlet無
====我已經使用pdb跟蹤器找出了\\ n表示的element(servlet)值。
> /apps/manu/python/manunamespace.py(10)<module>()
-> servlet_name=servlet.find('{http://java.sun.com/xml/ns/javaee}servlet-name')
(Pdb) servlet_name
<Element {http://java.sun.com/xml/ns/javaee}servlet-name at 882878>
(Pdb) servlet_name.text
'jsp'
(Pdb) n
> /apps/manu/python/manunamespace.py(11)<module>()
-> print "servlet_name:", servlet_name.text
(Pdb) servlet_name.text
'cgi'
(Pdb) servlet.text
'\n '
(Pdb) n
servlet_name: cgi
> /apps/manu/python/manunamespace.py(12)<module>()
-> if servlet_name.text == "cgi":
(Pdb) n
> /apps/manu/python/manunamespace.py(13)<module>()
-> print "remove the element"
(Pdb) n
remove the element
> /apps/manu/python/manunamespace.py(14)<module>()
-> print "remove : ",root.remove(servlet)
(Pdb) servlet
<Element {http://java.sun.com/xml/ns/javaee}servlet at 882d88>
(Pdb) servlet.text
'\n
'
這是失敗的:
servlets = tree.findall('servlet')
因為您的文檔中沒有servlet
元素。 根元素指定:
xmlns="http://java.sun.com/xml/ns/javaee"
這意味着,除非另有說明,否則所有元素都在此XML名稱空間中。 所以你要:
>>> tree.findall('{http://java.sun.com/xml/ns/javaee}servlet')
[<Element {http://java.sun.com/xml/ns/javaee}servlet at 0x7f280ec681b8>,
<Element {http://java.sun.com/xml/ns/javaee}servlet at 0x7f280ec68200>,
<Element {http://java.sun.com/xml/ns/javaee}servlet at 0x7f280ec682d8>]
>>>
您找不到要搜索的標簽,因為它們位於默認名稱空間( http://java.sun.com/xml/ns/javaee
)中。
另外,如果要測試元素的內容,則需要使用其text
屬性,而不是與元素本身進行比較。 如果匹配,你需要刪除servlet
從根,而不是-tag servlet-name
來自標簽servlet
。
嘗試這個:
from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('web.xml')
root = tree.getroot()
nsmap = {'jee': 'http://java.sun.com/xml/ns/javaee'}
servlets = root.findall('jee:servlet', nsmap)
print "servlets : ",servlets
for servlet in servlets:
servlet_names = servlet.findall('jee:servlet-name', nsmap)
for servlet_name in servlet_names:
if servlet_name.text == "cgi" :
print "servlet_name :", servlet_name.text
root.remove(servlet)
或者更有效地使用受支持的xpath語法 :
from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('web.xml')
root = tree.getroot()
nsmap = {'jee': 'http://java.sun.com/xml/ns/javaee'}
for servlet in root.findall("./jee:servlet[jee:servlet-name='cgi']", nsmap):
root.remove(servlet)
編輯:對於較舊的python版本(使用python2.5測試):
from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('web.xml')
root = tree.getroot()
ns = '{http://java.sun.com/xml/ns/javaee}'
servlets = root.findall(ns + 'servlet')
print "servlets : ",servlets
for servlet in servlets:
servlet_names = servlet.findall(ns + 'servlet-name')
for servlet_name in servlet_names:
if servlet_name.text == "cgi" :
print "servlet_name :", servlet_name.text
root.remove(servlet)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.