简体   繁体   English

如何使用python删除xml中的specifc标签

[英]how to remove specifc tags in xml using python

I have to remove some specific tag in apache-tomcat web.xml files 我必须在apache-tomcat web.xml文件中删除一些特定标签

web.xml web.xml中

    <?xml version="1.0" encoding="ISO-8859-1"?>



<web-app xmlns="http://java.sun.com/xml/ns/javaee"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
                      http://java.sun.com/xml/ns/javaee/web-app_3_0.xsd"
  version="3.0">

  <!-- ======================== Introduction ============================== -->
  <!-- This document defines default values for *all* web applications      -->
  <!-- loaded into this instance of Tomcat.  As each application is         -->
  <!-- deployed, this file is processed, followed by the                    -->
  <!-- "/WEB-INF/web.xml" deployment descriptor from your own               -->
  <!-- applications.                                                        -->
  <!--                                                                      -->
  <!-- WARNING:  Do not configure application-specific resources here!      -->
  <!-- They should go in the "/WEB-INF/web.xml" file in your application.   -->

     <servlet>
        <servlet-name>default</servlet-name>
        <servlet-class>org.apache.catalina.servlets.DefaultServlet</servlet-class>
        <init-param>
            <param-name>debug</param-name>
            <param-value>0</param-value>
        </init-param>
        <init-param>
            <param-name>listings</param-name>
            <param-value>false</param-value>
        </init-param>
        <load-on-startup>1</load-on-startup>
    </servlet>
   <servlet>
        <servlet-name>jsp</servlet-name>
        <servlet-class>org.apache.jasper.servlet.JspServlet</servlet-class>
        <init-param>
            <param-name>fork</param-name>
            <param-value>false</param-value>
        </init-param>
        <init-param>
            <param-name>xpoweredBy</param-name>
            <param-value>false</param-value>
        </init-param>
        <load-on-startup>3</load-on-startup>
    </servlet>

    <servlet>
        <servlet-name>cgi</servlet-name>
        <servlet-class>org.apache.catalina.servlets.CGIServlet</servlet-class>
        <init-param>
          <param-name>debug</param-name>
          <param-value>0</param-value>
        </init-param>
        <init-param>
          <param-name>cgiPathPrefix</param-name>
          <param-value>WEB-INF/cgi</param-value>
        </init-param>
         <load-on-startup>5</load-on-startup>
    </servlet>
</<web-app>

if servlet-name== cgi i need to remove entier servlet tag. 如果servlet-name == cgi我需要删除整个servlet标签。 my code as follows: 我的代码如下:

    from xml.etree.ElementTree import ElementTree
    tree = ElementTree()
    tree.parse('web.xml')
    servlets = tree.findall('servlet')
    print "servlets : ",servlets
    for servlet in servlets:
      servlet_names = foo.findall('servlet-name')
      for servlet_name  in servlet_names:
            if servlet_name == "cgi" :
                    print "servlet_name :", servlet_name
                    servlet.remove(servlet-name)

I am getting o/p as servlets : [] instead of all servlets and unable to enter the for loop. 我将o / p作为servlet:[]而不是所有servlet,并且无法进入for循环。 Can any one help me ?. 谁能帮我 ?。

I am not getting Any exception 我没有任何异常

#!/usr/bin/python
from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('web.xml')
root = tree.getroot()
ns = '{http://java.sun.com/xml/ns/javaee}'
servlets = root.findall(ns + 'servlet')
print "servlets : ",servlets
for servlet in servlets:
  servlet_names = servlet.findall(ns + 'servlet-name')
  for servlet_name  in servlet_names:
        if servlet_name.text == "cgi" :
                print "servlet_name :", servlet_name.text
                print "removed the cgi serverlet", root.remove(servlet)

=====output=============== servlets : [http://java.sun.com/xml/ns/javaee}servlet at 7f84e09b35a8>, http://java.sun.com/xml/ns/javaee}servlet at 7f84e09b3878>, http://java.sun.com/xml/ns/javaee}servlet at 7f84e09b3bd8>] servlet_name : cgi removed the cgi serverlet None =====输出=============== Servlet:[http://java.sun.com/xml/ns/javaee}位于7f84e09b35a8的servlet,http:// java .sun.com / xml / ns / javaee} servlet在7f84e09b3878>,http://java.sun.com/xml/ns/javaee} servlet在7f84e09b3bd8>] servlet_name:cgi删除了cgi serverlet无

==== i have used pdb tracer to find out the element(servlet) value its shwoing as \\n.. ====我已经使用pdb跟踪器找出了\\ n表示的element(servlet)值。

> /apps/manu/python/manunamespace.py(10)<module>()
-> servlet_name=servlet.find('{http://java.sun.com/xml/ns/javaee}servlet-name')
(Pdb) servlet_name
<Element {http://java.sun.com/xml/ns/javaee}servlet-name at 882878>
(Pdb) servlet_name.text
'jsp'
(Pdb) n
> /apps/manu/python/manunamespace.py(11)<module>()
-> print "servlet_name:", servlet_name.text
(Pdb) servlet_name.text
'cgi'
(Pdb) servlet.text
'\n        '
(Pdb) n
servlet_name: cgi
> /apps/manu/python/manunamespace.py(12)<module>()
-> if servlet_name.text == "cgi":
(Pdb) n
> /apps/manu/python/manunamespace.py(13)<module>()
-> print "remove the element"
(Pdb) n
remove the element
> /apps/manu/python/manunamespace.py(14)<module>()
-> print "remove : ",root.remove(servlet)
(Pdb) servlet
<Element {http://java.sun.com/xml/ns/javaee}servlet at 882d88>
(Pdb) servlet.text
'\n 

   '

This is failing: 这是失败的:

servlets = tree.findall('servlet')

Because there are no servlet elements in your document. 因为您的文档中没有servlet元素。 The root element specifies: 根元素指定:

xmlns="http://java.sun.com/xml/ns/javaee"

Which means that all elements, unless otherwise specified, are in this XML namespace. 这意味着,除非另有说明,否则所有元素都在此XML名称空间中。 So you want: 所以你要:

>>> tree.findall('{http://java.sun.com/xml/ns/javaee}servlet')
[<Element {http://java.sun.com/xml/ns/javaee}servlet at 0x7f280ec681b8>,
<Element {http://java.sun.com/xml/ns/javaee}servlet at 0x7f280ec68200>, 
<Element {http://java.sun.com/xml/ns/javaee}servlet at 0x7f280ec682d8>]
>>> 

You are not finding the tags you are searching for because they are in the default namespace ( http://java.sun.com/xml/ns/javaee ). 您找不到要搜索的标签,因为它们位于默认名称空间( http://java.sun.com/xml/ns/javaee )中。

Also if you want to test an elements content, you need to use its text attribute, not compare to the element itself. 另外,如果要测试元素的内容,则需要使用其text属性,而不是与元素本身进行比较。 If it matches, you need to remove the servlet -tag from the root, not the servlet-name tag from the servlet . 如果匹配,你需要删除servlet从根,而不是-tag servlet-name来自标签servlet

Try this: 尝试这个:

from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('web.xml')
root = tree.getroot()
nsmap = {'jee': 'http://java.sun.com/xml/ns/javaee'}
servlets = root.findall('jee:servlet', nsmap)
print "servlets : ",servlets
for servlet in servlets:
  servlet_names = servlet.findall('jee:servlet-name', nsmap)
  for servlet_name  in servlet_names:
        if servlet_name.text == "cgi" :
                print "servlet_name :", servlet_name.text
                root.remove(servlet)

Or using the supported xpath syntax more efficiently: 或者更有效地使用受支持的xpath语法

from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('web.xml')
root = tree.getroot()
nsmap = {'jee': 'http://java.sun.com/xml/ns/javaee'}
for servlet in root.findall("./jee:servlet[jee:servlet-name='cgi']", nsmap):
    root.remove(servlet)

Edit: For older python versions (tested with python2.5): 编辑:对于较旧的python版本(使用python2.5测试):

from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('web.xml')
root = tree.getroot()
ns = '{http://java.sun.com/xml/ns/javaee}'
servlets = root.findall(ns + 'servlet')
print "servlets : ",servlets
for servlet in servlets:
  servlet_names = servlet.findall(ns + 'servlet-name')
  for servlet_name  in servlet_names:
        if servlet_name.text == "cgi" :
                print "servlet_name :", servlet_name.text
                root.remove(servlet)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM