使用正則表達式從xml標記中刪除選擇字符

Question

我試圖從xml標記中刪除僅選擇字符+后面的任何數字+進行如下操作: ..例如： <ns2:projectArea alias=應該看起來像<projectArea alias=並且<ns9:name>應該看起來像<name>

基本上，數字將是隨機的（1到9之間的任何數字），並且始終會有一個過程:必須將其刪除。

到目前為止，我有：

import argparse
import re

# Initiates argument
parser = argparse.ArgumentParser()

parser.add_argument("--input", "-i", help="Set the input xml to clean up")
parser.add_argument("--output", "-o", help="Set the output xml location")

args = parser.parse_args()
inputfile = args.input
outputfile = args.output
if args.input:
  print("inputfile location is %s" % args.input)
if args.output:
  print("outputfile location is %s" % args.output)
# End argument

text = re.sub('<[^<]+>', "", open(inputfile).read())
with open(outputfile, "w") as f:
    f.write(text)

這段代碼就是問題： '<[^<]+>'它會刪除整個標簽，因此，如果以后需要搜索文本，則基本上必須搜索純文本而不是標簽。

我可以用什么替換'<[^<]+>'來刪除ns +以下數字（可能是任何數字）+后面的: ？

Answer 1

由於正則表達式可能會發生這種情況。 嘗試改用以下正則表達式：

   text = re.sub('^<[a-zA-Z0-9]+:','<',open(inputfile).read())

Answer 2

這有效：

找到r"<(?:(?:(/?)\\w+[1-9]:(\\w+\\s*/?))|(?:\\w+[1-9]:(\\w+\\s+(?:\\"[\\S\\s]*?\\"|'[\\S\\s]*?'|[^>]?)+\\s*/?)))>"
替換<$1$2$3>

https://regex101.com/r/yRhMI9/1

可讀版本：

 <
 (?:
      (?:
           ( /? )                        # (1)
           \w+ [1-9] :
           ( \w+ \s* /? )                # (2)
      )
   |  (?:
           \w+ [1-9] :
           (                             # (3 start)
                \w+ \s+ 
                (?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]? )+
                \s* /?
           )                             # (3 end)
      )
 )
 >

Answer 3

正則表達式： (?:(?<=<)|(?<=<\\/))(ns[0-9]+:)(?=[^>]*?>)

演示

使用正則表達式從xml標記中刪除選擇字符

問題描述

3 個解決方案

解決方案1
1 2018-05-18 20:19:14

解決方案2
0

解決方案3
0 2018-05-19 01:54:04

使用正則表達式從xml標記中刪除選擇字符

問題描述

3 個解決方案

解決方案1 1 2018-05-18 20:19:14

解決方案2 0

解決方案3 0 2018-05-19 01:54:04

解決方案1
1 2018-05-18 20:19:14

解決方案2
0

解決方案3
0 2018-05-19 01:54:04