简体   繁体   English

Python和正则表达式删除文件中的括号

[英]Python and regex to remove parenthesis in a file

xml file with about 2000 (texthere) parenthesis.带有大约 2000 (textthere) 括号的 .xml 文件。 I need to remove the parenthesis and text within it.我需要删除其中的括号和文本。 I tried but am getting an error :(我试过了,但出现错误:(

import re, sys

    fileName = (sys.argv[2])


    with open(fileName) as f:

        input = f.read()
        output = re.sub(r'\(\w*\)', '', input)
        print fileName + " cleaned of all parenthesis"

and my error :我的错误:

Traceback (most recent call last):
  File "/Users/eeamesX/work/data-scripts/removeParenFromXml.py", line 4, in <module>
    fileName = (sys.argv[2])
IndexError: list index out of range

I changed the (sys.argv[1])...I get no errors but also the parenthesis in my file.xml do not get removed?我更改了 (sys.argv[1])...我没有收到错误但我的 file.xml 中的括号没有被删除?

Since you're calling the script as follows:由于您按如下方式调用脚本:

python removeparenthesis.py filename.xml

the XML file name will appear under sys.argv[1] . XML 文件名将出现在sys.argv[1]

Also, you'd need to use lazy matching in your pattern:此外,您需要在您的模式中使用延迟匹配:

r'\(\w*?\)'    # notice the ?

A better pattern would be:更好的模式是:

r'\([^)]*\)'

Do you have nested parens?你有嵌套的括号吗?

stuff (words (inside (other) words) eww)

Will you have multiple groups of parens?你会有多组父母吗?

stuff (first group) stuff (second group)

Does text within parens have spaces?括号内的文本是否有空格?

stuff (single_word)
stuff (multiple words)

A simple regex could be \\(.*?\\) although you'll see that the nested parens are not caught (which is fine if you do NOT expect nested parens):一个简单的正则表达式可能是\\(.*?\\)虽然你会看到嵌套的括号没有被捕获(如果你不期望嵌套的括号,这很好):

https://regex101.com/r/kB2lU1/1 https://regex101.com/r/kB2lU1/1

Edit:编辑:

https://regex101.com/r/kB2lU1/2 may be able to handle some of those nested parens, but may still break depending on different types of edge cases. https://regex101.com/r/kB2lU1/2可能能够处理其中一些嵌套括号,但仍可能会根据不同类型的边缘情况而中断。

You'll need to specify what kinds of edge cases you expect so the answer can be better tailored to your needs.您需要指定您期望的边缘情况类型,以便可以更好地根据您的需求定制答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM