简体   繁体   中英

How can I remove text within parentheses with a regex?

I'm trying to handle a bunch of files, and I need to alter then to remove extraneous information in the filenames; notably, I'm trying to remove text inside parentheses. For example:

filename = "Example_file_(extra_descriptor).ext"

and I want to regex a whole bunch of files where the parenthetical expression might be in the middle or at the end, and of variable length.

What would the regex look like? Perl or Python syntax would be preferred.

s/\([^)]*\)//

So in Python, you'd do:

re.sub(r'\([^)]*\)', '', filename)

The pattern that matches substrings in parentheses having no other ( and ) characters in between (like (xyz 123) in Text (abc(xyz 123) ) is

\([^()]*\)

Details :

Removing code snippets:

  • JavaScript : string.replace(/\([^()]*\)/g, '')
  • PHP : preg_replace('~\([^()]*\)~', '', $string)
  • Perl : $s =~ s/\([^()]*\)//g
  • Python : re.sub(r'\([^()]*\)', '', s)
  • C# : Regex.Replace(str, @"\([^()]*\)", string.Empty)
  • VB.NET : Regex.Replace(str, "\([^()]*\)", "")
  • Java : s.replaceAll("\\([^()]*\\)", "")
  • Ruby : s.gsub(/\([^()]*\)/, '')
  • R : gsub("\\([^()]*\\)", "", x)
  • Lua : string.gsub(s, "%([^()]*%)", "")
  • Bash/sed : sed 's/([^()]*)//g'
  • Tcl : regsub -all {\([^()]*\)} $s "" result
  • C++ std::regex : std::regex_replace(s, std::regex(R"(\([^()]*\))"), "")
  • Objective-C :
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"\\([^()]*\\)" options:NSRegularExpressionCaseInsensitive error:&error]; NSString *modifiedString = [regex stringByReplacingMatchesInString:string options:0 range:NSMakeRange(0, [string length]) withTemplate:@""];
  • Swift : s.replacingOccurrences(of: "\\([^()]*\\)", with: "", options: [.regularExpression])
  • Google BigQuery : REGEXP_REPLACE(col, "\\([^()]*\\)", "")

I would use:

\([^)]*\)

If you don't absolutely need to use a regex, use consider using Perl's Text::Balanced to remove the parenthesis.

use Text::Balanced qw(extract_bracketed);

my ($extracted, $remainder, $prefix) = extract_bracketed( $filename, '()', '[^(]*' );

{   no warnings 'uninitialized';

    $filename = (defined $prefix or defined $remainder)
                ? $prefix . $remainder
                : $extracted;
}

You may be thinking, "Why do all this when a regex does the trick in one line?"

$filename =~ s/\([^}]*\)//;

Text::Balanced handles nested parenthesis. So $filename = 'foo_(bar(baz)buz)).foo' will be extracted properly. The regex based solutions offered here will fail on this string. The one will stop at the first closing paren, and the other will eat them all.

   $filename =~ s/\([^}]*\)//;
   # returns 'foo_buz)).foo'

   $filename =~ s/\(.*\)//;
   # returns 'foo_.foo'

   # text balanced example returns 'foo_).foo'

If either of the regex behaviors is acceptable, use a regex--but document the limitations and the assumptions being made.

For those who want to use Python, here's a simple routine that removes parenthesized substrings, including those with nested parentheses. Okay, it's not a regex, but it'll do the job!

def remove_nested_parens(input_str):
    """Returns a copy of 'input_str' with any parenthesized text removed. Nested parentheses are handled."""
    result = ''
    paren_level = 0
    for ch in input_str:
        if ch == '(':
            paren_level += 1
        elif (ch == ')') and paren_level:
            paren_level -= 1
        elif not paren_level:
            result += ch
    return result

remove_nested_parens('example_(extra(qualifier)_text)_test(more_parens).ext')

If a path may contain parentheses then the r'\(.*?\)' regex is not enough:

import os, re

def remove_parenthesized_chunks(path, safeext=True, safedir=True):
    dirpath, basename = os.path.split(path) if safedir else ('', path)
    name, ext = os.path.splitext(basename) if safeext else (basename, '')
    name = re.sub(r'\(.*?\)', '', name)
    return os.path.join(dirpath, name+ext)

By default the function preserves parenthesized chunks in directory and extention parts of the path.

Example:

>>> f = remove_parenthesized_chunks
>>> f("Example_file_(extra_descriptor).ext")
'Example_file_.ext'
>>> path = r"c:\dir_(important)\example(extra).ext(untouchable)"
>>> f(path)
'c:\\dir_(important)\\example.ext(untouchable)'
>>> f(path, safeext=False)
'c:\\dir_(important)\\example.ext'
>>> f(path, safedir=False)
'c:\\dir_\\example.ext(untouchable)'
>>> f(path, False, False)
'c:\\dir_\\example.ext'
>>> f(r"c:\(extra)\example(extra).ext", safedir=False)
'c:\\\\example.ext'

If you can stand to use sed (possibly execute from within your program, it'd be as simple as:

sed 's/(.*)//g'

Java code:

Pattern pattern1 = Pattern.compile("(\\_\\(.*?\\))");
System.out.println(fileName.replace(matcher1.group(1), ""));
>>> import re
>>> filename = "Example_file_(extra_descriptor).ext"
>>> p = re.compile(r'\([^)]*\)')
>>> re.sub(p, '', filename)
'Example_file_.ext'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM