简体   繁体   English

如何删除以特定字符串结尾的字符串

[英]how to remove string ending with specific string

I have file names like我有像这样的文件名

ios_g1_v1_yyyymmdd
ios_g1_v1_h1_yyyymmddhhmmss
ios_g1_v1_h1_YYYYMMDDHHMMSS
ios_g1_v1_g1_YYYY
ios_g1_v1_j1_YYYYmmdd
ios_g1_v1
ios_g1_v1_t1_h1
ios_g1_v1_ty1_f1

I would like to remove only the suffix when it matches the string YYYYMMDDHHMMSS OR yyyymmdd OR YYYYmmdd OR YYYY我只想删除与字符串 YYYYMMDDHHMMSS OR yyyymmdd OR YYYYmmdd OR YYYY 匹配的后缀

my expected output would be我预期的 output 将是

ios_g1_v1
ios_g1_v1_h1
ios_g1_v1_h1
ios_g1_v1_g1
ios_g1_v1_j1
ios_g1_v1
ios_g1_v1_t1_h1
ios_g1_v1_ty1_f1

How can I achieve this in python using regex?如何使用正则表达式在 python 中实现此目的? i tried with something like below, but it didn't work我尝试了类似下面的方法,但没有用

word_trimmed_stage1 = re.sub('.*[^YYYYMMDDHHMMSS]$', '', filename)

IIUC, your pattern involves Year, Month, Day, Hour, Minute, Second characters with any number of repeated characters in that order, starting with an underscore and case-insensitive. IIUC,您的模式涉及Year, Month, Day, Hour, Minute, Second字符,并按该顺序包含任意数量的重复字符,以下划线开头且不区分大小写。

Try this pattern r"_Y+M*D*H*M*S*" -试试这个模式r"_Y+M*D*H*M*S*" -

import re

regex_pattern = r"_Y+M*D*H*M*S*"
result = [re.sub(regex_pattern,'',i, flags=re.IGNORECASE) for i in l]
result
['ios_g1_v1',
 'ios_g1_v1_h1',
 'ios_g1_v1_h1',
 'ios_g1_v1_g1',
 'ios_g1_v1_j1',
 'ios_g1_v1',
 'ios_g1_v1_t1_h1',
 'ios_g1_v1_ty1_f1']

EXPLANATION解释

  1. The _ matches the underscore at start of the patter _匹配模式开头的下划线
  2. The flags=re.IGNORECASE makes this pattern search case-insensitive flags=re.IGNORECASE使此模式搜索不区分大小写
  3. The Y+ matches at least 1 instance of Y Y+至少匹配Y的 1 个实例
  4. Then the M*D*H*M*S* match any instances of these specific characters after the initial Y in that order (starting 0 instances)然后M*D*H*M*S*按顺序匹配初始Y之后这些特定字符的任何实例(从 0 个实例开始)

You can be explicit and use the exact patterns that you have identified, optionally case insensitive with re.I :您可以明确并使用您已确定的确切模式,可以选择不区分大小写与re.I

files = ['ios_g1_v1_yyyymmdd',
 'ios_g1_v1_h1_yyyymmddhhmmss',
 'ios_g1_v1_h1_YYYYMMDDHHMMSS',
 'ios_g1_v1_g1_YYYY',
 'ios_g1_v1_j1_YYYYmmdd',
 'ios_g1_v1',
 'ios_g1_v1_t1_h1',
 'ios_g1_v1_ty1_f1']

files2 = [re.sub('_(?:YYYYMMDDHHMMSS|yyyymmdd|YYYYmmdd|YYYY)$', '', x, flags=re.I)
          for x in files]

NB.注意。 with re.I you only need one of yyyymmdd / YYYYmmdd .使用re.I你只需要yyyymmdd / YYYYmmdd

Compressed variant:压缩变体:

files2 = [re.sub('_YYYY(?:MMDD(?:HHMMSS)?)?$', '', x, flags=re.I) for x in files]

Output: Output:

['ios_g1_v1',
 'ios_g1_v1_h1',
 'ios_g1_v1_h1',
 'ios_g1_v1_g1',
 'ios_g1_v1_j1',
 'ios_g1_v1',
 'ios_g1_v1_t1_h1',
 'ios_g1_v1_ty1_f1']

To remove a string ending with "YYYYMMDDHHMMSS" or one of the other specified formats, you can use the rstrip method.要删除以“YYYYMMDDHHMMSS”或其他指定格式之一结尾的字符串,可以使用 rstrip 方法。 This method will remove all characters in the specified string that appear at the end of the target string.此方法将删除指定字符串中出现在目标字符串末尾的所有字符。

Here's an example of how you can use it: s = "abcdefgYYYYMMDDHHMMSS" suffix = "YYYYMMDDHHMMSS"以下是如何使用它的示例:s = "abcdefgYYYYMMDDHHMMSS" suffix = "YYYYMMDDHHMMSS"

You can also use to remove the other specified formats by replacing "YYYYMMDDHHMMSS" with the appropriate format string.您还可以通过将“YYYYMMDDHHMMSS”替换为适当的格式字符串来删除其他指定格式。

Disclaimer: this is a non regex approach;免责声明:这是一种非正则表达式方法; @mozway posted a good regex approach @mozway 发布了一个很好的正则表达式方法

files = ['ios_g1_v1_yyyymmdd',
 'ios_g1_v1_h1_yyyymmddhhmmss',
 'ios_g1_v1_h1_YYYYMMDDHHMMSS',
 'ios_g1_v1_g1_YYYY',
 'ios_g1_v1_j1_YYYYmmdd',
 'ios_g1_v1',
 'ios_g1_v1_t1_h1',
 'ios_g1_v1_ty1_f1']

lst=[]
for filenames in files:
  k=[]
  for x in range(len(filenames)-1):
    if filenames[x]=='y' or filenames[x]=='Y':
        if filenames[x+1]=='y' or filenames[x+1]=='Y':
            break
    else:
        k.append(filenames[x])
  if k[-1]=='_':
    lst.append(''.join(k)[:-1])
  else:
    lst.append(''.join(k))
    
print(lst)

#['ios_g1_v1', 'ios_g1_v1_h1', 'ios_g1_v1_h1', 'ios_g1_v1_g1', 'ios_g1_v1_j1', 'ios_g1_v', 'ios_g1_v1_t1_h', 'ios_g1_v1_t1_f']

This can be another approach这可以是另一种方法

out = []
for filename in filenames:
    if filename.split("_")[-1].lower().startswith("y"):
        out.append("_".join(filename.split("_")[:-1]))
    else:
        out.append(filename)
        
print(out)

You can also make good use of list() function instead of append one element at a time:您还可以充分利用list() function 而不是append一次一个元素:

out = list(
    "_".join(filename.split("_")[:-1])
    if filename.split("_")[-1].lower().startswith("y")
    else filename
    for filename in filenames
    )

Both approach should produce the same output: Output:两种方法都应产生相同的 output:Output:

['ios_g1_v1',
 'ios_g1_v1_h1',
 'ios_g1_v1_h1',
 'ios_g1_v1_g1',
 'ios_g1_v1_j1',
 'ios_g1_v1',
 'ios_g1_v1_t1_h1',
 'ios_g1_v1_ty1_f1']

Try removing everything after the last _ detected.尝试在检测到最后一个_之后删除所有内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM