简体   繁体   English

从列表中删除多个元素的更简洁的方法?

[英]More succinct way to remove multiple elements from a list?

I am trying to slice and strip a string.我正在尝试切片和剥离字符串。 I have written the following code:我写了以下代码:

my_list = ['from ab1c_table in WXY\nprevious in time',
        'from abc3_table in MNO\nprevious in time',
        'from ab1_cow_table in DZMC1_IN tab\ncurrent in time',
        'from abc4_table in ERDU\ncurrent in time']
my_list_1 = []
for j in my_list:
  s = j.split(" ")
  s.remove('from')
  s.remove('in')
  s.remove('in')
  s.remove('time')

  for k in s:
    k = k.replace('current', '')
    k = k.replace('previous', '')
    k = k.replace('\n', '')
  my_list_1.append(k)
  if 'tab' in my_list_1:
    my_list_1.remove('tab')

print(my_list_1)

It is working fine but the issue is I have to remove each word separately.它工作正常,但问题是我必须分别删除每个单词。 Is there a way to do it in fewer lines?有没有办法用更少的行来做到这一点? The output I am looking for is:我要找的output是:

['WXY', 'MNO']

EDIT 1 -编辑 1 -

How do I get this output -我如何获得这个 output -

['ab1c_table', 'WXY', 'abc3_table', 'MNO', 'ab1_cow_table', 'DZMC1_IN', 'abc4_table', 'ERDU']

I am not sure if this is what you have in mind, but usually regular expressions are useful for extracting patterns from strings.我不确定这是否是您的想法,但通常正则表达式对于从字符串中提取模式很有用。 For example:例如:

import re
my_list = ['from ab1c_table in WXY\nprevious in time', 
           'from abc3_table in MNO\nprevious in time']

my_list1 = [re.findall(r" ([A-Z]{3})\n", s, )[0] for s in my_list]
print(my_list_1)

Edit:编辑:

Here is a modification of the regex pattern reflecting the additonal string samples provided by OP in a comment below:这是对正则表达式模式的修改,反映了 OP 在下面的评论中提供的附加字符串示例:

mylist = ['from ab1c_table in WXY\nprevious in time', 
          'from abc3_table in MNO\nprevious in time', 
          'from ab1_cow_table in DZMC1_IN tab\ncurrent in time', 
          'from abc4_table in ERDU\ncurrent in time']

my_list1 = [re.findall(r"_table in (\S+)(?:| tab)\n.* in time", s)[0] for s in mylist]

print(my_list1)

This gives:这给出:

['WXY', 'MNO', 'DZMC1_IN', 'ERDU']

Edit 2:编辑 2:

Version capturing _table patterns:版本捕获_table模式:

import re
from itertools import chain

mylist = ['from ab1c_table in WXY\nprevious in time', 
          'from abc3_table in MNO\nprevious in time', 
          'from ab1_cow_table in DZMC1_IN tab\ncurrent in time', 
          'from abc4_table in ERDU\ncurrent in time']

my_list1 = list(chain(*[re.findall(r"from (\S+_table) in (\S+).*?\n.* in time", s)[0] for s in mylist]))

print(my_list1)

It gives:它给:

['ab1c_table', 'WXY', 'abc3_table', 'MNO', 'ab1_cow_table', 'DZMC1_IN', 'abc4_table', 'ERDU']

Its not clear from the question what is variable in the strings, but it seems like this regular expression would do.从问题中不清楚什么是字符串中的变量,但似乎这个正则表达式可以。 The goal is to match everything that is static with some wildcards and parenthesized capture groups for the data you want in the result.目标是将 static 的所有内容与一些通配符和带括号的捕获组匹配,以获取结果中所需的数据。 Since you want two pieces of data in the order they are found in the string, you can create two capture groups and extend the result list.由于您希望按照在字符串中找到的顺序获取两条数据,因此您可以创建两个捕获组并扩展结果列表。

import re
  
my_list = ['from ab1c_table in WXY\nprevious in time',
        'from abc3_table in MNO\nprevious in time',
        'from ab1_cow_table in DZMC1_IN tab\ncurrent in time',
        'from abc4_table in ERDU\ncurrent in time']

result = []
for value in my_list:
    result.extend(re.match(r"from (.+_table) in (\S+)", value).groups())
print(result)

Result结果

['ab1c_table', 'WXY', 'abc3_table', 'MNO', 'ab1_cow_table', 'DZMC1_IN', 'abc4_table', 'ERDU']

As I previously suggested, I think it can be done much easier with a simple split() .正如我之前建议的那样,我认为使用简单的split()可以容易地完成。 The strings always follow the same pattern.字符串始终遵循相同的模式。 All you need to do is split at whitespace and take out the second and fourth element from the resulting lists.您需要做的就是在空格处拆分并从结果列表中取出第二个和第四个元素。

elems = list()
for e in my_list:
    # e.g., the first element becomes
    # ['from', 'ab1c_table', 'in', 'WXY', 'previous', 'in', 'time']
    parts = e.split()
    elems.extend([parts[1], parts[3]])

print(elems)

Result:结果:

['ab1c_table',
 'WXY',
 'abc3_table',
 'MNO',
 'ab1_cow_table',
 'DZMC1_IN',
 'abc4_table',
 'ERDU']

You can write a pattern that matches the strings and match for example either previous or current using (?:previous|current) , and capture the last part of the first line in group 1.您可以编写一个匹配字符串的模式,并使用(?:previous|current)匹配例如之前或当前的字符串,并捕获组 1 中第一行的最后部分。

First check if there is a match, and if there is set the new value to the group 1 value.首先检查是否匹配,如果匹配则将新值设置为组 1 的值。

If there is no match, leave the value unmodified.如果没有匹配项,则保留该值不变。

\bfrom \w+ in (\w+)\nprevious in time\b

See the capture group value in green in this regex demo .请参阅此正则表达式演示中绿色的捕获组值。

import re

pattern = r"\bfrom \w+ in (\w+)\n(?:previous|current) in time\b"
my_list = ['from ab1c_table in WXY\nprevious in time', 'from abc3_table in MNO\nprevious in time']

for n, i in enumerate(my_list):
    m = re.match(pattern, i)
    if m:
        my_list[n] = m.group(1)

print(my_list)

Output Output

['WXY', 'MNO']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM