[英]Why Updating value from 2 lists behave differently? (one of the item in list is a single-item list)
[英]Remove strings from a list if they are not in another list of single-item lists of strings
我有两个字符串列表如下:
good_tags = ['c#', '.net', 'java']
all_tags = [['c# .net datetime'],
['c# datetime time datediff relative-time-span'],
['html browser timezone user-agent timezone-offset']]
我的目标是只保留“all_tags”中字符串列表中的“good_tags”,例如,
我尝试使用“in”而不是“not in”,基于从另一个列表中删除一个列表中出现的所有元素
y3 = [x for x in all_tags if x in good_tags]
print ('y3: ', y3)
y4 = [x for x in good_tags if x in all_tags]
print ('y4: ', y4)
出去:
y3: []
y4: []
首先,您没有两个字符串列表。 您有字符串列表列表。
good_tags = ['c#', '.net', 'java']
all_tags = [['c# .net datetime'],['c# datetime time datediff relative-time-span'], ['html browser timezone user-agent timezone-offset']]
all_tags_with_good_tags = []
for tags in all_tags:
new_good_tags = set()
for tag in tags[0].split(): # here you have list, so you need to select 0 element
# of it as there's only 1 list element in your example
# and then split it on the whitespace to be a list of tags
if tag in good_tags:
new_good_tags.add(tag)
if new_good_tags:
all_tags_with_good_tags.append(' '.join(new_good_tags))
会得到你
['.net c#', 'c#']
您的all_tags
是一个列表,其中包含三个列表,其中每个列表包含一个字符串。 所以你首先需要做的是将每个子列表转换成一个包含字符串的列表,而不仅仅是一个字符串。
由于您在那里只有空格,即分隔标签而没有逗号,您必须将列表从['c# .net datetime']
为['c#', '.net', 'datetime']
:
[x for segments in all_tags[0] for x in segments.split()]
然后你可以为你的整个列表做这个,所以迭代它的长度:
[[x for segments in all_tags[entry] for x in segments.split()] for entry in range(len(all_tags))]
返回:
[['c#', '.net', 'datetime'],
['c#', 'datetime', 'time', 'datediff', 'relative-time-span'],
['html', 'browser', 'timezone', 'user-agent', 'timezone-offset']]
现在您可以根据您的好标签过滤此列表:
y3 = [[x for x in [words for segments in all_tags[entry] for words in segments.split()] if x in good_tags] for entry in range(len(all_tags))]
输出:
[['c#', '.net'], ['c#'], []]
good_tags = ['c#', '.net', 'java']
all_tags = [
['c# .net datetime'],
['c# datetime time datediff relative-time-span'],
['html browser timezone user-agent timezone-offset']
]
filtered_tags = [[" ".join(filter(lambda tag: tag in good_tags, row[0].split()))] for row in all_tags]
print(filtered_tags)
输出:
[['c# .net'], ['c#'], ['']]
>>>
使用set
而不是列表的简短解决方案:
good_tags = {'c#', '.net', 'java'} # this is a set
all_tags = [['c# .net datetime'],
['c# datetime time datediff relative-time-span'],
['html browser timezone user-agent timezone-offset']]
result = [set(lst[0].split()) & good_tags for lst in all_tags]
&
创建集合的交集。
但真正的问题是:为什么包含只有一个元素的列表的all_tags
? 首先可能有更好的方法来构建这个列表。
第一条语句:当“x in all_tags”执行时,它会给出 ['c# .net datetime'] 列表类,而 'c# .net datetime' 是单个字符串,不会单独处理。
第二条语句:在第一条语句 x = ['c# .net datetime'] 之后,现在列表将在不包含整个列表的 good_tags 中搜索,因此不会返回任何内容。
条件 1 :如果我们的 good_tags 像 ['c#', '.net', 'java', ['c# .net datetime'] ] 那么它将返回 ['c# .net datetime']
这是您的解决方案的问题:
good_tags = ['c#', '.net', 'java']
all_tags = [['c# .net datetime'], ['c# datetime time datediff relative-time-span'],
['html browser timezone user-agent timezone-offset']]
#y3 = [x for x in all_tags if x in good_tags]
all_tags_refine = []
for x in all_tags:
y = x[0].split()
z = [k for k in y if k in good_tags]
all_tags_refine.append(z)
print(all_tags_refine)
可能有更好的方法来做到这一点,但在这里,
good_tags = ['c#', '.net', 'java']
all_tags = [['c# .net datetime'],['c# datetime time datediff relative-time-span'], ['html browser timezone user-agent timezone-offset']]
for tags in all_tags:
empty = []
for tag in tags[0].split(" "):
if tag in good_tags:
empty.append(tag)
print(" ".join(empty))
good_tags = ['c#', '.net', 'java']
all_tags = [['c# .net datetime'],['c# datetime time datediff relative-time-span'], ['html browser timezone user-agent timezone-offset']]
new_tags = []
for _ in all_tags:
tags = _[0].split()
newtag = ''
for tag in tags:
if tag in good_tags:
if newtag == '':
newtag = tag
else:
newtag = newtag + ' ' + tag
if newtag != '':
l = []
l.append(newtag)
new_tags.append(l)
print(new_tags)
good_set = set(good_tags)
kept_tags = [[t for t in tags[0].split() if t in good_set]
for tags in all_tags]
print(kept_tags)
# [['c#', '.net'], ['c#'], []]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.