[英]Python unique values in a list
我是Python的新手,我发现set()有点令人困惑。 有人可以在查找和创建新的唯一编号列表(换句话说,消除重复)时提供一些帮助吗?
import string
import re
def go():
import re
file = open("C:/Cryptography/Pollard/Pollard/newfile.txt","w")
filename = "C:/Cryptography/Pollard/Pollard/primeFactors.txt"
with open(filename, 'r') as f:
lines = f.read()
found = re.findall(r'[\d]+[^\d.\d+()+\s]+[^\s]+[\d+\w+\d]+[\d+\^+\d]+[\d+\w+\d]+', lines)
a = found
for i in range(5):
a[i] = str(found[i])
print(a[i].split('x'))
现在
print(a[i].split('x'))
....给出以下输出
['2', '3', '1451', '40591', '258983', '11409589', '8337580729',
'1932261797039146667']
['2897', '514081', '585530047', '108785617538783538760452408483163']
['2', '3', '5', '19', '28087', '4947999059',
'2182718359336613102811898933144207']
['3', '5', '53', '293', '31159', '201911', '7511070764480753',
'22798192180727861167']
['2', '164493637239099960712719840940483950285726027116731']
如何输出仅非重复数字的列表? 我在论坛上读到“ set()”可以做到这一点,但是我没有做任何尝试。 任何帮助深表感谢!
set
是一个集合(如list
或tuple
),但是它不允许重复并且具有非常快速的成员资格测试。 您可以使用列表推导来过滤出先前列表中出现的一个列表中的值:
data = [['2', '3', '1451', '40591', '258983', '11409589', '8337580729', '1932261797039146667'],
['2897', '514081', '585530047', '108785617538783538760452408483163'],
['2', '3', '5', '19', '28087', '4947999059', '2182718359336613102811898933144207'],
['3', '5', '53', '293', '31159', '201911', '7511070764480753', '22798192180727861167'],
['2', '164493637239099960712719840940483950285726027116731']]
seen = set() # set of seen values, which starts out empty
for lst in data:
deduped = [x for x in lst if x not in seen] # filter out previously seen values
seen.update(deduped) # add the new values to the set
print(deduped) # do whatever with deduped list
输出:
['2', '3', '1451', '40591', '258983', '11409589', '8337580729', '1932261797039146667']
['2897', '514081', '585530047', '108785617538783538760452408483163']
['5', '19', '28087', '4947999059', '2182718359336613102811898933144207']
['53', '293', '31159', '201911', '7511070764480753', '22798192180727861167']
['164493637239099960712719840940483950285726027116731']
请注意,此版本不会过滤掉在单个列表中重复的值(除非它们已经是先前列表中值的重复)。 您可以通过用显式循环替换列表推导来解决此问题,该显式循环会在附加到列表进行输出之前,根据seen
集合检查每个值(如果是新集合,则add
s)。 或者,如果子列表中项目的顺序并不重要,则可以将它们变成自己的集合:
seen = set()
for lst in data:
lst_as_set = set(lst) # this step eliminates internal duplicates
deduped_set = lst_as_set - seen # set subtraction!
seen.update(deduped_set)
# now do stuff with deduped_set, which is iterable, but in an arbitrary order
最后,如果内部子列表完全是一个红色鲱鱼,并且您只想过滤一个扁平化的列表以仅获取唯一值,那么这听起来像是itertools
文档中的unique_everseen
配方的工作:
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
set
应该在这种情况下工作。
您可以尝试以下方法:
# Concat all your lists into a single list
>>> a = ['2', '3', '1451', '40591', '258983', '11409589', '8337580729','1932261797039146667'] +['2897', '514081', '585530047', '108785617538783538760452408483163'] +['2', '3', '5', '19', '28087', '4947999059','2182718359336613102811898933144207'] + ['3', '5', '53', '293', '31159', '201911', '7511070764480753', '22798192180727861167']+ ['2', '164493637239099960712719840940483950285726027116731']
>>> len(a)
29
>>> set(a)
set(['514081', '258983', '40591', '201911', '11409589', '585530047', '3', '2', '5', '108785617538783538760452408483163', '2279819218\
0727861167', '164493637239099960712719840940483950285726027116731', '8337580729', '4947999059', '19', '2897', '7511070764480753', '5\
3', '28087', '2182718359336613102811898933144207', '1451', '31159', '1932261797039146667', '293'])
>>> len(set(a))
24
>>>
如果要从展平的列表中获得唯一的值,则可以使用reduce()展平该列表。 然后使用Frozenset()构造函数获取结果列表:
>>> data = [
['2', '3', '1451', '40591', '258983', '11409589', '8337580729', '1932261797039146667'],
['2897', '514081', '585530047', '108785617538783538760452408483163'],
['2', '3', '5', '19', '28087', '4947999059', '2182718359336613102811898933144207'],
['3', '5', '53', '293', '31159', '201911', '7511070764480753', '22798192180727861167'],
['2', '164493637239099960712719840940483950285726027116731']]
>>> print list(frozenset(reduce((lambda a, b: a+b), data)))
['514081', '258983', '40591', '201911', '11409589', '585530047', '3',
'2', '5', '108785617538783538760452408483163', '22798192180727861167',
'164493637239099960712719840940483950285726027116731', '8337580729',
'4947999059', '19', '2897', '7511070764480753', '53', '28087',
'2182718359336613102811898933144207', '1451', '31159',
'1932261797039146667', '293']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.