I have list of s3 objects like this:
list1 = ['uid=123/2020/06/01/625e2ghvh.parquet','uid=876/2020/04/01/hgdshct7.parquet','uid=0987/2019/03/01/323dc.parquet']
list2 = ['123','876']
result_list = ['uid=0987/2019/03/01/323dc.parquet']
With out using any loop is there any efficient way to achieve this considering large no of elements in list1?
You could build a set
from list2
for a faster lookup and use a list comprehension to check for membership using the substring of interest:
list1 = ['uid=123/2020/06/01/625e2ghvh.parquet','uid=876/2020/04/01/hgdshct7.parquet',
'uid=0987/2019/03/01/323dc.parquet']
list2 = ['123','876']
set2 = set(list2)
[i for i in list1 if i.lstrip('uid=').split('/',1)[0] not in set2]
# ['uid=0987/2019/03/01/323dc.parquet']
The substring is obtained through:
s = 'uid=123/2020/06/01/625e2ghvh.parquet'
s.lstrip('uid=').split('/',1)[0]
# '123'
This does the job. For different patterns though, or to also cover slight variations, you could go for a regex. For this example you'd need something like:
import re
[i for i in list1 if re.search(r'^uid=(\d+).*?', i).group(1) not in set2]
# ['uid=0987/2019/03/01/323dc.parquet']
This is one way to do it without loops
def filter_function(item):
uid = int(item[4:].split('/')[0])
if uid not in list2:
return True
return False
list1 = ['uid=123/2020/06/01/625e2ghvh.parquet','uid=876/2020/04/01/hgdshct7.parquet','uid=0987/2019/03/01/323dc.parquet']
list2 = [123, 876]
result_list = list(filter(filter_function, list1))
How about this one:
_list2 = [f'uid={number}' for number in list2]
result = [item for item in list1 if not any([item.startswith(i) for i in _list2])] # ['uid=0987/2019/03/01/323dc.parquet']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.