I have the following lists:
target_list = ["FOLD/AAA.RST.TXT"]
and
mylist =
[
"FOLD/AAA.RST.12345.TXT",
"FOLD/BBB.RST.12345.TXT",
"RUNS/AAA.FGT.12345.TXT",
"FOLD/AAA.RST.87589.TXT",
"RUNS/AAA.RST.11111.TXT"
]
How can I filter only those records of mylist
that correspond to target_list
? The expected result is:
"FOLD/AAA.RST.12345.TXT"
"FOLD/AAA.RST.87589.TXT"
The following mask is considered for filtering mylist
xxx/yyy.zzz.nnn.txt
If xxx
, yyy
and zzz
coincide with target_list
, then the record should be selected. Otherwise it should be dropped from the result.
How can I solve this task withou using for loop?
selected_list = []
for t in target_list:
r1 = l.split("/")[0]
a1 = l.split("/")[1].split(".")[0]
b1 = l.split("/")[1].split(".")[1]
for l in mylist:
r2 = l.split("/")[0]
a2 = l.split("/")[1].split(".")[0]
b2 = l.split("/")[1].split(".")[1]
if (r1==r2) & (a1==a2) & (b1==b2):
selected_list.append(l)
Define a function to filter values:
target_list = ["FOLD/AAA.RST.TXT"]
def keep(path):
template = get_template(path)
return template in target_list
def get_template(path):
front, numbers, ext = path.rsplit('.', 2)
template = '.'.join([front, ext])
return template
This uses str.rsplit
which searches the string in reverse and splits it on the given character, .
in this case. The parameter 2
means it only performs at most two splits. This gives us three parts, the front, the numbers, and the extension:
>>> 'FOLD/AAA.RST.12345.TXT'.rsplit('.', 2)
['FOLD/AAA.RST', '12345', 'TXT']
We assign these to front
, numbers
and ext
.
We then build a string again using str.join
>>> '.'.join(['FOLD/AAA.RST', 'TXT']
'FOLD/AAA.RST.TXT'
So this is what get_template
returns:
>>> get_template('FOLD/AAA.RST.12345.TXT')
'FOLD/AAA.RST.TXT'
We can use it like so:
mylist = [
"FOLD/AAA.RST.12345.TXT",
"FOLD/BBB.RST.12345.TXT",
"RUNS/AAA.FGT.12345.TXT",
"FOLD/AAA.RST.87589.TXT",
"RUNS/AAA.RST.11111.TXT"
]
from pprint import pprint
pprint(filter(keep, mylist))
Output:
['FOLD/AAA.RST.12345.TXT'
'FOLD/AAA.RST.87589.TXT']
You can define a "filter-making function" that preprocesses the target list. The advantages of this are:
target_list
in a set: The total time is O(N_target_list) + O(N)
, since set lookups are O(1) on average. def prefixes(target):
"""
>>> prefixes("FOLD/AAA.RST.TXT")
('FOLD', 'AAA', 'RST')
>>> prefixes("FOLD/AAA.RST.12345.TXT")
('FOLD', 'AAA', 'RST')
"""
x, rest = target.split('/')
y, z, *_ = rest.split('.')
return x, y, z
def matcher(target_list):
targets = set(prefixes(target) for target in target_list)
def is_target(t):
return prefixes(t) in targets
return is_target
Then, you could do:
>>> list(filter(matcher(target_list), mylist))
['FOLD/AAA.RST.12345.TXT', 'FOLD/AAA.RST.87589.TXT']
You can use regular expressions to define a pattern, and check if your strings match that pattern.
In this case, split the target
and insert a \\d+
in between the xxx/yyy.zzz.
and the .txt
part. Use this as the pattern.
The pattern \\d+
means any number of digits. The rest of the pattern will be created based on the literal values of xxx/yyy.zzz
and .txt
. Since the period has a special meaning in regular expressions, we have to escape it with a \\
.
import re
selected_list = []
for target in target_list:
base, ext = target.rsplit(".", 1)
pat = ".".join([base, "\d+", ext] ).replace(".", "\.")
selected_list.append([s for s in mylist if re.match(pat, s) is not None])
print(selected_list)
#[['FOLD/AAA.RST.12345.TXT', 'FOLD/AAA.RST.87589.TXT']]
If the pattern does not match, re.match
returns None
.
Why not use filter
+ lambda
function:
import re
result=list(filter(lambda item: re.sub(r'.[0-9]+', '', item) == target_list[0], mylist))
Some comments:
lambda
function, for each mylist item we replace digits with '', then compare against the only item in target_list, target_list[0]. filter
will match all items where the lambda function is True
list
to convert from filter
object to list
object
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.