简体   繁体   中英

Check if two strings contain the same pattern in python

I have the following list:

names = ['s06_215','s06_235b','s06_235','s08_014','18:s08_014','s08_056','s08_169']

s06_235b and s06_235 , s08_014 and 18:s08_014 are duplicated. However, as shown in the example, there is no specific pattern in the naming. I need to do a pairwise comparison of the element of the list:

for i in range(0, len(names)-1):
    for index, value in enumerate(names):
        print names[i], names[index]

I need then to check for each pair, if the two, contain the same string but with length more than 4 . That is s06_235b and s06_235 , and s08_014 and 18:s08_014 would pass this criterion but s08_056 and s08_169 would not.

How can I achieve this in Python?

You could iterate all the combinations , join them with some special character that can not be part of those strings, and use a regular expression like (\\w{5,}).*#.*\\1 to find a repeated group in that pair. Other than just testing with s1 in s2 , this will also work if just a part of the first string is contained in the second, or vice versa.

Here, (\\w{5,}) is the shared substring of at least 5 characters (from the \\w class in this case, but feel free to adapt), followed by more characters .* the separator ( # in this case), more filler .* and then another instance of the first group \\1 .

p = re.compile(r"(\w{5,}).*#.*\1")
for pair in itertools.combinations(names, 2):
    m = p.search("#".join(pair))
    if m:
        print("%r shares %r" % (pair, m.group(1)))

Output:

('s06_215', 's06_235b') shares 's06_2'
('s06_215', 's06_235') shares 's06_2'
('s06_235b', 's06_235') shares 's06_235'
('s08_014', '18:s08_014') shares 's08_014'
('s08_014', 's08_056') shares 's08_0'
('18:s08_014', 's08_056') shares 's08_0'

Of course, you can tweak the regex to fit your needs. Eg, if you do not want the repeated region to be bounded by _ , you could use a regex like p = r"([a-z0-9]\\w{3,}[a-z0-9]).*#.*\\1" .

You can use an 'in' operator to see if on variable contains another

if "example" in "this is an example":

Try this:

for i in range(0, len(names)-1):
    for index, value in enumerate(names):
       if names[i] in names[index] and len(names[i]) > 4:
          print names[i], names[index]

Edit: As tobias_k mention: Note that this only works if the entire string is contained in the other string

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM