find all items in list with partial content matches

Question

I am trying to match up items in a list if the items contain the same strings as other items in the list.

So I have a list and I am only checking the items in the list if the have a '.' in them currently.

for g in groups:
    if '.' in g:
        print(g)

663.ord1,664.ord1
947.dfw3,949.dfw3
663.ord1
665.ord1,664.ord1
663.ord1,665.ord1
949.dfw3,948.dfw3
949.dfw3
947.dfw3,948.dfw3

What I want to do is print a 2 item list if the first part of the item matches another first part of the item (so separated on '.'

So for the input listed above. I am looking for the following, not necessarily in this order:

['663.ord1,664.ord1', '663.ord1']
['947.dfw3,949.dfw3','949.dfw3,948.dfw3']
['947.dfw3,949.dfw3','949.dfw3']
['947.dfw3,949.dfw3','947.dfw3,948.dfw3']
['665.ord1,664.ord1','663.ord1,665.ord1']
['663.ord1,665.ord1','663.ord1']
['949.dfw3,948.dfw3','947.dfw3,948.dfw3']

...I think I got them all...

Anyone have an idea how this could be done?

Answer 1

This could be done with Regular Expressions. For example, you could use something along the lines of pattern.search('.').span and substrings to split the strings and then compare. I'm not quite clear on which criteria you actually want the lists to be based on, but I'll do something that'll get your example to show how it would work. In actual code, that would look like this:

import re

def match_parts():
    # The list that's going to contain our results
    result = list()
    # assign the pattern we're going to match.
    pattern = re.compile('\.[a-z]*,')
    for g in groups:
        m = pattern.search(g)
        sp = m.span()
        str = g[sp[1]:sp[2]]    # Get a substring containing data we want compared
        for h in groups:
            m2 = pattern.search(h)
            sp2 = m2.span()
            str2 = h[sp2[1]:sp2[2]]
            if (str == str2):
                results.append([g,h])

    for i in results:
            print(i)

This should somewhat do what you want. There might be some redundancy due to not removing the item from the list after finding out it matched, but other than that it should be good. Please comment if I made a mistake and I'll correct it.

Answer 2

Here is a possible solution:

groups = ['663.ord1,664.ord1', '947.dfw3,949.dfw3', '663.ord1', '665.ord1,664.ord1', '663.ord1,665.ord1', '949.dfw3,948.dfw3', '949.dfw3', '947.dfw3,948.dfw3', 'plus other stuff']

# use a list comprehension to get items that have '.'
g1 = [g for g in groups if '.' in g]
g2 = g1
# use 'set' to get unique combinations
# use split(',')[0] to get the first element in e1
# check whether that split element is in e2
# check that elements in each tuple are not identical, i.e. e1 not equal to e2
s = set((e1,e2) for e1 in g1 for e2 in g2 if e1.split(',')[0] in e2 and e1 != e2)

# You would also want to get rid of reverse duplicates:
# For explanation see accepted answer at: https://stackoverflow.com/questions/41164630/pythonic-way-of-removing-reversed-duplicates-in-list/41173005#41173005
s2 = {tuple(sorted([e1,e2])) for (e1,e2) in s}

# Then you can print out the lists
for (e1,e2) in s2:
    print([e1,e2])

find all items in list with partial content matches

Question

2 answers

solution1
0 2019-08-06 16:19:33

solution2
0 2019-08-06 16:57:19

find all items in list with partial content matches

Question

2 answers

solution1 0 2019-08-06 16:19:33

solution2 0 2019-08-06 16:57:19

solution1
0 2019-08-06 16:19:33

solution2
0 2019-08-06 16:57:19