從python中的多個列表中查找重復項

Question

我想從單個列表中的多個列表中找到重復項（而不是刪除這些重復項，而是提取那些重復值）：一個名為Chunks的has列表，其中有13個列表。

我的數據如下

[[@TestRun
    And user set text "#Surname" on textbox name "surname"
    And user validate message on screen "Switch to paperless" 
    And user click on "Manage accounts" label 
    And user click link with label "View all online services" 
    And user waits for 10 seconds 
    Then page is successfully launched 
    And user click link with label "Go paperless for complete convenience" 
    Then page is successfully launched 
    And user validate message on screen "#EmailAddress" 
    And user clicks on the button "Confirm" 
    Then page is successfully launched 
    And user validate message on screen "#MessageValidate" 
    Then page is successfully launched 
    And user click on "menu open user preferences" label 
    And user clicks on the link "Statement and letter preferences" 
    Then page is successfully launched 
    And user validate "Switch to paperless" button is disabled 
    And user validate message on screen "Online only" 
    When user click on "Log out" label 
    Then page is successfully launched]

[@TestRun 
    And user click on link "Mobile site" 
    And user set text "#Surname" on textbox name "surname" 
    Then page is successfully launched 
    And user click on link "#Account" 
    Then page is successfully launched 
    And user verify message on screen "#Account" 
    And user verify message on screen "Manage statements" 
    And user verify message on screen "Step 1 of 3" 
    Then page is successfully launched 
    And user verify message on screen "Current format type"  
    And user verify message on screen "Online" 
    When user selects the radio button "Paper" ]


[@TestRun
 And user set text "#Surname" on textbox name "surname"
Then user wait for page load
And user click on button "Continue to Online Banking"
Then user wait for page load
    And user click on "menu open user preferences" label 
    And user clicks on the link "Statement and letter preferences" 
    Then page is successfully launched 
    And page is successfully launched 
    And user waits for 10 seconds ]
[ @TestRun
    And user set text "#Surname" on textbox name "surname"
    Then page is successfully launched 
    And user waits for 10 seconds 
    And user click checkbox "Telephone" 
    And user click checkbox "Post" 
    And user clicks on the button "Save" 
    Then page is successfully launched ]]

我已經將每個測試用例提取到一個列表中，即兩個@testrun之間的行作為一個列表

 import itertools as it
import more_itertools as mit
import pandas as pd
## got seperated all test case in seprate list i.e 13 test cases in 13 lists
with open('cust_pref.txt', "r") as f1:
    lines_1 = f1.readlines()

    pred_1 = lambda x: x.startswith("@TestRun")      
    inv_pred_1 = lambda x: not pred_1(x)

    lines_1 = it.dropwhile(inv_pred_1, lines_1)         
    chunks_1 = list(mit.split_before(lines_1, pred_1))
##print the list of testcases
print(chunks_1)

現在，我需要找出如何在所有這些列表中找到共同點，以及如何從哪個列表中知道哪些是共同點

我嘗試了以下

def get_duplicated_element(array):
    global result, checked_elements
    checked_elements = []
    result = -1
    def array_recursive_check(array):
        global result, checked_elements
        if result != -1: return
        for i in array:
            if type(i) == list:
                if i in checked_elements:
                    result = i
                    return
                checked_elements.append(i)
                array_recursive_check(i)
    array_recursive_check(array)
    return result

get_duplicated_element(chunks_1) ## this gives the answer as -1 , which is not expected

預期的輸出是：查找通用值/行（在我的情況下），並在可能的情況下找到哪些步驟進入python的列表編號

所需的輸出是：

{  
    And user set text "#Surname" on textbox name "surname"
    Then page is successfully launched 
}

由於在每個列表中都重復了這些步驟，所以這些應該是輸出

我已經使用以下副本

def find_dupe(lists, target):
    seen = set()
    for lst in lists:
        for item in lst:
            if item == target and item in seen:
                return True
            seen.add(item)

seen, dups = set(), set()
for l in chunks:
    dups = dups.union(seen.intersection(set(l)))
    seen = seen.union(set(l))

我從中得到一些重復，但是現在我的問題是我不知道哪個行來自哪個列表？ 有什么方法可以實現此映射，哪些值對應於哪個列表

Answer 1

並不是您想要的輸出，但是您可以得到進一步處理的提示。 檢查一下：

>>> data = [['@TestRun',
  '    And user set text "#Surname" on textbox name "surname"',
  '    And user validate message on screen "Switch to paperless" ',
  '    And user click on "Manage accounts" label ',
  '    And user click link with label "View all online services" ',
  '    And user waits for 10 seconds ',
  '    Then page is successfully launched ',
  '    And user click link with label "Go paperless for complete convenience" ',
  '    Then page is successfully launched ',
  '    And user validate message on screen "#EmailAddress" ',
  '    And user clicks on the button "Confirm" ',
  '    Then page is successfully launched ',
  '    And user validate message on screen "#MessageValidate" ',
  '    Then page is successfully launched ',
  '    And user click on "menu open user preferences" label ',
  '    And user clicks on the link "Statement and letter preferences" ',
  '    Then page is successfully launched ',
  '    And user validate "Switch to paperless" button is disabled ',
  '    And user validate message on screen "Online only" ',
  '    When user click on "Log out" label ',
  '    Then page is successfully launched'],
 ['@TestRun ',
  '    And user click on link "Mobile site" ',
  '    And user set text "#Surname" on textbox name "surname" ',
  '    Then page is successfully launched ',
  '    And user click on link "#Account" ',
  '    Then page is successfully launched ',
  '    And user verify message on screen "#Account" ',
  '    And user verify message on screen "Manage statements" ',
  '    And user verify message on screen "Step 1 of 3" ',
  '    Then page is successfully launched ',
  '    And user verify message on screen "Current format type"  ',
  '    And user verify message on screen "Online" ',
  '    When user selects the radio button "Paper"'],
 ['@TestRun',
  ' And user set text "#Surname" on textbox name "surname"',
  'Then user wait for page load',
  'And user click on button "Continue to Online Banking"',
  'Then user wait for page load',
  '    And user click on "menu open user preferences" label ',
  '    And user clicks on the link "Statement and letter preferences" ',
  '    Then page is successfully launched ',
  '    And page is successfully launched ',
  '    And user waits for 10 seconds']]
>>> data = [[line.strip() for line in test_set] for test_set in data]
>>> linewise_counts = {}
>>> for list_index,test_set in enumerate(pdata):
        for line in test_set:
            linewise_counts.setdefault(line,set()).add(list_index)


>>> duplicates = ["{} -> {}".format(line, in_list) for line,in_list in linewise_counts.items() if len(in_list)>1]
>>> duplicates
['And user clicks on the link "Statement and letter preferences" -> set([0, 2])',
 'And user waits for 10 seconds -> set([0, 2])',
 'Then page is successfully launched -> set([0, 1, 2])',
 '@TestRun -> set([0, 1, 2])',
 'And user set text "#Surname" on textbox name "surname" -> set([0, 1, 2])',
 'And user click on "menu open user preferences" label -> set([0, 2])']

Answer 2

您可以使用re和defaultdict

def read_file(filehandle):
''' yields the chunks of the file, delimited by the `@TestRun`'''
    count = 0
    text = mmap.mmap(file.fileno(), 0)  # read all text in memory
    # https://stackoverflow.com/a/454589/1562285
    string_pattern = re.compile(rb'(?:\[\@TestRun(.+?)\].*?)*', re.DOTALL)
    for item in string_pattern.findall(text):
        if item:
            yield count, [i.strip() for i in item.decode('utf8').strip().split('\n')]
            count += 1

def parse_chunks(chunks):
""" puts the lines of these chunks into a dick, with the line as key and a list of the positions of this line `(chunk_no, line_no) as value`"""
    result = collections.defaultdict(list)
    for chunk_no, lines in chunks:
        for i, line in enumerate(lines):
            result[line].append((chunk_no, i))
    return dict(result)

然后你可以這樣使用

with open(file, 'r') as file
    chunks = read_file(file)
    result = parse_chunks(chunks)

 { 'And user set text "#Surname" on textbox name "surname"': [(0, 0), (1, 1), (2, 0), (3, 0)], 'And user validate message on screen "Switch to paperless"': [(0, 1)], 'And user click on "Manage accounts" label': [(0, 2)], 'And user click link with label "View all online services"': [(0, 3)], 'And user waits for 10 seconds': [(0, 4), (2, 8), (3, 2)], 'Then page is successfully launched': [(0, 5), (0, 7), (0, 10), (0, 12), (0, 15), (0, 19), (1, 2), (1, 4), (1, 8), (2, 6), (3, 1), (3, 6)], 'And user click link with label "Go paperless for complete convenience"': [(0, 6)], 'And user validate message on screen "#EmailAddress"': [(0, 8)], 'And user clicks on the button "Confirm"': [(0, 9)], 'And user validate message on screen "#MessageValidate"': [(0, 11)], 'And user click on "menu open user preferences" label': [(0, 13), (2, 4)], 'And user clicks on the link "Statement and letter preferences"': [(0, 14), (2, 5)], 'And user validate "Switch to paperless" button is disabled': [(0, 16)], 'And user validate message on screen "Online only"': [(0, 17)], 'When user click on "Log out" label': [(0, 18)], 'And user click on link "Mobile site"': [(1, 0)], 'And user click on link "#Account"': [(1, 3)], 'And user verify message on screen "#Account"': [(1, 5)], 'And user verify message on screen "Manage statements"': [(1, 6)], 'And user verify message on screen "Step 1 of 3"': [(1, 7)], 'And user verify message on screen "Current format type"': [(1, 9)], 'And user verify message on screen "Online"': [(1, 10)], 'When user selects the radio button "Paper"': [(1, 11)], 'Then user wait for page load': [(2, 1), (2, 3)], 'And user click on button "Continue to Online Banking"': [(2, 2)], 'And page is successfully launched': [(2, 7)], 'And user click checkbox "Telephone"': [(3, 3)], 'And user click checkbox "Post"': [(3, 4)], 'And user clicks on the button "Save"': [(3, 5)] }

您可以使用

{key: value for key, value in result.items() if len(value)> 1}

從python中的多個列表中查找重復項

問題描述

2 個解決方案

解決方案1
0 2017-12-04 06:30:46

解決方案2
0 2017-12-04 09:33:13

從python中的多個列表中查找重復項

問題描述

2 個解決方案

解決方案1 0 2017-12-04 06:30:46

解決方案2 0 2017-12-04 09:33:13

解決方案1
0 2017-12-04 06:30:46

解決方案2
0 2017-12-04 09:33:13