I want to find duplicates( not to remove these duplicates but extract those repeating values) from multiple lists which are in a single list for example : a have list called Chunks which has 13 lists inside it.
my data is as follows
[[@TestRun
And user set text "#Surname" on textbox name "surname"
And user validate message on screen "Switch to paperless"
And user click on "Manage accounts" label
And user click link with label "View all online services"
And user waits for 10 seconds
Then page is successfully launched
And user click link with label "Go paperless for complete convenience"
Then page is successfully launched
And user validate message on screen "#EmailAddress"
And user clicks on the button "Confirm"
Then page is successfully launched
And user validate message on screen "#MessageValidate"
Then page is successfully launched
And user click on "menu open user preferences" label
And user clicks on the link "Statement and letter preferences"
Then page is successfully launched
And user validate "Switch to paperless" button is disabled
And user validate message on screen "Online only"
When user click on "Log out" label
Then page is successfully launched]
[@TestRun
And user click on link "Mobile site"
And user set text "#Surname" on textbox name "surname"
Then page is successfully launched
And user click on link "#Account"
Then page is successfully launched
And user verify message on screen "#Account"
And user verify message on screen "Manage statements"
And user verify message on screen "Step 1 of 3"
Then page is successfully launched
And user verify message on screen "Current format type"
And user verify message on screen "Online"
When user selects the radio button "Paper" ]
[@TestRun
And user set text "#Surname" on textbox name "surname"
Then user wait for page load
And user click on button "Continue to Online Banking"
Then user wait for page load
And user click on "menu open user preferences" label
And user clicks on the link "Statement and letter preferences"
Then page is successfully launched
And page is successfully launched
And user waits for 10 seconds ]
[ @TestRun
And user set text "#Surname" on textbox name "surname"
Then page is successfully launched
And user waits for 10 seconds
And user click checkbox "Telephone"
And user click checkbox "Post"
And user clicks on the button "Save"
Then page is successfully launched ]]
I have extracted every testcases in one list ie lines betwwen two @testrun as one list
import itertools as it
import more_itertools as mit
import pandas as pd
## got seperated all test case in seprate list i.e 13 test cases in 13 lists
with open('cust_pref.txt', "r") as f1:
lines_1 = f1.readlines()
pred_1 = lambda x: x.startswith("@TestRun")
inv_pred_1 = lambda x: not pred_1(x)
lines_1 = it.dropwhile(inv_pred_1, lines_1)
chunks_1 = list(mit.split_before(lines_1, pred_1))
##print the list of testcases
print(chunks_1)
Now I need to find out how to find common in all this lists and how can I know from which list which are common
I tried out following
def get_duplicated_element(array):
global result, checked_elements
checked_elements = []
result = -1
def array_recursive_check(array):
global result, checked_elements
if result != -1: return
for i in array:
if type(i) == list:
if i in checked_elements:
result = i
return
checked_elements.append(i)
array_recursive_check(i)
array_recursive_check(array)
return result
get_duplicated_element(chunks_1) ## this gives the answer as -1 , which is not expected
Expected output is: finding common values /lines(in my cases) and if possible which steps comes in which list number in python
Desired output is :
{
And user set text "#Surname" on textbox name "surname"
Then page is successfully launched
}
AS these steps are repeated in every list so these sholud be the output
I have used following to get duplicates
def find_dupe(lists, target):
seen = set()
for lst in lists:
for item in lst:
if item == target and item in seen:
return True
seen.add(item)
seen, dups = set(), set()
for l in chunks:
dups = dups.union(seen.intersection(set(l)))
seen = seen.union(set(l))
I get some duplicates from this but now my problem is i dont know which line is from which list ? Is there any way to achieve this mapping which values corresponds to which list
Not quite the desired output you want, but you can get a hint to process further on. Check this:
>>> data = [['@TestRun',
' And user set text "#Surname" on textbox name "surname"',
' And user validate message on screen "Switch to paperless" ',
' And user click on "Manage accounts" label ',
' And user click link with label "View all online services" ',
' And user waits for 10 seconds ',
' Then page is successfully launched ',
' And user click link with label "Go paperless for complete convenience" ',
' Then page is successfully launched ',
' And user validate message on screen "#EmailAddress" ',
' And user clicks on the button "Confirm" ',
' Then page is successfully launched ',
' And user validate message on screen "#MessageValidate" ',
' Then page is successfully launched ',
' And user click on "menu open user preferences" label ',
' And user clicks on the link "Statement and letter preferences" ',
' Then page is successfully launched ',
' And user validate "Switch to paperless" button is disabled ',
' And user validate message on screen "Online only" ',
' When user click on "Log out" label ',
' Then page is successfully launched'],
['@TestRun ',
' And user click on link "Mobile site" ',
' And user set text "#Surname" on textbox name "surname" ',
' Then page is successfully launched ',
' And user click on link "#Account" ',
' Then page is successfully launched ',
' And user verify message on screen "#Account" ',
' And user verify message on screen "Manage statements" ',
' And user verify message on screen "Step 1 of 3" ',
' Then page is successfully launched ',
' And user verify message on screen "Current format type" ',
' And user verify message on screen "Online" ',
' When user selects the radio button "Paper"'],
['@TestRun',
' And user set text "#Surname" on textbox name "surname"',
'Then user wait for page load',
'And user click on button "Continue to Online Banking"',
'Then user wait for page load',
' And user click on "menu open user preferences" label ',
' And user clicks on the link "Statement and letter preferences" ',
' Then page is successfully launched ',
' And page is successfully launched ',
' And user waits for 10 seconds']]
>>> data = [[line.strip() for line in test_set] for test_set in data]
>>> linewise_counts = {}
>>> for list_index,test_set in enumerate(pdata):
for line in test_set:
linewise_counts.setdefault(line,set()).add(list_index)
>>> duplicates = ["{} -> {}".format(line, in_list) for line,in_list in linewise_counts.items() if len(in_list)>1]
>>> duplicates
['And user clicks on the link "Statement and letter preferences" -> set([0, 2])',
'And user waits for 10 seconds -> set([0, 2])',
'Then page is successfully launched -> set([0, 1, 2])',
'@TestRun -> set([0, 1, 2])',
'And user set text "#Surname" on textbox name "surname" -> set([0, 1, 2])',
'And user click on "menu open user preferences" label -> set([0, 2])']
You can do something with re
and defaultdict
def read_file(filehandle):
''' yields the chunks of the file, delimited by the `@TestRun`'''
count = 0
text = mmap.mmap(file.fileno(), 0) # read all text in memory
# https://stackoverflow.com/a/454589/1562285
string_pattern = re.compile(rb'(?:\[\@TestRun(.+?)\].*?)*', re.DOTALL)
for item in string_pattern.findall(text):
if item:
yield count, [i.strip() for i in item.decode('utf8').strip().split('\n')]
count += 1
def parse_chunks(chunks):
""" puts the lines of these chunks into a dick, with the line as key and a list of the positions of this line `(chunk_no, line_no) as value`"""
result = collections.defaultdict(list)
for chunk_no, lines in chunks:
for i, line in enumerate(lines):
result[line].append((chunk_no, i))
return dict(result)
Then you can use this like this
with open(file, 'r') as file
chunks = read_file(file)
result = parse_chunks(chunks)
{ 'And user set text "#Surname" on textbox name "surname"': [(0, 0), (1, 1), (2, 0), (3, 0)], 'And user validate message on screen "Switch to paperless"': [(0, 1)], 'And user click on "Manage accounts" label': [(0, 2)], 'And user click link with label "View all online services"': [(0, 3)], 'And user waits for 10 seconds': [(0, 4), (2, 8), (3, 2)], 'Then page is successfully launched': [(0, 5), (0, 7), (0, 10), (0, 12), (0, 15), (0, 19), (1, 2), (1, 4), (1, 8), (2, 6), (3, 1), (3, 6)], 'And user click link with label "Go paperless for complete convenience"': [(0, 6)], 'And user validate message on screen "#EmailAddress"': [(0, 8)], 'And user clicks on the button "Confirm"': [(0, 9)], 'And user validate message on screen "#MessageValidate"': [(0, 11)], 'And user click on "menu open user preferences" label': [(0, 13), (2, 4)], 'And user clicks on the link "Statement and letter preferences"': [(0, 14), (2, 5)], 'And user validate "Switch to paperless" button is disabled': [(0, 16)], 'And user validate message on screen "Online only"': [(0, 17)], 'When user click on "Log out" label': [(0, 18)], 'And user click on link "Mobile site"': [(1, 0)], 'And user click on link "#Account"': [(1, 3)], 'And user verify message on screen "#Account"': [(1, 5)], 'And user verify message on screen "Manage statements"': [(1, 6)], 'And user verify message on screen "Step 1 of 3"': [(1, 7)], 'And user verify message on screen "Current format type"': [(1, 9)], 'And user verify message on screen "Online"': [(1, 10)], 'When user selects the radio button "Paper"': [(1, 11)], 'Then user wait for page load': [(2, 1), (2, 3)], 'And user click on button "Continue to Online Banking"': [(2, 2)], 'And page is successfully launched': [(2, 7)], 'And user click checkbox "Telephone"': [(3, 3)], 'And user click checkbox "Post"': [(3, 4)], 'And user clicks on the button "Save"': [(3, 5)] }
You can filter those duplicates with
{key: value for key, value in result.items() if len(value)> 1}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.