Eliminate duplicates in dictionary Python

Question

I have a csv file separated by tabs:

I need only to focus in the two first columns and find, for example, if the pair AB appears in the document again as BA and print AB if the BA appears. The same for the rest of pairs.

For the example proposed the output is: · AB & CD

    dic ={}
    import sys
    import os
    import pandas as pd
    import numpy as np
    import csv

    colnames = ['col1', 'col2', 'col3', 'col4', 'col5']

    data = pd.read_csv('koko.csv', names=colnames, delimiter='\t')

    col1 = data.col1.tolist()
    col2 = data.col2.tolist()

    dataset = list(zip(col1,col2))
    for a,b in dataset:
        if (a,b) and (b,a) in dataset:
        dic [a] = b
print (dic)

output = {'A': 'B', 'B': 'A', 'D': 'C', 'C':'D'}

How can I avoid duplicated (or swapped) results in the dictionary?

Answer 1

Does this work?:

import pandas as pd
import numpy as np

col_1 = ['A', 'B', 'C', 'B', 'D']
col_2 = ['B', 'C', 'D', 'A', 'C']

df = pd.DataFrame(np.column_stack([col_1,col_2]), columns = ['Col1', 'Col2'])

df['combined'] = list(zip(df['Col1'], df['Col2']))

final_set = set(tuple(sorted(t)) for t in df['combined'])

final_set looks like this:

 {('C', 'D'), ('A', 'B'), ('B', 'C')}

The output contains more than AB and CD because of the second row that has BC

Answer 2

The below should work,

example df used:

df = pd.DataFrame({'Col1' : ['A','C','D','B','D','A'], 'Col2' : ['B','D','C','A','C','B']})

This is the function I used:

 temp = df[['Col1','Col2']].apply(lambda row: sorted(row), axis = 1)
 print(temp[['Col1','Col2']].drop_duplicates())

useful links:

checking if a string is in alphabetical order in python

Difference between map, applymap and apply methods in Pandas

Answer 3

Here is one way.

df = pd.DataFrame({'Col1' : ['A','C','D','B','D','A','E'],
                   'Col2' : ['B','D','C','A','C','B','F']})

df = df.drop_duplicates()\
       .apply(sorted, axis=1)\
       .loc[df.duplicated(subset=['Col1', 'Col2'], keep=False)]\
       .drop_duplicates()

#   Col1 Col2
# 0    A    B
# 1    C    D

Explanation

The steps are:

Remove duplicate rows.
Sort dataframe by row.
Remove unique rows by keeping only duplicates.
Remove duplicate rows again.

Eliminate duplicates in dictionary Python

Question

3 answers

solution1
0 2018-03-05 20:44:56

solution2
0 2018-03-05 21:03:36

solution3
0 2018-03-06 00:15:57

Eliminate duplicates in dictionary Python

Question

3 answers

solution1 0 2018-03-05 20:44:56

solution2 0 2018-03-05 21:03:36

solution3 0 2018-03-06 00:15:57

solution1
0 2018-03-05 20:44:56

solution2
0 2018-03-05 21:03:36

solution3
0 2018-03-06 00:15:57