percentage of values matching between two columns of a csv in python

Question

I have a dataframe like this

df = pd.DataFrame({"true_key" :["Astral","Blob","Blob","Cat","Astral"], "true_key2": ["Japan","Astral","Blob","quics","Cat"]})

How do I calculate the percentage of values present in true_key that are present in true_key2 and vice versa?

So, as we can see 100% of true_key values are present in true_key2. And 60% of true_key2 are present in true_key

Is there any other method to do it in Python?

Thanks in advance.

Answer 1

one way would be to use set intersection and divide len accordingly:

mutual_len = len(set(df['true_key']).intersection(set(df['true_key2'])))
mutual_len / df['true_key'].nunique(), mutual_len / df['true_key2'].nunique()

(1.0, 0.6)

percentage of values matching between two columns of a csv in python

Question

1 answers

solution1
1 ACCPTED 2021-10-04 18:49:12

percentage of values matching between two columns of a csv in python

Question

1 answers

solution1 1 ACCPTED 2021-10-04 18:49:12

solution1
1 ACCPTED 2021-10-04 18:49:12