简体   繁体   中英

percentage of values matching between two columns of a csv in python

I have a dataframe like this

df = pd.DataFrame({"true_key" :["Astral","Blob","Blob","Cat","Astral"], "true_key2": ["Japan","Astral","Blob","quics","Cat"]})

How do I calculate the percentage of values present in true_key that are present in true_key2 and vice versa?

So, as we can see 100% of true_key values are present in true_key2. And 60% of true_key2 are present in true_key

Is there any other method to do it in Python?

Thanks in advance.

one way would be to use set intersection and divide len accordingly:

mutual_len = len(set(df['true_key']).intersection(set(df['true_key2'])))
mutual_len / df['true_key'].nunique(), mutual_len / df['true_key2'].nunique()

(1.0, 0.6)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM