I have two dicts created from csv files (see file below):
a_data = {
'78567908': {'26.01.21', '02.03.24', '26.01.12', '02.03.01', '04.03.03', '01.01.13', '01.01.10', '26.01.17'},
'85789070': {'02.03.17', '02.05.01', '02.05.04', '26.01.02', '09.01.04'},
'87140110': {'03.15.19', '03.15.25', '03.15.24'},
'87142218': {'26.17.13', '02.03.22', '02.11.01'},
'87006826': {'28.01.03'}
}
p_data = {
'78567908': {'24.11.01', '26.01.21', '24.11.02', '02.03.24', '02.03.01', '04.03.03', '01.01.13', '26.01.18', '01.01.10'},
'85789070': {'02.05.05', '02.03.17', '02.05.24', '02.05.01', '02.05.04', '26.01.02', '09.01.04'},
'87140110': {'03.15.19', '03.15.25', '03.15.10', '03.15.24'},
'87142218': {'26.17.13', '02.03.22', '02.11.01', '02.03.02', '02.03.24', '02.11.13'},
'87006826': {'28.01.03'}
}
I am trying to compare p_data
to a_data
. I want to know for each key in a_data
& p_data
, what is the intersection & what values are in a_data
but not in p_data
.
for key 78567908
, p_data
has 6 out of 8 values. The common values are
01.01.10
01.01.13
02.03.01
02.03.24
04.03.03
26.01.21
and the missing values are
26.01.12
26.01.17
The csv files look like this:
78567908,01.01.10,01.01.13,02.03.01,02.03.24,04.03.03,26.01.12,26.01.17,26.01.21
85789070,02.03.17,02.05.01,02.05.04,09.01.04,26.01.02
87140110,03.15.19,03.15.24,03.15.25
87142218,02.03.22,02.11.01,26.17.13
87006826,28.01.03
I created the dicts using this code:
a_data = {}
with open(cvsfile) as fin:
reader = csv.reader(fin, skipinitialspace=True)
for row in reader:
a_data[row[0]]=set(row[1:])
If there is a better way than dicts (like data frames) to arrive as the same product, I will accept that as an answer. So far, I've only managed to create two dictionaries or data frames, but no progress on comparing the two dicts/data frames.
You can try this using pandas:
import pandas as pd
a_data = {'78567908': {'26.01.21', '02.03.24', '26.01.12', '02.03.01', '04.03.03', '01.01.13', '01.01.10', '26.01.17'}, '85789070': {'02.03.17', '02.05.01', '02.05.04', '26.01.02', '09.01.04'}, '87140110': {'03.15.19', '03.15.25', '03.15.24'}, '87142218': {'26.17.13', '02.03.22', '02.11.01'}, '87006826': {'28.01.03'}}
p_data = {'78567908': {'24.11.01', '26.01.21', '24.11.02', '02.03.24', '02.03.01', '04.03.03', '01.01.13', '26.01.18', '01.01.10'}, '85789070': {'02.05.05', '02.03.17', '02.05.24', '02.05.01', '02.05.04', '26.01.02', '09.01.04'}, '87140110': {'03.15.19', '03.15.25', '03.15.10', '03.15.24'}, '87142218': {'26.17.13', '02.03.22', '02.11.01', '02.03.02', '02.03.24', '02.11.13'}, '87006826': {'28.01.03'}}
a = pd.DataFrame.from_dict(a_data, orient='index')
p = pd.DataFrame.from_dict(p_data, orient='index')
a.apply(lambda x: sum(i in p.loc[x.name,:].tolist() for i in x.dropna()), axis=1)
Output:
78567908 6
85789070 5
87140110 3
87142218 3
87006826 1
dtype: int64
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.