I have a dataframe with 3 columns:
'Date'
unique for each row'Group'
group to which each row belongs'ID'
identifier for each rowIn the 'ID'
columns there are some duplicate values. I want to count the number of repetitions for each group and for each date.
Example of dataframe:
Date Group ID
0 2021-10-29 09:15:52 B 9352
1 2021-10-29 10:58:57 A 9352
2 2021-10-29 11:20:46 C 9352
3 2021-10-29 12:47:34 C 6274
4 2021-10-29 15:41:35 C 1677
5 2021-10-29 16:12:39 B 1677
6 2021-10-29 18:57:56 B 9225
7 2021-10-29 19:46:46 C 9225
8 2021-10-30 01:23:07 C 9225
9 2021-10-30 02:13:57 A 9225
10 2021-10-30 05:03:52 B 9329
11 2021-10-30 07:48:39 B 9329
12 2021-10-30 08:45:00 A 9329
13 2021-10-30 11:17:47 C 9329
14 2021-10-30 21:46:07 C 9496
15 2021-10-30 22:13:29 A 1218
16 2021-10-31 05:39:38 C 2422
17 2021-10-31 05:39:41 C 9654
18 2021-10-31 10:10:21 A 1951
19 2021-10-31 10:19:45 A 1951
20 2021-10-31 16:40:10 A 1951
21 2021-10-31 16:41:23 C 1951
22 2021-10-31 22:07:16 A 1951
23 2021-11-01 00:26:30 C 2867
24 2021-11-01 01:25:46 B 2867
25 2021-11-01 01:53:16 B 4262
26 2021-11-01 01:58:30 A 4581
27 2021-11-01 05:23:26 C 1734
28 2021-11-01 05:38:22 C 1734
29 2021-11-01 05:47:43 C 1734
30 2021-11-01 06:49:27 A 4813
31 2021-11-01 07:54:02 C 4813
32 2021-11-01 12:10:48 C 8661
33 2021-11-01 14:32:50 C 4138
34 2021-11-01 18:38:16 B 4138
35 2021-11-01 21:33:37 C 4138
36 2021-11-02 02:34:53 B 4138
37 2021-11-02 04:45:56 C 4138
38 2021-11-02 07:33:38 C 4138
39 2021-11-02 07:40:33 C 4138
40 2021-11-02 08:06:21 B 4138
41 2021-11-02 08:32:20 C 4138
42 2021-11-02 09:47:26 A 4138
43 2021-11-02 15:51:33 C 4138
44 2021-11-02 16:04:33 B 2433
45 2021-11-02 20:47:01 B 2433
46 2021-11-03 04:11:57 A 4594
47 2021-11-03 12:36:16 A 6829
48 2021-11-03 14:42:14 A 6829
49 2021-11-03 18:03:27 B 7138
50 2021-11-03 18:46:46 C 7138
51 2021-11-03 19:07:01 C 7138
52 2021-11-03 19:23:02 A 9752
53 2021-11-03 21:10:51 A 2699
54 2021-11-04 00:58:12 C 2699
55 2021-11-04 03:44:12 A 7463
56 2021-11-04 05:40:07 C 4558
57 2021-11-04 05:56:51 C 7855
58 2021-11-04 06:27:28 C 7855
59 2021-11-04 07:50:46 C 7855
Desired result:
Date Group Repetitions
0 2021-10-29 A 1
1 2021-10-29 B 3
2 2021-10-29 C 3
3 2021-10-30 A 2
4 2021-10-30 B 2
5 2021-10-30 C 2
6 2021-10-31 A 4
7 2021-10-31 B 0
8 2021-10-31 C 1
9 2021-11-01 A 1
10 2021-11-01 B 2
11 2021-11-01 C 7
12 2021-11-02 A 1
13 2021-11-02 B 4
14 2021-11-02 C 5
15 2021-11-03 A 3
16 2021-11-03 B 1
17 2021-11-03 C 2
18 2021-11-04 A 0
19 2021-11-04 B 0
20 2021-11-04 C 4
Note that the duplicate conditions spans over dates and groups: the 'ID'
2699
in the example above counts as duplicate even if those repetitions belongs to different dates and groups.
IIUC, you can replace the duplicated values with 1, the others with zero, then groupby
+ agg
+ sum
:
(df.assign(Date=pd.to_datetime(df['Date']).dt.normalize(),
ID=df['ID'].duplicated(keep=False).astype(int)
)
.groupby(['Date', 'Group'], as_index=False).agg(repetitions=('ID', 'sum'))
)
output:
Date Group repetitions
0 2021-10-29 A 1
1 2021-10-29 B 3
2 2021-10-29 C 3
3 2021-10-30 A 2
4 2021-10-30 B 2
5 2021-10-30 C 2
6 2021-10-31 A 4
7 2021-10-31 C 1
8 2021-11-01 A 1
9 2021-11-01 B 2
10 2021-11-01 C 7
11 2021-11-02 A 1
12 2021-11-02 B 4
13 2021-11-02 C 5
14 2021-11-03 A 3
15 2021-11-03 B 1
16 2021-11-03 C 2
17 2021-11-04 A 0
18 2021-11-04 C 4
dates = pd.to_datetime(df['Date']).dt.normalize()
idx = pd.MultiIndex.from_product([dates.unique(), df['Group'].unique()], names=['Date', 'Group'])
(df.assign(Date=dates,
ID=df['ID'].duplicated(keep=False).astype(int)
)
.groupby(['Date', 'Group']).agg(repetitions=('ID', 'sum'))
.reindex(idx, fill_value=0)
.reset_index()
)
output:
Date Group repetitions
0 2021-10-29 B 3
1 2021-10-29 A 1
2 2021-10-29 C 3
3 2021-10-30 B 2
4 2021-10-30 A 2
5 2021-10-30 C 2
6 2021-10-31 B 0
7 2021-10-31 A 4
8 2021-10-31 C 1
9 2021-11-01 B 2
10 2021-11-01 A 1
11 2021-11-01 C 7
12 2021-11-02 B 4
13 2021-11-02 A 1
14 2021-11-02 C 5
15 2021-11-03 B 1
16 2021-11-03 A 3
17 2021-11-03 C 2
18 2021-11-04 B 0
19 2021-11-04 A 0
20 2021-11-04 C 4
df['ID'].value_counts()
应该给你每个 id 出现的次数。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.