I am trying to generate a list from a pandas data frame based on certain conditions on column values in data frame, my df looks something like
df =
48 150 39 0
0 BE0974302342 0 0 21
1 BE0974302342 3 3 19
2 BE0974302342 F 2 2
3 FR0000073843 0 0 22
4 FR0000073843 3 3 20
5 FR0000073843 F 2 2
6 FR0000076861 0 0 21
7 FR0000076861 3 3 18
8 FR0000076861 F 1 3
9 FR0000076861 F 2 3
10 FR0000076887 0 0 13
11 FR0000076887 3 3 11
12 FR0000076887 8 8 19
13 FR0000076887 F 2 2
14 FR0000077562 0 0 22
15 FR0000077562 3 3 19
16 FR0000077562 F 2 3
17 FR0000079147 0 0 20
18 FR0000079147 3 3 16
19 FR0000079147 F 1 1
20 FR0000079147 F 2 4
21 FR0004034072 0 0 14
22 FR0004034072 3 3 12
23 FR0004034072 8 8 21
24 FR0004034072 F 2 2
25 FR0004152874 0 0 22
26 FR0004152874 3 3 20
27 FR0004152874 F 1 1
28 FR0004152874 F 2 2
29 FR0004178572 0 0 21
...
Here the combination of column 150 and 39 has a meaning so I wanted to extract count based on the combination, there are 6 possible combinations
150 39
0 0
3 3
4 4
8 8
F 1
F 2
I want to form a final_list which will have count of each of these combination for every value in column '48',
for ex.
'BE0974302342', (150=0, 39=0) record count is 21, (150=3, 39=3) is 19, (150=4, 39=4) is 0, (150=8, 39=8) is 0, (150=F, 39=1) is 0, (150=F,39=2) is 2
so the final record list would be something like
[[BE0974302342,21,19,0,0,0,2],
[FR0000073843,22,20,0,0,0,2],
[FR0000076861,21,18,0,0,1,3]...]
What did I tried: I tried to convert the df in to list and then traverse through each sublist and checked for combination of 150 and 39 values, that partially worked but I wanted to have a better solution which will work perfectly, would appreciate any help or the suggestion for the approach that I should follow to achieve this, thanks in advance.
Use crosstab
with convert DataFrame to list
s:
df1 = pd.crosstab(df[48], [df[150], df[39]])
#alternative solutions
#df1 = df.groupby([48, 150, 39]).size().unstack(level=[1,2], fill_value=0)
#df1 = df.pivot_table(index=48, columns=[150, 39], aggfunc='size', fill_value=0)
print (df1)
150 0 3 8 F
39 0 3 8 1 2
48
BE0974302342 1 1 0 0 1
FR0000073843 1 1 0 0 1
FR0000076861 1 1 0 1 1
FR0000076887 1 1 1 0 1
FR0000077562 1 1 0 0 1
FR0000079147 1 1 0 1 1
FR0004034072 1 1 1 0 1
FR0004152874 1 1 0 1 1
FR0004178572 1 0 0 0 0
L = df1.reset_index().values.tolist()
print (L)
[['BE0974302342', 1, 1, 0, 0, 1],
['FR0000073843', 1, 1, 0, 0, 1],
['FR0000076861', 1, 1, 0, 1, 1],
['FR0000076887', 1, 1, 1, 0, 1],
['FR0000077562', 1, 1, 0, 0, 1],
['FR0000079147', 1, 1, 0, 1, 1],
['FR0004034072', 1, 1, 1, 0, 1],
['FR0004152874', 1, 1, 0, 1, 1],
['FR0004178572', 1, 0, 0, 0, 0]]
And if need combinations convert MultiIndex
in columns to list of tuples:
print (df1.columns.tolist())
[('0', 0), ('3', 3), ('8', 8), ('F', 1), ('F', 2)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.