简体   繁体   中英

Creating a python list from data frame based on conditions

I am trying to generate a list from a pandas data frame based on certain conditions on column values in data frame, my df looks something like

        df =      
                       48 150  39   0
        0    BE0974302342   0   0  21
        1    BE0974302342   3   3  19
        2    BE0974302342   F   2   2
        3    FR0000073843   0   0  22
        4    FR0000073843   3   3  20
        5    FR0000073843   F   2   2
        6    FR0000076861   0   0  21
        7    FR0000076861   3   3  18
        8    FR0000076861   F   1   3
        9    FR0000076861   F   2   3
        10   FR0000076887   0   0  13
        11   FR0000076887   3   3  11
        12   FR0000076887   8   8  19
        13   FR0000076887   F   2   2
        14   FR0000077562   0   0  22
        15   FR0000077562   3   3  19
        16   FR0000077562   F   2   3
        17   FR0000079147   0   0  20
        18   FR0000079147   3   3  16
        19   FR0000079147   F   1   1
        20   FR0000079147   F   2   4
        21   FR0004034072   0   0  14
        22   FR0004034072   3   3  12
        23   FR0004034072   8   8  21
        24   FR0004034072   F   2   2
        25   FR0004152874   0   0  22
        26   FR0004152874   3   3  20
        27   FR0004152874   F   1   1
        28   FR0004152874   F   2   2
        29   FR0004178572   0   0  21
        ...

Here the combination of column 150 and 39 has a meaning so I wanted to extract count based on the combination, there are 6 possible combinations

    150 39
    0   0
    3   3
    4   4
    8   8
    F   1
    F   2

I want to form a final_list which will have count of each of these combination for every value in column '48',

for ex.

'BE0974302342', (150=0, 39=0) record count is 21, (150=3, 39=3) is 19, (150=4, 39=4) is 0, (150=8, 39=8) is 0, (150=F, 39=1) is 0, (150=F,39=2) is 2  

so the final record list would be something like

[[BE0974302342,21,19,0,0,0,2], 
[FR0000073843,22,20,0,0,0,2],
[FR0000076861,21,18,0,0,1,3]...]

What did I tried: I tried to convert the df in to list and then traverse through each sublist and checked for combination of 150 and 39 values, that partially worked but I wanted to have a better solution which will work perfectly, would appreciate any help or the suggestion for the approach that I should follow to achieve this, thanks in advance.

Use crosstab with convert DataFrame to list s:

df1 = pd.crosstab(df[48], [df[150], df[39]])
#alternative solutions
#df1 = df.groupby([48, 150, 39]).size().unstack(level=[1,2], fill_value=0)
#df1 = df.pivot_table(index=48, columns=[150, 39], aggfunc='size', fill_value=0)
print (df1)
150           0  3  8  F   
39            0  3  8  1  2
48                         
BE0974302342  1  1  0  0  1
FR0000073843  1  1  0  0  1
FR0000076861  1  1  0  1  1
FR0000076887  1  1  1  0  1
FR0000077562  1  1  0  0  1
FR0000079147  1  1  0  1  1
FR0004034072  1  1  1  0  1
FR0004152874  1  1  0  1  1
FR0004178572  1  0  0  0  0

L = df1.reset_index().values.tolist()
print (L)

[['BE0974302342', 1, 1, 0, 0, 1], 
 ['FR0000073843', 1, 1, 0, 0, 1], 
 ['FR0000076861', 1, 1, 0, 1, 1], 
 ['FR0000076887', 1, 1, 1, 0, 1], 
 ['FR0000077562', 1, 1, 0, 0, 1], 
 ['FR0000079147', 1, 1, 0, 1, 1], 
 ['FR0004034072', 1, 1, 1, 0, 1], 
 ['FR0004152874', 1, 1, 0, 1, 1], 
 ['FR0004178572', 1, 0, 0, 0, 0]]

And if need combinations convert MultiIndex in columns to list of tuples:

print (df1.columns.tolist())
[('0', 0), ('3', 3), ('8', 8), ('F', 1), ('F', 2)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM