简体   繁体   中英

Conditionally converting a matrix into columns using Python

I have a large DataFrame (circa 2500x2500) and I would like to select all values in it that meet a condition (in this specific case, those that are > 50) and then read them into columns

I have got the following code to select those values > 50, however its the bit to turn this into the columns that I am missing

data[(data >= 50)]

A smaller version of my data would be

     AAAA  BBBB  CCCC  DDDD  EEEE  FFFF  GGGG  HHHH IIII
AAAA 80    4     0     65    17    32    42    93   27
BBBB 4     21    37    256   12    0     1     32   62
CCCC 0     37    0     32    67    34    2     0    26
DDDD 65    256   32    12    8     31    53    61   1
EEEE 17    12    67    8     8     3     74    1    6
FFFF 32    0     34    31    3     23    15    93   23
GGGG 42    1     2     53    74    15    180   123  32
HHHH 93    32    0     61    1     93    123   8    7
IIII 27    62    26    1     6     23    32    7    10

What I would like to get to is a list with column 1 as the index, column 2 as the header and then any values that are greater than 50. This would look as follows;

index   Header  Value
AAAA    AAAA    80
AAAA    DDDD    67
AAAA    HHHH    93
BBBB    DDDD    256
BBBB    IIII    62
CCCC    EEEE    67
DDDD    BBBB    256
DDDD    GGGG    53
DDDD    HHHH    61
EEEE    CCCC    67
EEEE    GGGG    74
FFFF    HHHH    93
GGGG    EEEE    74
GGGG    GGGG    180
GGGG    HHHH    123
HHHH    AAAA    93
HHHH    DDDD    61
HHHH    FFFF    93
HHHH    GGGG    123
IIII    BBBB    62

One way of achieving this is using pandas.melt() . First you need to create an id variable from the index of the DataFrame:

data['index'] = data.index

You can than melt the DataFrame (ie reshape from wide to long format), you specify the id_vars to be the 'index' column:

data_melt = pd.melt(data, id_vars='index')

The data_melt looks like this:

    index   variable    value
0   AAAA    AAAA    80
1   BBBB    AAAA    4
2   CCCC    AAAA    0
3   DDDD    AAAA    65
4   EEEE    AAAA    17
5   FFFF    AAAA    32

The last step is to filter out all rows that have value >= 50 :

data_melt[data_melt['value'] >= 50]

This will give you the desired output:

    index   variable    value
0   AAAA    AAAA    80
3   DDDD    AAAA    65
7   HHHH    AAAA    93
12  DDDD    BBBB    256
17  IIII    BBBB    62

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM