I have a large DataFrame (circa 2500x2500) and I would like to select all values in it that meet a condition (in this specific case, those that are > 50) and then read them into columns
I have got the following code to select those values > 50, however its the bit to turn this into the columns that I am missing
data[(data >= 50)]
A smaller version of my data would be
AAAA BBBB CCCC DDDD EEEE FFFF GGGG HHHH IIII
AAAA 80 4 0 65 17 32 42 93 27
BBBB 4 21 37 256 12 0 1 32 62
CCCC 0 37 0 32 67 34 2 0 26
DDDD 65 256 32 12 8 31 53 61 1
EEEE 17 12 67 8 8 3 74 1 6
FFFF 32 0 34 31 3 23 15 93 23
GGGG 42 1 2 53 74 15 180 123 32
HHHH 93 32 0 61 1 93 123 8 7
IIII 27 62 26 1 6 23 32 7 10
What I would like to get to is a list with column 1 as the index, column 2 as the header and then any values that are greater than 50. This would look as follows;
index Header Value
AAAA AAAA 80
AAAA DDDD 67
AAAA HHHH 93
BBBB DDDD 256
BBBB IIII 62
CCCC EEEE 67
DDDD BBBB 256
DDDD GGGG 53
DDDD HHHH 61
EEEE CCCC 67
EEEE GGGG 74
FFFF HHHH 93
GGGG EEEE 74
GGGG GGGG 180
GGGG HHHH 123
HHHH AAAA 93
HHHH DDDD 61
HHHH FFFF 93
HHHH GGGG 123
IIII BBBB 62
One way of achieving this is using pandas.melt()
. First you need to create an id variable from the index of the DataFrame:
data['index'] = data.index
You can than melt the DataFrame (ie reshape from wide to long format), you specify the id_vars to be the 'index' column:
data_melt = pd.melt(data, id_vars='index')
The data_melt looks like this:
index variable value
0 AAAA AAAA 80
1 BBBB AAAA 4
2 CCCC AAAA 0
3 DDDD AAAA 65
4 EEEE AAAA 17
5 FFFF AAAA 32
The last step is to filter out all rows that have value >= 50
:
data_melt[data_melt['value'] >= 50]
This will give you the desired output:
index variable value
0 AAAA AAAA 80
3 DDDD AAAA 65
7 HHHH AAAA 93
12 DDDD BBBB 256
17 IIII BBBB 62
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.