[英]In the data frame of probabilities over time return first column name where value is < .5 for each row
给定一个 pandas 数据框,如下所示,其中列名是时间,行是每个主题,值是概率返回列名(或时间)第一次概率小于每个主题的 50在数据框中。 概率总是从 1-0 递减。我尝试循环遍历数据帧,但计算效率不高。
科目编号 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | … | 669 | 670 | 671 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 0.997913 | 0.993116 | 0.989017 | 0.976157 | 0.973078 | 0.968056 | 0.963685 | … | 0.156092 | 0.156092 | 0.156092 |
2 | 1 | 0.990335 | 0.988685 | 0.983145 | 0.964912 | 0.958 | 0.952 | 0.946995 | … | 0.148434 | 0.148434 | 0.148434 |
3 | 1 | 0.996231 | 0.990571 | 0.985775 | 0.976809 | 0.972736 | 0.969633 | 0.966116 | … | 0.17037 | 0.17037 | 0.17037 |
4 | 1 | 0.997129 | 0.994417 | 0.991054 | 0.978795 | 0.974216 | 0.96806 | 0.963039 | … | 0.15192 | 0.15192 | 0.15192 |
5 | 1 | 0.997728 | 0.993598 | 0.986641 | 0.98246 | 0.977371 | 0.972874 | 0.96816 | … | 0.154545 | 0.154545 | 0.154545 |
6 | 1 | 0.998134 | 0.995564 | 0.989901 | 0.986941 | 0.982313 | 0.972951 | 0.969645 | … | 0.17473 | 0.17473 | 0.17473 |
7 | 1 | 0.995681 | 0.994131 | 0.990401 | 0.974494 | 0.967941 | 0.961859 | 0.956636 | … | 0.144753 | 0.144753 | 0.144753 |
8 | 1 | 0.997541 | 0.994904 | 0.991941 | 0.983389 | 0.979375 | 0.973158 | 0.966358 | … | 0.158763 | 0.158763 | 0.158763 |
9 | 1 | 0.992253 | 0.989064 | 0.979258 | 0.955747 | 0.948842 | 0.942899 | 0.935784 | … | 0.150291 | 0.150291 | 0.150291 |
目标 Output
科目编号 | 时间概率 <.05 |
---|---|
1 | 100 |
2 | 99 |
3 | 34 |
4 | 19 |
5 | 600 |
6 | 500 |
7 | 222 |
8 | 111 |
9 | 332 |
由于概率总是下降的,你可以这样做:
>>> df.set_index("subject id").gt(.98).sum(1)
subject id
1 4
2 4
3 4
4 4
5 5
6 6
7 4
8 5
9 3
dtype: int64
注意:我使用的是.98
而不是.5
,因为我只使用了一部分数据。
使用的数据
{'subject id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9},
'0': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1},
'1': {0: 0.997913,
1: 0.990335,
2: 0.996231,
3: 0.997129,
4: 0.997728,
5: 0.998134,
6: 0.995681,
7: 0.997541,
8: 0.992253},
'2': {0: 0.993116,
1: 0.988685,
2: 0.990571,
3: 0.994417,
4: 0.993598,
5: 0.995564,
6: 0.994131,
7: 0.994904,
8: 0.989064},
'3': {0: 0.989017,
1: 0.983145,
2: 0.985775,
3: 0.991054,
4: 0.986641,
5: 0.989901,
6: 0.990401,
7: 0.991941,
8: 0.979258},
'4': {0: 0.976157,
1: 0.964912,
2: 0.976809,
3: 0.978795,
4: 0.98246,
5: 0.986941,
6: 0.974494,
7: 0.983389,
8: 0.955747},
'5': {0: 0.973078,
1: 0.958,
2: 0.972736,
3: 0.974216,
4: 0.977371,
5: 0.982313,
6: 0.967941,
7: 0.979375,
8: 0.948842},
'6': {0: 0.968056,
1: 0.952,
2: 0.969633,
3: 0.96806,
4: 0.972874,
5: 0.972951,
6: 0.961859,
7: 0.973158,
8: 0.942899},
'7': {0: 0.963685,
1: 0.946995,
2: 0.966116,
3: 0.963039,
4: 0.96816,
5: 0.969645,
6: 0.956636,
7: 0.966358,
8: 0.935784}}
如果我理解正确,我认为这就是您要寻找的:
df.where(df.lt(.5)).idxmax(axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.