簡體   English   中英

在隨時間變化的概率數據框中返回第一列名稱,其中每行的值 <.5

[英]In the data frame of probabilities over time return first column name where value is < .5 for each row

給定一個 pandas 數據框,如下所示,其中列名是時間,行是每個主題,值是概率返回列名(或時間)第一次概率小於每個主題的 50在數據框中。 概率總是從 1-0 遞減。我嘗試循環遍歷數據幀,但計算效率不高。

科目編號 0 1 2 3 4 5 6 7 669 670 671
1 1 0.997913 0.993116 0.989017 0.976157 0.973078 0.968056 0.963685 0.156092 0.156092 0.156092
2 1 0.990335 0.988685 0.983145 0.964912 0.958 0.952 0.946995 0.148434 0.148434 0.148434
3 1 0.996231 0.990571 0.985775 0.976809 0.972736 0.969633 0.966116 0.17037 0.17037 0.17037
4 1 0.997129 0.994417 0.991054 0.978795 0.974216 0.96806 0.963039 0.15192 0.15192 0.15192
5 1 0.997728 0.993598 0.986641 0.98246 0.977371 0.972874 0.96816 0.154545 0.154545 0.154545
6 1 0.998134 0.995564 0.989901 0.986941 0.982313 0.972951 0.969645 0.17473 0.17473 0.17473
7 1 0.995681 0.994131 0.990401 0.974494 0.967941 0.961859 0.956636 0.144753 0.144753 0.144753
8 1 0.997541 0.994904 0.991941 0.983389 0.979375 0.973158 0.966358 0.158763 0.158763 0.158763
9 1 0.992253 0.989064 0.979258 0.955747 0.948842 0.942899 0.935784 0.150291 0.150291 0.150291

目標 Output

科目編號 時間概率 <.05
1 100
2 99
3 34
4 19
5 600
6 500
7 222
8 111
9 332

由於概率總是下降的,你可以這樣做:

>>> df.set_index("subject id").gt(.98).sum(1)
subject id
1    4
2    4
3    4
4    4
5    5
6    6
7    4
8    5
9    3
dtype: int64

注意:我使用的是.98而不是.5 ,因為我只使用了一部分數據。


使用的數據

{'subject id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9},
 '0': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1},
 '1': {0: 0.997913,
  1: 0.990335,
  2: 0.996231,
  3: 0.997129,
  4: 0.997728,
  5: 0.998134,
  6: 0.995681,
  7: 0.997541,
  8: 0.992253},
 '2': {0: 0.993116,
  1: 0.988685,
  2: 0.990571,
  3: 0.994417,
  4: 0.993598,
  5: 0.995564,
  6: 0.994131,
  7: 0.994904,
  8: 0.989064},
 '3': {0: 0.989017,
  1: 0.983145,
  2: 0.985775,
  3: 0.991054,
  4: 0.986641,
  5: 0.989901,
  6: 0.990401,
  7: 0.991941,
  8: 0.979258},
 '4': {0: 0.976157,
  1: 0.964912,
  2: 0.976809,
  3: 0.978795,
  4: 0.98246,
  5: 0.986941,
  6: 0.974494,
  7: 0.983389,
  8: 0.955747},
 '5': {0: 0.973078,
  1: 0.958,
  2: 0.972736,
  3: 0.974216,
  4: 0.977371,
  5: 0.982313,
  6: 0.967941,
  7: 0.979375,
  8: 0.948842},
 '6': {0: 0.968056,
  1: 0.952,
  2: 0.969633,
  3: 0.96806,
  4: 0.972874,
  5: 0.972951,
  6: 0.961859,
  7: 0.973158,
  8: 0.942899},
 '7': {0: 0.963685,
  1: 0.946995,
  2: 0.966116,
  3: 0.963039,
  4: 0.96816,
  5: 0.969645,
  6: 0.956636,
  7: 0.966358,
  8: 0.935784}}

如果我理解正確,我認為這就是您要尋找的:

df.where(df.lt(.5)).idxmax(axis=1)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM