計算跨列的數據幀中的null / NaN值

Question

我正在嘗試計算數據幀列中每行的唯一值的數量。

這是當前的數據幀：

[in] df
[out] 
         PID         CID      PPID        PPPID       PPPPID        PPPPPID
    0   2015-01-02   456      2014-01-02  2014-01-02  2014-01-02    2014-01-02
    1   2015-02-02   500      2014-02-02  2013-02-02  2012-02-02    2012-02-10  
    2   2010-12-04   300      2010-12-04  2010-12-04  2010-12-04    2010-12-04

除CID（contract_ID）之外的所有列都是日期時間。 我想在數據框中添加另一列，計算每行中唯一日期時間的數量（以便找出“鏈”中有多少合約）。

我嘗試過.sum() .count()和.sum()方法的不同實現，但不能讓它們逐行工作（輸出是所有具有相同值的行）。

例：

df_merged['COUNT'] = df_merged2.count(axis=1)

當我希望每行不同時，用'6'填充整個'COUNT'列。

刪除axis=1參數會使整個列'NaN'

Answer 1

您需要apply(your_func, axis=1)才能逐行工作。

df

Out[19]: 
          PID  CID        PPID       PPPID      PPPPID     PPPPPID
0  2015-01-02  456  2014-01-02  2014-01-02  2014-01-02  2014-01-02
1  2015-02-02  500  2014-02-02  2013-02-02  2012-02-02  2012-02-10
2  2010-12-04  300  2010-12-04  2010-12-04  2010-12-04  2010-12-04



df['counts'] = df.drop('CID', axis=1).apply(lambda row: len(pd.unique(row)), axis=1)

Out[20]: 
          PID  CID        PPID       PPPID      PPPPID     PPPPPID  counts
0  2015-01-02  456  2014-01-02  2014-01-02  2014-01-02  2014-01-02       2
1  2015-02-02  500  2014-02-02  2013-02-02  2012-02-02  2012-02-10       5
2  2010-12-04  300  2010-12-04  2010-12-04  2010-12-04  2010-12-04       1

[3 rows x 7 columns]

Answer 2

另一種方法是在df的轉置上調用unique ：

In [26]:    
df['counts'] = df.drop('CID', axis=1).T.apply(lambda x: len(pd.Series.unique(x)))
df

Out[26]:
          PID  CID        PPID       PPPID      PPPPID     PPPPPID  counts
0  2015-01-02  456  2014-01-02  2014-01-02  2014-01-02  2014-01-02       2
1  2015-02-02  500  2014-02-02  2013-02-02  2012-02-02  2012-02-10       5
2  2010-12-04  300  2010-12-04  2010-12-04  2010-12-04  2010-12-04       1

Answer 3

您可以直接在nunique上使用DataFrame 。 這是從pd.__version__ == u'0.20.0'開始。

In [169]: df['counts'] = df.drop('CID', axis=1).nunique(axis=1)

In [170]: df
Out[170]:
          PID  CID        PPID       PPPID      PPPPID     PPPPPID  counts
0  2015-01-02  456  2014-01-02  2014-01-02  2014-01-02  2014-01-02       2
1  2015-02-02  500  2014-02-02  2013-02-02  2012-02-02  2012-02-10       5
2  2010-12-04  300  2010-12-04  2010-12-04  2010-12-04  2010-12-04       1

計算跨列的數據幀中的null / NaN值

問題描述

3 個解決方案

解決方案1
2 已采納 2015-07-06 07:08:50

解決方案2
1 2015-07-06 08:15:42

解決方案3
1 2017-08-10 18:55:45

計算跨列的數據幀中的null / NaN值

問題描述

3 個解決方案

解決方案1 2 已采納 2015-07-06 07:08:50

解決方案2 1 2015-07-06 08:15:42

解決方案3 1 2017-08-10 18:55:45

解決方案1
2 已采納 2015-07-06 07:08:50

解決方案2
1 2015-07-06 08:15:42

解決方案3
1 2017-08-10 18:55:45