简体   繁体   English

使用 Pandas 在 Dataframe 中逐行获取最频繁值的问题

[英]Problem in getting the most frequent value row-wise in a Dataframe with Pandas

I have a following dataframe with the following columns我有以下数据框,其中包含以下列

>>print(df.columns)

Index(['iteration0', 'iteration1', 'iteration2', 'iteration3', 'iteration4',
   'iteration5', 'iteration6', 'iteration7', 'iteration8', 'iteration9',
   'iteration10', 'iteration11', 'iteration12', 'iteration13',
   'iteration14', 'iteration15', 'iteration16', 'iteration17',
   'iteration18', 'iteration19', 'iteration20', 'iteration21',
   'iteration22', 'iteration23', 'iteration24', 'iteration25',
   'iteration26', 'iteration27', 'iteration28', 'iteration29',
   'iteration30', 'iteration31', 'iteration32', 'iteration33',
   'iteration34', 'iteration35', 'iteration36', 'iteration37',
   'iteration38', 'iteration39', 'iteration40', 'iteration41',
   'iteration42', 'iteration43', 'iteration44', 'iteration45',
   'iteration46', 'iteration47', 'iteration48', 'iteration49',
   'iteration50', 'iteration51', 'iteration52', 'iteration53',
   'iteration54', 'iteration55', 'iteration56', 'iteration57',
   'iteration58', 'iteration59', 'iteration60', 'iteration61',
   'iteration62', 'iteration63', 'iteration64', 'iteration65',
   'iteration66', 'iteration67', 'iteration68', 'iteration69',
   'iteration70', 'iteration71', 'iteration72', 'iteration73',
   'iteration74', 'iteration75', 'iteration76', 'iteration77',
   'iteration78', 'iteration79', 'iteration80', 'iteration81',
   'iteration82', 'iteration83', 'iteration84', 'iteration85',
   'iteration86', 'iteration87', 'iteration88', 'iteration89',
   'iteration90', 'iteration91', 'iteration92', 'iteration93',
   'iteration94', 'iteration95', 'iteration96', 'iteration97',
   'iteration98', 'iteration99'],
  dtype='object')

I also have an index for each line of the Dataframe, which is the date我也有 Dataframe 每一行的索引,也就是日期

print(df.index)
Index(['05/12/2009', '05/13/2009', '05/14/2009', '05/15/2009', '05/18/2009',
   '05/19/2009', '05/20/2009', '05/21/2009', '05/22/2009', '05/25/2009',
   ...
   '10/23/2009', '10/26/2009', '10/27/2009', '10/28/2009', '10/29/2009',
   '10/30/2009', '11/02/2009', '11/03/2009', '11/04/2009', '11/05/2009'],
  dtype='object', name='Date', length=127)

Therefore, I have a dataFrame with 127 lines and 100 columns.因此,我有一个包含 127 行和 100 列的数据框。 Each value in this dataset assumes 0, 1 or 2.此数据集中的每个值都假定为 0、1 或 2。

What I want to do is simply getting the mode of each line, getting the most frequent value of each Date.我想要做的只是简单地获取每一行的模式,获取每个日期的最频繁值。 Here is what I did:这是我所做的:

most_frequent=df.mode(axis=1)

Then, I will return a new dataframe, containing the mode of each line然后,我将返回一个新的数据框,其中包含每一行的模式

local_df['ensemble'] = most_frequent 

But when I run the code, here is my error:但是当我运行代码时,这是我的错误:

 File "/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py", line 3370, in __setitem__
    self._set_item(key, value)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py", line 3446, in _set_item
    NDFrame._set_item(self, key, value)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py", line 3172, in _set_item
    self._data.set(key, value)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/internals/managers.py", line 1056, in set
    self.insert(len(self.items), item, value)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/internals/managers.py", line 1158, in insert
    placement=slice(loc, loc + 1))
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/internals/blocks.py", line 3095, in make_block
    return klass(values, ndim=ndim, placement=placement)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/internals/blocks.py", line 87, in __init__
    '{mgr}'.format(val=len(self.values), mgr=len(self.mgr_locs)))
ValueError: Wrong number of items passed 2, placement implies 1

By printing the most_frequent dataFrame, I have the very weird behavior通过打印 most_frequent 数据帧,我有非常奇怪的行为

09/25/2009  0.0 NaN
09/28/2009  0.0 NaN
09/29/2009  0.0 NaN
09/30/2009  1.0 NaN
10/01/2009  0.0 NaN
10/02/2009  0.0 NaN
10/05/2009  0.0 NaN
10/06/2009  1.0 NaN
10/07/2009  0.0 NaN
10/08/2009  0.0 NaN
10/09/2009  0.0 NaN
10/12/2009  0.0 NaN
10/13/2009  1.0 NaN
10/14/2009  0.0 NaN
10/15/2009  0.0 NaN
10/16/2009  0.0 NaN
10/19/2009  0.0 NaN
10/20/2009  0.0 NaN
10/21/2009  0.0 NaN
10/22/2009  0.0 NaN
10/23/2009  0.0 NaN
10/26/2009  0.0 NaN
10/27/2009  0.0 NaN

In other words, there is a new column as result.换句话说,结果是有一个新列。

I dont know if its what caused the problem.我不知道它是否导致了问题。 Anyway, what was my mistake here?无论如何,我在这里犯了什么错误?

There is no mistake, mode method return sometimes more like 1 value, here per row.没有错误,模式方法有时返回更像是 1 个值,这里是每行。

So try select first column by position with DataFrame.iloc :所以尝试使用DataFrame.iloc按位置选择第一列:

local_df['ensemble'] = df.mode(axis=1).iloc[:, 0]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM