将值分配给Pandas数据框中的行的子集

Question

我想基于Pandas DataFrame中的索引条件分配值。

class test():
    def __init__(self):
        self.l = 1396633637830123000
        self.dfa = pd.DataFrame(np.arange(20).reshape(10,2), columns = ['A', 'B'], index = arange(self.l,self.l+10))
        self.dfb = pd.DataFrame([[self.l+1,self.l+3], [self.l+6,self.l+9]], columns = ['beg', 'end'])

    def update(self):
        self.dfa['true'] = False
        self.dfa['idx'] = np.nan
        for i, beg, end in zip(self.dfb.index, self.dfb['beg'], self.dfb['end']):
            self.dfa.ix[beg:end]['true'] = True
            self.dfa.ix[beg:end]['idx'] = i

    def do(self):
        self.update()
        print self.dfa

t = test()
t.do()

结果：

                      A   B   true  idx
1396633637830123000   0   1  False  NaN
1396633637830123001   2   3   True  NaN
1396633637830123002   4   5   True  NaN
1396633637830123003   6   7   True  NaN
1396633637830123004   8   9  False  NaN
1396633637830123005  10  11  False  NaN
1396633637830123006  12  13   True  NaN
1396633637830123007  14  15   True  NaN
1396633637830123008  16  17   True  NaN
1396633637830123009  18  19   True  NaN

true列已正确分配，而idx列未正确分配。 此外，这似乎取决于如何初始化列，因为如果这样做：

    def update(self):
        self.dfa['true'] = False
        self.dfa['idx'] = False

也无法正确分配true列。

我究竟做错了什么？

ps预期结果是：

                      A   B   true  idx
1396633637830123000   0   1  False  NaN
1396633637830123001   2   3   True  0
1396633637830123002   4   5   True  0
1396633637830123003   6   7   True  0
1396633637830123004   8   9  False  NaN
1396633637830123005  10  11  False  NaN
1396633637830123006  12  13   True  1
1396633637830123007  14  15   True  1
1396633637830123008  16  17   True  1
1396633637830123009  18  19   True  1

编辑：我尝试同时使用loc和iloc进行分配，但它似乎不起作用：loc：

self.dfa.loc[beg:end]['true'] = True
self.dfa.loc[beg:end]['idx'] = i

ILOC：

self.dfa.loc[self.dfa.index.get_loc(beg):self.dfa.index.get_loc(end)]['true'] = True
self.dfa.loc[self.dfa.index.get_loc(beg):self.dfa.index.get_loc(end)]['idx'] = i

Answer 1

您正在对链进行索引，请参阅此处。 警告不保证会发生。

您应该这样做。 无需真正跟踪b，btw中的索引。

In [44]: dfa = pd.DataFrame(np.arange(20).reshape(10,2), columns = ['A', 'B'], index = np.arange(l,l+10))

In [45]: dfb = pd.DataFrame([[l+1,l+3], [l+6,l+9]], columns = ['beg', 'end'])

In [46]: dfa['in_b'] = False

In [47]: for i, s in dfb.iterrows():
   ....:     dfa.loc[s['beg']:s['end'],'in_b'] = True
   ....:

或如果您具有非整数dtype

In [36]: for i, s in dfb.iterrows():
             dfa.loc[(dfa.index>=s['beg']) & (dfa.index<=s['end']),'in_b'] = True


In [48]: dfa
Out[48]: 
                      A   B  in_b
1396633637830123000   0   1  False
1396633637830123001   2   3  True
1396633637830123002   4   5  True
1396633637830123003   6   7  True
1396633637830123004   8   9  False
1396633637830123005  10  11  False
1396633637830123006  12  13  True
1396633637830123007  14  15  True
1396633637830123008  16  17  True
1396633637830123009  18  19  True

[10 rows x 3 columns

如果b很大，则可能不是那样。

顺便说一句，这些看起来像纳秒级。 通过转换它们可以更加友好。

In [49]: pd.to_datetime(dfa.index)
Out[49]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-04-04 17:47:17.830123, ..., 2014-04-04 17:47:17.830123009]
Length: 10, Freq: None, Timezone: None

将值分配给Pandas数据框中的行的子集

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-04-04 18:26:34

将值分配给Pandas数据框中的行的子集

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-04-04 18:26:34

解决方案1
1 已采纳 2014-04-04 18:26:34