簡體   English   中英

Pandas dataframe - 根據組的條件更新某些行的值object

[英]Pandas dataframe - update the values of certain rows based on the condition of a groupby object

使用以下數據:

idx_a = pd.date_range(start="2000-01-01 00:00:00", periods=5, freq="H")
idx_b = pd.date_range(start="2000-01-01 00:05:00", periods=5, freq="H")
idx_c = pd.date_range(start="2000-01-02 00:00:00", periods=5, freq="H")
idx_d = pd.date_range(start="2000-01-02 00:05:00", periods=5, freq="H")

df = pd.DataFrame({'article: ['a', 'b']*10 , 'view': range(1,21) }, index= idx_a.union(idx_b).union(idx_c).union(idx_d))
                     article    view
2000-01-01 00:00:00        a    1
2000-01-01 00:05:00        b    2
2000-01-01 01:00:00        a    3
2000-01-01 01:05:00        b    4
2000-01-01 02:00:00        a    5
2000-01-01 02:05:00        b    6
2000-01-01 03:00:00        a    7
2000-01-01 03:05:00        b    8
2000-01-01 04:00:00        a    9
2000-01-01 04:05:00        b    10
2000-01-02 00:00:00        a    11
2000-01-02 00:05:00        b    12
2000-01-02 01:00:00        a    13
2000-01-02 01:05:00        b    14
2000-01-02 02:00:00        a    15
2000-01-02 02:05:00        b    16
2000-01-02 03:00:00        a    17
2000-01-02 03:05:00        b    18
2000-01-02 04:00:00        a    19
2000-01-02 04:05:00        b    20

我想每天只更新凌晨 2 點的view值和凌晨 3 點的每篇文章的view值。 因此,所需的結果應如下所示(“<==”表示行已更新):

                     article    view
2000-01-01 00:00:00        a    1
2000-01-01 00:05:00        b    2
2000-01-01 01:00:00        a    3
2000-01-01 01:05:00        b    4
2000-01-01 02:00:00        a    7   <===
2000-01-01 02:05:00        b    8   <===
2000-01-01 03:00:00        a    7
2000-01-01 03:05:00        b    8
2000-01-01 04:00:00        a    9
2000-01-01 04:05:00        b    10
2000-01-02 00:00:00        a    11
2000-01-02 00:05:00        b    12
2000-01-02 01:00:00        a    13
2000-01-02 01:05:00        b    14
2000-01-02 02:00:00        a    17  <===
2000-01-02 02:05:00        b    18  <===
2000-01-02 03:00:00        a    17
2000-01-02 03:05:00        b    18
2000-01-02 04:00:00        a    19
2000-01-02 04:05:00        b    20

經過多次嘗試,我得到了這段代碼最接近的結果:

df.groupby([pd.Grouper(freq="D"), 'article']).view.transform(lambda s: s.where( ~(s.index.hour==2), s[s.index.hour==3]))

但是, np.nan顯示在我希望更新的單元格中。 有趣的是,如果我將s[s.index.hour==3]替換為 integer,則單元格(即 2am 值)將正確更新為 integer。 如何獲取每篇文章凌晨 3 點的價值並使用它來更新給定日期凌晨 2 點的價值?

如果僅在hour hour過濾行,然后使用DataFrame.update

df1 = df[df.index.hour==3].rename(lambda x: x.replace(hour=2))
print (df1)                   
                    article  view
2000-01-01 02:00:00       a     7
2000-01-01 02:05:00       b     8
2000-01-02 02:00:00       a    17
2000-01-02 02:05:00       b    18

df.update(df1)
print (df)
                    article  view
2000-01-01 00:00:00       a   1.0
2000-01-01 00:05:00       b   2.0
2000-01-01 01:00:00       a   3.0
2000-01-01 01:05:00       b   4.0
2000-01-01 02:00:00       a   7.0
2000-01-01 02:05:00       b   8.0
2000-01-01 03:00:00       a   7.0
2000-01-01 03:05:00       b   8.0
2000-01-01 04:00:00       a   9.0
2000-01-01 04:05:00       b  10.0
2000-01-02 00:00:00       a  11.0
2000-01-02 00:05:00       b  12.0
2000-01-02 01:00:00       a  13.0
2000-01-02 01:05:00       b  14.0
2000-01-02 02:00:00       a  17.0
2000-01-02 02:05:00       b  18.0
2000-01-02 03:00:00       a  17.0
2000-01-02 03:05:00       b  18.0
2000-01-02 04:00:00       a  19.0
2000-01-02 04:05:00       b  20.0

您的解決方案應該通過將值轉換為列表來更改,也可以使用反轉掩碼!=

df['view1'] = (df.groupby([pd.Grouper(freq="D"), 'article']).view
                .transform(lambda s: s.where(s.index.hour!=2,s[s.index.hour==3].tolist())))
print (df)
                    article  view  view1
2000-01-01 00:00:00       a     1      1
2000-01-01 00:05:00       b     2      2
2000-01-01 01:00:00       a     3      3
2000-01-01 01:05:00       b     4      4
2000-01-01 02:00:00       a     5      7
2000-01-01 02:05:00       b     6      8
2000-01-01 03:00:00       a     7      7
2000-01-01 03:05:00       b     8      8
2000-01-01 04:00:00       a     9      9
2000-01-01 04:05:00       b    10     10
2000-01-02 00:00:00       a    11     11
2000-01-02 00:05:00       b    12     12
2000-01-02 01:00:00       a    13     13
2000-01-02 01:05:00       b    14     14
2000-01-02 02:00:00       a    15     17
2000-01-02 02:05:00       b    16     18
2000-01-02 03:00:00       a    17     17
2000-01-02 03:05:00       b    18     18
2000-01-02 04:00:00       a    19     19
2000-01-02 04:05:00       b    20     20

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM