简体   繁体   English

Pandas 根据 DataFrame 中的另一列填充 NA 的增量值

[英]Pandas fill incremental values for NA's according to another column in the DataFrame

I have a dataframe with sessions for each user.我有一个 dataframe ,每个用户都有会话。 One of the column is sessions till now.其中一个专栏是迄今为止的会议。 Some of these sessions have null values.其中一些会话具有 null 值。 I believe I could use fillna and transform methods to appropriately fill the dataframe.我相信我可以使用 fillna 和 transform 方法来适当地填充 dataframe。

import pandas as pd

df = pd.DataFrame({'user': [A, A, A, A, A, B, B, B, B, C, C, C, C, C],  'sessions': [28, NaN, NaN, NaN , 32, NaN, NaN,NaN,12, NaN,15, NaN, 17,NaN]})

Expected Output DataFrame:预期 Output DataFrame:

df_out = pd.DataFrame({'user': [A, A, A, A, A, B, B, B, B, C, C, C, C, C],  'sessions': [28, 29, 30, 31 , 32, 9, 10, 11,12, 14,15,16,17,18]})

Tried Code:试过的代码:

df['sessions'] = df['sessions'].fillna(df.groupby('user')['sessions'].transform('mean'))

this works if I were to fill mean and this is as far as I could think.如果我要填补平均数,这是可行的,这是我所能想到的。 Please suggest a few approaches.请提出一些方法。

PS - The starting value of the session is not 1. I am doing it from a snapshot at some point of time. PS - session 的起始值不是 1。我在某个时间点从快照中执行此操作。 I do not have data going back till session number 1 for every user.我没有数据可以追溯到每个用户的 session 编号 1。

Assuming there is no mismatch between the not NaN values, you could do the following:假设非NaN值之间没有不匹配,您可以执行以下操作:

def fun(x):
    _, diff = (~x.reset_index().isna()).idxmax()  # find the absolute position of the first non NaN

    start = x[(~x.isna()).idxmax()] - diff  # find the start value

    result = pd.RangeIndex(start, start + len(x))  # generate range based on first value and length of group

    return pd.Series(data=result.values, index=x.index)  # return series


df['count'] = df.groupby('user').sessions.apply(fun)

print(df)

Output Output

   user  sessions  count
0     A      28.0     28
1     A       NaN     29
2     A       NaN     30
3     A       NaN     31
4     A      32.0     32
5     B       NaN      9
6     B       NaN     10
7     B       NaN     11
8     B      12.0     12
9     C       NaN     14
10    C      15.0     15
11    C       NaN     16
12    C      17.0     17
13    C       NaN     18

The first line of the function fun , is equivalent to: function fun的第一行,相当于:

diff = (~x.reset_index().isna()).idxmax()[1]

Basically find the index position in a normalized (starting from 0) index.基本上在归一化(从 0 开始)索引中找到索引 position。

Use cumsum with fillna(1) for each group:对每个组使用cumsumfillna(1)

df.groupby('user',sort=False)['sessions'].apply(lambda x: x.fillna(1).cumsum()).reset_index()

You may re-construct sessions by using groupby cumcount and first您可以使用 groupby cumcountfirst重新构建sessions

s = df.groupby('user').sessions.cumcount()
s1 = (df.sessions - s).groupby(df.user).transform('first')

df['sessions'] = s1 + s

In [867]: df
Out[867]:
   user  sessions
0     A      28.0
1     A      29.0
2     A      30.0
3     A      31.0
4     A      32.0
5     B       9.0
6     B      10.0
7     B      11.0
8     B      12.0
9     C      14.0
10    C      15.0
11    C      16.0
12    C      17.0
13    C      18.0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用另一个 pandas 数据框的列填充 na 值,但使用列索引,而不是名称 - Fill na values in one pandas dataframe's column using another, but using column indices, not names 通过使用另一个Pandas DataFrame在Pandas DataFrame中填写NA值 - Fill out NA values in Pandas DataFrame by using another Pandas DataFrame 如何根据 pandas dataframe 中的另一列在一列中填充 null 值? - How can I fill null values in one column according to another column in pandas dataframe? Pandas dataframe如何根据列中的模式填写缺失值? - How to fill in missing values in Pandas dataframe according to pattern in column? 如何创建熊猫列并根据另一列中的值填充值 - How to create pandas columns and fill with values according to values in another column 如果其他两个列在Pandas中具有匹配的值,如何用另一个数据框的值填充空列的值? - How to fill empty column values with another dataframe's value if two other columns have matching values in Pandas? 熊猫数据框根据其他列值的范围插入值 - pandas dataframe insert values according to range of another column values 根据 pandas 中的另一个填充 dataframe 中的单元格 - Fill cell within a dataframe according to another in pandas 根据pandas DataFrame中的另一列填充缺失值 - Fill missing values based on another column in a pandas DataFrame 用熊猫数据框中另一列的相同值填充空值 - fill up empty values with same value of another column in pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM