简体   繁体   English

在数据框中创建一个新列并将 1 添加到该列的前一行

[英]Create a new column in a dataframe and add 1 to the previous row of that column

I am looking to derive a new row from a current row in my dataframe, and add 1 to the previous row to keep a kind of running total我希望从数据框中的当前行派生一个新行,并将 1 添加到前一行以保持某种运行总计

df['Touch_No'] = np.where((df.Time_btween_steps.isnull()) | (df.Time_btween_steps > 30), 1, df.First_touch.shift().add(1))

I basically want to check if the column value is null, if it is then set that to "First Activity"/resets the counter, if not, add 1 to the "previous activity", to give me a running total of the number of outreach we are doing on specific people:我基本上想检查列值是否为空,如果然后将其设置为“第一个活动”/重置计数器,如果不是,则在“上一个活动”中加 1,给我一个运行总数我们正在针对特定人群开展外展活动:

Expected outcome:预期结果:

Time Between Steps | Touch_No
     Null.         |.   1
     0             |.   2
     5.4           |.   3
     6.7           |.   4
     2             |.   5
     null          |.   1
     1             |.   2

Edited according to your clarification:根据您的说明进行编辑:

df = pd.DataFrame(data=np.array(([None, 0, 5.4, 6.7, 2, None, 1],[50,1,2,3,4,35,1])).T, columns=['Time_btween_steps', 'Touch_No'])
mask = pd.isna(df['Time_btween_steps']) | df['Time_btween_steps']>30 
df['Touch_No'][~mask] += 1
df['Touch_No'][mask] = 1

Returns:返回:

  Time_btween_steps Touch_No
0   None    51
1   0       2
2   5.4     3
3   6.7     4
4   2       5
5   None    36
6   1       2

In my opinion a solution like this is much more readable.在我看来,这样的解决方案更具可读性。 We increment by 1 where the condition is not met, and we set the ones where the condition is true to 1. You can combine these into a single line if you wish.我们在不满足条件的地方加 1,我们将条件成立的地方设置为 1。如果你愿意,你可以将它们组合成一行。

Old answer for posterity.子孙后代的旧答案。

Here is a simple solution using pandas apply functionality which takes a function.这是一个使用 Pandas apply功能的简单解决方案,它带有一个函数。

import pandas as pd

df = pd.DataFrame(data=[1,2,3,4,None,5,0],columns=['test'])
df.test.apply(lambda x: 0 if pd.isna(x) else x+1)

Which returns:返回:

0    2.0
1    3.0
2    4.0
3    5.0
4    0.0
5    6.0
6    1.0

Here I wrote the function in place but if you have more complicated logic, such as resetting if the number is something else, etc., you can write a custom function and pass it in instead of the lambda function.在这里我就地写了函数,但是如果你有更复杂的逻辑,比如如果数字是其他东西的重置等,你可以写一个自定义函数并传入它而不是 lambda 函数。 This is not the only way to do it, but if your data frame isn't huge (hundreds of thousands of rows), it should be performant.这不是唯一的方法,但如果您的数据框不是很大(数十万行),它应该是高性能的。 If you don't want a copy but to overwrite the array simply assign it back by prepending:如果您不想要副本而是覆盖数组,只需通过添加将其分配回:

df['test'] = before the last line. df['test'] =在最后一行之前。

If you want the output to be ints, you can also do:如果您希望输出为整数,您还可以执行以下操作:

df['test'].astype(int) but be careful about converting None/Null to int. df['test'].astype(int)但要小心将 None/Null 转换为 int。

Answer using this .使用this回答。 Combo of cumsum() , groupBy() , and cumcount() cumsum()groupBy()cumcount()

df = pd.DataFrame(data=[None, 0, 5.4, 6.7, 2, None, 1], columns=['Time_btween_steps'])
df['Touch_No'] = np.where((df.Time_btween_steps.isnull()), (df.Time_btween_steps > 30), 1)
df['consec'] = df['Touch_No'].groupby((df['Touch_No']==0).cumsum()).cumcount()
df.head(10)

Using np.where, index values with ffill for partitioning and simple rank:使用 np.where,使用 ffill 索引值进行分区和简单排名:

import numpy as np
import pandas as pd

sodf = pd.DataFrame({'time_bw_steps': [None, 0, 5.4, 6.7, 2, None, 1]})
sodf['touch_partition'] = np.where(sodf.time_bw_steps.isna(), sodf.index, np.NaN)
sodf['touch_partition'] = sodf['touch_partition'].fillna(method='ffill')
sodf['touch_no'] = sodf.groupby('touch_partition')['touch_partition'].rank(method='first', ascending='False')
sodf.drop(columns=['touch_partition'], axis='columns', inplace=True)
sodf

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Spark使用上一行的值将新列添加到数据框 - Spark add new column to dataframe with value from previous row 根据上一行的值在熊猫数据框中创建一个新列 - Create a new column in a pandas dataframe based on values found on a previous row Pandas DataFrame:添加具有基于前一行计算值的新列 - Pandas DataFrame: Add new column with calculated values based on previous row 使用上一行用值创建新的Pandas DataFrame列 - Create New Pandas DataFrame Column with Values using Previous Row Pandas Dataframe基于前一行,将值添加到新列,但该列的最大值限于该列 - Pandas Dataframe Add a value to a new Column based on the previous row limited to the maximum value in that column 根据前一行将数据添加到新列 - Add data to new column based on previous row 如何创建一个新的 DataFrame ,其中每一列代表一个实例在前一个 DataFrame 的行中的出现 - How to create a new DataFrame where each column represents occurrence of an instance in a row of a previous DataFrame 如何根据 Pandas dataframe 中上一行的行值创建新列? - How to create a new column based on row value in previous row in Pandas dataframe? 如何从 pandas dataframe 中的当前行中减去前一行以创建一个新列,以每个名称重新启动进程? - How to subtract previous row from current row in a pandas dataframe to create a new column restarting the process with each name? 如果上一列中的对应项在列表中,则将新列添加到 pandas dataframe - Add a new column to a pandas dataframe if the corresponding item in the previous column is in a list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM