[英]Create a new column in a dataframe and add 1 to the previous row of that column
I am looking to derive a new row from a current row in my dataframe, and add 1 to the previous row to keep a kind of running total我希望从数据框中的当前行派生一个新行,并将 1 添加到前一行以保持某种运行总计
df['Touch_No'] = np.where((df.Time_btween_steps.isnull()) | (df.Time_btween_steps > 30), 1, df.First_touch.shift().add(1))
I basically want to check if the column value is null, if it is then set that to "First Activity"/resets the counter, if not, add 1 to the "previous activity", to give me a running total of the number of outreach we are doing on specific people:我基本上想检查列值是否为空,如果然后将其设置为“第一个活动”/重置计数器,如果不是,则在“上一个活动”中加 1,给我一个运行总数我们正在针对特定人群开展外展活动:
Expected outcome:预期结果:
Time Between Steps | Touch_No
Null. |. 1
0 |. 2
5.4 |. 3
6.7 |. 4
2 |. 5
null |. 1
1 |. 2
df = pd.DataFrame(data=np.array(([None, 0, 5.4, 6.7, 2, None, 1],[50,1,2,3,4,35,1])).T, columns=['Time_btween_steps', 'Touch_No'])
mask = pd.isna(df['Time_btween_steps']) | df['Time_btween_steps']>30
df['Touch_No'][~mask] += 1
df['Touch_No'][mask] = 1
Returns:返回:
Time_btween_steps Touch_No
0 None 51
1 0 2
2 5.4 3
3 6.7 4
4 2 5
5 None 36
6 1 2
In my opinion a solution like this is much more readable.在我看来,这样的解决方案更具可读性。 We increment by 1 where the condition is not met, and we set the ones where the condition is true to 1. You can combine these into a single line if you wish.
我们在不满足条件的地方加 1,我们将条件成立的地方设置为 1。如果你愿意,你可以将它们组合成一行。
Here is a simple solution using pandas apply
functionality which takes a function.这是一个使用 Pandas
apply
功能的简单解决方案,它带有一个函数。
import pandas as pd
df = pd.DataFrame(data=[1,2,3,4,None,5,0],columns=['test'])
df.test.apply(lambda x: 0 if pd.isna(x) else x+1)
Which returns:返回:
0 2.0
1 3.0
2 4.0
3 5.0
4 0.0
5 6.0
6 1.0
Here I wrote the function in place but if you have more complicated logic, such as resetting if the number is something else, etc., you can write a custom function and pass it in instead of the lambda function.在这里我就地写了函数,但是如果你有更复杂的逻辑,比如如果数字是其他东西的重置等,你可以写一个自定义函数并传入它而不是 lambda 函数。 This is not the only way to do it, but if your data frame isn't huge (hundreds of thousands of rows), it should be performant.
这不是唯一的方法,但如果您的数据框不是很大(数十万行),它应该是高性能的。 If you don't want a copy but to overwrite the array simply assign it back by prepending:
如果您不想要副本而是覆盖数组,只需通过添加将其分配回:
df['test'] =
before the last line. df['test'] =
在最后一行之前。
If you want the output to be ints, you can also do:如果您希望输出为整数,您还可以执行以下操作:
df['test'].astype(int)
but be careful about converting None/Null to int. df['test'].astype(int)
但要小心将 None/Null 转换为 int。
Answer using this .使用this回答。 Combo of
cumsum()
, groupBy()
, and cumcount()
cumsum()
、 groupBy()
和cumcount()
df = pd.DataFrame(data=[None, 0, 5.4, 6.7, 2, None, 1], columns=['Time_btween_steps'])
df['Touch_No'] = np.where((df.Time_btween_steps.isnull()), (df.Time_btween_steps > 30), 1)
df['consec'] = df['Touch_No'].groupby((df['Touch_No']==0).cumsum()).cumcount()
df.head(10)
Using np.where, index values with ffill for partitioning and simple rank:使用 np.where,使用 ffill 索引值进行分区和简单排名:
import numpy as np
import pandas as pd
sodf = pd.DataFrame({'time_bw_steps': [None, 0, 5.4, 6.7, 2, None, 1]})
sodf['touch_partition'] = np.where(sodf.time_bw_steps.isna(), sodf.index, np.NaN)
sodf['touch_partition'] = sodf['touch_partition'].fillna(method='ffill')
sodf['touch_no'] = sodf.groupby('touch_partition')['touch_partition'].rank(method='first', ascending='False')
sodf.drop(columns=['touch_partition'], axis='columns', inplace=True)
sodf
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.