[英]Pandas duplicating row if condition met, and assigning value
I have a Pandas dataframe like the one below, where column A is a series of string values, and column B maintains a running total of the number of times the value in column A differs from the value of column A in the previous row.我有一个如下所示的 Pandas 数据框,其中A 列是一系列字符串值, B 列维护A 列中的值与前一行A 列中的值不同的次数的运行总和。
A B
1 1
1 1
1b 2
1b 2
1b 2
1 3
Every time there is a change in the value of column A, I would like to duplicate the preceding row and assign it an incremented value of column B. For example, with the input dataframe as above, the output would look like:每次 A 列的值发生变化时,我想复制前一行并为其分配 B 列的递增值。 例如,对于上述输入数据框,输出将如下所示:
A B
1 1
1 1
1 2
1b 2
1b 2
1b 2
1b 3
1 3
Any thoughts about how to go about this in an efficient way?关于如何以有效的方式解决这个问题的任何想法?
Filter last duplicated values by B
, then shifting only B
and assign back, remove last row and last join togehter by concat
with sorting by index:按
B
过滤最后重复的值,然后仅移动B
并分配回,通过按索引排序的concat
删除最后一行和最后一个连接在一起:
df1 = (df[df['B'].ne(df['B'].shift(-1))]
.assign(B = lambda x: x.B.shift(-1)).iloc[:-1].astype({'B':int}))
df = pd.concat([df, df1]).sort_index(ignore_index=True)
print (df)
A B
0 1 1
1 1 1
2 1 2
3 1b 2
4 1b 2
5 1b 2
6 1b 3
7 1 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.