[英]Python - assign unique identifier for repeated rows
Using the repeat function I repeat a row but want to be able to differentiate between them with a unique indentifier such as 'Instance.'使用重复 function 我重复一行,但希望能够使用唯一标识符(例如“实例”)来区分它们。
Currently the code is as follows目前代码如下
import pandas as pd
import numpy as np
table = pd.DataFrame(
{'ID':['a','b'], 'freq': [2, 3], '2007' : [1000, 1500], '2008': [0,2000], '2009': [2000,3000]})
Output Output
ID freq 2007 2008 2009
0 a 2 1000 0 2000
1 b 3 1500 2000 3000
I then look to replicate the row for when freq is greater than 2然后我希望在 freq 大于 2 时复制该行
rep = [val-1 if val>2 else 1 for val in table.freq]
table.loc[np.repeat(table.index.values, rep)]
Output Output
ID freq 2007 2008 2009
0 a 2 1000 0 2000
1 b 3 1500 2000 3000
1 b 3 1500 2000 3000
Desired output所需 output
ID Instance freq 2007 2008 2009
0 a 1 2 1000 0 2000
1 b 1 3 1500 2000 3000
1 b 2 3 1500 2000 3000
Any suggestions on an efficient approach to take?关于采取有效方法的任何建议?
You can try this after doing a group by for all columns您可以在对所有列进行分组后尝试此操作
import pandas as pd
import numpy as np
table = pd.DataFrame(
{
"ID": ["a", "b"],
"freq": [2, 3],
"2007": [1000, 1500],
"2008": [0, 2000],
"2009": [2000, 3000],
}
)
rep = [val - 1 if val > 2 else 1 for val in table.freq]
res = table.loc[np.repeat(table.index.values, rep)]
res["Instance"] = res.groupby([*res], sort=False).ngroup().add(1)
print(res)
ID freq 2007 2008 2009 Instance
0 a 2 1000 0 2000 1
1 b 3 1500 2000 3000 2
1 b 3 1500 2000 3000 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.