简体   繁体   English

Python - 为重复行分配唯一标识符

[英]Python - assign unique identifier for repeated rows

Using the repeat function I repeat a row but want to be able to differentiate between them with a unique indentifier such as 'Instance.'使用重复 function 我重复一行,但希望能够使用唯一标识符(例如“实例”)来区分它们。

Currently the code is as follows目前代码如下

import pandas as pd
import numpy as np
table = pd.DataFrame(
    {'ID':['a','b'], 'freq': [2, 3], '2007' : [1000, 1500], '2008': [0,2000], '2009': [2000,3000]})

Output Output

   ID  freq 2007    2008    2009
0   a   2   1000    0       2000
1   b   3   1500    2000    3000

I then look to replicate the row for when freq is greater than 2然后我希望在 freq 大于 2 时复制该行

rep = [val-1 if val>2 else 1 for val in table.freq]
table.loc[np.repeat(table.index.values, rep)]

Output Output


   ID  freq 2007    2008    2009
0   a   2   1000    0       2000
1   b   3   1500    2000    3000
1   b   3   1500    2000    3000

Desired output所需 output

   ID  Instance  freq   2007    2008    2009
0   a      1       2    1000    0       2000
1   b      1       3    1500    2000    3000
1   b      2       3    1500    2000    3000 

Any suggestions on an efficient approach to take?关于采取有效方法的任何建议?

You can try this after doing a group by for all columns您可以在对所有列进行分组后尝试此操作

import pandas as pd
import numpy as np
 
table = pd.DataFrame(
    {
        "ID": ["a", "b"],
        "freq": [2, 3],
        "2007": [1000, 1500],
        "2008": [0, 2000],
        "2009": [2000, 3000],
    }
)
 
 
rep = [val - 1 if val > 2 else 1 for val in table.freq]
res = table.loc[np.repeat(table.index.values, rep)]
 
 
res["Instance"] = res.groupby([*res], sort=False).ngroup().add(1)
 
print(res)
  ID  freq  2007  2008  2009  Instance
0  a     2  1000     0  2000         1
1  b     3  1500  2000  3000         2
1  b     3  1500  2000  3000         2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM