![](/img/trans.png)
[英]Collapse identical rows / columns in pandas DataFrame to intervals
[英]In Pandas I have a dataframe where several columns define a configuration. I want to identify the rows with identical configurations
df = pd.DataFrame({'id': [ 101, 102, 103, 104, 105, 106, 107 ],
'color': [ 'blue', 'blue', 'blue', 'red', 'blue', 'red', 'blue' ],
'location': ['there', 'here', 'there', 'here', 'here', 'there', 'here']})
df
輸出[12]:
id color location
0 101 blue there
1 102 blue here
2 103 blue there
3 104 red here
4 105 blue here
5 106 red there
6 107 blue here
我想創建一個按顏色和位置分組的列,如下所示:
id color location group
0 101 blue there A
1 102 blue here B
2 103 blue there A
3 104 red here C
4 105 blue here B
5 106 red there D
6 107 blue here B
看起來像groupby().ngroup()
:
df['group'] = df.groupby(['color','location'], sort=False).ngroup()
Output:
id color location group
0 101 blue there 0
1 102 blue here 1
2 103 blue there 0
3 104 red here 2
4 105 blue here 1
5 106 red there 3
6 107 blue here 1
我會做factorize
df[['color','location']].agg(','.join,1).factorize()[0]
Out[12]: array([0, 1, 0, 2, 1, 3, 1], dtype=int64)
#df['group']=df[['color','location']].agg(','.join,1).factorize()[0]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.