简体   繁体   English

Graphlab Sframe-如何在groupby中保留所有列

[英]Graphlab Sframes - How to retain all columns in groupby

I have a sframe where I want to do a groupby with some operator on a column. 我有一个sframe,我想在列上使用一些运算符进行分组。 But, this returns an sframe only with key columns specified. 但是,这仅返回具有指定键列的sframe。 How can I do the operation on some columns, but keep all the columns nonetheless? 如何在某些列上执行操作,但是仍然保留所有列?

To the best of my understanding from your question, you want to do operations on column without loosing their initial state. 根据您的问题,就我所知,您希望在不损失其初始状态的情况下对列进行操作。 The below example may illustrate. 下面的示例可以说明。 Suppose we have a movie dataset as SFrame sf :- 假设我们有一个电影数据集,如SFrame sf:

movieId    userId    actors    rating
102        10        A,B,C      5
204        8         B,C,D      4
333        3         K,L,M      3
204        11        P,Q,R      1
423        3         K,B,C      4    
533        31        K,A,C      2    
633        3         P,L,A      3
.
.
...

In the above SFrame, user 3 gave multiple rating, so you may work on user's rating mean as 在上面的SFrame中,用户3给出了多个评分,因此您可以按以下方式处理用户的评分

 rating_stats = sf.groupby(key_columns='userId',operations {'mean_rating': agg.MEAN('rating')})

Then, you may like to add the found column in SFrame without affecting already present columns, ie you can retain SFrame. 然后,您可能希望在SFrame中添加找到的列而不影响已经存在的列,即可以保留SFrame。

sf['mean_rating'] = rating_stats['mean_rating']

You will find that sf is not affected and you added a new column. 您会发现sf不受影响,并添加了新列。

Now answer to your question can be, if you are using groupby() method, its better to have a separate SFrame where you are specific to the operation, and you may further use or add to the original SFrame, or maybe merge rest of columns to your found SFrame using groupby() method or you can also use join on found SFrame, but its not a good practice to keep changing original SFrame to operate. 现在,如果您使用的是groupby()方法,则可以得到一个更好的答案,那就是最好有一个单独的SFrame专门用于该操作,并且可以进一步使用或添加到原始SFrame中,或者合并其余的列使用groupby()方法添加到找到的SFrame上,或者也可以在找到的SFrame上使用join ,但是保持更改原始SFrame的操作不是一个好习惯。

Also, note that for multiple entities in a column like in actors in SFrame, method that can make things easy is using stack method before using groupby() to operate on data. 另外请注意,多个实体的列就像actors在SFrame,方法,可以让事情用简单的stack使用前法groupby()对数据进行操作。 I hope that helps. 希望对您有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM