如何用唯一的行（Python）表示每个用户？

Question

I have data like this:我有这样的数据：

UserId  Date    Part_of_day    Apps         Category   Frequency      Duration_ToT
1   2020-09-10  evening    Settings     System tool        1          3.436
1   2020-09-11  afternoon   Calendar    Calendar           5          9.965
1   2020-09-11  afternoon   Contacts    Phone_and_SMS      7          2.606
2   2020-09-11  afternoon   Facebook    Social             15         50.799
2   2020-09-11  afternoon   clock       System tool        2          5.223
3   2020-11-18  morning    Contacts    Phone_and_SMS       3          1.726
3   2020-11-18  morning     Google    Productivity         1          4.147
3   2020-11-18  morning    Instagram    Social             1          0.501
.......................................
67  2020-11-18  morning    Truecaller   Communication     1          1.246
67  2020-11-18  night      Instagram    Social            3          58.02

I'am trying to reduce the diemnsionnality of my dataframe to set the entries for k-means.我正在尝试减少 dataframe 的维度来设置 k-means 的条目。 I'd like to ask it's possible to represent each user by one row?我想问可以用一行来代表每个用户吗？ what do you think to Embedding?你怎么看嵌入？ How can i do please.请问我该怎么做。 I can't find any solution我找不到任何解决方案

Answer 1

This depends on how you want to aggregate the values.这取决于您希望如何聚合这些值。 Here is a small example how to do it with groupby and agg .这是一个如何使用groupby和agg的小示例。

First I create some sample data.首先，我创建一些示例数据。

import pandas as pd
import random

df = pd.DataFrame({
   "id":   [int(i/3) for i in range(20)], 
   "val1": [random.random() for _ in range(20)], 
   "val2": [str(int(random.random()*100)) for _ in range(20)]
})
>>> df.head()
   id      val1 val2
0   0  0.174553   49
1   0  0.724547   95
2   0  0.369883    3
3   1  0.243191   64
4   1  0.575982   16
>>> df.dtypes
id        int64
val1    float64
val2     object
dtype: object

Then we group by the id and aggregate the values according to the functions you specify in the dictionary you pass to agg .然后我们按 id 分组并根据您在传递给agg的字典中指定的函数聚合值。 In this example I sum up the float values and join the strings with an underscore separator.在此示例中，我总结了浮点值并使用下划线分隔符连接字符串。 You could eg also pass the list function to store the values in a list.例如，您还可以传递列表 function 以将值存储在列表中。

>>> df.groupby("id").agg({"val1": sum, "val2": "__".join})
        val1        val2
id
0   1.268984   49__95__3
1   0.856992  64__16__54
2   2.186370  30__59__21
3   1.486925  29__47__77
4   1.523898  19__78__99
5   0.855413  59__74__73
6   0.201787      63__33

EDIT regarding the comment "But how can we make val2 contain the top 5 applications according to the duration of the application?":编辑关于评论“但是我们如何使 val2 根据应用程序的持续时间包含前 5 个应用程序？”：

The agg method is restricted in the sense that you cannot access other attributes while aggregating. agg方法在聚合时无法访问其他属性的意义上受到限制。 To do that you should use the apply method.为此，您应该使用apply方法。 You pass it a function, that processes the whole group and returns a row as Series object.你传递给它一个 function，它处理整个组并返回一行作为 object Series 。

In this example I still use the sum for val1, but for val2 I return the val2 of the row with the highest val1.在此示例中，我仍然使用 val1 的总和，但对于 val2，我返回具有最高 val1 的行的 val2。 This should make clear how to make the aggregation depend on other attributes.这应该清楚如何使聚合依赖于其他属性。

def apply_func(group):
   return pd.Series({
      "id": group["id"].iat[0], 
      "val1": group["val1"].sum(), 
      "val2": group["val2"].iat[group["val1"].argmax()]
   })

>>> df.groupby("id").apply(apply_func)
    id      val1 val2
id
0    0  1.749955   95
1    1  0.344372   65
2    2  2.019035   70
3    3  2.444691   36
4    4  2.573576   92
5    5  1.453769   72
6    6  1.811516   94

如何用唯一的行（Python）表示每个用户？

问题描述

1 个解决方案

解决方案1
1 2021-01-28 08:49:36

如何用唯一的行（Python）表示每个用户？

问题描述

1 个解决方案

解决方案1 1 2021-01-28 08:49:36

解决方案1
1 2021-01-28 08:49:36