简体   繁体   English

如何用唯一的行(Python)表示每个用户?

[英]How to represent each user by a unique row (Python)?

I have data like this:我有这样的数据:

UserId  Date    Part_of_day    Apps         Category   Frequency      Duration_ToT
1   2020-09-10  evening    Settings     System tool        1          3.436
1   2020-09-11  afternoon   Calendar    Calendar           5          9.965
1   2020-09-11  afternoon   Contacts    Phone_and_SMS      7          2.606
2   2020-09-11  afternoon   Facebook    Social             15         50.799
2   2020-09-11  afternoon   clock       System tool        2          5.223
3   2020-11-18  morning    Contacts    Phone_and_SMS       3          1.726
3   2020-11-18  morning     Google    Productivity         1          4.147
3   2020-11-18  morning    Instagram    Social             1          0.501
.......................................
67  2020-11-18  morning    Truecaller   Communication     1          1.246
67  2020-11-18  night      Instagram    Social            3          58.02

I'am trying to reduce the diemnsionnality of my dataframe to set the entries for k-means.我正在尝试减少 dataframe 的维度来设置 k-means 的条目。 I'd like to ask it's possible to represent each user by one row?我想问可以用一行来代表每个用户吗? what do you think to Embedding?你怎么看嵌入? How can i do please.请问我该怎么做。 I can't find any solution我找不到任何解决方案

This depends on how you want to aggregate the values.这取决于您希望如何聚合这些值。 Here is a small example how to do it with groupby and agg .这是一个如何使用groupbyagg的小示例。

First I create some sample data.首先,我创建一些示例数据。

import pandas as pd
import random

df = pd.DataFrame({
   "id":   [int(i/3) for i in range(20)], 
   "val1": [random.random() for _ in range(20)], 
   "val2": [str(int(random.random()*100)) for _ in range(20)]
})
>>> df.head()
   id      val1 val2
0   0  0.174553   49
1   0  0.724547   95
2   0  0.369883    3
3   1  0.243191   64
4   1  0.575982   16
>>> df.dtypes
id        int64
val1    float64
val2     object
dtype: object

Then we group by the id and aggregate the values according to the functions you specify in the dictionary you pass to agg .然后我们按 id 分组并根据您在传递给agg的字典中指定的函数聚合值。 In this example I sum up the float values and join the strings with an underscore separator.在此示例中,我总结了浮点值并使用下划线分隔符连接字符串。 You could eg also pass the list function to store the values in a list.例如,您还可以传递列表 function 以将值存储在列表中。

>>> df.groupby("id").agg({"val1": sum, "val2": "__".join})
        val1        val2
id
0   1.268984   49__95__3
1   0.856992  64__16__54
2   2.186370  30__59__21
3   1.486925  29__47__77
4   1.523898  19__78__99
5   0.855413  59__74__73
6   0.201787      63__33

EDIT regarding the comment "But how can we make val2 contain the top 5 applications according to the duration of the application?":编辑关于评论“但是我们如何使 val2 根据应用程序的持续时间包含前 5 个应用程序?”:

The agg method is restricted in the sense that you cannot access other attributes while aggregating. agg方法在聚合时无法访问其他属性的意义上受到限制。 To do that you should use the apply method.为此,您应该使用apply方法。 You pass it a function, that processes the whole group and returns a row as Series object.你传递给它一个 function,它处理整个组并返回一行作为 object Series

In this example I still use the sum for val1, but for val2 I return the val2 of the row with the highest val1.在此示例中,我仍然使用 val1 的总和,但对于 val2,我返回具有最高 val1 的行的 val2。 This should make clear how to make the aggregation depend on other attributes.这应该清楚如何使聚合依赖于其他属性。

def apply_func(group):
   return pd.Series({
      "id": group["id"].iat[0], 
      "val1": group["val1"].sum(), 
      "val2": group["val2"].iat[group["val1"].argmax()]
   })

>>> df.groupby("id").apply(apply_func)
    id      val1 val2
id
0    0  1.749955   95
1    1  0.344372   65
2    2  2.019035   70
3    3  2.444691   36
4    4  2.573576   92
5    5  1.453769   72
6    6  1.811516   94

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python-如何将数组添加到唯一的numpy数组? 每个数组必须代表我的numpy数组中的一行 - Python - How can i add arrays to an unique numpy array ? Each array must represent a row in my numpy array 如何使用python计算一列中每一行的唯一值? - How to count the unique values of each row in one column with python? 如何限制python中的列表以显示每个唯一行的N条记录? - How to limit list in python to show N records for each unique row? Python,每个用户输入的新唯一 object - Python, new unique object for each user input 基于 3-4 列的每一行有多独特? - How unique is each row based on 3-4 columns? 用向量表示每个学生 python - Represent each student by vector in python 如何获取唯一 ID 并将另一列的每一行转换为 R 和 Python 中的另一列 - How to take unique IDs and convert each row of another column into another column in R and Python 从 Python 中的 pandas 列中的每一行获取唯一字数 - Getting the unique word count from each row in a pandas column in Python 如何在Python中获取每一行? - How to get each row in Python? 通过对每个唯一用户进行分组并为每个唯一日期添加访问次数列来对 dataframe 进行排序和排名 Python - Sort and rank a dataframe by grouping each unique user and adding a visit number column for each unique date Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM