简体   繁体   English

使用 python/pandas 为 A 列中的每个唯一记录获取 B 列中的唯一值

[英]Get unique values in column B for each unique record in column A using python/pandas

在此处输入图片说明

I'm in search for a quick&productive workaround for the following task.我正在为以下任务寻找快速且高效的解决方法。

I need to create a separate column for each DeviceID .我需要为每个DeviceID创建一个单独的列。 The column must contain an array with unique SessionStartDate values for each DeviceID .该列必须包含一个数组,该数组具有每个DeviceID唯一SessionStartDate值。

For example:例如:

  • 8846620190473426378 | 8846620190473426378 | [2018-08-01, 2018-08-02] [2018-08-01, 2018-08-02]
  • 381156181455864495 | 381156181455864495 | [2018-08-01] [2018-08-01]

Though user 8846620190473426378 may have had 30 sessions on 2018-08-01, and 25 sessions on 2018-08-02, I'm only interested in unique dates when these sessions occurred.虽然用户8846620190473426378可能在 2018-08-01 有 30 个会话,在 2018-08-02 有 25 个会话,但我只对这些会话发生的唯一日期感兴趣。

Currently, I'm using this approach:目前,我正在使用这种方法:

df_main['active_days'] = [
sorted(
    list(
        set(
            sessions['SessionStartDate'].loc[sessions['DeviceID'] == x['DeviceID']]
            )
        )
    )  
for _, x in df_main.iterrows()
]

df_main here is another DataFrame, containing aggregated data grouped by DeviceID df_main这里是另一个 DataFrame,包含按 DeviceID 分组的聚合数据

The approach seems to be very ( Wall time: 1h 45min 58s ) slow, and I believe there's a better solution for the task.这种方法似乎非常慢( Wall time: 1h 45min 58s ),我相信有更好的解决方案。

Thanks in advance!提前致谢!

I believe you need sort_values with SeriesGroupBy.unique :我相信你需要sort_valuesSeriesGroupBy.unique

rng = pd.date_range('2017-04-03', periods=4)
sessions = pd.DataFrame({'SessionStartDate': rng, 'DeviceID':[1,2,1,2]})  
print (sessions)
  SessionStartDate  DeviceID
0       2017-04-03         1
1       2017-04-04         2
2       2017-04-05         1
3       2017-04-06         2

#if necessary convert datetimes to dates
sessions['SessionStartDate'] = sessions['SessionStartDate'].dt.date
out = (sessions.sort_values('SessionStartDate')
               .groupby('DeviceID')['SessionStartDate']
               .unique())
print (out)
DeviceID
1    [2017-04-03, 2017-04-05]
2    [2017-04-04, 2017-04-06]
Name: SessionStartDate, dtype: object

Another solution is remove duplicates by drop_duplicates and groupby with converting to list s:另一种解决方案是通过drop_duplicatesgroupby删除重复drop_duplicates并转换为list s:

sessions['SessionStartDate'] = sessions['SessionStartDate'].dt.date
out = (sessions.sort_values('SessionStartDate')
               .drop_duplicates(['DeviceID', 'SessionStartDate'])
               .groupby('DeviceID')['SessionStartDate']
               .apply(list))
print (out)
DeviceID
1    [2017-04-03, 2017-04-05]
2    [2017-04-04, 2017-04-06]
Name: SessionStartDate, dtype: object

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas 为 B 列中的每个唯一值获取 A 列中唯一值的列表 - Pandas Get List of Unique Values in Column A for each Unique Value in Column B Python Pandas - 过滤 pandas dataframe 以获取一列中具有最小值的行,以获取另一列中的每个唯一值 - Python Pandas - filter pandas dataframe to get rows with minimum values in one column for each unique value in another column 使用Python从列中获取唯一值 - Get unique values from a column using Python Python pandas dataframe:为另一列的每个唯一值查找最大值 - Python pandas dataframe: find max for each unique values of an another column 如何获取 pandas 中每对唯一列的列值计数? - How to get count of column values for each unique pair of columns in pandas? Pandas,对于一列中的每个唯一值,在另一列中获取唯一值 - Pandas, for each unique value in one column, get unique values in another column 对唯一列值进行分组以获得 pandas dataframe 列中每个唯一值的平均值 - Grouping unique column values to get average of each unique value in pandas dataframe column 获取每个分组的唯一元素并写入 Pandas Python 中的列 - Get unique elements for each grouping and write to column in Pandas Python Pandas:为每个唯一行获取一个新列 - Pandas: get a new column for each unique row 通过使用唯一的行值python pandas创建列来转换数据框 - Transforming dataframe by making column using unique row values python pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM