简体   繁体   English

如何制作更高效的代码?(熊猫数据框)

[英]How to make more efficient code?(pandas dataframe)

my code:我的代码:

在此处输入图像描述

This is my code.这是我的代码。

I want to find same key (user_id) and merge column(info).我想找到相同的键(user_id)和合并列(信息)。

If如果

user_id info

test.   [1,2,3]
test.   [2,3,4]

==>>

user_id. info

test. [1,2,3,4]

My code is too slow.我的代码太慢了。 So I want to know make efficient code.所以我想知道如何制作高效的代码。

Thanks for reading!谢谢阅读!

You can try this approach using set s:您可以使用set s 尝试这种方法:

import pandas as pd

list1 = {"user_id": ["user1", "user2", "user1"], "info": [[1, 2, 3], [10, 20, 30], [2, 3, 4]]}

df = pd.DataFrame(data=list1)

aggregate_info = set()
df = df[df["user_id"] == "user1"]
info = list(df["info"])
for item in info:
    aggregate_info = set(item).union(aggregate_info)

print(f"user1: {list(aggregate_info)}")

This will give you:这会给你:

user1: [1, 2, 3, 4]

A simple one liner should do the trick:一个简单的一个班轮应该可以解决问题:

import pandas as pd


df = pd.DataFrame.from_dict({
    'user_id': ['test.', 'test.', 'user1', 'user1', 'user1'], 
    'info': [[1, 2, 3], [2, 3, 4], [1], [1, 2, 3, 4, 5], [1, 5, 7]]
})
print(df)
#   user_id             info
# 0   test.        [1, 2, 3]
# 1   test.        [2, 3, 4]
# 2   user1              [1]
# 3   user1  [1, 2, 3, 4, 5]
# 4   user1        [1, 5, 7]

distinct_df = df.groupby('user_id').sum()['info'].apply(lambda x: sorted(set(x))).reset_index()
print(distinct_df)
#   user_id                info
# 0   test.        [1, 2, 3, 4]
# 1   user1  [1, 2, 3, 4, 5, 7]

You could try:你可以试试:

df.info = df.groupby('user_id').info.apply(lambda x: set(x.sum()))

(or list(set(x.sum())) if you still want the info values to be lists) (或list(set(x.sum()))如果您仍然希望info值是列表)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM