[英]How to make more efficient code?(pandas dataframe)
my code:我的代码:
This is my code.这是我的代码。
I want to find same key (user_id) and merge column(info).我想找到相同的键(user_id)和合并列(信息)。
If如果
user_id info
test. [1,2,3]
test. [2,3,4]
==>>
user_id. info
test. [1,2,3,4]
My code is too slow.我的代码太慢了。 So I want to know make efficient code.
所以我想知道如何制作高效的代码。
Thanks for reading!谢谢阅读!
You can try this approach using set
s:您可以使用
set
s 尝试这种方法:
import pandas as pd
list1 = {"user_id": ["user1", "user2", "user1"], "info": [[1, 2, 3], [10, 20, 30], [2, 3, 4]]}
df = pd.DataFrame(data=list1)
aggregate_info = set()
df = df[df["user_id"] == "user1"]
info = list(df["info"])
for item in info:
aggregate_info = set(item).union(aggregate_info)
print(f"user1: {list(aggregate_info)}")
This will give you:这会给你:
user1: [1, 2, 3, 4]
A simple one liner should do the trick:一个简单的一个班轮应该可以解决问题:
import pandas as pd
df = pd.DataFrame.from_dict({
'user_id': ['test.', 'test.', 'user1', 'user1', 'user1'],
'info': [[1, 2, 3], [2, 3, 4], [1], [1, 2, 3, 4, 5], [1, 5, 7]]
})
print(df)
# user_id info
# 0 test. [1, 2, 3]
# 1 test. [2, 3, 4]
# 2 user1 [1]
# 3 user1 [1, 2, 3, 4, 5]
# 4 user1 [1, 5, 7]
distinct_df = df.groupby('user_id').sum()['info'].apply(lambda x: sorted(set(x))).reset_index()
print(distinct_df)
# user_id info
# 0 test. [1, 2, 3, 4]
# 1 user1 [1, 2, 3, 4, 5, 7]
You could try:你可以试试:
df.info = df.groupby('user_id').info.apply(lambda x: set(x.sum()))
(or list(set(x.sum()))
if you still want the info
values to be lists) (或
list(set(x.sum()))
如果您仍然希望info
值是列表)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.