简体   繁体   English

Pandas Dataframe:基于字典列表更新多行的最快方法

[英]Pandas Dataframe: fastest way of updating multiple rows based on a list of dictionaries

What is the fastest way of replacing values in multiple rows of a pandas DataFrame based on a list of dictionaries (messages).根据字典(消息)列表,在 pandas DataFrame 的多行中替换值的最快方法是什么。 Eventually, I want to process real time streaming data (from a websocket connection) at a peak rate of ~1000 messages per second.最终,我想以每秒约 1000 条消息的峰值速率处理实时流数据(来自 websocket 连接)。 Below I post a simplified artificial example to illustrate the task.下面我发布一个简化的人工示例来说明该任务。 At the moment the performance is not fast enough.目前性能还不够快。

import random
names = ["Jim", "Bryan", "Roy", "Axel", "Billy", "Charlie", "Peter", "Marie", "Paul"]

data = {'id':  np.arange(10000, 20000, 1).tolist(),
        'name': [random.choice(names) for _ in range(10000)],
        'age': np.random.randint(18, 67, size=10000),
        }

df = pd.DataFrame (data, columns = ['id', 'name', 'age'])
df

Something like this:像这样的东西:

    id  name    age
0   10000   Charlie 45
1   10001   Peter   36
2   10002   Billy   34
3   10003   Axel    62
4   10004   Paul    20
... ... ... ...

This is an example list of dictionaries to be used to update the DataFrame with:这是用于更新 DataFrame 的字典示例列表:

message_list = [
     {
    "id": 10002,
    "name": "Peter",
    "age": 65,
    },
     {
    "id": 10036,
    "name": "John",
    "age": 26,
    },
     {
    "id": 10789,
    "name": "Lisa",
    "age": 41,
    },
]

This is my current approach to update the name:这是我目前更新名称的方法:

def update_df(df, message):
    df.loc[df.id == message["id"], 'name'] = message['name']

%%time
[update_df(df, message) for message in message_list]

CPU times: user 5.79 ms, sys: 494 µs, total: 6.29 ms
Wall time: 5.95 ms

Is there a faster way of doing this kind of DataFrame update?有没有更快的方法来进行这种 DataFrame 更新? Maybe replacing the list comprehension with a more vectorized approach?也许用更矢量化的方法替换列表理解?

You could use .map on the series (from dictionary) and fillna with existing values.您可以在系列(来自字典)上使用.map并使用现有值fillna

In [260]: mapper = {d['id']:d['name'] for d in message_list}

In [261]: df['name'] = df['id'].map(mapper).fillna(df['name'])

In [262]: df
Out[262]: 
      id     name  age
0  10000  Charlie   45
1  10001    Peter   36
2  10002    Peter   34
3  10003     Axel   62
4  10004     Paul   20

In [269]: mapper
Out[269]: {10002: 'Peter', 10036: 'John', 10789: 'Lisa'}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为 Pandas 数据帧中的每一行搜索和更新值的最快方法 - fastest way for searching and updating values for every rows in Pandas dataframe 根据条件迭代熊猫数据框中行子集的最快方法 - Fastest way to iterate subsets of rows in pandas dataframe based on condition 将字典列表(每个都有多个子字典)转换为单个 dataframe 的最快方法 - Fastest way to convert a list of dictionaries (each having multiple sub-dictionaries) into a single dataframe 如何将一行中的字典列表拆分为多行 pandas DataFrame? - How to split list of dictionaries in a row into multiple rows of pandas DataFrame? 在给定索引列表的情况下将多行插入数据帧的最快方法(python) - fastest way to insert multiple rows into a dataframe given a list of indexes (python) 向现有熊猫数据框添加行的最快方法 - Fastest way to add rows to existing pandas dataframe 将 Pandas 数据帧拆分为子数据帧列表的最快方法 - Fastest way to split a pandas dataframe into a list of subdataframes 使用列表过滤 Pandas Dataframe 的最快方法 - Fastest Way To Filter A Pandas Dataframe Using A List 在多列上筛选熊猫数据框的最快方法 - Fastest way to filter a pandas dataframe on multiple columns 有没有办法将带有字典列表的列取消嵌套到 pandas Dataframe - Is there a way of unnesting a column with a list of dictionaries into a pandas Dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM