简体   繁体   English

Python pandas dataframe 中字典映射的最有效方法

[英]Python most efficient way to dictionary mapping in pandas dataframe

I have a dictionary of dictionaries and each contains a mapping for each column of my dataframe.我有一本字典,每个字典都包含我的 dataframe 每一列的映射。

My goal is to find the most efficient way to perform mapping for my dataframe with 1 row and 300 columns.我的目标是找到最有效的方法来为我的 dataframe 执行 1 行和 300 列的映射。

My dataframe is randomly sampled from range(mapping_size) ;我的 dataframe 是从range(mapping_size)随机采样的; and my dictionaries map values from range(mapping_size) to random.randint(mapping_size+1,mapping_size*2) .和我的字典 map 值从range(mapping_size)random.randint(mapping_size+1,mapping_size*2)

I can see from the answer provided by jpp that map is possibly the most efficient way to go but I am looking for something which is even faster than map .我可以从jpp 提供的答案中看到map可能是 go 最有效的方法,但我正在寻找比map更快的方法。 Can you think of any?你能想到任何吗? I am happy if the data structure of the input is something else instead of pandas dataframe.如果输入的数据结构是其他东西而不是 pandas dataframe,我很高兴。

Here is the code for setting up the question and results using map and replace :这是使用map设置问题和结果的代码并replace

# import packages
import random
import pandas as pd
import numpy as np
import timeit

# specify paramters
ncol = 300 # number of columns
nrow =  1 #number of rows
mapping_size = 10 # length of each dictionary

# create a dictionary of dictionaries for mapping
mapping_dict = {}

random.seed(123)

for idx1 in range(ncol):
    # create empty dictionary
    mapping_dict['col_' + str(idx1)] = {}
    for inx2 in range(mapping_size):
        # create dictionary of length mapping_size and maps value from range(mapping_size) to  random.randint(mapping_size +1 ,mapping_size*2)
        mapping_dict['col_' + str(idx1)][inx2+1] = random.randint(mapping_size+1,mapping_size*2)
        
# Create a dataframe with values sampled from range(mapping_size)
d={}

random.seed(123)

for idx1 in range(ncol):
    d['col_' + str(idx1)] = np.random.choice(range(mapping_size),nrow)
    
df = pd.DataFrame(data=d)

Results using map and replace :结果使用mapreplace

%%timeit -n 20
df.replace(mapping_dict) #296 ms

%%timeit -n 20
for key in mapping_dict.keys():
    df[key] = df[key].map(mapping_dict[key]).fillna(df[key]) #221ms

%%timeit -n 20
for key in mapping_dict.keys():
    df[key] = df[key].map(mapping_dict[key]) #181ms

Just use pandas without python for iteration.只需使用没有 python 的 pandas for迭代。

# runtime  ~ 1s (1000rows)

# creat a map_serials with multi_index
df_dict = pd.DataFrame(mapping_dict)
obj_dict = df_dict.T.stack()

# obj_dict

    # col_0    1     10
    #          2     14
    #          3     11
    # Length: 3000, dtype: int64

# convert df to map_serials's index, df can have more then 1 row
obj_idx = pd.Series(df.values.flatten())
obj_idx.index = pd.Index(df.columns.to_list() * df.shape[0])
idx = obj_idx.to_frame().reset_index().set_index(['index', 0]).index
result = obj_dict[idx]

# handle null values
cond = result.isnull()
result[cond] = pd.Series(result[cond].index.values).str[1].values

# transform to reslut DataFrame
df_result = pd.DataFrame(result.values.reshape(df.shape))
df_result.columns = df.columns

df_result

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pandas DataFrame 中映射列的最有效方法 - Most efficient way of mapping column in pandas DataFrame Pandas:从DataFrame列创建词典字典的最有效方法 - Pandas: Most efficient way to make dictionary of dictionaries from DataFrame columns 将带有numpy数组列表的字典转换为pandas数据帧的最有效方法? - Most efficient way to convert a dictionary with list of numpy arrays into pandas dataframe? 将python字典映射到pandas数据框 - Mapping python dictionary to pandas dataframe 以最有效的方式将字典嵌套到 Dataframe - Nested dictionary to Dataframe in the most efficient way possible 在 pandas 中计算平方 dataframe 的最有效方法 - Most efficient way to compute a square dataframe in pandas 将字典转换为熊猫数据框的有效方法 - Efficient way to transform a dictionary into a dataframe in pandas 如何通过python / pandas中另一个数据框的值来标记一个数据框的列的最有效方式? - How to flag the most efficient way a column of a dataframe by values of another dataframe's in python/pandas? 在 Python/Pandas 中,将自定义 function 应用于输入包含字符串的 dataframe 的列的最有效方法是什么? - In Python/Pandas, what is the most efficient way, to apply a custom function, to a column of a dataframe, where the input includes strings? Python - 将大型多页电子表格读入 pandas dataframe 的最有效方法是什么 - Python - whats the most efficient way to read large multi sheet spreadsheets into a pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM