简体   繁体   English

根据某些词典数据更改以熊猫为单位的列值

[英]Change column value in pandas depending on some dictionary data

I have some data in dictionary like and a pandas dataframe like: 我在字典中有一些数据,例如pandas dataframe:

s_dict = {('A1','B1'):100, ('A3','B3'):300}

df = pd.DataFrame(data={'A': ['A1', 'A2'], 'B': ['B1', 'B2'], 
                        'C': ['C1', 'C2'], 'count':[1,2]})

#    A   B   C  count
#0  A1  B1  C1      1
#1  A2  B2  C2      2

I want to replace count column of "df" if data exist in s_dict. 如果数据存在于s_dict中,我想替换“ df”的计数列。 So I want following output: 所以我想要以下输出:

#    A   B   C  count
#0  A1  B1  C1      100
#1  A2  B2  C2      2

You can use: 您可以使用:

df['count'] = df[['A', 'B']].apply(tuple, axis=1).map(s_dict).fillna(df['count'])
  • apply(tuple, axis=1) creates a tuple of the relevant columns' values. apply(tuple, axis=1)创建相关列值的元组。
  • map(s_dict) maps the tuples to the values in s_dict . map(s_dict)将元组映射到s_dict的值。
  • fillna(df['count']) fills missing values with those of count . fillna(df['count'])填充与那些缺失值count

Here is one way using zip() which is generally faster than .apply() . 这是使用zip()一种方法,通常比.apply()更快。

import pandas as pd

s_dict = {('A1','B1'):100, ('A3','B3'):300}
df = pd.DataFrame(data={'A': ['A1', 'A2'], 'B': ['B1', 'B2'], 
                       'C': ['C1', 'C2'], 'count':[1,2]})

# Create a map
m = pd.Series(list(zip(df['A'],df['B']))).map(s_dict).dropna()

# Assign to the index that are not nan
df.loc[m.index, 'count'] = m

Inspired by filling na with the column values you could do: (seems to be the quickest) 通过用列值填充na可以启发您:(似乎是最快的)

df['count'] = pd.Series(list(zip(df['A'],df['B']))).map(s_dict).fillna(df['count'])

Timings 计时

df['count'] = pd.Series(list(zip(df['A'],df['B']))).map(s_dict).fillna(df['count'])
# 1.52 ms ± 85.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

df['count'] = df[['A', 'B']].apply(tuple, axis=1).map(s_dict).fillna(df['count'])
# 1.88 ms ± 100 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

dropna and loc (2 row-operation above)
# 1.93 ms ± 55.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM