[英]Fastest way to map a dict on a df (multiple columns)
I have the following data frame:我有以下数据框:
df = pd.DataFrame({0: {0: 'EFG',
1: 'EFG',
2: 'EFG',
3: 'EFG',
4: 'EFG',
5: 'EFG',
6: 'EFG',
7: 'ABC',
8: 'EFG',
9: 'EFG',
10: 'EFG',
11: 'EFG',
12: 'EFG',
13: 'EFG',
14: 'EFG',
15: 'EFG',
16: 'EFG',
17: 'EFG',
18: 'EFG',
19: 'EFG',
20: 'ABC',
21: 'EFG',
22: 'EFG',
23: 'EFG',
24: 'EFG',
25: 'EFG',
26: 'EFG',
27: 'EFG',
28: 'EFG',
29: 'EFG'},
1: {0: 'DS',
1: 'DS',
2: 'DS',
3: 'Q',
4: 'DS',
5: 'DS',
6: 'DS',
7: 'DS',
8: 'DS',
9: 'DS',
10: 'DS',
11: 'DS',
12: 'DS',
13: 'DS',
14: 'DS',
15: 'DS',
16: 'DS',
17: 'DS',
18: 'DS',
19: 'DS',
20: 'DS',
21: 'DS',
22: 'DAS',
23: 'DAS',
24: 'DAS',
25: 'DS',
26: 'DS',
27: 'Q',
28: 'DS',
29: 'DS'},
2: {0: '321',
1: '900',
2: '900',
3: '900',
4: '1000',
5: '1000',
6: '1000',
7: '444',
8: '900',
9: '900',
10: '321',
11: '900',
12: '1000',
13: '900',
14: '321',
15: '321',
16: '1000',
17: '1000',
18: '1000',
19: '1000',
20: '444',
21: '900',
22: '12345',
23: '12345',
24: '321',
25: '321',
26: '12345',
27: '1000',
28: '900',
29: '321'}})
and the following dict:和以下字典:
{('ABC', 'AS', '1000'): 123,
('ABC', 'AS', '444'): 321,
('ABC', 'AS', '231341'): 421,
('ABC', 'AS', '888'): 412,
('ABC', 'AS', '087'): 4215,
('ABC', 'DAS', '1000'): 3415,
('ABC', 'DAS', '444'): 4215,
('ABC', 'DAS', '231341'): 3214,
('ABC', 'DAS', '888'): 321,
('ABC', 'DAS', '087'): 111,
('ABC', 'Q', '1000'): 222,
('ABC', 'Q', '444'): 3214,
('ABC', 'Q', '231341'): 421,
('ABC', 'Q', '888'): 321,
('ABC', 'Q', '087'): 41,
('ABC', 'DS', '1000'): 421,
('ABC', 'DS', '444'): 421,
('ABC', 'DS', '231341'): 321,
('ABC', 'DS', '888'): 41,
('ABC', 'DS', '087'): 41,
('EFG', 'AS', '1000'): 213,
('EFG', 'AS', '900'): 32,
('EFG', 'AS', '12345'): 1,
('EFG', 'AS', '321'): 3,
('EFG', 'DAS', '1000'): 421,
('EFG', 'DAS', '900'): 321,
('EFG', 'DAS', '12345'): 123,
('EFG', 'DAS', '321'): 31,
('EFG', 'Q', '1000'): 41,
('EFG', 'Q', '900'): 51,
('EFG', 'Q', '12345'): 321,
('EFG', 'Q', '321'): 321,
('EFG', 'DS', '1000'): 41,
('EFG', 'DS', '900'): 51,
('EFG', 'DS', '12345'): 321,
('EFG', 'DS', '321'): 1}
This is of course only a sample of my df, and I have multiple dicts like this one.这当然只是我的 df 的一个样本,我有多个像这样的字典。 I am looking for the fastest way to map this dict to this dataframe, based on the 3 columns.我正在寻找 map 这个字典到这个 dataframe 的最快方法,基于 3 列。 I need to run this multiple times during,y analysis, so I am looking for the optimal solution in term of running time.我需要在分析期间多次运行它,所以我正在寻找运行时间方面的最佳解决方案。
what i tried already:我已经尝试过的:
def map_d(a,b,c):
return d1[(a,b,c)]
res = [map_d(*a) for a in tuple(zip(df[0], df[1], df[2]))]
23.1 µs ± 335 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
performance on real data:
170 ms ± 5.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
res = df.apply(lambda x: d1[(x[0],x[1],x[2])],axis=1)
742 µs ± 16.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
performance on real data:
7.27 s ± 201 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
I am looking for fastest solutions (I can build the dict differently if needed) Thanks我正在寻找最快的解决方案(如果需要,我可以以不同的方式构建字典)谢谢
For improve performance in larger DataFrames is possible use Series
with DataFrame.join
:为了提高更大数据帧的性能,可以使用带有DataFrame.join
的Series
:
df = pd.DataFrame(df)
df1 = df.join(pd.Series(d, name='new'), on=[0,1,2])
print (df1.head(10))
0 1 2 new
0 EFG DS 321 1
1 EFG DS 900 51
2 EFG DS 900 51
3 EFG Q 900 51
4 EFG DS 1000 41
5 EFG DS 1000 41
6 EFG DS 1000 41
7 ABC DS 444 421
8 EFG DS 900 51
9 EFG DS 900 51
Another idea (similar solution like in question):另一个想法(类似问题的解决方案):
res = [d[(a,b,c)] for a,b,c in zip(df[0], df[1], df[2])]
Or:或者:
res = [d[(a,b,c)] for a,b,c in df[[0,1,2]].to_numpy()]
Convert your dataframe as MultiIndex
and extract values from your dict ( Series
):将您的 dataframe 转换为MultiIndex
并从您的 dict ( Series
) 中提取值:
out = pd.Series(d).loc[pd.MultiIndex.from_frame(df)] \
.rename('values').reset_index()
Output: Output:
>>> out
0 1 2 values
0 EFG DS 321 1
1 EFG DS 900 51
2 EFG DS 900 51
3 EFG Q 900 51
4 EFG DS 1000 41
5 EFG DS 1000 41
6 EFG DS 1000 41
7 ABC DS 444 421
8 EFG DS 900 51
9 EFG DS 900 51
10 EFG DS 321 1
11 EFG DS 900 51
12 EFG DS 1000 41
13 EFG DS 900 51
14 EFG DS 321 1
15 EFG DS 321 1
16 EFG DS 1000 41
17 EFG DS 1000 41
18 EFG DS 1000 41
19 EFG DS 1000 41
20 ABC DS 444 421
21 EFG DS 900 51
22 EFG DAS 12345 123
23 EFG DAS 12345 123
24 EFG DAS 321 31
25 EFG DS 321 1
26 EFG DS 12345 321
27 EFG Q 1000 41
28 EFG DS 900 51
29 EFG DS 321 1
You can apply tuple
constructor for each row and then map
the dictionary.您可以为每一行应用tuple
构造函数,然后map
字典。 Note that your df
is not a DataFrame.请注意,您的df
不是 DataFrame。 It needs to be cast into a DataFrame constructor.需要将其转换为 DataFrame 构造函数。
df = pd.DataFrame(df)
df['vals'] = df.apply(tuple, axis=1).map(dct)
Output: Output:
0 1 2 vals
0 EFG DS 321 1
1 EFG DS 900 51
2 EFG DS 900 51
3 EFG Q 900 51
4 EFG DS 1000 41
...
25 EFG DS 321 1
26 EFG DS 12345 321
27 EFG Q 1000 41
28 EFG DS 900 51
29 EFG DS 321 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.