[英]Given a list of values in a column in pandas DataFrame, how to output values from another column in the same rows?
The problem is simple, the input is a list of non-container objects ( int
, str
etc.), all elements inside the list are contained inside a column in a DataFrame
, the task is, for each element inside the list, find the object (only its value, not the array) in another column in the same row.问题很简单,输入是一个非容器对象列表(
int
, str
等),列表中的所有元素都包含在DataFrame
的列中,任务是,对于列表中的每个元素,找到object(只是它的值,不是数组)在同一行的另一列中。
The problem will be better demonstrated in code:该问题将在代码中得到更好的证明:
from pandas import DataFrame
digits = '0123456789abcdef'
df = DataFrame([(a,b) for a, b in zip(digits, range(16))], columns=['hex', 'dec'])
df
df.loc[df.dec == 12, 'hex']
df.loc[df.dec == 12, 'hex'].values[0]
import random
eight = random.sample(range(16), 8)
eight
fun = lambda x: df.loc[df.dec == x, 'hex'].values[0]
''.join(fun(i) for i in eight)
''.join(map(fun, eight))
As you can see I can already do this, but I am using a for loop, and the performance isn't very impressive, I know pandas
and numpy
are all about vectorization, I wonder is there a built-in way to do this...如您所见,我已经可以做到这一点,但我正在使用 for 循环,并且性能不是很令人印象深刻,我知道
pandas
和numpy
都是关于矢量化的,我想知道是否有内置的方法可以做到这一点。 ..
In [1]: from pandas import DataFrame
In [2]: digits = '0123456789abcdef'
In [3]: df = DataFrame([(a,b) for a, b in zip(digits, range(16))], columns=['hex', 'dec'])
In [4]: df
Out[4]:
hex dec
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 a 10
11 b 11
12 c 12
13 d 13
14 e 14
15 f 15
In [5]: df.loc[df.dec == 12, 'hex']
Out[5]:
12 c
Name: hex, dtype: object
In [6]: df.loc[df.dec == 12, 'hex'].values[0]
Out[6]: 'c'
In [7]: import random
In [8]: eight = random.sample(range(16), 8)
In [9]: eight
Out[9]: [9, 7, 1, 6, 11, 12, 14, 10]
In [10]: fun = lambda x: df.loc[df.dec == x, 'hex'].values[0]
In [11]: ''.join(fun(i) for i in eight)
Out[11]: '9716bcea'
In [12]: ''.join(map(fun, eight))
Out[12]: '9716bcea'
In [13]: %timeit ''.join(fun(i) for i in eight)
2.34 ms ± 136 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [14]: %timeit ''.join(map(fun, eight))
2.34 ms ± 134 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So what is a vectorized way to achieve the same result as the method demonstrated in the code?那么有什么向量化的方式可以实现和代码中演示的方法一样的结果呢?
A vectorized way would be to construct a Series:矢量化的方法是构造一个系列:
series = df.set_index('dec')['hex']
''.join(series[eight])
Output: '9716bcea'
Output:
'9716bcea'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.