[英]pandas dataframe check if arrays in row are same as sample
I have a PD frame which looks like so (1000's of rows):我有一个看起来像这样的 PD 框架(1000 行):
pk_id ses_id data zero_val goal
5410 0 [4, 6, 7, 43, 4, 4, 4, 4, 4, 4, 2, 2, ... 9541 1
...
where the data array has say size (64,).其中数据数组的大小为(64,)。 Now, I have another sample ND array, say
sample_array
of size (64,) and I would like to test it against all the arrays in the "data" column and return its corresponding pk_id
.现在,我有另一个样本 ND 数组,例如大小为 (64) 的 sample_array,我想针对“数据”列中的所有
sample_array
对其进行测试,并返回其对应的pk_id
。 To this end I do:为此,我这样做:
self.pd_data.index[self.pd_data['data'] == sample_array].tolist()
but I keep getting:但我不断得到:
pandas/core/ops/array_ops.py", line 234, in comparison_op
raise ValueError("Lengths must match to compare")
ValueError: Lengths must match to compare
I dont really understand what could be wrong - I have checked the lengths and they are indeed (64,) as I expect.我真的不明白可能出了什么问题 - 我已经检查了长度,它们确实是(64,),正如我所期望的那样。
Any pointers would be much appreciated.任何指针将不胜感激。
When you compare当你比较
series == sample_array
you actually unfold self.pd_data['data']
, which is a series, and compare element wise to sample_array
.您实际上展开
self.pd_data['data']
,这是一个系列,并将元素明智地与sample_array
进行比较。 That is那是
[x == y for x,y in zip(series, sample_array)]
Now, your sample_array
has length 64
while series
does not.现在,您的
sample_array
的长度为64
,而series
没有。 Pandas doesn't like that and throws the said error. Pandas 不喜欢这样并抛出上述错误。
A way to go around that is using np.vstack
: go 的一种方法是使用
np.vstack
:
(np.vstack(self.pd_data['data'])==sample_array).all(1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.