简体   繁体   English

pandas dataframe 检查行中的 arrays 是否与样本相同

[英]pandas dataframe check if arrays in row are same as sample

I have a PD frame which looks like so (1000's of rows):我有一个看起来像这样的 PD 框架(1000 行):

 pk_id   ses_id       data                                            zero_val    goal
 5410         0     [4, 6, 7, 43, 4, 4, 4, 4, 4, 4, 2, 2, ...        9541       1
 ...

where the data array has say size (64,).其中数据数组的大小为(64,)。 Now, I have another sample ND array, say sample_array of size (64,) and I would like to test it against all the arrays in the "data" column and return its corresponding pk_id .现在,我有另一个样本 ND 数组,例如大小为 (64) 的 sample_array,我想针对“数据”列中的所有sample_array对其进行测试,并返回其对应的pk_id To this end I do:为此,我这样做:

self.pd_data.index[self.pd_data['data'] == sample_array].tolist()

but I keep getting:但我不断得到:

pandas/core/ops/array_ops.py", line 234, in comparison_op
raise ValueError("Lengths must match to compare")
ValueError: Lengths must match to compare

I dont really understand what could be wrong - I have checked the lengths and they are indeed (64,) as I expect.我真的不明白可能出了什么问题 - 我已经检查了长度,它们确实是(64,),正如我所期望的那样。

Any pointers would be much appreciated.任何指针将不胜感激。

When you compare当你比较

series == sample_array

you actually unfold self.pd_data['data'] , which is a series, and compare element wise to sample_array .您实际上展开self.pd_data['data'] ,这是一个系列,并将元素明智地与sample_array进行比较。 That is那是

[x == y for x,y in zip(series, sample_array)]

Now, your sample_array has length 64 while series does not.现在,您的sample_array的长度为64 ,而series没有。 Pandas doesn't like that and throws the said error. Pandas 不喜欢这样并抛出上述错误。

A way to go around that is using np.vstack : go 的一种方法是使用np.vstack

(np.vstack(self.pd_data['data'])==sample_array).all(1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM