I have a PD frame which looks like so (1000's of rows):
pk_id ses_id data zero_val goal
5410 0 [4, 6, 7, 43, 4, 4, 4, 4, 4, 4, 2, 2, ... 9541 1
...
where the data array has say size (64,). Now, I have another sample ND array, say sample_array
of size (64,) and I would like to test it against all the arrays in the "data" column and return its corresponding pk_id
. To this end I do:
self.pd_data.index[self.pd_data['data'] == sample_array].tolist()
but I keep getting:
pandas/core/ops/array_ops.py", line 234, in comparison_op
raise ValueError("Lengths must match to compare")
ValueError: Lengths must match to compare
I dont really understand what could be wrong - I have checked the lengths and they are indeed (64,) as I expect.
Any pointers would be much appreciated.
When you compare
series == sample_array
you actually unfold self.pd_data['data']
, which is a series, and compare element wise to sample_array
. That is
[x == y for x,y in zip(series, sample_array)]
Now, your sample_array
has length 64
while series
does not. Pandas doesn't like that and throws the said error.
A way to go around that is using np.vstack
:
(np.vstack(self.pd_data['data'])==sample_array).all(1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.