简体   繁体   中英

pandas dataframe check if arrays in row are same as sample

I have a PD frame which looks like so (1000's of rows):

 pk_id   ses_id       data                                            zero_val    goal
 5410         0     [4, 6, 7, 43, 4, 4, 4, 4, 4, 4, 2, 2, ...        9541       1
 ...

where the data array has say size (64,). Now, I have another sample ND array, say sample_array of size (64,) and I would like to test it against all the arrays in the "data" column and return its corresponding pk_id . To this end I do:

self.pd_data.index[self.pd_data['data'] == sample_array].tolist()

but I keep getting:

pandas/core/ops/array_ops.py", line 234, in comparison_op
raise ValueError("Lengths must match to compare")
ValueError: Lengths must match to compare

I dont really understand what could be wrong - I have checked the lengths and they are indeed (64,) as I expect.

Any pointers would be much appreciated.

When you compare

series == sample_array

you actually unfold self.pd_data['data'] , which is a series, and compare element wise to sample_array . That is

[x == y for x,y in zip(series, sample_array)]

Now, your sample_array has length 64 while series does not. Pandas doesn't like that and throws the said error.

A way to go around that is using np.vstack :

(np.vstack(self.pd_data['data'])==sample_array).all(1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM