简体   繁体   中英

Python numpy array filter

I got the following numpy array named 'data'. It consists of 15118 rows and 2 columns. The first column mostly consist of 0.01 steps, but sometimes there is a step in between (shown in red) which I would like to remove/filter out.

I achieved this with the following code:

# Create array [0, 0.01 .... 140], rounded 2 decimals to prevent floating point error
b = np.round(np.arange(0,140.01,0.01),2)

# New empty data array
new_data = np.empty(shape=[0, 2])

# Loop over values to remove/filter out data
for x in b:
    Index = np.where(x == data[:,0])[0][0]
    new_data = np.vstack([new_data,data[Index]])

I feel like this code is far from optimal and I was wondering if anyone knows a faster/better way of achieving this?

Here's a solution using pandas for resampling, you can probably achieve the same result in pure numpy but there are a number of floating point and rounding error pitfalls you are going to face, maybe it's better to let a trusted library do the work for you.

Let's say arr is your data array and assume your index to be in fractions of seconds. You can convert your array to a dataframe with a timedelta index:

df = pd.DataFrame(arr[:,1], index=arr[:,0])
df.index = pd.to_timedelta(df.index, unit="s")

Than resampling it's pretty easy, 10ms is the frequency you want, first() should give you the expected result dropping everything but the records at 10ms ticks, but feel free to experiment with other functions

df = df.resample("10ms").first()

Eventually you could get back to your array with something like:

np.vstack([pd.to_numeric(df.index, downcast="float").values / 1e9,
           df.values.squeeze()]).T

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM