I have a Pandas dataframe with two columns, x and y, that correspond to a large signal. It is about 3 million rows in size.
I am trying to isolate the peaks from the signal. After using scipy, I got a 1D Python list corresponding to the indexes of the peaks. However, they are not the actual x-values of the signal, but just the index of their corresponding row:
from scipy.signal import find_peaks
peaks, _ = find_peaks(y, height=(None, peakline))
So, I decided I would just filter the original dataframe by setting all values in its y column to NaN unless they were on an index found in the peak list. I did this iteratively, however, since it is 3000000 rows, it is extremely slow:
peak_index = 0
for data_index in list(data.index):
if data_index != peaks[peak_index]:
data[data_index, 1] = float('NaN')
else:
peak_index += 1
Does anyone know what a faster method of filtering a Pandas dataframe might be?
Looping in most cases is extremely inefficient when it comes to pandas. Assuming you just need filtered DataFrame that contains the values of both x
and y
columns only when y
is a peak, you may use the following piece of code:
df.iloc[peaks]
Alternatively, if you are hoping to retrieve an original DataFrame with y
column retaining its peak values and having NaN
otherwise, then please use:
df.y = df.y.where(df.y.iloc[peaks] == df.y.iloc[peaks])
Finally, since you seem to care about just the x
values of the peaks, you might just rework the first piece in the following way:
df.iloc[peaks].x
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.