简体   繁体   中英

Baseline subtraction/removal of a pandas DataFrame (Python)

I have some experimental time-lapse data of light emission of cells. Unfortunately, the baseline changes over time (see attached image for example, https://i.stack.imgur.com/zjlBR.png ) which makes it harder to analyse the data. For the different samples, the baseline changes in somewhat different ways (for example, some are linearly decreasing/increasing).

I'm wondering if there's some way to remove the baseline of each column in my DataFrame. I've looked into scipy's signal.detrend, but since it's not exactly linear, it doesn't seem useful in this case. I've searched for days before posting this question, but I have yet to find a proper solution. I considered plotting the local minima and subtracting them, but I found that as too blunt of a tool and unwise to implement on a whole DataFrame consisting of 40 columns.

I also found the peakutils baseline module, but I found it unsatisfying. Is there anything I've missed? This should be far from a unique problem within experimental data so I would be very surprised if SciPy doesn't have a proper module. Below is an example of the type of data that I would like to be able to subtract a baseline from, effectively removing the periodicity and making it more or less linear.

import numpy as np    
n = 1000
limit_low = 0
limit_high = 0.48
my_data = np.random.normal(0, 0.5, n) \
      + np.abs(np.random.normal(0, 2, n) \
               * np.sin(np.linspace(0, 3*np.pi, n)) ) \
      + np.sin(np.linspace(0, 5*np.pi, n))**2 \
      + np.sin(np.linspace(1, 6*np.pi, n))**2
scaling = (limit_high - limit_low) / (max(my_data) - min(my_data))
my_data = my_data * scaling
my_data = my_data + (limit_low - min(my_data))

(Code courtesy of user Swier)

If I understand your question correctly, you are dealing with a non-stationary time series. This is a common problem in time series analysis, but there are ways to deal with non-stationary data or make it stationary (such as detrending). These operations are not trivial, and there are various approaches, mostly copied from more stats-oriented languages such as R (eg: https://www.statsmodels.org/dev/generated/statsmodels.tsa.tsatools.detrend.html ).

An LSTM can also deal with non-stationary data. See this answer for more helpful discussion.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM