简体   繁体   中英

Modifying Wide Pandas Data frame based on Condition

I am attempting to edit values for a wide form of time series data based on a condition in python using the pandas library. The data is satellite observational values on a given date (see photo below). The first column is a unique id and all subsequent columns are date values. This means that each row is a time series for that specific id.

The idea is this:

if n1 is the current observation and n2 is the next observation and n3 is the observation after that then:

if ((n2 - n1) > 0.3) and (n3 >= (0.9 * n1)):
    n2 = (n1 + n3) / 2

Just to be clear, n1, n2, n3 are the first three values of this data frame, not attributes. For the attached example n1 would be 0.25916876 and n2 would be 0.25916876 and n3 would be 0.23824187.

How can I modify my Data frame with this rule? Could this be done with list comprehension?

This is what df looks like

If your dataframe is named df , then you can try:

mask = (df.n1 - df.n2 > 0.3) & (df.n3 >= (0.9*df.n1))
df.n2.where(~mask, (df.n1 + df.n3) / 2)

I assume you want to do this process for each column of the dataframe. This is working with a fake dataframe I created to replicate the process:

# Iterate over each column
for c in list(df):
    df[c] = np.where((df[c]-df[c].shift(1, fill_value=0)>0.3) &
                     (df[c].shift(-1, fill_value=0) > 0.9*df[c].shift(1, fill_value=0)), 
                     np.mean(df[c].shift(-1, fill_value=0),df[c].shift(1, fill_value=0)), 
                     df[c])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM