简体   繁体   中英

Applying corrections to a subsampled copy of a dataframe back to the original dataframe?

I'm a Pandas newbie, so please bear with me.

Overview: I started with a free-form text file created by a data harvesting script that remotely accessed dozens of different kinds of devices, and multiple instances of each. I used OpenRefine ( a truly wonderful tool ) to munge that into a CSV that was then input to dataframe df using Pandas in a JupyterLab notebook.

My first inspection of the data showed the 'Timestamp' column was not monotonic. I accessed individual data sources as follows, in this case for the 'T-meter' data source. ( The technique was taken from a search result - I don't really understand it, but it worked. )

cond = df['Source']=='T-meter'
rows = df.loc[cond, :]
df_tmeter = pd.DataFrame(columns=df.columns)
df_tmeter = df_tmeter.append(rows, ignore_index=True)

then checked each as follows:

df_tmeter['Timestamp'].is_monotonic

Fortunately, the problem was easy to identify and fix: Some sensors were resetting, then sending bad (but still monotonic) timestamps until their clocks were updated. I wrote the function healing() to cleanly patch such errors, and it worked a treat:

df_tmeter['healed'] = df_tmeter['Timestamp'].apply(healing)

Now for my questions:

  1. How do I get the 'healed' values back into the original df['Timestamp'] column for only the 'T-meter' items in df['Source'] ?

  2. Given the function healing() , is there a clean way to do this directly on df ?

Thanks!

Edit: I first thought I should be using 'views' into df , but other operations on the data would either generate errors, or silently turn the views into copies.

I wrote a wrapper function heal_row() for healing() :

def heal_row( row ):
    if row['Source'] == 'T-meter':   # Redundant check, but safe!
        row['Timestamp'] = healing(row['Timestamp'])
    return row

then did the following:

df = df.apply(lambda row: row if row['Source'] != 'T-meter' else heal_row(row), axis=1)

This ordering is important, since healing() is stateful based on the prior row(s), and thus can't be the default operation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM