将更正应用于 dataframe 的子采样副本回到原始 dataframe？

Question

I'm a Pandas newbie, so please bear with me.我是 Pandas 新手，所以请多多包涵。

Overview: I started with a free-form text file created by a data harvesting script that remotely accessed dozens of different kinds of devices, and multiple instances of each.概述：我从一个由数据收集脚本创建的自由格式文本文件开始，该脚本远程访问数十种不同类型的设备，以及每种设备的多个实例。 I used OpenRefine ( a truly wonderful tool ) to munge that into a CSV that was then input to dataframe df using Pandas in a JupyterLab notebook.我使用 OpenRefine（一个非常棒的工具）将其转换为 CSV，然后使用 JupyterLab 笔记本中的 Pandas 输入 dataframe df 。

My first inspection of the data showed the 'Timestamp' column was not monotonic.我对数据的第一次检查显示'Timestamp'列不是单调的。 I accessed individual data sources as follows, in this case for the 'T-meter' data source.我按如下方式访问了各个数据源，在本例中是'T-meter'数据源。 ( The technique was taken from a search result - I don't really understand it, but it worked. ) （该技术取自搜索结果 - 我不太了解，但它有效。 ）

cond = df['Source']=='T-meter'
rows = df.loc[cond, :]
df_tmeter = pd.DataFrame(columns=df.columns)
df_tmeter = df_tmeter.append(rows, ignore_index=True)

then checked each as follows:然后检查每个如下：

df_tmeter['Timestamp'].is_monotonic

Fortunately, the problem was easy to identify and fix: Some sensors were resetting, then sending bad (but still monotonic) timestamps until their clocks were updated.幸运的是，这个问题很容易识别和修复：一些传感器正在重置，然后发送错误（但仍然是单调的）时间戳，直到它们的时钟更新。 I wrote the function healing() to cleanly patch such errors, and it worked a treat:我写了 function tracking healing()来干净地修补这些错误，它起到了治疗作用：

df_tmeter['healed'] = df_tmeter['Timestamp'].apply(healing)

Now for my questions:现在我的问题：

How do I get the 'healed' values back into the original df['Timestamp'] column for only the 'T-meter' items in df['Source'] ?如何仅将df['Source']中的'T-meter'项的'healed'值返回到原始df['Timestamp']列？
Given the function healing() , is there a clean way to do this directly on df ?鉴于 function tracking healing() ，有没有一种干净的方法可以直接在df上执行此操作？

Thanks!谢谢！

Edit: I first thought I should be using 'views' into df , but other operations on the data would either generate errors, or silently turn the views into copies.编辑：我首先认为我应该在df中使用“视图”，但是对数据的其他操作要么会产生错误，要么会默默地将视图变成副本。

Answer 1

I wrote a wrapper function heal_row() for healing() :我写了一个包装 function heal_row()用于healing() ：

def heal_row( row ):
    if row['Source'] == 'T-meter':   # Redundant check, but safe!
        row['Timestamp'] = healing(row['Timestamp'])
    return row

then did the following:然后做了以下事情：

df = df.apply(lambda row: row if row['Source'] != 'T-meter' else heal_row(row), axis=1)

This ordering is important, since healing() is stateful based on the prior row(s), and thus can't be the default operation.这种排序很重要，因为healing()是基于先前行的有状态的，因此不能是默认操作。

将更正应用于 dataframe 的子采样副本回到原始 dataframe？

问题描述

1 个解决方案

解决方案1
0 2020-12-16 18:51:25

将更正应用于 dataframe 的子采样副本回到原始 dataframe？

问题描述

1 个解决方案

解决方案1 0 2020-12-16 18:51:25

解决方案1
0 2020-12-16 18:51:25