简体   繁体   English

Python,Pandas Dataframe - 创建带有条件语句的派生列

[英]Python, Pandas Dataframe - creating a derived column with conditional statements

I have a Pandas DataFrame with 50 columns and 50k rows.我有一个有 50 列和 50k 行的 Pandas DataFrame。 There is one column with measurement data that needs correcting with a calibration factor.有一列包含需要使用校准因子校正的测量数据。 The factor is an integer value to be added or substracted.因子是要相加或相减的整数值。 There are multiple (10ish) measurements in the same column of measurement data ['T_calibrated'], they all have an unique serial number in a seperate column ['serial']在同一列测量数据 ['T_calibrated'] 中有多个(10ish)测量,它们在单独的列 ['serial'] 中都有唯一的序列号

I can calibrate a single sensor as follows using .where:我可以使用 .where 校准单个传感器,如下所示:

data['T_calibrated'] = data['T_uncalibrated'].where(data['serial'] == 12345)-2.7

12345 is the unique serial number -2.7 is the calibration factor. 12345 是唯一的序列号 -2.7 是校准系数。

How would I write this in a more generic form so that I can add the unique calibration factor associated with each serial number and add this all as a single combined column ['T_calibrated'].我将如何以更通用的形式编写它,以便我可以添加与每个序列号相关的唯一校准因子,并将其添加为单个组合列 ['T_calibrated']。 So far I'm getting stuck with brute force ways.到目前为止,我一直在使用蛮力方法。 I'm sure there must be some very elegant way to do this.我相信一定有一些非常优雅的方式来做到这一点。

I have a second dataframe with the serial number and calibration factor that can be looped or compared with ofcourse.我有一个带有序列号和校准因子的第二个数据帧,可以循环或与当然进行比较。

Close after posting my question I saw the light.发布我的问题后关闭,我看到了曙光。

I joined the two dataframes on the serial numbers preserving the original index of the original (because I want that).我在保留原始索引的序列号上加入了两个数据帧(因为我想要那个)。 Then I created another column just subtracting the two values.然后我创建了另一列,只是减去这两个值。 I didn't know how to add "inplace=True" with the join statement.我不知道如何在 join 语句中添加“inplace=True”。

Here's my code:这是我的代码:

calibrated_data=data.join(calibration_dataframe.set_index('serial'),on='serial')
calibrated_data['T_calibrated'] = calibrated_data.T_uncalibrated - calibrated_data.calibration_factor

You describe two data frames structured as below.您描述了两个结构如下的数据框。 Simplest approach is to merge them then calculate required column from merged data frame.最简单的方法是合并它们,然后从合并的数据框中计算所需的列。

import numpy as np

serial = [f"{a}{ord(a)}" for a in list("abcdef")]

df = pd.DataFrame({"serial":np.random.choice(serial, 50), "T_uncalibrated":np.random.randint(20,30,50)})
dfs = pd.DataFrame({"serial":serial, "calibration":np.random.randint(-2,2,len(serial))})

df.merge(dfs, on="serial").assign(T_calibrated=lambda d: d["T_uncalibrated"]+d["calibration"])

sample output样本输出

serial连续剧 T_uncalibrated T_未校准 calibration校准 T_calibrated T_校准
c99 c99 20 20 -2 -2 18 18
c99 c99 27 27 -2 -2 25 25
c99 c99 28 28 -2 -2 26 26
c99 c99 28 28 -2 -2 26 26
c99 c99 20 20 -2 -2 18 18
c99 c99 22 22 -2 -2 20 20
c99 c99 24 24 -2 -2 22 22
c99 c99 24 24 -2 -2 22 22
d100 d100 21 21 -1 -1 20 20
d100 d100 26 26 -1 -1 25 25

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM