Python，Pandas Dataframe - 创建带有条件语句的派生列

Question

I have a Pandas DataFrame with 50 columns and 50k rows.我有一个有 50 列和 50k 行的 Pandas DataFrame。 There is one column with measurement data that needs correcting with a calibration factor.有一列包含需要使用校准因子校正的测量数据。 The factor is an integer value to be added or substracted.因子是要相加或相减的整数值。 There are multiple (10ish) measurements in the same column of measurement data ['T_calibrated'], they all have an unique serial number in a seperate column ['serial']在同一列测量数据 ['T_calibrated'] 中有多个（10ish）测量，它们在单独的列 ['serial'] 中都有唯一的序列号

I can calibrate a single sensor as follows using .where:我可以使用 .where 校准单个传感器，如下所示：

data['T_calibrated'] = data['T_uncalibrated'].where(data['serial'] == 12345)-2.7

12345 is the unique serial number -2.7 is the calibration factor. 12345 是唯一的序列号 -2.7 是校准系数。

How would I write this in a more generic form so that I can add the unique calibration factor associated with each serial number and add this all as a single combined column ['T_calibrated'].我将如何以更通用的形式编写它，以便我可以添加与每个序列号相关的唯一校准因子，并将其添加为单个组合列 ['T_calibrated']。 So far I'm getting stuck with brute force ways.到目前为止，我一直在使用蛮力方法。 I'm sure there must be some very elegant way to do this.我相信一定有一些非常优雅的方式来做到这一点。

I have a second dataframe with the serial number and calibration factor that can be looped or compared with ofcourse.我有一个带有序列号和校准因子的第二个数据帧，可以循环或与当然进行比较。

Answer 1

Close after posting my question I saw the light.发布我的问题后关闭，我看到了曙光。

I joined the two dataframes on the serial numbers preserving the original index of the original (because I want that).我在保留原始索引的序列号上加入了两个数据帧（因为我想要那个）。 Then I created another column just subtracting the two values.然后我创建了另一列，只是减去这两个值。 I didn't know how to add "inplace=True" with the join statement.我不知道如何在 join 语句中添加“inplace=True”。

Here's my code:这是我的代码：

calibrated_data=data.join(calibration_dataframe.set_index('serial'),on='serial')
calibrated_data['T_calibrated'] = calibrated_data.T_uncalibrated - calibrated_data.calibration_factor

Answer 2

You describe two data frames structured as below.您描述了两个结构如下的数据框。 Simplest approach is to merge them then calculate required column from merged data frame.最简单的方法是合并它们，然后从合并的数据框中计算所需的列。

import numpy as np

serial = [f"{a}{ord(a)}" for a in list("abcdef")]

df = pd.DataFrame({"serial":np.random.choice(serial, 50), "T_uncalibrated":np.random.randint(20,30,50)})
dfs = pd.DataFrame({"serial":serial, "calibration":np.random.randint(-2,2,len(serial))})

df.merge(dfs, on="serial").assign(T_calibrated=lambda d: d["T_uncalibrated"]+d["calibration"])

sample output样本输出

serial连续剧	T_uncalibrated T_未校准	calibration校准	T_calibrated T_校准
c99 c99	20 20	-2 -2	18 18
c99 c99	27 27	-2 -2	25 25
c99 c99	28 28	-2 -2	26 26
c99 c99	28 28	-2 -2	26 26
c99 c99	20 20	-2 -2	18 18
c99 c99	22 22	-2 -2	20 20
c99 c99	24 24	-2 -2	22 22
c99 c99	24 24	-2 -2	22 22
d100 d100	21 21	-1 -1	20 20
d100 d100	26 26	-1 -1	25 25

Python，Pandas Dataframe - 创建带有条件语句的派生列

问题描述

2 个解决方案

解决方案1
0 2021-06-30 08:44:29

解决方案2
0 2021-06-30 08:48:29

sample output样本输出

Python，Pandas Dataframe - 创建带有条件语句的派生列

问题描述

2 个解决方案

解决方案1 0 2021-06-30 08:44:29

解决方案2 0 2021-06-30 08:48:29

sample output样本输出

解决方案1
0 2021-06-30 08:44:29

解决方案2
0 2021-06-30 08:48:29