简体   繁体   English

添加一个基于另一列计算的列(在 Pandas 中)

[英]Add a column (in Pandas) that is calculated based on another column

I have a simple database that has every month's earnings, with Year (values 1991-2020), Month (Jan-Dec) and Earnings .我有一个简单的数据库,其中包含每个月的收入,包括Year (值 1991-2020)、 Month (Jan-Dec)和Earnings I want to make a new column, where for years 1991-2005 I divide the Earnings column by 10000 but for 2006-2020 I want it to be the same as in the earnings column.我想创建一个新专栏,在 1991-2005 年,我将Earnings栏除以 10000,但对于 2006-2020 年,我希望它与收益栏中的相同。

I am a beginner, but what I was thinking is that I want the new column ( TrueEarn ) to be Earnings /10000 but only for columns 1991-2005.我是初学者,但我想的是我希望新列 ( TrueEarn ) 为Earnings /10000 但仅限于 1991-2005 列。

df['TrueEarn'] = df['Earnings']/10000 for (['Year']=('1991':"2005"))

Since I am a newb with Python, this may not make sense for you, but that is how I logically wanted to write it因为我是 Python 的新手,这对你来说可能没有意义,但这就是我逻辑上想要写的

Can you help me, please?你能帮我吗?

Yoy should provide a minimum reproducible example. Yoy 应该提供一个最小的可重现示例。 But assuming that you have the year in another column, the way to go could be但是假设你在另一列中有年份,那么到 go 的方法可能是

df['TrueEarn'] = np.where((df['YEAR'] >= 1991) & (df['YEAR'] <= 2005),
                               df['Earnings'] / 10000, df['Earnings'])

As @wjandrea says, this can be done directly with pandas, but numpy is faster.正如@wjandrea 所说,这可以直接使用 pandas 完成,但 numpy 更快。 Benchmark with a toy dataframe:基准玩具 dataframe:

df = pd.DataFrame(
    {"YEAR": np.random.randint(1991, 2020, size=50000), "Earnings": np.random.uniform(0, 2e10, size=50000)}
)

   
%timeit df["TrueEarn"] = np.where((df["YEAR"] >= 1991) & (df["YEAR"] <= 2005), df["Earnings"] / 10000, df["Earnings"])

695 µs ± 3.17 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)每个循环 695 µs ± 3.17 µs(7 次运行的平均值 ± 标准偏差,每次 1,000 次循环)

VS with pandas mask VS配pandas口罩

%timeit df["TrueEarn"] = df["Earnings"].mask(df["YEAR"].between(1991, 2005), df["Earnings"] / 10000)

959 µs ± 4.45 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)每个循环 959 µs ± 4.45 µs(7 次运行的平均值 ± 标准偏差,每次 1,000 次循环)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM