[英]Summing a column based on a condition in another column in a pandas data frame
I just started using Python and I am trying to create programs to help monitor some of my investments.我刚开始使用 Python,我正在尝试创建程序来帮助监控我的一些投资。 Right now I have a definition set up that will give me my current returns based on my initial buy price and the current price.
现在我有一个定义,它会根据我的初始购买价格和当前价格给出我当前的回报。 Here is what my data frame looks like:
这是我的数据框的样子:
Ticker Expiration Contracts Call Buy Prem 12/22 Prem 12/23 Prem 12/25
0 x date 1 1 $ 0.13 0.15 0.12 0.13
1 y date 2 1 $ 0.33 0.34 0.34 0.39
2 z date 3 1 $ 0.25 NaN NaN 0.25
I have the current definition written for returns:我有为退货编写的当前定义:
def returns(op):
"""
Calculates the current return for each options
"""
totalPrem=op.sum(axis=0,skipna=True)["Prem 12/22":]
buy=op.sum(axis=0,skipna=True)["Buy"]
return (totalPrem-buy)*100
This gives me the results by adding all the columns from Prem 12/22 onward and subtracting it from the sum of the Buy column.这通过添加从 Prem 12/22 开始的所有列并从 Buy 列的总和中减去它来给出结果。 My problem is that on 12/22 and 12/23, z was not yet bought.
我的问题是在 12/22 和 12/23,z 还没有买。 However, the returns definition sums all of Buy.
但是,退货定义对所有 Buy 求和。 So the returns for 12/22 and 12/23 adds the two data points in 12/22 and 12/23 and subtracts them from the 3 data points in Buy.
因此,12/22 和 12/23 的回报将 12/22 和 12/23 中的两个数据点相加,并从买入的 3 个数据点中减去它们。 This leads to the result:
这导致了结果:
Prem 12/22: -22
Prem 12/23: -25
Prem 12/25: 6
I want to alter my code to where for 12/22 and 12/23, the buy column only adds the first two.我想将我的代码更改为 12/22 和 12/23,购买列只添加前两个。 I was wondering if there was a way to where buy could be calculated by summing the buy column in a way where the data points are only added together if there is no NaN on the row of the data point.
我想知道是否有一种方法可以通过以仅在数据点行上没有 NaN 时才将数据点加在一起的方式对购买列求和来计算购买。 The output I am looking for is:
我要找的output是:
Prem 12/22: 3
Prem 12/23: 0
Prem 12/25: 6
Thanks!谢谢!
You can use list comprehension to filter for notnull()
rows by column and do the calculation per column.您可以使用列表推导式按列过滤
notnull()
行并按列进行计算。 To only apply to the columns with Prem
in them, I create a cols
index object so we can dynamically apply changes to those indexed columns:为了仅应用于其中包含
Prem
的列,我创建了一个cols
索引 object 以便我们可以动态地将更改应用于这些索引列:
cols = df.columns[df.columns.str.contains('Prem')]
res = [int(round((df.loc[df[col].notnull(), col].sum() -
df.loc[df[col].notnull(), 'Buy'].sum()), 3)
* 100) for col in cols]
for c,r in zip(cols, res):
print(f'{c}: {r}')
Prem 12/22: 3
Prem 12/23: 0
Prem 12/25: 6
Your purpose is to calculate return of Buy
column.您的目的是计算
Buy
列的回报。 Suppose your data frame that contain Buy
column is called df
then you can simply call like the following:假设包含
Buy
列的数据框称为df
,那么您可以像下面这样简单地调用:
def returnBuy(df):
tsBuy=df["Buy"]
return tsBuy.pct_change()
May be this: you could create a boolean column: if NA/NAN After that take the rows without missing data and sum after that:可能是这样的:您可以创建一个 boolean 列:如果 NA/NAN 之后获取没有丢失数据的行,然后求和:
data = {'col_1': [1, 2, 3, 4], 'col_2': [5, 7, pd.NA, 8]}
df = pd.DataFrame.from_dict(data)
df['should_add'] = pd.isna(df["col_2"])
print(df)
sums= df[~df['should_add']].sum(axis=0)
print(sums)
or one line:或一行:
sums= df[~pd.isna(df["col_2"])].sum(axis=0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.