简体   繁体   English

根据 pandas 数据框中另一列中的条件对一列求和

[英]Summing a column based on a condition in another column in a pandas data frame

I just started using Python and I am trying to create programs to help monitor some of my investments.我刚开始使用 Python,我正在尝试创建程序来帮助监控我的一些投资。 Right now I have a definition set up that will give me my current returns based on my initial buy price and the current price.现在我有一个定义,它会根据我的初始购买价格和当前价格给出我当前的回报。 Here is what my data frame looks like:这是我的数据框的样子:

    Ticker   Expiration    Contracts   Call   Buy   Prem 12/22  Prem 12/23  Prem 12/25
0   x          date 1        1         $      0.13   0.15         0.12       0.13
1   y          date 2        1         $      0.33   0.34         0.34       0.39
2   z          date 3        1         $      0.25   NaN          NaN        0.25

I have the current definition written for returns:我有为退货编写的当前定义:

def returns(op):
    """
    Calculates the current return for each options
    """
    totalPrem=op.sum(axis=0,skipna=True)["Prem 12/22":]
    buy=op.sum(axis=0,skipna=True)["Buy"]
    return (totalPrem-buy)*100

This gives me the results by adding all the columns from Prem 12/22 onward and subtracting it from the sum of the Buy column.这通过添加从 Prem 12/22 开始的所有列并从 Buy 列的总和中减去它来给出结果。 My problem is that on 12/22 and 12/23, z was not yet bought.我的问题是在 12/22 和 12/23,z 还没有买。 However, the returns definition sums all of Buy.但是,退货定义对所有 Buy 求和。 So the returns for 12/22 and 12/23 adds the two data points in 12/22 and 12/23 and subtracts them from the 3 data points in Buy.因此,12/22 和 12/23 的回报将 12/22 和 12/23 中的两个数据点相加,并从买入的 3 个数据点中减去它们。 This leads to the result:这导致了结果:

Prem 12/22: -22
Prem 12/23: -25
Prem 12/25: 6

I want to alter my code to where for 12/22 and 12/23, the buy column only adds the first two.我想将我的代码更改为 12/22 和 12/23,购买列只添加前两个。 I was wondering if there was a way to where buy could be calculated by summing the buy column in a way where the data points are only added together if there is no NaN on the row of the data point.我想知道是否有一种方法可以通过以仅在数据点行上没有 NaN 时才将数据点加在一起的方式对购买列求和来计算购买。 The output I am looking for is:我要找的output是:

Prem 12/22: 3
Prem 12/23: 0
Prem 12/25: 6

Thanks!谢谢!

You can use list comprehension to filter for notnull() rows by column and do the calculation per column.您可以使用列表推导式按列过滤notnull()行并按列进行计算。 To only apply to the columns with Prem in them, I create a cols index object so we can dynamically apply changes to those indexed columns:为了仅应用于其中包含Prem的列,我创建了一个cols索引 object 以便我们可以动态地将更改应用于这些索引列:

cols = df.columns[df.columns.str.contains('Prem')]
res = [int(round((df.loc[df[col].notnull(), col].sum() -
                  df.loc[df[col].notnull(), 'Buy'].sum()), 3) 
                  * 100)  for col in cols]
for c,r in zip(cols, res):
    print(f'{c}: {r}')

Prem 12/22: 3
Prem 12/23: 0
Prem 12/25: 6

Your purpose is to calculate return of Buy column.您的目的是计算Buy列的回报。 Suppose your data frame that contain Buy column is called df then you can simply call like the following:假设包含Buy列的数据框称为df ,那么您可以像下面这样简单地调用:

def returnBuy(df):
    tsBuy=df["Buy"]
    return tsBuy.pct_change()

May be this: you could create a boolean column: if NA/NAN After that take the rows without missing data and sum after that:可能是这样的:您可以创建一个 boolean 列:如果 NA/NAN 之后获取没有丢失数据的行,然后求和:

data = {'col_1': [1, 2, 3, 4], 'col_2': [5, 7, pd.NA, 8]}
df = pd.DataFrame.from_dict(data)
df['should_add'] = pd.isna(df["col_2"])
print(df)
sums= df[~df['should_add']].sum(axis=0)
print(sums)

or one line:或一行:

sums= df[~pd.isna(df["col_2"])].sum(axis=0)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从具有基于另一列的条件的 pandas 数据帧中删除重复项 - Removing duplicates from pandas data frame with condition based on another column 根据为pandas中另一个数据框中的列提供的条件对数据框的列执行操作 - perform operation on column of data frame based on condition given to column in another data frame in pandas 基于另一列对数据框进行分组/汇总和求和 - Grouping/Summarising and summing a data-frame based on another column 在 Pandas 数据框中快速搜索并根据条件在数据框的另一列中插入值 - Fast search in pandas data frame and inserting values in another column of the data frame based on a condition 如何使用 pandas 根据同一数据帧中另一列的条件获取列值的连续平均值 - How to get consecutive averages of the column values based on the condition from another column in the same data frame using pandas Pandas 根据同一数据框中另一列的条件替换列值 - Pandas Replace column values based on condition upon another column in the same data frame 根据另一列的值向python pandas数据框添加一列 - Adding a column to a python pandas data frame based on the value of another column 根据另一个pandas数据框中的列排序对列进行排序 - Sort a column based on the sorting of a column from another pandas data frame 根据条件替换熊猫数据框列中的值 - Replace value in a pandas data frame column based on a condition Pandas 数据框根据条件替换列中的值 - Pandas data frame replace values in column based on condition
相关标签
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM