Pandas DataFrame，如何基於多行計算新的列元素

Question

我目前正在嘗試根據不同行的內容對特定行進行統計測試。 在下圖中給出了數據框：

DataFrame我想基於一個函數創建一個新列，該函數考慮了在“ Template”列中具有相同字符串的數據幀的所有列。

例如，在這種情況下，有兩行帶有模板“ [Are | Off]”，對於這些行中的每一行，我都需要根據“點擊次數”，“展示次數”和“轉化次數”在新列中創建一個元素兩行。

您如何最好地解決這個問題？

PS：對於您描述問題的方式，我事先表示歉意，您可能已經注意到我不是專業密碼：D但是，我非常感謝您的幫助！

這是我在excel中解決此問題的公式：

Excel Chi方檢驗

Answer 1

這可能過於籠統，但如果根據模板名稱應做不同的事情，我將使用某種功能映射：

import pandas as pd
import numpy as np
import collections

n = 5
template_column = list(['are|off', 'are|off', 'comp', 'comp', 'comp|city'])
n = len(template_column)
df = pd.DataFrame(np.random.random((n, 3)), index=range(n), columns=['Clicks', 'Impressions', 'Conversions'])
df['template'] = template_column

# Use a defaultdict so that you can define a default value if a template is
# note defined
function_map = collections.defaultdict(lambda: lambda df: np.nan)

# Now define functions to compute what the new columns should do depending on
# the template.
function_map.update({
    'are|off': lambda df: df.sum().sum(),
    'comp': lambda df: df.mean().mean(),
    'something else': lambda df: df.mean().max()
})

# The lambda functions are just placeholders.  You could do whatever you want in these functions... for example:

def do_special_stuff(df):
    """Do something that uses rows and columns... 
    you could also do looping or whatever you want as long 
    as the result is a scalar, or a sequence with the same 
    number of columns as the original template DataFrame
    """
    crazy_stuff = np.prod(np.sum(df.values,axis=1)[:,None] + 2*df.values, axis=1)
    return crazy_stuff

function_map['comp'] = do_special_stuff

def wrap(f):
    """Wrap a function so that it returns an updated dataframe"""

    def wrapped(df):
        df = df.copy()
        new_column_data = f(df.drop('template', axis=1))
        df['new_column'] = new_column_data
        return df

    return wrapped

# wrap all the functions so that each template has a function defined that does
# the correct thing
series_function_map = {k: wrap(function_map[k]) for k in df['template'].unique()}

# throw everything back together
new_df = pd.concat([series_function_map[label](group)
                    for label, group in df.groupby('template')],
                   ignore_index=True)

# print your shiny new dataframe
print(new_df)

結果就是：

     Clicks  Impressions  Conversions   template  new_column
0  0.959765     0.111648     0.769329    are|off    4.030594
1  0.809917     0.696348     0.683587    are|off    4.030594
2  0.265642     0.656780     0.182373       comp    0.502015
3  0.753788     0.175305     0.978205       comp    0.502015
4  0.269434     0.966951     0.478056  comp|city         NaN

希望能幫助到你！

Answer 2

好的，所以在groupby之后，您需要應用此公式..so您也可以在熊貓中執行此操作...

import numpy as np
t = df.groupby("Template") # this is for groupby
def calculater(b5,b6,c5,c6):
    return b5/(b5+b6)*((c5+c6))
t['result'] = np.vectorize(calculater)(df["b5"],df["b6"],df["c5"],df["c6"])

這里b5，b6 ..是圖像中顯示的單元格的列名

這應該為您工作，或者可能需要在數學上做一些小的更改

Pandas DataFrame，如何基於多行計算新的列元素

問題描述

2 個解決方案

解決方案1
3 已采納 2016-03-11 12:02:33

解決方案2
2 2016-03-11 10:59:09

Pandas DataFrame，如何基於多行計算新的列元素

問題描述

2 個解決方案

解決方案1 3 已采納 2016-03-11 12:02:33

解決方案2 2 2016-03-11 10:59:09

解決方案1
3 已采納 2016-03-11 12:02:33

解決方案2
2 2016-03-11 10:59:09