Calculate t-test statistic for each group in pandas DataFrame

Question

Given a pandas DataFrame with columns for group , x , and y (multiple records per group value), I'd like to create a new DataFrame with one row per group and the associated t-statistic for x and y values in that group. I'd like to do this with groupby , not a loop.

Example:

import pandas as pd
import numpy as np
from scipy import stats

N = 100  # Observations per group.
tt_df = pd.DataFrame({'group': np.append(['A'] * N, ['B'] * N),
                      'x': np.random.randn(2 * N)})
tt_df['y'] = tt_df['x'] + np.random.randn(2 * N)
stats.ttest_ind(tt_df['x'], tt_df['y'])[0]  # -0.32 global t statistic.

Answer 1

tt_df.groupby('group').apply(lambda df: stats.ttest_ind(df['x'], df['y'])[0])
# group
# A   -0.292413
# B   -0.167816
# dtype: float64

Calculate t-test statistic for each group in pandas DataFrame

Question

1 answers

solution1
2 2018-01-25 02:04:13

Calculate t-test statistic for each group in pandas DataFrame

Question

1 answers

solution1 2 2018-01-25 02:04:13

solution1
2 2018-01-25 02:04:13