I am working with a dataframe to carry out a test in Python.
Group Count
B 21
B 13
A 25
A 75
A 11
B 15
As long as it is just for one section or category as such, the test is fine:
import pandas as pd
import scipy.stats as stats
valuespergroup = [col for col_name, col in df.groupby('Group')['Count']]
stats.ranksums(*valuespergroup)
Now, consider the following:
Category Group Count
S1 P 21
S1 P 13
S1 A 25
S1 A 75
S1 A 10
S1 P 10
S2 P 21
S2 P 14
S2 A 29
S2 A 95
S2 A 15
S2 P 18
I need to process by category, meaning passing the data for S1 first, then S2, etc.. I tried putting category in the groupby, but it does not work. The function takes two arguments only.
Updates: I tried the following codes, but it will print the entire data for each category and I don't think it is passed correctly to the test either. It's along the lines I want to do. The final output should be: S1 test results S2 test results
groupby_Category = df.groupby('Category')
for Category in groupby_Category:
values_per_group = [col for col_name, col in df3.groupby(['Group'])['Count']]
print(Category, stats.ranksums(*values_per_group))
Seems like you need groupby
'Group','Category'
for x , y in df.groupby(['Group','Category'])['Count']:
print(x,y)
('A', 'S1') 2 25
3 75
4 10
Name: Count, dtype: int64
('A', 'S2') 8 29
9 95
10 15
Name: Count, dtype: int64
('P', 'S1') 0 21
1 13
5 10
Name: Count, dtype: int64
('P', 'S2') 6 21
7 14
11 18
Name: Count, dtype: int64
Your working attempt should work. However, you are using an unknown df3 . Simply replace this with actual dataframe from iteration of your groupby object, sub_df . In fact, extend your loop to build a dataframe of results from a list of dictionaries.
groupby_Category = df.groupby('Category')
data_list = []
for i, sub_df in groupby_Category:
values_per_group = [col for col_name, col in sub_df.groupby(['Group'])['Count']]
res = stats.ranksums(*values_per_group)
print(i, res)
# S1 RanksumsResult(statistic=0.8728715609439696, pvalue=0.38273308888522595)
# S2 RanksumsResult(statistic=1.091089451179962, pvalue=0.27523352407483426)
data_list.append({'Category': i, 'statistic': res[0], 'p_value': res[1]})
ranksums_df = pd.DataFrame(data_list)
print(ranksums_df)
# Category p_value statistic
# 0 S1 0.382733 0.872872
# 1 S2 0.275234 1.091089
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.