三列交叉表

Question

I have a dataframe similar to that seen below that extends for about 20,000 rows我有一个 dataframe 类似于下面看到的延伸约 20,000 行

Colors can be Blue, Yellow, Green, Red Colors可以是蓝黄绿红

Values can be FN, FP, TP, blank值可以是 FN、FP、TP、空白

df = pd.DataFrame({'Color': ['Blue', 'Yellow', 'Green','Red','Yellow','Green'],
                   'BIG': ['FN', ' ', 'FP', ' ', ' ', 'FN'],
                   'MED': ['FP', ' ', 'FN', ' ', 'TP', ' '],
                   'SM' : [' ', 'TP', ' ', ' ', ' ', 'FP']}

What I would like is a count for each combo.我想要的是每个组合的计数。

Example: Blue/BIG/TP = 105 counts示例：蓝色/BIG/TP = 105 个计数

| Color |BIG_TP|BIG_FN|BIG_FP|MED_TP|MED_FN|MED_FP|SM_TP|SM_FN|SM_FP|   
|:-----:|:----:|:----:|:----:|:----:|:----:|:----:|:---:|:---:|:---:|   
|Blue   | 105  |   35 |  42  | 199  |   75 |  49  | 115 | 135 |  13 |
|Yellow |  85  |    5 |  23  |  05  |  111 |  68  |  99 |  42 |  42 |
|Green  | 365  |   66 |  74  |  35  |    2 |  31  | 207 | 190 |  61 |
|Red    | 245  |    3 |  8   |  25  |    7 |  49  |   7 |  55 |  69 |

What i've tried:我试过的：

color_summary = pd.crosstab(index=[df['Color']], columns= [df['BIG'], df['MED'], df['SM']], values=[df[df['BIG']], df[df['MED']], df[df['SM']]], aggfunc=sum)

This was not very close to what I was looking for.这与我正在寻找的东西不是很接近。 I did manage to get the solution in a totally round-about, nasty way with lots of repetition.我确实设法以一种完全迂回、讨厌的方式通过大量重复得到了解决方案。 Looking for a much much more concise solution using crosstabs perhaps.也许正在寻找使用交叉表的更简洁的解决方案。

test_1 = df['BIG']=='TP'
test_2 = df['BIG']=='FN'
test_3 = df['BIG']=='FP'

sev_tp = pd.crosstab(df['Language'], [df.loc[test_1, 'BIG']])
sev_fn = pd.crosstab(df['Language'], [df.loc[test_2, 'BIG']])
sev_fp = pd.crosstab(df['Language'], [df.loc[test_3, 'BIG']])

big_tp_df = pd.DataFrame(big_tp.to_records())
big_fn_df = pd.DataFrame(big_fn.to_records())
big_fp_df = pd.DataFrame(big_fp.to_records())

Big_TP = pd.Series(big_tp_df.True_Positive.values,index=big_tp_df.Color).to_dict()
Big_FN = pd.Series(big_fn_df.False_Negative.values,index=big_fn_df.Color).to_dict()
Big_FP = pd.Series(big_fp_df.False_Positive.values,index=big_fp_df.Color).to_dict()

a = pd.Series(Big_TP, name='BIG_TP')
b = pd.Series(Big_FN, name='BIG_FN')
c = pd.Series(Big_FP, name='BIG_FP')

a.index.name = 'Color'
b.index.name = 'Color'
c.index.name = 'Color'

a.reset_index()
b.reset_index()
c.reset_index()

color_summary = pd.DataFrame(columns=['Color'])

color_summary['Color'] = big_tp_df['Color']

color_summary = pd.merge(color_summary_summary, a, on='Color')
color_summary = pd.merge(color_summary_summary, b, on='Color')
color_summary = pd.merge(color_summary_summary, c, on='Color')

color_summary.head()

Answer 1

Try this.尝试这个。 I have run the code for the sample you shared using df.unstack and pd.crosstab我已经运行了您使用df.unstack和pd.crosstab共享的示例的代码

df = pd.DataFrame({'Color': ['Blue', 'Yellow', 'Green','Red','Yellow','Green'],
                   'BIG': ['FN', ' ', 'FP', ' ', ' ', 'FN'],
                   'MED': ['FP', ' ', 'FN', ' ', 'TP', ' '],
                   'SM' : [' ', 'TP', ' ', ' ', ' ', 'FP']} )

#Unstack the dataframe to get 3 columns
ddf = pd.DataFrame(df.set_index('Color').unstack()).reset_index().set_axis(['size','color','f'], axis=1)

#Create crosstab with multiindex columns
ct = pd.crosstab(ddf['color'], [ddf['size'], ddf['f']])

#Concat the multiindexes to a single column
ct.columns = ct.columns.map('_'.join)

#Drop the columns of the type (color, ' ') and only keep (color, 'FN') or (color, 'TP') etc.
out = ct.reset_index().drop(ddf['size'].unique()+'_ ', axis=1)
print(out)

    color  BIG_FN  BIG_FP  MED_FN  MED_FP  MED_TP  SM_FP  SM_TP
0    Blue       1       0       0       1       0      0      0
1   Green       1       1       1       0       0      1      0
2     Red       0       0       0       0       0      0      0
3  Yellow       0       0       0       0       1      0      1

三列交叉表

问题描述

1 个解决方案

解决方案1
0 2020-08-08 01:04:55

三列交叉表

问题描述

1 个解决方案

解决方案1 0 2020-08-08 01:04:55

解决方案1
0 2020-08-08 01:04:55