I have a python dataframe with columns, 'Expected' vs 'Actual' that shows a product (A,B,C or D) for each record
ID | Expected | Actual |
---|---|---|
1 | A | B |
2 | A | A |
3 | C | B |
4 | B | D |
5 | C | D |
6 | A | A |
7 | B | B |
8 | A | D |
I want to get a count from both columns for each unique value found in both columns (both columns dont share all the same products). So the result should look like this,
Value | Expected | Actual |
---|---|---|
A | 4 | 2 |
B | 2 | 3 |
C | 2 | 0 |
D | 0 | 3 |
Thank you for all your help
I would do it following way
import pandas as pd
df = pd.DataFrame({'Expected':['A','A','C','B','C','A','B','A'],'Actual':['B','A','B','D','D','A','B','D']})
ecnt = df['Expected'].value_counts()
acnt = df['Actual'].value_counts()
known = sorted(set(df['Expected']).union(df['Actual']))
cntdf = pd.DataFrame({'Value':known,'Expected':[ecnt.get(k,0) for k in known],'Actual':[acnt.get(k,0) for k in known]})
print(cntdf)
output
Value Expected Actual
0 A 4 2
1 B 2 3
2 C 2 0
3 D 0 3
Explanation: main idea here is having separate value counts for Expected
column and Actual
column. If you wish to rather have Value as Index of your pandas.DataFrame
you can do
...
cntdf = pd.DataFrame([acnt,ecnt]).T.fillna(0)
print(cntdf)
output
Actual Expected
D 3.0 0.0
B 3.0 2.0
A 2.0 4.0
C 0.0 2.0
You can use apply
and value_counts
df = pd.DataFrame({'Expected':['A','A','C','B','C','A','B','A'],'Actual':['B','A','B','D','D','A','B','D']})
df.apply(pd.Series.value_counts).fillna(0)
output:
Expected Actual
A 4.0 2.0
B 2.0 3.0
C 2.0 0.0
D 0.0 3.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.