[英]Counting/Pivot of Table With Multiple Values In Cell
I've got some data that looks like this:我有一些数据看起来像这样:
Class Instructor
Intro to Philosophy Jake
Algorithms Ashley/Jake
Spanish I Ashley
Vector Calculus Jake
Intro to Philosophy Jake
How can I get to a count or pivot that looks like below where the instance that Ashley and Jake both teach a class is properly added to the counts?我怎样才能得到如下所示的计数或 pivot,其中 Ashley 和 Jake 都教 class 的实例已正确添加到计数中? The instance of one instructor is trivial, but two or more for a single class in the same cell is tripping me up.一个讲师的实例是微不足道的,但是同一个单元格中的单个 class 的两个或更多个实例让我感到困惑。
I'd like to get to something like this:我想做这样的事情:
Jake Ashley
Intro to Philosophy 2 0
Algorithms 1 1
Spanish I 0 1
Vector Calculus 1 0
Total 4 2
You can use .str.get_dummies
to split and binarize the Instructor
field.您可以使用.str.get_dummies
对Instructor
字段进行拆分和二值化。 Then you can groupby Class
:然后你可以 groupby Class
:
ret = (df['Instructor'].str.get_dummies('/')
.groupby(df['Class']).sum()
)
ret.loc['Total'] = ret.sum()
Output: Output:
Ashley Jake
Class
Algorithms 1 1
Intro to Philosophy 0 2
Spanish I 1 0
Vector Calculus 0 1
Total 2 4
You can do this:你可以这样做:
In [1746]: df.Instructor = df.Instructor.str.split('/')
In [1747]: df = df.explode('Instructor')
In [1751]: x = df.groupby('Instructor').Class.value_counts().reset_index(level=0).pivot(columns='Instructor', values='Class').fillna(0)
In [1754]: x.loc['Total'] = x.sum()
In [1755]: x
Out[1755]:
Instructor Ashley Jake
Class
Algorithms 1.0 1.0
Intro_to_Philosophy 0.0 2.0
Spanish_I 1.0 0.0
Vector_Calculus 0.0 1.0
Total 2.0 4.0
Let us do crosstab
after explode
让我们在explode
后做crosstab
df.Instructor = df.Instructor.str.split('/')
df = df.explode('Instructor')
out = pd.crosstab(df['Class'], df['Instructor'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.