根据另一列拆分pandas DataFrame列的最短方法

Question

Inspiration 灵感

In R, this is very easy 在R中，这非常容易

data("iris")
bartlett.test(Sepal.Length ~ Species,data = iris)

The important thing about the data set is that the column Sepal.Length is numerical, the species is categorical. 关于数据集的重要一点是，Sepal.Length列是数字，种类是分类的。

Problem 问题

In Python scipy.stats.bartlett would need separate arrays for each species, see docs . 在Python中， scipy.stats.bartlett对于每种物种都需要单独的数组，请参阅docs 。

What would be the easiest way to achieve this? 实现这一目标的最简单方法是什么？

An easy way to get the dataset in python: 在python中获取数据集的简单方法：

from sklearn import datasets
iris = datasets.load_iris()
iris = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                     columns= ["sepal.length","sepal.width","petal.length","petal.width"] + ['species'])

I really wanted this to work: 我真的希望它能工作：

iris.groupby("species")["sepal.length"].apply(ss.bartlett)

but it didn't due to it needing multiple sample vectors. 但这并不是因为它需要多个样本向量。

Answer 1

Following the groupby pattern you can do a bit of manipulation and do this: 按照groupby模式，您可以进行一些操作并执行以下操作：

gb = iris.groupby('species')["sepal.length"]
ss.bartlett(*[gb.get_group(x).values for x in gb.groups])

the * unpacks the list into the function, the rest is just to get the groups into the right form for the function to take. *将列表解压缩到函数中，剩下的只是将组以正确的形式放入函数中。 As mentioned in the comments, the .values isn't needed here so we can write it as: 如评论中所述，此处不需要.values ，因此我们可以将其编写为：

gb = iris.groupby('species')["sepal.length"]
ss.bartlett(*[gb.get_group(x) for x in gb.groups])

And just for completion, if you really want to do it in one line: 只是为了完成，如果您真的想一行完成：

ss.bartlett(*[x[1] for x in iris.groupby('species')["sepal.length"]])

But I personally find that less readable. 但我个人认为它的可读性较差。

根据另一列拆分pandas DataFrame列的最短方法

问题描述

1 个解决方案

解决方案1
4 已采纳 2018-10-16 17:36:53

根据另一列拆分pandas DataFrame列的最短方法

问题描述

1 个解决方案

解决方案1 4 已采纳 2018-10-16 17:36:53

解决方案1
4 已采纳 2018-10-16 17:36:53