简体   繁体   English

如何在保持行顺序的同时从 Pandas DataFrame 中采样

[英]How to sample from Pandas DataFrame while keeping row order

Given any DataFrame 2-dimensional, you can call eg.给定任何二维数据帧,您可以调用例如。 df.sample(frac=0.3) to retrieve a sample. df.sample(frac=0.3)检索样本。 But this sample will have completely shuffled row order.但是此示例将完全打乱行顺序。

Is there a simple way to get a subsample that preserves the row order ?有没有一种简单的方法来获取保留行顺序的子样本?

What we can do instead is use df.sample() , and then sort the resultant index by the original row order.我们可以做的是使用df.sample() ,然后按原始行顺序对结果索引进行排序。 Appending the sort_index() call does the trick.附加sort_index()调用可以解决问题。 Here's my code:这是我的代码:

df = pd.DataFrame(np.random.randn(100, 10))
result = df.sample(frac=0.3).sort_index()

You can even get it in ascending order.您甚至可以按升序获取它。 Documentation here .文档在这里

The way the question is phrased, it sounds like the accepted answer does not provide a valid solution.问题的措辞方式,听起来接受的答案没有提供有效的解决方案。 I'm not sure what the OP really wanted;我不确定 OP 真正想要什么; however, if we don't assume the original index is already sorted, we can't rely on sort_index() to reorder the rows according to their original order.但是,如果我们不假设原始索引已经排序,我们就不能依靠sort_index()根据原始顺序对行重新排序。

Assuming we have a DataFrame with an arbitrary index假设我们有一个带有任意索引的 DataFrame

df = pd.DataFrame(np.random.randn(100, 10), np.random.rand(100))

We can reset the index first to get a RangeIndex, sample, reorder, and reinstate the original index我们可以先重置索引,得到一个RangeIndex,采样,重新排序,恢复原索引

df_sample = df.reset_index().sample(frac=0.3).sort_index().set_index("index")

And this guarantees we maintain the original order, whatever it was, whatever the index.这保证了我们保持原始顺序,无论它是什么,无论索引如何。

Finally, in case there's already a column named "index", we'll need to do something slightly different such as rename the index first, or keep it in a separate variable while we sample.最后,如果已经有一个名为“index”的列,我们需要做一些稍微不同的事情,比如先重命名索引,或者在我们采样时将它保存在一个单独的变量中。 But the principle remains the same.但原理是一样的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM