How to sample from Pandas DataFrame while keeping row order

Question

Given any DataFrame 2-dimensional, you can call eg. df.sample(frac=0.3) to retrieve a sample. But this sample will have completely shuffled row order.

Is there a simple way to get a subsample that preserves the row order ?

Answer 1

What we can do instead is use df.sample() , and then sort the resultant index by the original row order. Appending the sort_index() call does the trick. Here's my code:

df = pd.DataFrame(np.random.randn(100, 10))
result = df.sample(frac=0.3).sort_index()

You can even get it in ascending order. Documentation here .

Answer 2

The way the question is phrased, it sounds like the accepted answer does not provide a valid solution. I'm not sure what the OP really wanted; however, if we don't assume the original index is already sorted, we can't rely on sort_index() to reorder the rows according to their original order.

Assuming we have a DataFrame with an arbitrary index

df = pd.DataFrame(np.random.randn(100, 10), np.random.rand(100))

We can reset the index first to get a RangeIndex, sample, reorder, and reinstate the original index

df_sample = df.reset_index().sample(frac=0.3).sort_index().set_index("index")

And this guarantees we maintain the original order, whatever it was, whatever the index.

Finally, in case there's already a column named "index", we'll need to do something slightly different such as rename the index first, or keep it in a separate variable while we sample. But the principle remains the same.

How to sample from Pandas DataFrame while keeping row order

Question

2 answers

solution1
3 ACCPTED 2020-01-04 20:27:55

solution2
0 2020-12-21 12:32:40

How to sample from Pandas DataFrame while keeping row order

Question

2 answers

solution1 3 ACCPTED 2020-01-04 20:27:55

solution2 0 2020-12-21 12:32:40

solution1
3 ACCPTED 2020-01-04 20:27:55

solution2
0 2020-12-21 12:32:40