How to restrict DataFrame number of rows to the Xth unique value in certain column?

Question

Say for example we have the following DataFrame:

And we would know we wanted an x(say 3) number of unique values in column A. Then the desired output would be:

I thought about looping through the column in question, counting the number of unique values by tracking and taking the subset of the DataFrame with the right index. I am still a newbie to Python and I believe there would be a more efficient way to do this, please share your solutions. Appreciated!

Answer 1

You can try series.factorize which indexes the unique values starting at 0 and then select the values which is <= n-1 ( because index starts at 0 ),hence reserves order too:

n=3
df[df['A'].factorize()[0]<=n-1]

Answer 2

You can use np.random.choice to select the unique id, then isin to select rows with those id:

selected_ids = np.random.choice(df['A'].unique(), replace=False, size=3)

df[df['A'].isin(selected_ids)]

How to restrict DataFrame number of rows to the Xth unique value in certain column?

Question

2 answers

solution1
2 2021-03-02 16:24:45

solution2
1 2021-03-02 16:18:19

How to restrict DataFrame number of rows to the Xth unique value in certain column?

Question

2 answers

solution1 2 2021-03-02 16:24:45

solution2 1 2021-03-02 16:18:19

solution1
2 2021-03-02 16:24:45

solution2
1 2021-03-02 16:18:19