What is the best way to shuffle a group of rows in a data frame? Need this for a shuffled train set of the model.
For example, shuffle every 10 rows as a separate group or have some logical condition to create separate groups and shuffle them as a group.
If you create a new column with the index you're grouping on, you could do something like:
groups = [df.sample(frac=1) for _, df in df.groupby('index_to_group_on')]
return pandas.concat(groups)
If, for example, you want to shuffle every group of 10 rows, you could create this index via:
df['group_of_ten'] = numpy.arange(len(df)/10)
If you're trying to do cross validation, you can look into scikit-learn's train_test_split
: http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
There may be other ways too, one way may be using shuffle
from sklearn
. You can slice the n
rows that you want to shuffle and append
remaining other rows using .append
to the result of shuffled rows.
from sklearn.utils import shuffle
# if df is the dataframe to then:
n = 10 # number of rows to shuffle
shuffled_df = shuffle(df[:n]).append(df[n:])
What you can do is - create a column which identifies the group and then group by that column, followed by a random shuffle of each group.
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df['group_id'] = np.arange(df.shape[0]) // 10 # // is integer division in python3, won't work in python2
shuffled_groups = [v.drop(['group_id'], axis=1).sample(frac=1).reset_index(drop=True) for k, v in df.groupby('group_id')]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.