Python数据框：随机排列行

Question

What is the best way to shuffle a group of rows in a data frame? 混洗数据帧中的一组行的最佳方法是什么？ Need this for a shuffled train set of the model. 需要这个用于改组模型的火车。

For example, shuffle every 10 rows as a separate group or have some logical condition to create separate groups and shuffle them as a group. 例如，每隔10行作为一个单独的组进行洗牌，或者具有某种逻辑条件以创建单独的组并将它们作为一个组洗牌。

Answer 1

If you create a new column with the index you're grouping on, you could do something like: 如果使用要分组的索引创建新列，则可以执行以下操作：

groups = [df.sample(frac=1) for _, df in df.groupby('index_to_group_on')]
return pandas.concat(groups)

If, for example, you want to shuffle every group of 10 rows, you could create this index via: 例如，如果您想随机播放每组10行，可以通过以下方式创建此索引：

df['group_of_ten'] = numpy.arange(len(df)/10)

If you're trying to do cross validation, you can look into scikit-learn's train_test_split : http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html 如果您要进行交叉验证，则可以查看scikit-learn的train_test_split ： http : train_test_split

Answer 2

There may be other ways too, one way may be using shuffle from sklearn . 也可能有其他方法，一种方法可能是使用sklearn shuffle 。 You can slice the n rows that you want to shuffle and append remaining other rows using .append to the result of shuffled rows. 您可以对要混排的n行进行切片，并使用.append append剩余的其他行到.append的结果中。

from sklearn.utils import shuffle

# if df is the dataframe to then:
n = 10 # number of rows to shuffle
shuffled_df = shuffle(df[:n]).append(df[n:])

Answer 3

What you can do is - create a column which identifies the group and then group by that column, followed by a random shuffle of each group. 您可以做的是-创建一个标识组的列，然后按该列分组，然后随机分组每个组。

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df['group_id'] = np.arange(df.shape[0]) // 10  # // is integer division in python3, won't work in python2
shuffled_groups = [v.drop(['group_id'], axis=1).sample(frac=1).reset_index(drop=True) for k, v in df.groupby('group_id')]

Python数据框：随机排列行

问题描述

3 个解决方案

解决方案1
0 2018-08-09 22:01:52

解决方案2
0 2018-08-09 22:57:36

解决方案3
0 2018-08-09 23:23:21

Python数据框：随机排列行

问题描述

3 个解决方案

解决方案1 0 2018-08-09 22:01:52

解决方案2 0 2018-08-09 22:57:36

解决方案3 0 2018-08-09 23:23:21

解决方案1
0 2018-08-09 22:01:52

解决方案2
0 2018-08-09 22:57:36

解决方案3
0 2018-08-09 23:23:21