简体   繁体   English

随机化/随机播放pandas中数据帧中的行

[英]Randomizing/Shuffling rows in a dataframe in pandas

I am currently trying to find a way to randomize items in a dataframe row-wise. 我目前正试图找到一种方法来逐行随机化数据框中的项目。 I found this thread on shuffling/permutation column-wise in pandas ( shuffling/permutating a DataFrame in pandas ), but for my purposes, is there a way to do something like 我发现这个线程在pandas中逐列/排列(在pandas中改组/置换一个DataFrame ),但是出于我的目的,有没有办法做类似的事情

import pandas as pd

data = {'day': ['Mon', 'Tues', 'Wed', 'Thurs', 'Fri'],
       'color': ['Blue', 'Red', 'Green', 'Yellow', 'Black'],
       'Number': [11, 8, 10, 15, 11]}

dataframe = pd.DataFrame(data)
    Number   color    day
0      11    Blue    Mon
1       8     Red   Tues
2      10   Green    Wed
3      15  Yellow  Thurs
4      11   Black    Fri

And randomize the rows into some like 并将行随机化为一些类似的行

    Number   color    day
0      Mon    Blue    11
1      Red    Tues     8
2      10     Wed    Green
3      15    Yellow  Thurs
4      Black   11     Fri

If in order to do so, the column headers would have to go away or something of the like, I understand. 如果为了这样做,列标题将不得不消失或类似的东西,我明白。

EDIT: So, in the thread I posted, part of the code refers to an "axis" parameter. 编辑:所以,在我发布的帖子中,部分代码引用了“轴”参数。 I understand that axis = 0 refers to the columns and axis =1 refers to the rows. 据我所知,axis = 0表示列,而axis = 1表示行。 I tried taking the code and changing the axis to 1, and it seems to randomize my dataframe only if the table consists of all numbers (as opposed to a list of strings, or a combination of the two). 我尝试获取代码并将轴更改为1,并且只有当表包含所有数字(而不是字符串列表或两者的组合)时,它似乎随机化我的数据帧。

That said, should I consider not using dataframes? 那就是说,我应该考虑不使用数据帧吗? Is there a better 2D structure where I can randomize the rows and the columns if my data consists of only strings or a combinations of ints and strings? 有没有更好的2D结构,如果我的数据只包含字符串或int和字符串的组合,我可以随机化行和列?

Edit: I misunderstood the question, which was just to shuffle rows and not all the table (right?) 编辑:我误解了这个问题,这只是为了洗牌而不是所有的表(对吧?)

I think using dataframes does not make lots of sense, because columns names become useless. 我认为使用数据帧并没有多大意义,因为列名称变得毫无用处。 So you can just use 2D numpy arrays : 所以你可以使用2D numpy数组:

In [1]: A
Out[1]: 
array([[11, 'Blue', 'Mon'],
       [8, 'Red', 'Tues'],
       [10, 'Green', 'Wed'],
       [15, 'Yellow', 'Thurs'],
       [11, 'Black', 'Fri']], dtype=object)

In [2]: _ = [np.random.shuffle(i) for i in A] # shuffle in-place, so return None

In [3]: A
Out[3]: 
array([['Mon', 11, 'Blue'],
       [8, 'Tues', 'Red'],
       ['Wed', 10, 'Green'],
       ['Thurs', 15, 'Yellow'],
       [11, 'Black', 'Fri']], dtype=object)

And if you want to keep dataframe : 如果你想保留数据帧:

In [4]: pd.DataFrame(A, columns=data.columns)
Out[4]: 
  Number  color     day
0    Mon     11    Blue
1      8   Tues     Red
2    Wed     10   Green
3  Thurs     15  Yellow
4     11  Black     Fri

Here a function to shuffle rows and columns: 这是一个洗牌行和列的函数:

import numpy as np
import pandas as pd

def shuffle(df):
    col = df.columns
    val = df.values
    shape = val.shape
    val_flat = val.flatten()
    np.random.shuffle(val_flat)
    return pd.DataFrame(val_flat.reshape(shape),columns=col)

In [2]: data
Out[2]: 
   Number   color    day
0      11    Blue    Mon
1       8     Red   Tues
2      10   Green    Wed
3      15  Yellow  Thurs
4      11   Black    Fri

In [3]: shuffle(data)
Out[3]: 
  Number  color     day
0    Fri    Wed  Yellow
1  Thurs  Black     Red
2  Green   Blue      11
3     11      8      10
4    Mon   Tues      15

Hope this helps 希望这可以帮助

Maybe flatten the 2d array and then shuffle? 也许压扁2d阵列然后洗牌?

In [21]: data2=dataframe.values.flatten()

In [22]: np.random.shuffle(data2)

In [23]: dataframe2=pd.DataFrame (data2.reshape(dataframe.shape), columns=dataframe.columns )

In [24]: dataframe2
Out[24]: 
  Number   color    day
0   Tues  Yellow     11
1    Red   Green    Wed
2  Thurs     Mon   Blue
3     15       8  Black
4    Fri      11     10

Building on @jrjc 's answer, I have posted https://stackoverflow.com/a/44686455/5009287 which uses np.apply_along_axis() 基于@jrjc的回答,我发布了https://stackoverflow.com/a/44686455/5009287 ,它使用了np.apply_along_axis()

a = np.array([[10, 11, 12], [20, 21, 22], [30, 31, 32],[40, 41, 42]])
print(a)
[[10 11 12]
 [20 21 22]
 [30 31 32]
 [40 41 42]]

print(np.apply_along_axis(np.random.permutation, 1, a))
[[11 12 10]
 [22 21 20]
 [31 30 32]
 [40 41 42]]

See the full answer to see how that could be integrated with a Pandas df. 查看完整的答案,了解如何将其与Pandas df集成。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM