[英]Randomly Select Rows in Pandas DataFrame Vectorized Operation
I want to select a random row during a vector operation on a DataFrame. 我想在对DataFrame进行矢量操作期间选择一个随机行。 this is what my inpDF
looks like: 这是我的inpDF
样子:
string1 string2
0 abc dfe
1 ghi jkl
2 mno pqr
3 stu vwx
I'm trying to find the function getRandomRow()
here: 我试图在这里找到函数getRandomRow()
:
outDF['string1'] = inpDF['string1']
outDF['string2'] = inpDF.getRandomRow()['string2']
so that the outDF
ends up looking (for example) like this: 这样outDF
最终看起来像这样:
string1 string2
0 abc jkl
1 ghi pqr
2 mno dfe
3 stu pqr
EDIT 1: 编辑1:
I tried using the sample()
function as suggested in this answer , but that just causes the same sample to get replicated accross all rows: 我尝试按照此答案中的建议使用sample()
函数,但这只会导致同一示例在所有行上都被复制:
outDF['string1'] = inpDF['string1']
outDF['string2'] = inpDF.sample(n=1).iloc[0,:]['string2']
which gives: 这使:
string1 string2
0 abc pqr
1 ghi pqr
2 mno pqr
3 stu pqr
EDIT 2: 编辑2:
For my particular use case, even picking the value from 'n' rows down would suffice. 对于我的特定用例,即使从'n'行中挑选值也足够了。 So, I tried doing this (I'm using inpDF.index
based on what I read in this answer ): 因此,我尝试执行此操作(根据此答案中的内容,我正在使用inpDF.index
):
numRows = len(inpDF)
outDF['string1'] = inpDF['string1']
outDF['string2'] = inpDF.iloc[(inpDF.index + 2)%numRows,:]['string2']
but it just ends up picking the value from the same row, and the outDF
comes out to be this: 但是它最终只是从同一行中选择值,而outDF
就是这样的:
string1 string2
0 abc dfe
1 ghi jkl
2 mno pqr
3 stu vwx
whereas I'm expecting it should be this: 而我期望它应该是这样的:
string1 string2
0 abc pqr
1 ghi vwx
2 mno dfe
3 stu jkl
try np.random.shuffle()
: 尝试np.random.shuffle()
:
np.random.shuffle(df.string2)
print(df)
string1 string2
0 abc pqr
1 ghi vwx
2 mno def
3 stu jkl
If you don't want to shuffle inplace try: 如果您不想就地洗牌,请尝试:
df['string3']=np.random.permutation(df.string2)
print(df)
You use pandas.DataFrame.sample
for this: pandas.DataFrame.sample
,请使用pandas.DataFrame.sample
:
df['string2'] = df.string2.sample(len(df.string2)).to_list()
print(df)
string1 string2
0 abc vwx
1 ghi jkl
2 mno def
3 stu pqr
Or 要么
df['string2'] = df.string2.sample(len(df.string2)).values
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.