Pandas data frame - Group a column values then Randomize new values of that column

Question

I have one column (X) that contains some values with duplicates (several rows have the same value and they all are sequenced). I have a requirement to randomize new values for that columns for testing one issue. so I tried:

np.random.seed(RSEED)
df["X"] = np.random.randint(100, 500, df.shape[0])

But this is not enough, I need to keep the sequences, I mean to group by same value then to randomize for all of the rows of that value a new number, and to do it for all grouped values of the original column. eg

X	new X (randomized)
210	500
210	500
.	.
.	.
340	100
340	100
.	.
.	.

I started looking if Pandas has something built-in, I can group by pandas.DataFrame.groupBy but couldn't find a pandas.DataFrame.random that can be applied for the same group.

Answer 1

Simple approach is to use groupby and transform to broadcast random integers per group

df.groupby('X')['X'].transform(lambda _: np.random.randint(100, 500))

0    137
1    137
2    .
3    .
4    335
5    335
Name: X, dtype: int64

Pandas data frame - Group a column values then Randomize new values of that column

Question

1 answers

solution1
2 ACCPTED 2023-01-02 17:38:18

Pandas data frame - Group a column values then Randomize new values of that column

Question

1 answers

solution1 2 ACCPTED 2023-01-02 17:38:18

solution1
2 ACCPTED 2023-01-02 17:38:18