简体   繁体   中英

How to map the integer values in a column in a pandas datfarme to random n-digit numbers?

I have a pandas data frame like df:

df=pd.DataFrame([[111, 7,8], [409,6,4], [333, 9,0],[111,3,2],[111,0,0], [409,7,0]], columns=['A','B','C'])
df
     A  B  C
0  111  7  8
1  409  6  4
2  333  9  0
3  111  3  2
4  111  0  0
5  409  7  0

How to map column A to 10-digit random integers such that the same value in columns A (such as 111) has the same 10-digit random integer in the new array. For example, I want something like this

     A         B  C
0  8765479834  7  8
1  7653780954  6  4
2  9400211346  9  0
3  8765479834  3  2
4  8765479834  0  0
5  7653780954  7  0

Thank you!

One way via hashlib :

import hashlib
df['A'] = df['A'].apply(lambda s: int(hashlib.sha1(str(s).encode("utf-8")).hexdigest(), 16) % (10 ** 8))

OUTPUT:

          A  B  C
0  22445762  7  8
1  63857454  6  4
2  61248669  9  0
3  22445762  3  2
4  22445762  0  0
5  63857454  7  0

NOTE: If you want values of random length you can also use:

df['A'] = pd.util.hash_pandas_object(df['A'], index =False)

You can use map and numpy

# find unique values in A
unique = df['A'].unique()
# use numpy to generate a random int
data = np.random.randint(1000000000, 9999999999, len(unique))
# zip the random int with your unique values and map to col A
df['A'] = df['A'].map(dict(zip(unique, data)))

            A  B  C
0  8059444826  7  8
1  2465745168  6  4
2  8408792865  9  0
3  8059444826  3  2
4  8059444826  0  0
5  2465745168  7  0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM