简体   繁体   中英

Replacing values in a Pandas data frame with the order of their columns

How can we replace specific values in a data frame such that the replacements are equal to the order of the ith column where those specific values reside? For example I have this DF:

A  B  C
0  0  1
1  0  0 
1  0  0
0  1  0
1  0  1

Replacing all ones in this data frame with the order of the ith column (1st, 2nd, 3rd, etc) where the 1's reside, so that it loos like this:

A  B  C
0  0  3
1  0  0 
1  0  0
0  2  0
1  0  3

That is what I thought would work, but it did not:

 DF_2= [(0 if i== 0 else j  for i in DF.iloc[:,j]  ) for j in range(DF.shape[1]) ]

If only 1 and 0 values you can multiple numpy array converted by values with np.arrange :

print (np.arange(1, len(df.columns)+1))
[1 2 3]


print (df.values * np.arange(1, len(df.columns)+1))
[[0 0 3]
 [1 0 0]
 [1 0 0]
 [0 2 0]
 [1 0 3]]

df = pd.DataFrame(df.values * np.arange(1, len(df.columns)+1),
                  index=df.index, columns=df.columns)
print (df)
   A  B  C
0  0  0  3
1  1  0  0
2  1  0  0
3  0  2  0
4  1  0  3

More general solution, (if 0 and another numeric) is convert values to bool:

print (df)
   A  B  C
0  0  0  4
1  1  0  0
2  1  0  0
3  0  6  0
4  1  0  1

df = pd.DataFrame(df.astype(bool).values * np.arange(1, len(df.columns)+1),
                  index=df.index, columns=df.columns)
print (df)
   A  B  C
0  0  0  3
1  1  0  0
2  1  0  0
3  0  2  0
4  1  0  3

Thank you for another solutions ( Jon Clements and MaxU ):

df = df.replace({col: {1: n} for n, col in enumerate(df.columns[1:], 2)})
print (df)
   A  B  C
0  0  0  3
1  1  0  0
2  1  0  0
3  0  2  0
4  1  0  3

df = df * np.arange(1, df.shape[1]+1)
print (df)
   A  B  C
0  0  0  3
1  1  0  0
2  1  0  0
3  0  2  0
4  1  0  3

Timings :

N = 100
cols = ['col' + str(i) for i in range(N)]
df = pd.DataFrame(np.random.choice([0,1], size=(100000,N)), columns=cols)
[100000 rows x 100 columns]
#print (df)


In [101]: %timeit pd.DataFrame(df.values * np.arange(1, len(df.columns)+1), index=df.index, columns=df.columns)
10 loops, best of 3: 25.1 ms per loop

In [102]: %timeit df.replace({col: {1: n} for n, col in enumerate(df.columns[1:], 2)})
1 loop, best of 3: 1.39 s per loop

In [103]: %timeit df * np.arange(1, df.shape[1]+1)
10 loops, best of 3: 21 ms per loop

#Wen solution
In [104]: %timeit (df.mul(list(range(1, len(df.columns)+1))))
10 loops, best of 3: 38.7 ms per loop

or you can try this .(PS: you can using range to generate the list : list(range(1,df1.shape[1]+1)) )

df.mul([1,2,3])
Out[433]: 
   A  B  C
0  0  0  3
1  1  0  0
2  1  0  0
3  0  2  0
4  1  0  3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM