Previously, with RI used sub
and paste
to concatenate the strings and numbers together. I found it a bit harder in Python. Here is a sample code in Python
import pandas as pd
from numpy.random import rand
random.seed(1)
testtt = round(pd.DataFrame(rand(5,4)),3)
testtt.iloc[1,1]
print(testtt)
# 0 1 2 3
# 0 0.417 0.720 0.000 0.302
# 1 0.147 0.092 0.186 0.346
# 2 0.397 0.539 0.419 0.685
# 3 0.204 0.878 0.027 0.670
# 4 0.417 0.559 0.140 0.198
for i in range(testtt.shape[1]):
for j in range(testtt.shape[0]):
testtt.iloc[j,i] = str(i) + '_' + str(testtt.iloc[j,i],)
print(testtt)
# 0 1 2 3
# 0 0_0.417 1_0.72 2_0.0 3_0.302
# 1 0_0.147 1_0.092 2_0.186 3_0.346
# 2 0_0.397 1_0.539 2_0.419 3_0.685
# 3 0_0.204 1_0.878 2_0.027 3_0.67
# 4 0_0.417 1_0.559 2_0.14 3_0.198
Actually, I am looking forward to adding column index to the numbers under it. As you see for the first column "0_" is added to all of the elements under that column, for the second one "1_" is added and so forth.
I think for loops
is not the best way to do it since my real data is a matrix of 90000*20 elements which takes too much time to be run.
It is my previous code in R which is far faster because the number of columns is 20 and it uses just a short loop in columns:
for (i in 1:(ncol(testtt))){
testtt[,i] <- sub("^", paste(i,"_",sep = ""), testtt[,i] )
}
I am very new to Python. please consider it with your help.
In Python, string concatenation is done via additions. Using broadcasting you can do something like this
df.astype(str).radd(df.add_suffix('_').columns)
Out:
0 1 2 3
0 0_0.972 1_0.661 2_0.872 3_0.876
1 0_0.751 1_0.097 2_0.673 3_0.978
2 0_0.662 1_0.645 2_0.498 3_0.769
3 0_0.587 1_0.538 2_0.032 3_0.279
4 0_0.739 1_0.663 2_0.769 3_0.475
Here is how it works:
add_suffix
method adds _
at the end of each column name.
df.add_suffix('_').columns
Out: Index(['0_', '1_', '2_', '3_'], dtype='object')
Now it is only a matter of addition to get your desired output. However, if you add df to the df.columns, you'll get this:
df.add_suffix('_').columns + df.astype('str')
Out:
Index([('0_0.972', '1_0.661', '2_0.872', '3_0.876'),
('0_0.751', '1_0.097', '2_0.673', '3_0.978'),
('0_0.662', '1_0.645', '2_0.498', '3_0.769'),
('0_0.587', '1_0.538', '2_0.032', '3_0.279'),
('0_0.739', '1_0.663', '2_0.769', '3_0.475')],
dtype='object')
Since df.add_suffix('_').columns
is an Index
object, the returning object is also index. We want the returning object to be a DataFrame, so we do the operation on a DataFrame. radd
method adds df
to the right of df.columns
.
You can achieve the same with a for loop:
df = df.astype('str')
for col in df:
df[col] = '{}_'.format(col) + df[col]
Your R snippet translates into pandas as something like this:
for i in range(len(testtt.columns)):
testtt.iloc[: i] = str(i) + '_' + testtt.iloc[:, i].round(3).astype(str)
A more efficient solution, however, is to use the name
property of each Series
in your DataFrame
-- which, based on your numeric column names, gives us the prefix we need -- and performing the concatenation by applying a lambda (ie anonymous) function:
testtt = testtt.apply(lambda x: str(x.name) + '_' + x.round(3).astype(str))
The pd.DataFrame.apply
method works on one column of a DataFrame at a time (based on the default argument axis=0
; if axis=1
is provided instead, it works row-wise), thus eliminating the need in this case for a "for" loop.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.