I have the following dataframe:
df = pd.DataFrame([['A', 1],['B', 2],['C', 3]], columns=['index', 'result'])
index | result |
---|---|
A | 1 |
B | 2 |
C | 3 |
I would like to create a new column, for example multiply the column 'result' by two, and I am just curious to know if there is a way to do it in pandas as pyspark does it.
In pyspark:
df = df\
.withColumn("result_multiplied", F.col("result")*2)
I don't like the fact of writing the name of the dataframe everytime I have to perform an operation as it is done in pandas such as:
In pandas:
df['result_multiplied'] = df['result']*2
Use DataFrame.assign
:
df = df.assign(result_multiplied = df['result']*2)
Or if column result
is processing in code before is necessary lambda function for processing counted values in column result
:
df = df.assign(result_multiplied = lambda x: x['result']*2)
Sample for see difference column result_multiplied
is count by multiple original df['result']
, for result_multiplied1
is used multiplied column after mul(2)
:
df = df.mul(2).assign(result_multiplied = df['result']*2,
result_multiplied1 = lambda x: x['result']*2)
print (df)
index result result_multiplied result_multiplied1
0 AA 2 2 4
1 BB 4 4 8
2 CC 6 6 12
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.