简体   繁体   English

以与 pyspark 类似的方式在 pandas 中分配一个新列

[英]Assign a new column in pandas in a similar way as in pyspark

I have the following dataframe:我有以下 dataframe:

df = pd.DataFrame([['A', 1],['B', 2],['C', 3]], columns=['index', 'result'])
index指数 result结果
A一个 1 1
B 2 2
C C 3 3

I would like to create a new column, for example multiply the column 'result' by two, and I am just curious to know if there is a way to do it in pandas as pyspark does it.我想创建一个新列,例如将“结果”列乘以 2,我只是想知道在 pandas 中是否有办法做到这一点,就像 pyspark 那样。

In pyspark:
df = df\
.withColumn("result_multiplied", F.col("result")*2)

I don't like the fact of writing the name of the dataframe everytime I have to perform an operation as it is done in pandas such as:我不喜欢每次我必须执行操作时都写 dataframe 的名称,因为它在 pandas 中完成,例如:

In pandas:
df['result_multiplied'] = df['result']*2

Use DataFrame.assign :使用DataFrame.assign

df = df.assign(result_multiplied = df['result']*2)

Or if column result is processing in code before is necessary lambda function for processing counted values in column result :或者,如果列result在代码中处理之前是必要的 lambda function 用于处理列result中的计数值:

df = df.assign(result_multiplied = lambda x: x['result']*2)

Sample for see difference column result_multiplied is count by multiple original df['result'] , for result_multiplied1 is used multiplied column after mul(2) :查看差异列的示例result_multiplied由多个原始df['result']计数,因为result_multiplied1mul(2)之后使用乘列:

df = df.mul(2).assign(result_multiplied = df['result']*2,
                      result_multiplied1 = lambda x: x['result']*2)
print (df)
  index  result  result_multiplied  result_multiplied1
0    AA       2                  2                   4
1    BB       4                  4                   8
2    CC       6                  6                  12

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM