[英]how to rename column name of dataframe in pyspark?
I want to rename one column name from dataframe columns, So currently the Column name is rate%year
.我想从数据框列中重命名一个列名,所以目前列名是
rate%year
。 I want to rename it as rateyear
in pyspark
.我想将其重命名为
rateyear
在pyspark
。
Possibly, we can rename columns at dataframe and table level after registering dataframe as table, but at table level "%" will create problem so i want to rename at dataframe level itelf.可能,我们可以在将数据帧注册为表后在数据帧和表级别重命名列,但在表级别“%”会产生问题,所以我想在数据帧级别重命名 itelf。
I tried this- data.selectExpr("rate%year as rateyear")
我试过这个-
data.selectExpr("rate%year as rateyear")
but getting this error pyspark.sql.utils.AnalysisException: u"cannot resolve 'rate' given input columns
但收到此错误
pyspark.sql.utils.AnalysisException: u"cannot resolve 'rate' given input columns
Thanks.谢谢。
Try this:尝试这个:
sqlContext.registerDataFrameAsTable(data, "myTable")
data = sqlContext.sql("SELECT rate%year AS rateyear from myTable")
I wrote an easy and fast function for you to remove % from column names.我为您编写了一个简单快速的函数来从列名中删除 %。 Enjoy!
享受! :)
:)
def rename_cols(rename_df):
for column in rename_df.columns:
new_column = column.replace('%','')
rename_df = rename_df.withColumnRenamed(column, new_column)
return rename_df
Possible way of renaming at dataframe level-在数据帧级别重命名的可能方法-
oldColumns=['rate%year']
newColumns = ["rateyear"]
df1 = reduce(lambda df, idx: df.withColumnRenamed(oldColumns[idx], newColumns[idx]), xrange(len(oldColumns)), df)
this is working fine at dataframe level.这在数据帧级别工作正常。 any suggestion how to resolve at table level?
任何建议如何在表级别解决?
Simple and quick way to alter dataframe column names.更改数据框列名称的简单快捷的方法。
def format_col(df):
cols = [col.replace("%", "") for col in df.columns]
res_df = df.toDF(*cols)
return res_df
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.