简体   繁体   English

如何在pyspark中重命名数据框的列名?

[英]how to rename column name of dataframe in pyspark?

I want to rename one column name from dataframe columns, So currently the Column name is rate%year .我想从数据框列中重命名一个列名,所以目前列名是rate%year I want to rename it as rateyear in pyspark .我想将其重命名为rateyearpyspark

Possibly, we can rename columns at dataframe and table level after registering dataframe as table, but at table level "%" will create problem so i want to rename at dataframe level itelf.可能,我们可以在将数据帧注册为表后在数据帧和表级别重命名列,但在表级别“%”会产生问题,所以我想在数据帧级别重命名 itelf。

I tried this- data.selectExpr("rate%year as rateyear")我试过这个- data.selectExpr("rate%year as rateyear")

but getting this error pyspark.sql.utils.AnalysisException: u"cannot resolve 'rate' given input columns但收到此错误pyspark.sql.utils.AnalysisException: u"cannot resolve 'rate' given input columns

Thanks.谢谢。

Try this:尝试这个:

sqlContext.registerDataFrameAsTable(data, "myTable")
data = sqlContext.sql("SELECT rate%year AS rateyear from myTable")

I wrote an easy and fast function for you to remove % from column names.我为您编写了一个简单快速的函数来从列名中删除 %。 Enjoy!享受! :) :)

def rename_cols(rename_df):
    for column in rename_df.columns:
        new_column = column.replace('%','')
        rename_df = rename_df.withColumnRenamed(column, new_column)
    return rename_df

Possible way of renaming at dataframe level-在数据帧级别重命名的可能方法-

oldColumns=['rate%year']
newColumns = ["rateyear"]
df1 = reduce(lambda df, idx: df.withColumnRenamed(oldColumns[idx], newColumns[idx]), xrange(len(oldColumns)), df)

this is working fine at dataframe level.这在数据帧级别工作正常。 any suggestion how to resolve at table level?任何建议如何在表级别解决?

Simple and quick way to alter dataframe column names.更改数据框列名称的简单快捷的方法。

def format_col(df):    
    cols = [col.replace("%", "") for col in df.columns]
    res_df = df.toDF(*cols)
    return res_df

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 PySpark 如何使用产生错误的名称重命名列? - PySpark how to rename column with error-producing name? 重命名 PySpark Dataframe 中的透视和聚合列 - Rename pivoted and aggregated column in PySpark Dataframe 如何按列名对 DataFrame 进行切片 - 并通过迭代重命名新的 dataframe - How To Slice A DataFrame by column name - and rename the new dataframe by iteration 在一个pyspark dataframe中,当我重命名一个列时,仍然可以使用以前的名称进行过滤。 错误或功能? - In a pyspark dataframe, when I rename a column, the previous name can still be used for filtering. Bug or feature? 如何重命名数据框中的列 - How to rename a column in a dataframe 如何按索引重命名 PySpark 数据框列? (处理重复的列名) - How can I rename a PySpark dataframe column by index? (handle duplicated column names) 如何重命名一个类似于名称的数字的数据框列? - how to rename a dataframe column which is a digit like name? 如果名称是空字符串,如何重命名Pandas DataFrame列? - How to rename a Pandas DataFrame column if the name is an empty string? 如何从数据框中消除行名和列名的值导致pyspark? - How to eliminate row and column name values from the dataframe result in pyspark? 如何获取pyspark数据框中具有最大值的列的名称 - how to get the name of column with maximum value in pyspark dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM