[英]Rename columns with special characters in python or Pyspark dataframe
我在python / pyspark中有一個數據框。 這些列具有特殊字符,例如點(。),空格,括號(())和括號{}。 以他們的名字。
現在,我想以這樣的方式重命名列名稱:如果有點和空格,請用下划線替換它們;如果有()和{},則將其從列名稱中刪除。
我已經做到了
df1 = df.toDF(*(re.sub(r'[\.\s]+', '_', c) for c in df.columns))
這樣,我就可以用下划線替換點和空格,並且不能執行第二位,即如果()和{}在那里,則從列名稱中刪除它們。
我們如何實現這一目標。
Python 3.x解決方案:
tran_tab = str.maketrans({x:None for x in list('{()}')})
df1 = df.toDF(*(re.sub(r'[\.\s]+', '_', c).translate(tran_tab) for c in df.columns))
Python 2.x解決方案:
df1 = df.toDF(*(re.sub(r'[\.\s]+', '_', c).translate(None, '(){}') for c in df.columns))
如果您有pyspark數據框,則可以嘗試使用withColumnRenamed函數重命名列。 我確實嘗試過,看看並為您的更改自定義它。
>>> l=[('some value1','some value2','some value 3'),('some value4','some value5','some value 6')]
>>> l_schema = StructType([StructField("col1.some valwith(in)and{around}",StringType(),True),StructField("col2.some valwith()and{}",StringType(),True),StructField("col3 some()valwith.and{}",StringType(),True)])
>>> reps=('.','_'),(' ','_'),('(',''),(')',''),('{','')('}','')
>>> rdd = sc.parallelize(l)
>>> df = sqlContext.createDataFrame(rdd,l_schema)
>>> df.printSchema()
root
|-- col1.some valwith(in)and{around}: string (nullable = true)
|-- col2.some valwith()and{}: string (nullable = true)
|-- col3 some()valwith.and{}: string (nullable = true)
>>> df.show()
+------------------------+------------------------+------------------------+
|col1.some valwith(in)and{around}|col2.some valwith()and{}|col3 some()valwith.and{}|
+------------------------+------------------------+------------------------+
| some value1| some value2| some value 3|
| some value4| some value5| some value 6|
+------------------------+------------------------+------------------------+
>>> def colrename(x):
... return reduce(lambda a,kv : a.replace(*kv),reps,x)
>>> for i in df.schema.names:
... df = df.withColumnRenamed(i,colrename(i))
>>> df.printSchema()
root
|-- col1_some_valwithinandaround: string (nullable = true)
|-- col2_some_valwithand: string (nullable = true)
|-- col3_somevalwith_and: string (nullable = true)
>>> df.show()
+--------------------+--------------------+--------------------+
|col1_some_valwithinandaround|col2_some_valwithand|col3_somevalwith_and|
+--------------------+--------------------+--------------------+
| some value1| some value2| some value 3|
| some value4| some value5| some value 6|
+--------------------+--------------------+--------------------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.