簡體   English   中英

遍歷每一列並找到最大長度

[英]Iterate through each column and find the max length

我想從 pyspark dataframe 的每一列中獲取最大長度。

以下是示例 dataframe:

from pyspark.sql.types import StructType,StructField, StringType, IntegerType
data2 = [("James","","Smith","36636","M",3000),
    ("Michael","Rose","","40288","M",4000),
    ("Robert","","Williams","42114","M",4000),
    ("Maria","Anne","Jones","39192","F",4000),
    ("Jen","Mary","Brown","","F",-1)
  ]

schema = StructType([ \
    StructField("firstname",StringType(),True), \
    StructField("middlename",StringType(),True), \
    StructField("lastname",StringType(),True), \
    StructField("id", StringType(), True), \
    StructField("gender", StringType(), True), \
    StructField("salary", IntegerType(), True) \
  ])
 
df = spark.createDataFrame(data=data2,schema=schema)
df.show(truncate=False)

我嘗試實現Scala中提供的解決方案,但無法轉換。
解決方案Scala

我是 Python 的新手,你能幫幫我嗎?

這行得通

from pyspark.sql.functions import col, length, max


df=df.select([max(length(col(name))) for name in df.schema.names])

結果

輸出

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM