简体   繁体   English

遍历每一列并找到最大长度

[英]Iterate through each column and find the max length

I want to get the maximum length from each column from a pyspark dataframe.我想从 pyspark dataframe 的每一列中获取最大长度。

Following is the sample dataframe:以下是示例 dataframe:

from pyspark.sql.types import StructType,StructField, StringType, IntegerType
data2 = [("James","","Smith","36636","M",3000),
    ("Michael","Rose","","40288","M",4000),
    ("Robert","","Williams","42114","M",4000),
    ("Maria","Anne","Jones","39192","F",4000),
    ("Jen","Mary","Brown","","F",-1)
  ]

schema = StructType([ \
    StructField("firstname",StringType(),True), \
    StructField("middlename",StringType(),True), \
    StructField("lastname",StringType(),True), \
    StructField("id", StringType(), True), \
    StructField("gender", StringType(), True), \
    StructField("salary", IntegerType(), True) \
  ])
 
df = spark.createDataFrame(data=data2,schema=schema)
df.show(truncate=False)

I tried to implement the solution provided in Scala but could not convert it.我尝试实现Scala中提供的解决方案,但无法转换。
Solution in Scala 解决方案Scala

I am fairly new in Python, can you please help me out?我是 Python 的新手,你能帮帮我吗?

This would work这行得通

from pyspark.sql.functions import col, length, max


df=df.select([max(length(col(name))) for name in df.schema.names])

Result结果

输出

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM