简体   繁体   English

如何转换 pyspark dataframe 列的值?

[英]How do I convert the value of a pyspark dataframe column?

I have a column in a pyspark dataframe for the age of an electronic device, and these values are given in milliseconds.我在 pyspark dataframe 中有一个电子设备使用年限的列,这些值以毫秒为单位。 Is there an easy way to convert that column's values to years?有没有一种简单的方法可以将该列的值转换为年份? I am not well versed in Spark.我对 Spark 不太熟悉。

EDIT: I understand that you can convert milliseconds to years pretty easily with basic math, I'm trying to take a column of a pyspark dataframe and iterate through it and convert all column values to a different value.编辑:我知道您可以使用基本数学很容易地将毫秒转换为年,我正在尝试获取 pyspark dataframe 的列并遍历它并将所有列值转换为不同的值。 Is there a specific pyspark function that makes this easier or no?是否有特定的 pyspark function 可以使这更容易或没有? I have a column where all values are very large integers with time in milliseconds, and I am trying to filter out values which are too small or large to make sense based on the lifespan of the device.我有一列,其中所有值都是非常大的整数,时间以毫秒为单位,我试图根据设备的使用寿命过滤掉太小或太大而无法理解的值。

table.filter(F.col("age")>0).filter(F.col("age")<yearsToSeconds(20))

where yearsToSeconds is a very basic function converting years to seconds.其中yearsToSeconds 是将年份转换为秒的非常基本的function。 I'd prefer being able to convert the column values to years, but I haven't worked with spark before and I don't know an optimal way to do that.我更希望能够将列值转换为年,但我以前没有使用过 spark,我不知道这样做的最佳方法。

well, one way is to use withColumn .好吧,一种方法是使用withColumn

here I'm demonstrating adding a new column called "ageinMin" to dataframe and calculate it based on "age" column from dataframe and dividing it by 600 to get equivalent minutes:在这里,我演示了在 dataframe 中添加一个名为“ageinMin”的新列,并根据 dataframe 中的“age”列计算它,然后将其除以 600 以获得等效分钟数:

df.withColumn("ageinMin",col("age") /600)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM