[英]case statement with regex in pyspark sql
我有下面的 pyspark 代碼。 在代碼中,我從另一個已轉換為臨時視圖的 dataframe 創建 dataframe。 然后我使用 sql 查詢在最終查詢中創建一個新字段。 我嘗試創建的字段的代碼最初來自 postgresql,我想知道在 pyspark sql 中的 case 語句和正則表達式的正確版本是什么?
case when a.field2::varchar ~ '^[0-9]+$' then a.field2::varchar else '0' end
我只是強制轉換(field2 作為字符串)嗎?
還有什么是正確的 pyspark sql 版本的正則表達式測試?
代碼:
from pyspark.sql.types import *
from pyspark.context import SparkContext
from pyspark.sql import Window
from pyspark.sql import SQLContext
from pyspark.sql.functions import col
from pyspark.sql.functions import first
from pyspark.sql.functions import date_format
from pyspark.sql.functions import lit,StringType
from pyspark.sql.functions import date_trunc, udf,trim, upper, to_date, substring, length, min, when, format_number, dayofmonth, hour, dayofyear, month, year, weekofyear, date_format, unix_timestamp
from pyspark import SparkConf
from pyspark.sql.functions import coalesce
from pyspark.sql import SparkSession
from pyspark.sql.functions import year, month, dayofmonth
from pyspark.sql.functions import UserDefinedFunction
import datetime
from pyspark.sql.functions import year
from pyspark.sql.functions import datediff,coalesce,lag
from pyspark.sql.functions import when, to_date
from pyspark.sql.functions import date_add
from pyspark.sql.functions import UserDefinedFunction
import traceback
import sys
import time
import math
import datetime
table_df.createOrReplaceTempView("table")
query="""select
case when a.field2::varchar ~ '^[0-9]+$' then a.field2::varchar else '0' end as field1
from table a"""
df=spark.sql(query)
你可以試試:
query = """
select
case when a.field2 rlike '^[0-9]+$'
then a.field2
else '0'
end as field1
from table a
"""
df = spark.sql(query)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.