简体   繁体   English

使用 spark sql 中的别名值从现有 Dataframe 创建另一个 dataframe

[英]Create another dataframe from existing Dataframe with alias value in spark sql

i am using spark 1.6 with scala.我正在使用带有 scala 的 spark 1.6。

I have created a Dataframe which looks like below.我创建了一个 Dataframe,如下所示。

DATA
SKU,    MAKE,   MODEL,  GROUP   SUBCLS  IDENT
IM, AN4032X,    ADH3M032,   RM, 1011,   0
IM, A3M4936,    MP3M4936,   RM, 1011,   0
IM, AK116BC,    3M4936P,    05, ABC,    0
IM, A-116-B,    16ECAPS,    RM, 1011,   0

I am doing data validation and capture any record in new dataframe which violate the rule.我正在进行数据验证并捕获新 dataframe 中违反规则的任何记录。

Rule:规则:

Column “GROUP” must be character 
Column “SUBCLS” must be NUMERIC
Column “IDENT” must be 0

The new Dataframe will looks like新的 Dataframe 看起来像

AUDIT TABLE审核表

SKU MAKE    AUDIT_SKU   AUDIT_MAKE  AUDIT_MODEL AUDIT_GRP   AUDIT_SUBCLS    Audit_IDENT
 IM,    A-K12216BC, N,  N,  N,  Y,  Y,  N

Y represent rule violation and N represent Rule pass. Y 代表违反规则,N 代表规则通过。

i have validated rule using isnull or regex for ex: checking column Group using我已经使用 isnull 或正则表达式验证规则,例如:检查列组使用

regex: df.where( $"GROUP".rlike("^[A-Za-z]}$")).show

May someone please help me how can i do this in SPARK SQL.有人请帮助我如何在 SPARK SQL 中做到这一点。 is it possible to create a dataframe with the above scenario.是否可以在上述情况下创建 dataframe。

Thanks谢谢

you can use rlike with |您可以将 rlike 与 | 一起使用

scala> df.withColumn("Group1",when($"GROUP".rlike("^[\\d+]|[A-Za-z]\\d+"),"Y").otherwise("N")).withColumn("SUBCLS1",when($"SUBCLS".rlike("^[0-9]"),"N").otherwise("Y")).withColumn("IDENT1",when($"IDENT"==="0","N").otherwise("Y")).show()
+---+-------+--------+-----+------+-----+------+-------+------+
|SKU|   MAKE|   MODEL|GROUP|SUBCLS|IDENT|Group1|SUBCLS1|IDENT1|
+---+-------+--------+-----+------+-----+------+-------+------+
| IM|AN4032X|ADH3M032|   RM|  1011|    0|     N|      N|     N|
| IM|A3M4936|MP3M4936|  1RM|  1011|    0|     Y|      N|     N|
| IM|AK116BC| 3M4936P|   05|   ABC|    0|     Y|      Y|     N|
| IM|A-116-B| 16ECAPS|  RM1|  1011|    0|     Y|      N|     N|
+---+-------+--------+-----+------+-----+------+-------+------+

just write version 1 of each column for understanding purpose only you can overwrite column.只需编写每列的第 1 版以了解目的,只有您可以覆盖列。 let me know if you need any help on the same.如果您需要任何帮助,请告诉我。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM