繁体   English   中英

使用 pySpark 提取字符串

[英]string extract with pySpark

我在 spark df 中有一个列targeting 值如下所示:

ab=px_d_1200;ab=9;ab=t_d_o_1000;artid=delish.recipe.46338;artid=delish_recipe_46338;avb=85;cat=recipes;role=3;sect=cooking
ab=px_d_1200;ab=8;ab=t_d_o_1000;apn=640x480_370;artid=delish.recipe.25860457;artid=delish_recipe_25860457;avb=90;cat=recipes;clc=chicken-breast-recipes;clc=insanely-easy-chicken-dinners;clc=weeknight-dinners;embedid=a6311e94-3b66-4712-8fca-eaa423e4e69a;gs_cat=response_check;gs_cat=gl_english;role=3;sect=cooking;sub=recipe-ideas;tool=recipe;urlhash=5425cac3a9c2959917d0634f5bd6d842

我需要提取role=X。 另外,等号后面的值我需要保存在另一列中。 所需的输出是:

role
3
3

这对您来说可能是一个可行的解决方案

在此处创建数据框

df = spark.createDataFrame([(1,"ab=px_d_1200;ab=9;ab=t_d_o_1000;artid=delish.recipe.46338;artid=delish_recipe_46338;avb=85;cat=recipes;role=3;sect=cooking")],[ "col1","col2"])
df.show(truncate=False)
+----+--------------------------------------------------------------------------------------------------------------------------+
|col1|col2                                                                                                                      |
+----+--------------------------------------------------------------------------------------------------------------------------+
|1   |ab=px_d_1200;ab=9;ab=t_d_o_1000;artid=delish.recipe.46338;artid=delish_recipe_46338;avb=85;cat=recipes;role=3;sect=cooking|
+----+--------------------------------------------------------------------------------------------------------------------------+

df_new = df.filter(F.col("col2").contains("role"))
df_new = df_new.withColumn("split_col", F.explode(F.split(F.col("col2"), ";")))
df_new = df_new.filter(F.col("split_col").contains("role"))
df_new = df_new.withColumn("final_col", (F.split(F.col("split_col"), "=")))
df_new = df_new.withColumn("role", F.element_at(F.col('final_col'), -1).alias('role'))
df_new.show()

+----+--------------------+---------+---------+----+
|col1|                col2|split_col|final_col|role|
+----+--------------------+---------+---------+----+
|   1|ab=px_d_1200;ab=9...|   role=3|[role, 3]|   3|
+----+--------------------+---------+---------+----+

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM