[英]string extract with pySpark
我在 spark df 中有一个列targeting
。 值如下所示:
ab=px_d_1200;ab=9;ab=t_d_o_1000;artid=delish.recipe.46338;artid=delish_recipe_46338;avb=85;cat=recipes;role=3;sect=cooking
ab=px_d_1200;ab=8;ab=t_d_o_1000;apn=640x480_370;artid=delish.recipe.25860457;artid=delish_recipe_25860457;avb=90;cat=recipes;clc=chicken-breast-recipes;clc=insanely-easy-chicken-dinners;clc=weeknight-dinners;embedid=a6311e94-3b66-4712-8fca-eaa423e4e69a;gs_cat=response_check;gs_cat=gl_english;role=3;sect=cooking;sub=recipe-ideas;tool=recipe;urlhash=5425cac3a9c2959917d0634f5bd6d842
我需要提取role=X。 另外,等号后面的值我需要保存在另一列中。 所需的输出是:
role
3
3
这对您来说可能是一个可行的解决方案
在此处创建数据框
df = spark.createDataFrame([(1,"ab=px_d_1200;ab=9;ab=t_d_o_1000;artid=delish.recipe.46338;artid=delish_recipe_46338;avb=85;cat=recipes;role=3;sect=cooking")],[ "col1","col2"])
df.show(truncate=False)
+----+--------------------------------------------------------------------------------------------------------------------------+
|col1|col2 |
+----+--------------------------------------------------------------------------------------------------------------------------+
|1 |ab=px_d_1200;ab=9;ab=t_d_o_1000;artid=delish.recipe.46338;artid=delish_recipe_46338;avb=85;cat=recipes;role=3;sect=cooking|
+----+--------------------------------------------------------------------------------------------------------------------------+
df_new = df.filter(F.col("col2").contains("role"))
df_new = df_new.withColumn("split_col", F.explode(F.split(F.col("col2"), ";")))
df_new = df_new.filter(F.col("split_col").contains("role"))
df_new = df_new.withColumn("final_col", (F.split(F.col("split_col"), "=")))
df_new = df_new.withColumn("role", F.element_at(F.col('final_col'), -1).alias('role'))
df_new.show()
+----+--------------------+---------+---------+----+
|col1| col2|split_col|final_col|role|
+----+--------------------+---------+---------+----+
| 1|ab=px_d_1200;ab=9...| role=3|[role, 3]| 3|
+----+--------------------+---------+---------+----+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.