I am trying to split a string in scala and store it in a DF to use it with Apache Spark. The string that I have is the following:
fromTo: NT=xxx_bt_bsns_m,OD=ntis,OS=wnd,SX=xs,SZ=ddp,
fromTo: NT=xds_bt2_bswns_m,OD=nis,OS=wnd,SX=xs,SZ=ddp,
fromTo: NT=xxa_bt1_b1ns_m,OD=nts,OS=nd,SX=xs,SZ=ddp
I just want to get the following substrings :
xxx_bt_bsns_m
xds_bt2_bswns_m
xxa_bt1_b1ns_m
and then store it in a DF to show something like:
+--------------------+
| Name |
+--------------------+
| xxx_bt_bsns_m |
| xds_bt2_bswns_m |
| xxa_bt1_b1ns_m |
+--------------------+
So what i have to try to get all the string that start with NT and ends with a "," maybe using a pattern with regex and then store it in a DF?
I am starting with scala so for this reason i am having doubts with this.
Thanks in advance!
You can do this using an UDF:
val rgx = "^fromTo: NT=([a-zA-Z0-9_]+),(.*)".r
val udfToExtract = udf { str : String => str match { case (rgx(gr1, _)) => gr1} }
it gives:
+-----------------------------------------------------+-------------+
|text |textNew |
+-----------------------------------------------------+-------------+
|fromTo: NT=xxx_bt_bsns_m,OD=ntis,OS=wnd,SX=xs,SZ=ddp,|xxx_bt_bsns_m|
+-----------------------------------------------------+-------------+
Or using regex_extract:
df.select(regexp_extract($"text", "^fromTo: NT=([a-zA-Z0-9_]+),(.*)", 1).as("textNew")).show()
It gives also:
+-------------+
| textNew|
+-------------+
|xxx_bt_bsns_m|
+-------------+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.