简体   繁体   English

从Stringbuilder到RDD

[英]Stringbuilder to RDD

I have a string builder(sb) with data as below in Scala IDE 我在Scala IDE中有一个数据如下的字符串生成器(sb)

CellId,Date,Time,MeasType,MeasResult

251498240,2016-12-02,20:45:00,RRC.ConnEstabAtt.emergency,0

251498240,2016-12-02,20:45:00,RRC.ConnEstabAtt.highPriorityAccess,0

251498240,2016-12-02,20:45:00,RRC.ConnEstabAtt.mt-Access,4

Now I want to convert this string into RDD by using scala . 现在,我想通过使用scala将此字符串转换为RDD Please help me. 请帮我。

I am using this code. 我正在使用此代码。 But no luck. 但是没有运气。 Thanks in advance 提前致谢

 val headerFile = sc.parallelize(sb)
 headerFile.collect()

StringBuilder is used to build strings from mutable sequence of characters . StringBuilder 用于根据可变的字符序列构建字符串 So what ever you add to the builder would be appended to become as one string. 因此,您添加到构建器中的所有内容都将附加为一个字符串。

You would need to separate the strings added to be used as list of strings in sparkcontext 您需要分离添加的字符串以用作sparkcontext中的字符串列表

Assuming that the string are added with trailing line feed, you can split the string builder with line feed and use it to be transformed as rdd 假设该字符串是使用尾随换行符添加的,则可以使用换行符拆分字符串生成器,并将其用作rdd进行转换

val headerFile = sc.parallelize(sb.toString.split("\n"))
headerFile.collect()

To visualize the data, you would have to print them or save them to file 要可视化数据,您必须将其打印或保存到文件中

Now if you want to convert to dataframe before saving then you can perform as below 现在,如果要在保存之前转换为数据框,则可以执行以下操作

val data = sb.toString.split("\n")
import org.apache.spark.sql.types._
val schema = StructType(data.head.split(",").map(StructField(_, StringType, true)))
val rdd = sc.parallelize(sb.toString.split("\n").tail.map(line => Row.fromSeq(line.split(","))))
spark.createDataFrame(rdd, schema).show(false)

which should give you 这应该给你

+---------+----------+--------+-----------------------------------+----------+
|CellId   |Date      |Time    |MeasType                           |MeasResult|
+---------+----------+--------+-----------------------------------+----------+
|251498240|2016-12-02|20:45:00|RRC.ConnEstabAtt.emergency         |0         |
|251498240|2016-12-02|20:45:00|RRC.ConnEstabAtt.highPriorityAccess|0         |
|251498240|2016-12-02|20:45:00|RRC.ConnEstabAtt.mt-Access         |4         |
+---------+----------+--------+-----------------------------------+----------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM