[英]Create Spark RDD or dataframe from Strings coming from InputStream in java
I have a stream of Strings in java.我在java中有一个字符串流。 That is coming from a csv file on some other machine.
这是来自其他机器上的 csv 文件。 I am creating an InputStream and reading csv file line by line from BufferedReader in java as follows.
我正在创建一个 InputStream 并从 java 中的 BufferedReader 逐行读取 csv 文件,如下所示。
//call a method that returns inputStream
InputStream stream = getInputStreamOfFile();
BufferedReader lineStream = new BufferedReader(new InputStreamReader(stream));
while ((inputLine = lineStream.readLine()) != null) {
System.out.println("******************new Line***********");
System.out.println(inputLine);
}
lineStream.close();
stream.close();
Now, I want to create a spark RDD or DataFrame from this.现在,我想从中创建一个 spark RDD 或 DataFrame。
one solution is, I keep creating new RDD at each line and maintain globle RDD and continue doing union of RDDs.一个解决方案是,我不断在每一行创建新的 RDD 并维护全局 RDD 并继续进行 RDD 的联合。 Is there any other solution ?
还有其他解决方案吗?
Note : this file is not on the same machine.注意:此文件不在同一台机器上。 It is coming from some remote storage.
它来自某个远程存储。 I do have the HTTP URL of the file.
我确实有该文件的 HTTP URL。
If the contents of the inputStream fits in memory, we can use the following:如果 inputStream 的内容适合内存,我们可以使用以下内容:
private static List<String> displayTextInputStream(InputStream input) throws IOException {
// Read the text input stream one line at a time and display each line.
BufferedReader reader = new BufferedReader(new InputStreamReader(input));
String line = null;
List<String> result = new ArrayList<String>();
while ((line = reader.readLine()) != null) {
result.add(line);
}
return result;
}
Now we can convert the List<String>
to corresponding RDD
.现在我们可以将
List<String>
转换为相应的RDD
。
S3Object fullObject = s3Client.getObject(new GetObjectRequest("bigdataanalytics", each.getKey()));
List<String> listVals = displayTextInputStream(fullObject.getObjectContent());
JavaRDD<String> s3Rdd = sc.parallelize(listVals);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.