简体   繁体   English

将Spark RDD上传到REST webservice POST方法

[英]Upload Spark RDD to REST webservice POST method

Frankly i'm not sure if this feature exist?sorry for that 坦率地说,我不确定这个功能是否存在?对不起

My requirement is to send spark analysed data to file server on daily basis, file server supports file transfer through SFTP and REST Webservice post call. 我的要求是每天将spark分析数据发送到文件服务器,文件服务器支持通过SFTP和REST Webservice post调用进行文件传输。

Initial thought was to save Spark RDD to HDFS and transfer to fileserver through SFTP. 最初的想法是将Spark RDD保存到HDFS并通过SFTP传输到文件服务器。 I would like to know is it possible to upload the RDD directly by calling REST service from spark driver class without saving to HDFS. 我想知道是否可以通过从spark驱动程序类调用REST服务直接上载RDD而无需保存到HDFS。 Size of the data is less than 2MB 数据大小小于2MB

Sorry for my bad english! 对不起,我的英语不好!

There is no specific way to do that with Spark. Spark没有特定的方法可以做到这一点。 With that kind of data size it will not be worth it to go through HDFS or another type of storage. 使用这种数据大小,通过HDFS或其他类型的存储是不值得的。 You can collect that data in your driver's memory and send it directly. 您可以在驱动程序的内存中收集这些数据并直接发送。 For a POST call you can just use plain old java.net.URL , which would look something like this: 对于POST调用,您可以使用普通的旧java.net.URL ,它看起来像这样:

import java.net.{URL, HttpURLConnection}

// The RDD you want to send
val rdd = ???

// Gather data and turn into string with newlines
val body = rdd.collect.mkString("\n")

// Open a connection
val url = new URL("http://www.example.com/resource")
val conn = url.openConnection.asInstanceOf[HttpURLConnection]

// Configure for POST request
conn.setDoOutput(true);
conn.setRequestMethod("POST");

val os = conn.getOutputStream;
os.write(input.getBytes);
os.flush;

A much more complete discussion of using java.net.URL can be found at this question . 这个问题上可以找到关于使用java.net.URL更完整的讨论。 You could also use a Scala library to handle the ugly Java stuff for you, like akka-http or Dispatch . 您还可以使用Scala库来处理丑陋的Java内容,例如akka-httpDispatch

Spark itself does not provide this functionality (it is not a general-purpose http client). Spark本身不提供此功能(它不是通用的http客户端)。 You might consider using some existing rest client library such as akka-http, spray or some other java/scala client library. 您可以考虑使用一些现有的rest客户端库,例如akka-http,spray或其他一些java / scala客户端库。

That said, you are by no means obliged to save your data to disk before operating on it. 也就是说,在操作之前,您无需将数据保存到磁盘。 You could for example use collect() or foreach methods on your RDD in combination with your REST client library. 例如,您可以将RDD上的collect()foreach方法与REST客户端库结合使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM