繁体   English   中英

如何从Dstream中提取记录并写入Cassandra(火花流)

[英]How to extract records from Dstream and write into Cassandra (Spark Streaming)

我正在从Kafka提取数据并在Spark Streaming中处理并将数据写入Cassandra

我正在尝试过滤DStream记录,但它不过滤记录并在Cassandra中写入完整的记录,

任何带有示例/示例代码的建议可以过滤记录的多列,并且任何帮助将不胜感激,我对此进行了研究,但未能获得任何解决方案。

class SparkKafkaConsumer1(val recordStream : org.apache.spark.streaming.dstream.DStream[String], val streaming : StreamingContext) {

val internationalAddress = recordStream.map(line => line.split("\\|")(10).toUpperCase)

def timeToStr(epochMillis: Long): String =
  DateTimeFormat.forPattern("YYYYMMddHHmmss").print(epochMillis)

if(internationalAddress =="INDIA")
{
print("-----------------------------------------------")
recordStream.print()
val riskScore = "1"
val timestamp: Long = System.currentTimeMillis
val formatedTimeStamp = timeToStr(timestamp)
var wc1 = recordStream.map(_.split("\\|")).map(r=>Row(r(0),r(1),r(2),r(3),r(4).toInt,r(5).toInt,r(6).toInt,r(7),r(8),r(9),r(10),r(11),r(12),r(13),r(14),r(15),r(16),riskScore.toInt,0,0,0,formatedTimeStamp))
implicit val rowWriter = SqlRowWriter.Factory
wc1.saveToCassandra("fraud", "fraudrating", SomeColumns("purchasetimestamp","sessionid","productdetails","emailid","productprice","itemcount","totalprice","itemtype","luxaryitem","shippingaddress","country","bank","typeofcard","creditordebitcardnumber","contactdetails","multipleitem","ipaddress","consumer1score","consumer2score","consumer3score","consumer4score","recordedtimestamp"))

} 

(注意:我在Kafka中有internationalAddress = INDIA的记录,并且对Scala还是很陌生)

我不确定您要做什么,但是如果您只是想过滤与印度有关的记录,则可以执行以下操作:

implicit val rowWriter = SqlRowWriter.Factory
recordStream
   .filter(_.split("\\|")(10).toUpperCase) == "INDIA")
   .map(_.split("\\|"))
   .map(r => Row(...))
   .saveToCassandra(...)

顺便提一下,我认为案例类对您真的很有帮助。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM