[英]java.lang.IllegalArgumentException: 'path' is not specified // Spark Consumer Issue
I am trying to create SparkConsumer so I can send messeges in this case a csv file to Kafka through Spark Streaming. 我正在尝试创建SparkConsumer,以便在这种情况下可以通过Spark Streaming将CSV文件发送到Kafka。 But I have an error that 'path' is not specified.
但是我有一个错误,未指定“路径”。 See my code below
请参阅下面的代码
My code is as follows: 我的代码如下:
import org.apache.log4j.{Level, Logger}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.execution.streaming.FileStreamSource.Timestamp
import org.apache.spark.sql.types.{StringType, StructField, StructType}
import org.apache.spark.sql.functions.from_json
import org.apache.spark.sql.streaming.OutputMode
object sparkConsumer extends App {
val conf = new SparkConf().setMaster("local").setAppName("Name")
val sc = new SparkContext(conf)
val rootLogger = Logger.getRootLogger()
rootLogger.setLevel(Level.ERROR)
val spark = SparkSession
.builder()
.appName("Spark-Kafka-Integration")
.master("local")
.getOrCreate()
val schema = StructType(Array(
StructField("InvoiceNo", StringType, nullable = true),
StructField("StockCode", StringType, nullable = true),
StructField("Description", StringType, nullable = true),
StructField("Quantity", StringType, nullable = true)
))
val streamingDataFrame = spark.readStream.schema(schema).csv("C:/Users/me/Desktop/Tasks/Tasks1/test.csv")
streamingDataFrame.selectExpr("CAST(InvoiceNo AS STRING) AS key", "to_json(struct(*)) AS value").
writeStream
.format("csv")
.option("topic", "topic_test")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("checkpointLocation", "C:/Users/me/IdeaProjects/SparkStreaming/checkpointLocation/")
.start()
import spark.implicits._
val df = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("subscribe", "topic_test")
.load()
val df1 = df.selectExpr("CAST(value AS STRING)", "CAST(timestamp AS TIMESTAMP)").as[(String, Timestamp)]
.select(from_json($"value", schema).as("data"), $"timestamp")
.select("data.*", "timestamp")
df1.writeStream
.format("console")
.option("truncate","false")
.outputMode(OutputMode.Append)
.start()
.awaitTermination()
}
I become the following error: 我变成以下错误:
Exception in thread "main" java.lang.IllegalArgumentException: 'path' is not specified
Does anyone know what I am missing? 有人知道我在想什么吗?
It seems that it can be a problem on this part of your code: 在这部分代码看来,这可能是一个问题:
streamingDataFrame.selectExpr("CAST(InvoiceNo AS STRING) AS key", "to_json(struct(*)) AS value").
writeStream
.format("csv")
.option("topic", "topic_test")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("checkpointLocation", "C:/Users/me/IdeaProjects/SparkStreaming/checkpointLocation/")
.start()
because you use use a "csv" format but you don´t set the file location that it needs. 因为您使用的是“ csv”格式,但未设置所需的文件位置。 Instead you configure Kafka properties to use a kafka topic as your sink.
相反,您可以配置Kafka属性以将kafka主题用作接收器。 So if you change the format to "kafka" it should work.
因此,如果将格式更改为“ kafka”,则应该可以使用。
Another problem you can experiment using csv as source is that your path should be a directory not file. 您可以尝试使用csv作为源的另一个问题是您的路径应该是目录而不是文件。 In your case, if you create a directory and move your csv file it will work.
就您而言,如果您创建目录并移动csv文件,它将起作用。
Just for testing, create a directoy named C:/Users/me/Desktop/Tasks/Tasks1/test.csv and create a file with the name part-0000.csv inside. 仅出于测试目的,创建一个名为C:/Users/me/Desktop/Tasks/Tasks1/test.csv的目录,并创建一个内部名为part-0000.csv的文件。 Then include your csv content in this new file and start again the process.
然后将您的csv内容包含在这个新文件中,然后再次开始该过程。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.