简体   繁体   中英

AttributeError: 'DataFrameWriter' object has no attribute 'start'

I am trying to write a code using Kafka, Python and SparK The problem statement is: Read data from XML and the data consumed will be in the binary format. This data has to be stored in a data frame.

I am getting below error:

Error: File "C:/Users/HP/PycharmProjects/xml_streaming/ConS.py", line 55, in.format("console")
AttributeError: 'DataFrameWriter' object has no attribute 'start'

Here is my code for reference:

   #import  *
    
    # Set spark environments
    #os.environ['PYSPARK_PYTHON'] = <PATH>
    #os.environ['PYSPARK_DRIVER_PYTHON'] = <PATH>
    
    spark = SparkSession\
             .builder\
             .master("local[1]")\
             .appName("Consumer")\
             .getOrCreate()
    
    topic_Name = 'XML_File_Processing3'
    consumer = kafka.KafkaConsumer(topic_Name, bootstrap_servers=['localhost:9092'], auto_offset_reset='latest')
    
    kafka_df = spark\
        .read \
        .format("kafka") \
        .option("kafka.bootstrap.servers", "localhost:9092") \
        .option("kafka.security.protocol", "SSL") \
        .option("failOnDataLoss", "false") \
        .option("subscribe", topic_Name) \
        .load()
    #.option("startingOffsets", "earliest") \
    print("Loaded to DataFrame kafka_df")
    kafka_df.printSchema()
    new_df = kafka_df.selectExpr("CAST(value AS STRING)")
    schema = ArrayType(StructType()\
            .add("book_id", IntegerType())\
            .add("author", StringType())\
            .add("title", StringType())\
            .add("genre",StringType())\
            .add("price",IntegerType())\
            .add("publish_date", IntegerType())\
            .add("description", StringType()))
    book_DF = new_df.select(from_json(col("value"), schema).alias("dataf"))     #.('data')).select("data.*")
    book_DF.printSchema()
    #book_DF.select("dataf.author").show()
    
    book_DF.write\
           .format("console")\
           .start()

I don't have a lot of experience with kafka, but at the end you're using the start() method on the result of book_DF.write.format("console") , which is a DataFrameWriter object. This does not have a start() method.

Do you want to write this as a stream? Then you'll probably need to use something like the writeStream method:

    book_DF.writeStream \
           .format("kafka") \
           .start()

More info + examples can be found here .

If you simply want to print your dataframe to the console you should be able to use the show method for that. So in your case: book_DF.show()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM