简体   繁体   English

如何使用 Databricks 在 Apache Spark 上编译 PySpark 中的 While 循环语句

[英]How to Compile a While Loop statement in PySpark on Apache Spark with Databricks

I'm trying send data to my Data Lake with a While Loop.我正在尝试使用 While 循环将数据发送到我的数据湖。

Basically, the intention is to continually loop through code and send data to my Data Lake when ever data received from my Azure Service Bus using the following code:基本上,目的是在使用以下代码从我的 Azure 服务总线接收到数据时,不断循环代码并将数据发送到我的数据湖:

This code receives message from my Service Bus此代码从我的服务总线接收消息

def myfunc():
  with ServiceBusClient.from_connection_string(CONNECTION_STR) as client:
      # max_wait_time specifies how long the receiver should wait with no incoming messages before stopping receipt.
      # Default is None; to receive forever.

        with client.get_queue_receiver(QUEUE_NAME, session_id=session_id, max_wait_time=5) as receiver:
          for msg in receiver:
              # print("Received: " + str(msg))
              themsg = json.loads(str(msg))
              # complete the message so that the message is removed from the queue
              receiver.complete_message(msg)
              return themsg

This code assigns a variable to the message:此代码为消息分配一个变量:

result = myfunc()

The following code sends the message to my data lake以下代码将消息发送到我的数据湖

rdd = sc.parallelize([json.dumps(result)])
spark.read.json(rdd) \
  .write.mode("overwrite").json('/mnt/lake/RAW/FormulaClassification/F1Area/')

I would like help looping through the code to continually checking for messages and sending the results to my data lake.我需要帮助遍历代码以不断检查消息并将结果发送到我的数据湖。

I believe the solution is accomplished with a While Loop but not sure我相信解决方案是通过 While 循环完成的,但不确定

Just because you're using Spark doesn't mean you cannot loop仅仅因为您使用的是 Spark 并不意味着您不能循环

First off all, you're only returning the first message from your receiver, so it should look like this首先,你只是从你的接收者那里返回第一条消息,所以它应该是这样的

with client.get_queue_receiver(QUEUE_NAME, session_id=session_id, max_wait_time=5) as receiver:
    msg = str(next(receiver)) 
          
    # print("Received: " + msg)
    themsg = json.loads(msg)
    # complete the message so that the message is removed from the queue
              
    receiver.complete_message(msg)
    return themsg 

To answer your question,要回答你的问题,

while True:
    result = json.dumps(myfunc())

    rdd = sc.parallelize([result])
    spark.read.json(rdd) \  # You should use rdd.toDF().json here instead 
      .write.mode("overwrite").json('/mnt/lake/RAW/FormulaClassification/F1Area/')

Keep in mind that the output file names aren't consistent and you might not want them to be overwritten请记住,output 文件名不一致,您可能不希望它们被覆盖

Alternatively, you should look into writing your own Source / SparkDataStream class that defines SparkSQL sources so that you don't need a loop in your main method and it's natively handled by Spark或者,您应该考虑编写自己的Source / SparkDataStream class 来定义 SparkSQL 源,这样您的 main 方法中就不需要循环,它由 Spark 本机处理

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Apache Spark function to_timestamp() 无法在 Databricks 上使用 PySpark - Apache Spark function to_timestamp() not working with PySpark on Databricks 无法使用 PySpark 与 Databricks 上的 apache spark function to_timestamp() 连接并添加一列 - Unable to concatenate with apache spark function to_timestamp() on Databricks using PySpark and add a column 如何在 Databricks 的 Iceberg 表上执行 Spark SQL 合并语句? - How to execute a Spark SQL merge statement on an Iceberg table in Databricks? 如何使用 Databricks 的 Apache Spark 从 SQL 表中获取 stream 数据 - How to stream data from SQL Table with Apache Spark with Databricks 使用 Apache Spark 在 Databricks 中使用 SQL 查询进行 CASTING 问题 - CASTING issue with SQL query in Databricks with Apache Spark 试图写一个 pyspark function 连接到 SQL 服务器与 Databricks 在 Apache Spart - Attempting to write a pyspark function to connect to SQL Server with Databricks on Apache Spart 如何在本地模拟和测试 Databricks Pyspark 笔记本 - How to mock and test Databricks Pyspark notebooks Locally 如何从 Databricks 中的 JSON 或字典或键值对格式创建 Apache Spark DataFrame - How to Create an Apache Spark DataFrame from JSON or Dictonary or Key Value pairs format in Databricks 使用 Databricks(和 Apache Spark)从 AWS Redshift 读取 - Read from AWS Redshift using Databricks (and Apache Spark) Apache Spark Streaming 连接字符串错误,Databricks 连接到 Azure 事件中心 - Apache Spark Streaming Connection String error with Databricks connection to Azure Event Hub
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM