简体   繁体   English

如何使用 spark.read.jdbc 读取不同 Pyspark 数据帧中的多个文件

[英]How to Read multiple files in different Pyspark Dataframes using spark.read.jdbc

I have a code to read multiple files (>10) into different dataframes in Pyspark.我有一个代码可以将多个文件(> 10 个)读入 Pyspark 中的不同数据帧。 However, I would like to optimize this piece of code using a for loop and a reference variable or something like that.但是,我想使用 for 循环和引用变量或类似的东西来优化这段代码。 My code is as follows:我的代码如下:

Features_PM = (spark.read
          .jdbc(url=jdbcUrl, table='Features_PM',
                properties=connectionProperties))

Features_CM = (spark.read
          .jdbc(url=jdbcUrl, table='Features_CM',
                properties=connectionProperties))

I tried something like this but it didn't work:我试过这样的事情,但没有奏效:

table_list = ['table1', 'table2','table3', 'table4']

for table in table_list:
     jdbcDF = spark.read \
         .format("jdbc") \
         .option("url", "jdbc:postgresql:dbserver") \
         .option("dbtable", "schema.{}".format(table)) \
         .option("user", "username") \
         .option("password", "password") \
         .load()

Source for the above snippet: https://community.cloudera.com/t5/Support-Questions/read-multiple-table-parallel-using-Spark/td-p/286498以上片段的来源: https : //community.cloudera.com/t5/Support-Questions/read-multiple-table-parallel-using-Spark/td-p/286498

Any help would be appreciated.任何帮助,将不胜感激。 Thanks谢谢

You can use the following code to achieve your end goal.您可以使用以下代码来实现您的最终目标。 You will get a dictionary of dataframes where the key is the table name and value is teh appropriate dataframe您将获得一个数据框字典,其中键是表名,值是适当的数据框

def read_table(opts):
    return spark.read.format("jdbc").options(**opts).load()

table_list = ['table1', 'table2','table3', 'table4']



table_df_dict = {table: read_table({"url":"jdbc:postgresql:dbserver",
                                   "dbtable":"schema.{}".format(table),
                                   "user": "username",
                                   "password":"password"})
                 for table in table_list}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用pyspark将bz2文件读入数据帧? - How to read bz2 files into dataframes using pyspark? 读入文件并将它们分成两个数据帧(Pyspark、spark-dataframe) - Read in Files and split them into two dataframes (Pyspark, spark-dataframe) 如何读取不同 Pandas Dataframes 中的多个 excel 文件 - How to read multiple excel files in Different Pandas Dataframes 使用PySpark进行Spark读取图像 - Spark using PySpark read images Spark:如何使用子集日期读取多个s3文件 - Spark: How to read multiple s3 files using subset date 如何使用 python 中的 spark dataframe 从 AWS S3 读取镶木地板文件(pyspark) - How to read parquet files from AWS S3 using spark dataframe in python (pyspark) 如何在pyspark中读取具有不同架构的不同csv文件 - how to read different csv files with different schema in pyspark 使用熊猫将多个csv文件读取到单独的数据帧中 - Read multiple csv files into separate dataframes using pandas 如何将源文件中的 excel 文件动态读取到 python 中的不同数据帧中 - How to read excel files dynamically in a source file into different dataframes in python 在单独的 spark dataframe 中读取不同文件夹中的多个 json 文件 - Read multiple json files in different folders in separate spark dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM