使用 JDBC 连接器在 spark 中读取 MySQL 表的一部分

Question

I am trying to read a table from MySQL database using a JDBC connector in pyspark.我正在尝试使用 pyspark 中的 JDBC 连接器从 MySQL 数据库读取表。 My script to read the table is :我读取表格的脚本是：

query = "SELECT * FROM C WHERE hitId = 4235441"

readConfig = {
  "driver": driver,
  "url": url,
  "dbtable": tableName,
  "user": user,
  "password": password,
  "query_custom": query
}

saveLocation = mountPoint + "/" + tableName
print(saveLocation)

readDF = spark.read.format("jdbc").options(**readConfig).schema(tableSchema).load()
readDF.write.format("delta").option("mergeSchemas", "True").mode("overwrite").save(saveLocation)

I am trying to read only the particular rows which have a hitId of 4235441.我正在尝试仅读取 hitId 为 4235441 的特定行。

The issue is, still the whole table is being read instead of rows satisfying the custom query.问题是，仍然读取整个表而不是满足自定义查询的行。 Anyone can point out what is wrong in my script, or if anyone knows any other method to achieve objective?任何人都可以指出我的脚本有什么问题，或者是否有人知道实现目标的任何其他方法？

I am stuck for quite a time, so any help is highly appreciated.我被困了很长一段时间，所以非常感谢任何帮助。

Answer 1

In readConfig near dbtable option your are specifying the table_name.在dbtable选项附近的readConfig您正在指定 table_name。 Instead try specifying the query like below而是尝试指定如下query

query = "SELECT * FROM C WHERE hitId = 4235441"

readConfig = {
  "driver": driver,
  "url": url,
  "dbtable": query,
  "user": user,
  "password": password,
}

使用 JDBC 连接器在 spark 中读取 MySQL 表的一部分

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-06-30 18:41:35

使用 JDBC 连接器在 spark 中读取 MySQL 表的一部分

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-06-30 18:41:35

解决方案1
2 已采纳 2021-06-30 18:41:35