简体   繁体   English

使用 JDBC 连接器在 spark 中读取 MySQL 表的一部分

[英]Read a part of a MySQL table in spark using JDBC connector

I am trying to read a table from MySQL database using a JDBC connector in pyspark.我正在尝试使用 pyspark 中的 JDBC 连接器从 MySQL 数据库读取表。 My script to read the table is :我读取表格的脚本是:

query = "SELECT * FROM C WHERE hitId = 4235441"

readConfig = {
  "driver": driver,
  "url": url,
  "dbtable": tableName,
  "user": user,
  "password": password,
  "query_custom": query
}

saveLocation = mountPoint + "/" + tableName
print(saveLocation)

readDF = spark.read.format("jdbc").options(**readConfig).schema(tableSchema).load()
readDF.write.format("delta").option("mergeSchemas", "True").mode("overwrite").save(saveLocation)

I am trying to read only the particular rows which have a hitId of 4235441.我正在尝试仅读取 hitId 为 4235441 的特定行。

The issue is, still the whole table is being read instead of rows satisfying the custom query.问题是,仍然读取整个表而不是满足自定义查询的行。 Anyone can point out what is wrong in my script, or if anyone knows any other method to achieve objective?任何人都可以指出我的脚本有什么问题,或者是否有人知道实现目标的任何其他方法?

I am stuck for quite a time, so any help is highly appreciated.我被困了很长一段时间,所以非常感谢任何帮助。

In readConfig near dbtable option your are specifying the table_name.dbtable选项附近的readConfig您正在指定 table_name。 Instead try specifying the query like below而是尝试指定如下query

query = "SELECT * FROM C WHERE hitId = 4235441"

readConfig = {
  "driver": driver,
  "url": url,
  "dbtable": query,
  "user": user,
  "password": password,
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM