[英]Read a part of a MySQL table in spark using JDBC connector
I am trying to read a table from MySQL database using a JDBC connector in pyspark.我正在尝试使用 pyspark 中的 JDBC 连接器从 MySQL 数据库读取表。 My script to read the table is :
我读取表格的脚本是:
query = "SELECT * FROM C WHERE hitId = 4235441"
readConfig = {
"driver": driver,
"url": url,
"dbtable": tableName,
"user": user,
"password": password,
"query_custom": query
}
saveLocation = mountPoint + "/" + tableName
print(saveLocation)
readDF = spark.read.format("jdbc").options(**readConfig).schema(tableSchema).load()
readDF.write.format("delta").option("mergeSchemas", "True").mode("overwrite").save(saveLocation)
I am trying to read only the particular rows which have a hitId of 4235441.我正在尝试仅读取 hitId 为 4235441 的特定行。
The issue is, still the whole table is being read instead of rows satisfying the custom query.问题是,仍然读取整个表而不是满足自定义查询的行。 Anyone can point out what is wrong in my script, or if anyone knows any other method to achieve objective?
任何人都可以指出我的脚本有什么问题,或者是否有人知道实现目标的任何其他方法?
I am stuck for quite a time, so any help is highly appreciated.我被困了很长一段时间,所以非常感谢任何帮助。
In readConfig
near dbtable
option your are specifying the table_name.在
dbtable
选项附近的readConfig
您正在指定 table_name。 Instead try specifying the query
like below而是尝试指定如下
query
query = "SELECT * FROM C WHERE hitId = 4235441"
readConfig = {
"driver": driver,
"url": url,
"dbtable": query,
"user": user,
"password": password,
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.