[英]error writing with Apache Spark SQLContext
I am a novice for using Spark SQL. 我是使用Spark SQL的新手。 I followed the online guide here from DataBricks: https://docs.databricks.com/spark/latest/data-sources/sql-databases.html 我遵循了DataBricks的在线指南: https ://docs.databricks.com/spark/latest/data-sources/sql-databases.html
I can successfully get a connection to the MySQL instance and also read from it. 我可以成功连接到MySQL实例并从中读取信息。 But I keep getting variations of NoTableFound or NoDatabaseFound errors from Spark SQL. 但是我不断从Spark SQL获取NoTableFound或NoDatabaseFound错误的变体。 Here is what my entire test class looks like: 这是我整个测试类的样子:
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.SparkSession;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.util.Properties;
public class MySqlConnectionTest {
private static final String MYSQL_USERNAME = "";
private static final String MYSQL_PASSWORD = "";
private static final String MYSQL_HOSTNAME = "";
private static final String MYSQL_PORT = "";
private static final String MYSQL_DATABASE = "";
private static final String MYSQL_URL = "jdbc:mysql://" + MYSQL_HOSTNAME + ":" + MYSQL_PORT + "/" + MYSQL_DATABASE + "?user=" + MYSQL_USERNAME + "&password=" + MYSQL_PASSWORD;
public static void main(String[] args) {
Properties connectionProperties = new Properties();
connectionProperties.put("user", MYSQL_USERNAME);
connectionProperties.put("password", MYSQL_PASSWORD);
/* First verify we are getting a valid connection!
try {
testConnection();
} catch(Exception e) {
e.printStackTrace();
} */
/*
* NONE of the writeToSummary methods work! The readFromSummary methods work fine...
* */
// writeToSummary(connectionProperties);
// writeToSummaryV2(connectionProperties);
writeToSummaryV3(connectionProperties);
}
private static void testConnection() throws ClassNotFoundException, SQLException {
Class.forName("com.mysql.jdbc.Driver");
Connection connection = DriverManager.getConnection(MYSQL_URL, MYSQL_USERNAME, MYSQL_PASSWORD);
boolean result = connection.isClosed();
System.out.println("@@ is connection closed?? ==> " + result);
}
private static SparkSession getSparkSession(){
return SparkSession.builder().master("local[2]").appName("readUsageSummaryV2").getOrCreate();
}
private static SQLContext getSqlContext() {
SparkConf sparkConf = new SparkConf()
.setAppName("saveUsageSummary")
.setMaster("local[2]");
JavaSparkContext javaSparkContext = new JavaSparkContext(sparkConf);
return new SQLContext(javaSparkContext);
}
private static void readFromSummary(Properties connectionProperties) {
Dataset dataSet = getSqlContext().read().jdbc(MYSQL_URL, "summary", connectionProperties);
dataSet.printSchema();
dataSet.select("id","cycle_key", "product", "access_method", "billed", "received_date")
.limit(5)
.show();
}
private static void readFromSummaryV2(Properties connectionProperties) {
Dataset dataSet = getSparkSession().read().jdbc(MYSQL_URL, "summary", connectionProperties);
dataSet.select("id","cycle_key", "product", "access_method", "billed", "received_date")
.limit(5)
.show();
}
private static void writeToSummary(Properties connectionProperties) {
SQLContext sqlContext = getSqlContext();
sqlContext.tables("usages")
.write()
// .mode(SaveMode.Append)
.jdbc(MYSQL_URL, "summary", connectionProperties);
}
private static void writeToSummaryV2(Properties connectionProperties) {
SQLContext sqlContext = getSqlContext();
sqlContext.table("summary")
.write()
// .mode(SaveMode.Append)
.jdbc(MYSQL_URL, "summary", connectionProperties);
}
private static void writeToSummaryV3(Properties connectionProperties) {
SQLContext sqlContext = getSqlContext();
sqlContext.sql("SELECT * FROM summary LIMIT 5")
.write()
// .mode(SaveMode.Append)
.jdbc(MYSQL_URL, "summary", connectionProperties);
}
} }
The answer is always a simple one... I re-read the documentation with fresh eyes and understood that for it to work, the Dataset.write() method must be writing something which already exists in Spark SQL context. 答案总是很简单的……我以崭新的眼光重新阅读了文档,并了解到要使其正常工作,Dataset.write()方法必须编写Spark SQL上下文中已经存在的内容。 So I can make it write against a Dataset which is created by reading from the database, like so: 因此,我可以针对通过读取数据库而创建的数据集进行写入,如下所示:
private static void writeToSummaryV4(Properties connectionProperties) {
Dataset summary = getSparkSession().read().jdbc(MYSQL_URL, "summary", connectionProperties);
summary.select("comp_code","cycle_key", "product", "access_method", "billed", "received_date")
.limit(5)
.show();
summary.write().mode(SaveMode.Append).jdbc(MYSQL_URL, "summary", connectionProperties);
}
Another simple way to do this is simply pass a Spark Dataset and write it to any database you want, just pass the correct DB connection strings, like this example below, which writes to a MySQL database. 另一种简单的方法是传递一个Spark数据集并将其写入所需的任何数据库,只需传递正确的数据库连接字符串(如下面的示例),该字符串将写入MySQL数据库。
private static void writeToSummaryV4(Dataset summary) {
summary.write()
.format("jdbc")
.option("url", MYSQL_URL)
.option("dbtable", MYSQL_DATABASE + "." + MYSQL_SUMMARY_TABLE)
.option("user", MYSQL_USERNAME)
.option("password", MYSQL_PASSWORD)
.mode(SaveMode.Append)
.save();
}
For me, I need to read something from a Cassandra database and then load it to the MySQL database. 对我来说,我需要从Cassandra数据库中读取一些内容,然后将其加载到MySQL数据库中。 So I could easily get the dataset from Cassandra DB like so: 因此,我可以轻松地从Cassandra DB获取数据集,如下所示:
private static Dataset readFromCassandraSummary() {
return getSparkSession().read()
.format("org.apache.spark.sql.cassandra")
.option("keyspace", "usage")
.option("table", "summary")
.load();
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.