简体   繁体   English

在Spark中将DataFrame写入MemSQL表

[英]Writing DataFrame to MemSQL Table in Spark

Im trying to load a .parquet file into a MemSQL Database with Spark and MemSQL Connector. 我正在尝试使用Spark和MemSQL Connector将.parquet文件加载到MemSQL数据库中。

package com.memsql.spark

import com.memsql.spark.context._

import org.apache.spark._
import org.apache.spark.sql._
import org.apache.spark.sql.types._

import com.memsql.spark.connector._
import com.mysql.jdbc._

object readParquet {
    def main(args: Array[String]){
    val conf = new SparkConf().setAppName("ReadParquet")
    val sc = new SparkContext(conf)
    sc.addJar("/data/applications/spark-1.5.1-bin-hadoop2.6/lib/mysql-connector-java-5.1.37-bin.jar")
    sc.addJar("/data/applications/spark-1.5.1-bin-hadoop2.6/lib/memsql-connector_2.10-1.1.0.jar")
    Class.forName("com.mysql.jdbc.Driver")

    val host = "xxxx"
    val port = 3306
    val dbName = "WP1"
    val user = "root"
    val password = ""
    val tableName = "rt_acc"

    val memsqlContext = new com.memsql.spark.context.MemSQLContext(sc, host, port, user, password)

    val rt_acc = memsqlContext.read.parquet("tachyon://localhost:19998/rt_acc.parquet")
    val func_rt_acc = new com.memsql.spark.connector.DataFrameFunctions(rt_acc)
    func_rt_acc.saveToMemSQL(dbName, tableName, host, port, user, password)
    }
}

I'm fairly certain that Tachyon is not causing the problem, as the same exceptions occur if loaded from disk and i can use sql-queries on the dataframe. 我相当确定Tachyon不会引起问题,因为如果从磁盘加载,也会发生相同的异常,并且我可以在数据帧上使用sql查询。 I've seen people suggest df.saveToMemSQL(..) however it seems this method is in DataFrameFunctions now. 我见过有人建议使用df.saveToMemSQL(..),但是现在看来此方法在DataFrameFunctions中。

Also the table doesnt exist yet but saveToMemSQL should do CREATE TABLE as documentation and source code tell me. 该表还不存在,但是saveToMemSQL应该做CREATE TABLE,因为文档和源代码告诉我。

Edit: Ok i guess i misread something. 编辑:好的,我想我读错了什么。 saveToMemSQL doesn't create the table. saveToMemSQL不会创建表。 Thanks. 谢谢。

Try using createMemSQLTableAs instead of saveToMemSQL . 尝试使用createMemSQLTableAs而不是saveToMemSQL
saveToMemSQL loads a dataframe into an existing table, where as createMemSQLTableAs creates the table and then loads it. saveToMemSQL将数据帧加载到现有表中,与createMemSQLTableAs在其中创建表,然后将其加载。 It also returns a handy dataframe wrapping that MemSQL table :). 它还返回一个方便的数据框,用于包装MemSQL表:)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM