簡體   English   中英

如何在 spark-sql 和 DataFrames 中使用 UPDATE 命令

[英]How to use UPDATE command in spark-sql & DataFrames

我正在嘗試在 spark 中的DataFrames上實現UPDATE命令。 但是得到這個錯誤。 請就應該做什么提出建議。

17/01/19 11:49:39 INFO Replace$: query --> UPDATE temp SET c2 = REPLACE(c2,"i","a");
17/01/19 11:49:39 ERROR Main$: [1.1] failure: ``with'' expected but identifier UPDATE found

UPDATE temp SET c2 = REPLACE(c2,"i","a");
^
java.lang.RuntimeException: [1.1] failure: ``with'' expected but identifier UPDATE found

UPDATE temp SET c2 = REPLACE(c2,"i","a");

這是程序

object Replace extends SparkPipelineJob{
  val logger = LoggerFactory.getLogger(getClass)
  protected implicit val jsonFormats: Formats = DefaultFormats

  def createSetCondition(colTypeMap:List[(String,DataType)], pattern:String, replacement:String):String = {
    val res = colTypeMap map {
      case (c,t) =>
        if(t == StringType)
          c+" = REPLACE(" + c + ",\"" + pattern + "\",\"" + replacement + "\")"
        else
          c+" = REPLACE(" + c + "," + pattern + "," + replacement + ")"
    }
    return res.mkString(" , ")
  }

  override def execute(dataFrames: List[DataFrame], sc: SparkContext, sqlContext: SQLContext, params: String, productId: Int) : List[DataFrame] = {
    import sqlContext.implicits._

    val replaceData = ((parse(params)).extractOpt[ReplaceDataSchema]).get
    logger.info(s"Replace-replaceData --> ${replaceData}")

    val (inputDf, (columnsMap, colTypeMap)) = (dataFrames(0), LoadInput.colMaps(dataFrames(0)))

    val tableName = Constants.TEMP_TABLE
    inputDf.registerTempTable(tableName)

    val colMap = replaceData.colName map {
      x => (x,colTypeMap.get(x).get)
    }
    logger.info(s"colMap --> ${colMap}")

    val setCondition = createSetCondition(colMap,replaceData.input,replaceData.output)
    val query = "UPDATE "+tableName+" SET "+setCondition+";"
    logger.info(s"query --> ${query}")

    val outputDf = sqlContext.sql(query)
    List(outputDf)
  }
}

這是一些額外的信息。

17/01/19 11:49:39 INFO Replace$: Replace-replaceData --> ReplaceDataSchema(List(SchemaDetectData(s3n://fakepath/data37.csv,None,None)),List(c2),i,a)
17/01/19 11:49:39 INFO Replace$: colMap --> List((c2,StringType))

數據37.csv

c1 c2
90 nine

如果需要,請詢問更多信息。

Spark SQL 不支持UPDATE查詢。 如果您想“修改”數據,您應該使用SELECT創建新表:

SELECT *  REPLACE(c2, 'i', 'a') AS c2 FROM table

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM