[英]Using python and scala to rename output file from azure/databricks
我正在嘗試從隨機字符的python默認值中將輸出文件重命名為更明智的名稱,其中包含日期/時間以使文件名具有唯一性
這是我使用的代碼。 python將文件發送到共享驅動器,但名稱不可用。 我嘗試在python代碼中搜索某種重命名文件的方法,但是失敗了。 然后,我開始查看scala,盡管它可以滿足我的要求,但幾乎可以做到。 似乎運行正常,但未生成輸出文件,可能與開發人員有關,例如Me !!。
任何幫助,將不勝感激
%python
try:
dfsql = spark.sql("select * from dbsmets1mig02_technical_build.tbl_Temp_Output_CS_Notes_Final order by record1") #Replace with your SQL
except:
print("Exception occurred")
if dfsql.count() == 0:
print("No data rows")
else:
dfsql.coalesce(1).write.format("com.databricks.spark.csv").option("quote", "").option("header","false").option("delimiter","|").mode("overwrite").save(
"/mnt/publisheddatasmets1mig/metering/smets1mig/cs/system_data_build/notes/outbound/")
%scala
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
def merge(srcPath: String, dstPath: String): Unit = {
val hadoopConfig = new Configuration()
val hdfs = FileSystem.get(hadoopConfig)
FileUtil.copyMerge(hdfs, new Path(srcPath), hdfs, new Path(dstPath), true, hadoopConfig, null)
// the "true" setting deletes the source files once they are merged into the new outputfile
}
// replace newdata, outputfile and filename values with preferred values
val dfsql = sqlContext.sql("select * from dbsmets1mig02_technical_build.tbl_Temp_Output_CS_Notes_Final order by record1") //SQL here
val outputfile = "/mnt/publisheddatasmets1mig/metering/smets1mig/cs/system_data_build/notes/outbound" //PATH names here
var filename = "CS_Notes" //Filename here
var fileext = ".csv"
//val dateFormat = "yyyyMMdd_HHmm"
val dateFormat = "dd-MM-yyyy_HH-mm-ss"
val dateValue = spark.range(1).select(date_format(current_timestamp,dateFormat)).as[(String)].first
filename = filename + "_" + dateValue
var outputFileName = outputfile + "/" + filename + fileext
var mergedFileName = outputfile + "/" + filename + fileext
var mergeFindGlob = outputFileName
dfsql.write.format("com.databricks.spark.csv").option("header", "false").option("delimiter", "|").option("quote","\u0000").mode("overwrite").mode("overwrite").save(outputFileName)
merge(mergeFindGlob, mergedFileName )
dfsql.unpersist()
創建單個大文件到路徑然后重命名,但是由於違反Hadoop分區概念,因此不建議這樣做。
val outputFilePath = "/mnt/data/output"
dfsql.coalesce(1).write.format("com.databricks.spark.csv").option("header", "false").option("delimiter", "|").option("quote","\u0000").mode("overwrite").mode("overwrite").save(outputFileName)
// /mnt/data/output/1234565125435.csv
val outputFileName = "/mnt/data/output/filename.csv"
//Rename /mnt/data/output/1234565125435.csv to /mnt/data/output/filename.csv
rename(outputFilePath, outputFileName)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.