简体   繁体   English

使用Python从MS Azure读取和写入文件

[英]Read and Write file from MS Azure using Python

I am newbie to Python and Spark, I am trying to load file from Azure to table. 我是Python和Spark的新手,我正尝试将文件从Azure加载到表。 Below is my simple code. 下面是我的简单代码。

 import os import sys os.environ['SPARK_HOME'] = "C:\\spark-2.0.0-bin-hadoop2.74" sys.path.append("C:\\spark-2.0.0-bin-hadoop2.7\\python") sys.path.append("C:\\spark-2.0.0-bin-hadoop2.7\\python\\lib\\py4j-0.10.1-src.zip") from pyspark import SparkContext from pyspark import SparkConf from pyspark.sql.types import * from pyspark.sql import * sc = SparkContext("local", "Simple App") def loadFile(path, rowDelimeter, columnDelimeter, firstHeaderColName): loadedFile = sc.newAPIHadoopFile(path, "org.apache.hadoop.mapreduce.lib.input.TextInputFormat", "org.apache.hadoop.io.LongWritable", "org.apache.hadoop.io.Text", conf={"textinputformat.record.delimiter": rowDelimeter}) rddData = loadedFile.map(lambda l:l[1].split(columnDelimeter)).filter(lambda f: f[0] != firstHeaderColName) return rddData Schema= StructType([ StructField("Column1", StringType(), True), StructField("Column2", StringType(), True), StructField("Column3", StringType(), True), StructField("Column4", StringType(), True) ]) rData= loadFile("wasbs://Storagename@Accountname.blob.core.windows.net/File.txt", '\\r\\n',"#|#","Column1") DF = sc.createDataFrame(Data,Schema) DF.write.saveAsTable("Table1") 

I am getting error like FileNotFoundError: [WinError 2] The system cannot find the file specified 我收到类似FileNotFoundError的错误:[WinError 2]系统找不到指定的文件

@Miruthan, As far as I know, If we'd like to read data from WASB into Spark, the URL syntax is as following : @Miruthan,据我所知,如果我们想将WASB中的数据读取到Spark中,则URL语法如下:

wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path>

Meanwhile, due to Azure Storage Blob (WASB) is used as the storage account associated with an HDInsight cluster.Could you please double check it? 同时,由于将Azure Storage Blob(WASB)用作与HDInsight群集关联的存储帐户,请您仔细检查一下吗? Any update, please let me know. 任何更新,请让我知道。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM