简体   繁体   English

将S3安装到数据块

[英]mount S3 to databricks

I'm trying understand how mount works. 我正在尝试了解mount工作原理。 I have a S3 bucket named myB , and a folder in it called test . 我有一个名为myB的S3存储桶,其中有一个名为test的文件夹。 I did a mount using 我使用了

var AwsBucketName = "myB"
val MountName = "myB"

My question is that: does it create a link between S3 myB and databricks, and would databricks access all the files include the files under test folder? 我的问题是:它是否在S3 myB和databricks之间创建链接,并且databricks是否可以访问所有文件,包括test文件夹下的文件? (or if I do a mount using var AwsBucketName = "myB/test" does it only link databricks to that folder test but not anyother files that outside of that folder?) (或者,如果我使用var AwsBucketName = "myB/test"进行挂载,是否仅将数据砖链接到该文件夹test ,而不链接该文件夹之外的任何其他文件?)

If so, how do I say list files in test folder, read that file or or count() a csv file in scala? 如果是这样,我怎么说在test文件夹中列出文件,读取该文件或在scala中读取或计数(csv文件)? I did a display(dbutils.fs.ls("/mnt/myB")) and it only shows the test folder but not files in it. 我做了一个display(dbutils.fs.ls("/mnt/myB")) ,它只显示测试文件夹,而不显示其中的文件。 Quite new here. 这里很新。 Many thanks for your help! 非常感谢您的帮助!

From the Databricks documentation: 从Databricks文档中:

// Replace with your values
val AccessKey = "YOUR_ACCESS_KEY"
// Encode the Secret Key as that can contain "/"
val SecretKey = "YOUR_SECRET_KEY".replace("/", "%2F")
val AwsBucketName = "MY_BUCKET"
val MountName = "MOUNT_NAME"

dbutils.fs.mount(s"s3a://$AccessKey:$SecretKey@$AwsBucketName", s"/mnt/$MountName")
display(dbutils.fs.ls(s"/mnt/$MountName"))

If you are unable to see files in your mounted directory it is possible that you have created a directory under /mnt that is not a link to the s3 bucket. 如果在安装目录中看不到文件,则可能是在/ mnt下创建了一个目录,该目录不是s3存储桶的链接。 If that is the case try deleting the directory (dbfs.fs.rm) and remounting using the above code sample. 如果是这种情况,请尝试删除目录(dbfs.fs.rm)并使用上述代码示例重新安装。 Note that you will need your AWS credentials (AccessKey and SecretKey above). 请注意,您将需要您的AWS凭证(上面的AccessKey和SecretKey)。 If you don't know them you will need to ask your AWS account admin for them. 如果您不认识他们,则需要向您的AWS账户管理员询问。

It only lists the folders and files directly under bucket. 它仅列出存储桶正下方的文件夹和文件。

In S3 在S3中

<bucket-name>/<Files & Folders>

In Databricks 在Databricks中

/mnt/<MOUNT-NAME>/<Bucket-Data-List>

Just like below (Output for dbutils.fs.ls(s"/mnt/$MountName") ) 就像下面一样( dbutils.fs.ls(s"/mnt/$MountName")

dbfs:/mnt/<MOUNT-NAME>/Folder/  
dbfs:/mnt/<MOUNT-NAME>/file1.csv
dbfs:/mnt/<MOUNT-NAME>/file2.csv

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM