简体繁体 English

如何使用 Go 从 MapR 集群中读取文件？

[英]How can I read files from a MapR cluster using Go?

原文 2022-01-27 13:44:07 7 1 go/ kubernetes/ mapr

I have a Go application running in a Kubernetes cluster which needs to read files from a large MapR cluster.我有一个 Go 应用程序在 Kubernetes 集群中运行，该集群需要从大型 MapR 集群中读取文件。 The two clusters are separate and the Kubernetes cluster does not permit us to use the CSI driver.这两个集群是独立的，Kubernetes 集群不允许我们使用 CSI 驱动程序。 All I can do is run userspace apps in Docker containers inside Kubernetes pods and I am given maprticket s to connect to the MapR cluster.我所能做的就是在 Kubernetes pod 内的 Docker 容器中运行用户空间应用程序，并给我maprticket来连接到 MapR 集群。

I'm able to use the com.mapr.hadoop maprfs jar to write a Java app which is able to connect and read files using a maprticket , but we need to integrate this into a Go app, which, ideally, shouldn't require a Java sidecar process. I'm able to use the com.mapr.hadoop maprfs jar to write a Java app which is able to connect and read files using a maprticket , but we need to integrate this into a Go app, which, ideally, shouldn't require一个 Java 边车过程。

1 个解决方案

This is a good question because it highlights the way that some environments impose limits that violate the assumptions external software may hold.这是一个很好的问题，因为它突出了某些环境施加限制的方式，这些限制违反了外部软件可能持有的假设。

And just for reference, MapR was acquired by HPE so a MapR cluster is now an HPE Ezmeral Data Fabric cluster.仅供参考，MapR 已被 HPE 收购，因此 MapR 集群现在是 HPE Ezmeral Data Fabric 集群。 I am still training myself to say that.我仍在训练自己这么说。

Anyway, the accepted method for a generic program in language X to communicate with the Ezmeral Data Fabric (the filesystem formerly known as MapR FS) is to mount the file system and just talk to it using file APIs like open/read/write and such.无论如何，语言 X 中的通用程序与 Ezmeral Data Fabric（以前称为 MapR FS 的文件系统）通信的公认方法是挂载文件系统并使用文件 API（如 open/read/write 等）与之通信. This applies to Go, Python, C, Julia or whatever.这适用于 Go、Python、C、Z2344521E389D6897AE7AF9ABCCF1 或其他任何东西。 Inside Kubernetes, the normal way to do this mount is to use a CSI driver that has some kind of operator working in the background.在 Kubernetes 内部，执行此挂载的正常方法是使用具有某种操作员在后台工作的 CSI 驱动程序。 That operator isn't particularly magical... it just does what is needful.那个操作员并不是特别神奇……它只是做需要做的事情。 In the case of data fabric, the operator mounts the data fabric using NFS or FUSE and then bind mounts[1] part of that into the pod's awareness.对于 Data Fabric，操作员使用 NFS 或 FUSE 挂载 Data Fabric，然后将 mounts[1] 的一部分绑定到 Pod 的感知中。

But this question is cool because it precludes all of that.但是这个问题很酷，因为它排除了所有这些。 If you can't install an operator, then this other stuff is just a dead letter.如果您无法安装操作员，那么其他这些东西只是一纸空文。

There are three alternative approaches that may work.有三种替代方法可能有效。

NFS mounts were included in Kubernetes as a native capability before the CSI plugin approach was standardized.在 CSI 插件方法标准化之前，NFS 挂载作为本机功能包含在 Kubernetes 中。 It might still be possible to use that on a very vanilla Kubernetes cluster and that could give access to the data cluster.仍然可以在非常普通的 Kubernetes 集群上使用它，并且可以访问数据集群。
It is possible to integrate a container into your pod that does the necessary FUSE mount in an unprivileged way.可以将容器集成到您的 pod 中，以非特权方式执行必要的 FUSE 挂载。 This will be kind of painful because you would have to tease apart the FUSE driver from the data fabric install and get it to work.这会有点痛苦，因为您必须将 FUSE 驱动程序与数据结构安装分开并让它工作。 That would let you see the data fabric inside the pod.这会让您看到 pod 内的数据结构。 Even then, there is no guarantee Kubernetes or the OS will allow this to work.即使这样，也不能保证 Kubernetes 或操作系统会允许它工作。
There is an unpublished Go file system client that users the low level data fabric API directly.有一个未发布的 Go 文件系统客户端直接使用低级数据结构 API。 We don't yet release that separately.我们还没有单独发布。 For more information on that, folks should ping me directly (my contact info is everywhere... email to ted.dunning hpe.com or gmail.com works) For more information on that, folks should ping me directly (my contact info is everywhere... email to ted.dunning hpe.com or gmail.com works)
The data fabric allows you to access data via S3.数据结构允许您通过 S3 访问数据。 With the 7.0 release of Ezmeral Data Fabric, this capability is heavily revamped to give massive performance especially since you can scale up the number of gateways essentially without limit (I have heard numbers like 3-5GB/s per stateless connection to a gateway, but YMMV).随着 Ezmeral Data Fabric 7.0 版本的发布，此功能经过重大改进以提供强大的性能，特别是因为您可以基本上无限制地扩展网关的数量（我听说每个网关的无状态连接有 3-5GB/s 之类的数字，但是YMMV）。 This will require the least futzing and should give plenty of performance.这将需要最少的麻烦，并且应该提供足够的性能。 You can even access files as if they were S3 objects.您甚至可以像访问 S3 对象一样访问文件。