简体   繁体   English

在 AWS EC2 上挂载 NVME 磁盘

[英]Mounting a NVME disk on AWS EC2

So I created i3.large with NVME disk on each nodes, here was my process :所以我在每个节点上用 NVME 磁盘创建了 i3.large,这是我的过程:

  1. lsblk -> nvme0n1 (check if nvme isn't yet mounted) lsblk -> nvme0n1(检查 nvme 是否尚未安装)
  2. sudo mkfs.ext4 -E nodiscard /dev/nvme0n1须藤 mkfs.ext4 -E nodiscard /dev/nvme0n1
  3. sudo mount -o discard /dev/nvme0n1 /mnt/my-data sudo mount -o 丢弃 /dev/nvme0n1 /mnt/my-data
  4. /dev/nvme0n1 /mnt/my-data ext4 defaults,nofail,discard 0 2 /dev/nvme0n1 /mnt/my-data ext4 默认值,nofail,discard 0 2
  5. sudo mount -a (check if everything is OK) sudo mount -a (检查一切是否正常)
  6. sudo reboot须藤重启

So all of this works, I can connect back to the instance.所以所有这些都有效,我可以连接回实例。 I have 500 Go on my new partition.我的新分区上有 500 个 Go。

But after I stop and restart the EC2 machines, some of them randomly became inaccessible (AWS warning only 1/2 test status checked)但是在我停止并重新启动 EC2 机器后,其中一些机器随机变得无法访问(AWS 警告仅检查 1/2 测试状态)

When I watch the logs of why it is inaccessible it tells me, it's about the nvme partition (but I did sudo mount -a to check if this was ok, so I don't understand)当我查看它为什么无法访问的日志时,它告诉我,它是关于 nvme 分区的(但我做了 sudo mount -a 来检查这是否正常,所以我不明白)

I don't have the AWS logs exactly, but I got some lines of it :我没有确切的 AWS 日志,但我得到了一些内容:

Bad magic number in super-block while trying to open尝试打开时超级块中的幻数错误

then the superblock is corrupt, and you might try running e2fsck with an alternate superblock:那么超级块已损坏,您可以尝试使用备用超级块运行 e2fsck:

/dev/fd/9: line 2: plymouth: command not found /dev/fd/9:第 2 行:普利茅斯:找不到命令

Stopping and starting an instance erases the ephemeral disks, moves the instance to new host hardware, and gives you new empty disks... so the ephemeral disks will always be blank after stop/start.停止和启动实例会擦除临时磁盘,将实例移动到新的主机硬件,并为您提供新的空磁盘......因此在停止/启动后临时磁盘将始终为空白。 When an instance is stopped, it doesn't exist on any physical host -- the resources are freed.当一个实例停止时,它不存在于任何物理主机上——资源被释放。

So, the best approach, if you are going to be stopping and starting instances is not to add them to /etc/fstab but rather to just format them on first boot and mount them after that.因此,如果您要停止和启动实例,最好的方法不是将它们添加到/etc/fstab ,而是在第一次启动时格式化它们,然后再安装它们。 One way of testing whether a filesystem is already present is using the file utility and grep its output.测试文件系统是否已经存在的一种方法是使用file实用程序并grep其输出。 If grep doesn't find a match, it returns false.如果 grep 没有找到匹配项,则返回 false。

The NVMe SSD on the i3 instance class is an example of an Instance Store Volume , also known as an Ephemeral [ Disk | i3 实例类上的 NVMe SSD 是Instance Store Volume 的一个示例,也称为Ephemeral [磁盘 | Volume |卷 | Drive ].驾驶 ]。 They are physically inside the instance and extremely fast, but not redundant and not intended for persistent data... hence, "ephemeral."它们物理地位于实例内部并且速度非常快,但不是冗余的,也不是用于持久数据......因此,“短暂的”。 Persistent data needs to be on an Elastic Block Store (EBS) volume or an Elastic File System (EFS) , both of which survive instance stop/start, hardware failures, and maintenance.持久性数据需要位于弹性块存储 (EBS)卷或弹性文件系统 (EFS) 上,两者都可以在实例停止/启动、硬件故障和维护后继续存在。

It isn't clear why your instances are failing to boot, but nofail may not be doing what you expect when a volume is present but has no filesystem.不清楚为什么您的实例无法启动,但是当卷存在但没有文件系统时, nofail可能不会按照您的预期执行。 My impression has been that eventually it should succeed.我的印象是它最终应该会成功。

But, you may need to apt-get install linux-aws if running Ubuntu 16.04.但是,如果运行 Ubuntu 16.04,您可能需要apt-get install linux-aws Ubuntu 14.04 NVMe support is not really stable and not recommended . Ubuntu 14.04 NVMe 支持不是很稳定, 不推荐使用

Each of these three storage solutions has its advantages and disadvantages.这三种存储解决方案中的每一种都有其优点和缺点。

The Instance Store is local, so it's quite fast... but, it's ephemeral. Instance Store 是本地的,所以它很快……但是,它是短暂的。 It survives hard and soft reboots, but not stop/start cycles.它可以承受硬重启和软重启,但不能在停止/启动周期中存活。 If your instance suffers a hardware failure, or is scheduled for retirement, as eventually happens to all hardware, you will have to stop and start the instance to move it to new hardware.如果您的实例遇到硬件故障,或计划停用(最终所有硬件都会发生这种情况),您将必须停止并启动实例以将其移至新硬件。 Reserved and dedicated instances don't change ephemeral disk behavior.预留实例和专用实例不会改变临时磁盘行为。

EBS is persistent, redundant storage, that can be detached from one instance and moved to another (and this happens automatically across a stop/start). EBS 是持久的冗余存储,可以从一个实例分离并移动到另一个实例(这会在停止/启动时自动发生)。 EBS supports point-in-time snapshots, and these are incremental at the block level, so you don't pay for storing the data that didn't change across snapshots... but through some excellent witchcraft, you also don't have to keep track of "full" vs. "incremental" snapshots -- the snapshots are only logical containers of pointers to the backed-up data blocks, so they are in essence, all "full" snapshots, but only billed as incrememental. EBS 支持时间点快照,并且这些是块级别的增量,因此您无需为存储跨快照未更改的数据付费……但是通过一些出色的巫术,您也没有跟踪“完整”与“增量”快照——快照只是指向备份数据块的指针的逻辑容器,因此它们本质上都是“完整”快照,但仅被视为增量快照。 When you delete a snapshot, only the blocks no longer needed to restore either that snapshot and any other snapshot are purged from the back-end storage system (which, transparent to you, actually uses Amazon S3).当您删除一个快照时,只有不再需要恢复该快照和任何其他快照的块才会从后端存储系统(对您来说是透明的,实际上使用 Amazon S3)中清除。

EBS volumes are available as both SSD and spinning platter magnetic volumes, again with tradeoffs in cost, performance, and appropriate applications. EBS 卷可用作 SSD 和旋转盘片磁卷,同样需要在成本、性能和适当的应用程序方面进行权衡。 See EBS Volume Types .请参阅EBS 卷类型 EBS volumes mimic ordinary hard drives, except that their capacity can be manually increased on demand (but not decreased), and can be converted from one volume type to another without shutting down the system. EBS 卷模仿普通硬盘驱动器,不同之处在于它们的容量可以按需手动增加(但不能减少),并且可以在不关闭系统的情况下从一种卷类型转换为另一种类型。 EBS does all of the data migration on the fly, with a reduction in performance but no disruption. EBS 会动态执行所有数据迁移,性能会降低但不会造成中断。 This is a relatively recent innovation.这是一个相对较新的创新。

EFS uses NFS, so you can mount an EFS filesystem on as many instances as you like, even across availability zones within one region. EFS 使用 NFS,因此您可以在任意数量的实例上挂载 EFS 文件系统,甚至可以跨一个区域内的可用区挂载。 The size limit for any one file in EFS is 52 terabytes, and your instance will actually report 8 exabytes of free space. EFS 中任何一个文件的大小限制52 TB,您的实例实际上将报告 8 EB 的可用空间。 The actual free space is for all practical purposes unlimited, but EFS is also the most expensive -- if you did have a 52 TiB file stored there for one month, that storage would cost over $15,000.实际可用空间实际上是无限的,但 EFS 也是最昂贵的——如果您确实将 52 TiB 文件存储在那里一个月,那么该存储将花费超过 15,000 美元。 The most I ever stored was about 20 TiB for 2 weeks, cost me about $5k but if you need the space, the space is there.我最多存储了大约 20 TiB,持续了 2 周,花费了我大约 5000 美元,但如果您需要空间,空间就在那里。 It's billed hourly, so if you stored the 52 TiB file for just a couple of hours and then deleted it, you'd pay maybe $50.它按小时计费,因此如果您将 52 TiB 文件存储几个小时然后将其删除,您可能需要支付 50 美元。 The "Elastic" in EFS refers to the capacity and the price. EFS 中的“弹性”是指容量和价格。 You don't pre-provision space on EFS.您不在 EFS 上预配置空间。 You use what you need and delete what you don't, and the billable size is calculated hourly.您使用需要的并删除不需要的,计费大小按小时计算。

A discussion of storage wouldn't be complete without S3.如果没有 S3,关于存储的讨论就不会完整。 It's not a filesystem, it's an object store.它不是文件系统,而是对象存储。 At about 1/10 the price of EFS, S3 also has effectively infinite capacity, and a maximum object size of 5TB.以 EFS 价格的 1/10 左右,S3 还具有有效的无限容量和 5TB 的最大对象大小。 Some applications would be better designed using S3 objects, instead of files.使用 S3 对象而不是文件来设计某些应用程序会更好。

S3 can also be easily used by systems outside of AWS, whether in your data center or in another cloud. S3 也可以被 AWS 之外的系统轻松使用,无论是在您的数据中心还是在另一个云中。 The other storage technologies are intended for use inside EC2, though there is an undocumented workaround that allows EFS to be used externally or across regions, with proxies and tunnels.其他存储技术旨在在 EC2 内部使用,但有一种未记录的解决方法允许 EFS 使用代理和隧道在外部或跨区域使用。

I have been using "c5" type instances since almost a month, mostly "c5d.4xlarge" with nvme drives.近一个月以来,我一直在使用“c5”类型的实例,主要是带有 nvme 驱动器的“c5d.4xlarge”。 So, here's what has worked for me on Ubuntu instances:因此,以下是在 Ubuntu 实例上对我有用的方法:

first get the location nvme drive is located at:首先获取位置 nvme 驱动器位于:

lsblk

mine was always mounted at nvme1n1 .我的总是安装在nvme1n1 Then check if it is an empty volume and doens't has any file system, (it mostly doesn't, unless you are remounting).然后检查它是否是一个空卷并且没有任何文件系统,(它通常没有,除非您重新安装)。 the output should be /dev/nvme1n1: data for empty drives:输出应为/dev/nvme1n1: data空驱动器的/dev/nvme1n1: data

sudo file -s /dev/nvme1n1

Then do this to format(if from last step you learned that your drive had file system and isn't an empty drive. skip this and go to next step):然后执行此操作以格式化(如果从上一步您了解到您的驱动器具有文件系统并且不是空驱动器。跳过此步骤并转到下一步):

sudo mkfs -t xfs /dev/nvme1n1

Then create a folder in current directory and mount the nvme drive:然后在当前目录创建一个文件夹并挂载nvme驱动器:

sudo mkdir /data
sudo mount /dev/nvme1n1 /data

you can now even check it's existence by running:你现在甚至可以通过运行来检查它的存在:

df -h

I just had a similar experience!我刚刚也有类似的经历! My C5.xlarge instance detects an EBS as nvme1n1.我的 C5.xlarge 实例将 EBS 检测为 nvme1n1。 I have added this line in fstab.我在 fstab 中添加了这一行。

 /dev/nvme1n1 /data ext4 discard,defaults,nofail 0 2

After a couple of rebooting, it looked working.几次重新启动后,它看起来工作正常。 It kept running for weeks.它持续运行了数周。 But today, I just got alert that instance was unable to be connected.但是今天,我刚刚收到无法连接实例的警报。 I tried rebooting it from AWS console, no luck looks the culprit is the fstab.我尝试从 AWS 控制台重新启动它,但看起来罪魁祸首是 fstab。 The disk mount is failed.磁盘挂载失败。

I raised the ticket to AWS support, no feedback yet.我提出了 AWS 支持的票,还没有反馈。 I have to start a new instance to recover my service.我必须启动一个新实例来恢复我的服务。

In another test instance, I try to use UUID(get by command blkid) instead of /dev/nvme1n1.在另一个测试实例中,我尝试使用 UUID(通过命令 blkid 获取)而不是 /dev/nvme1n1。 So far looks still working... will see if it cause any issue.到目前为止看起来仍然有效......将看看它是否会导致任何问题。

I will update here if any AWS support feedback.如果有任何 AWS 支持反馈,我会在此处更新。

================ EDIT with my fix =========== ================ 编辑我的修复 ============

AWS doesn't give me feedback yet, but I found the issue. AWS 还没有给我反馈,但我发现了问题。 Actually, in fstab, whatever you mount /dev/nvme1n1 or UUID, it doesn't matter.实际上,在fstab中,无论您挂载/dev/nvme1n1还是UUID,都没有关系。 My issue is, my ESB has some errors in file system.我的问题是,我的 ESB 在文件系统中有一些错误。 I attached it to an instance then run我将它附加到一个实例然后运行

fsck.ext4 /dev/nvme1n1

After fixes a couple of file system error, put it in fstab, reboot, no problem anymore!修复了几个文件系统错误后,将其放入fstab,重新启动,没问题了!

You may find useful new EC2 instance family equipped with local NVMe storage: C5d .您可能会发现配备本地 NVMe 存储的有用的新 EC2 实例系列: C5d

See announcement blog post: https://aws.amazon.com/blogs/aws/ec2-instance-update-c5-instances-with-local-nvme-storage-c5d/请参阅公告博文: https : //aws.amazon.com/blogs/aws/ec2-instance-update-c5-instances-with-local-nvme-storage-c5d/

在此处输入图片说明

Some excerpts from the blog post:博客文章的一些摘录:

  • You don't have to specify a block device mapping in your AMI or during the instance launch;不必在 AMI 中或在实例启动期间指定块储存设备映射 the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.客户机操作系统启动后,本地存储将显示为一个或多个设备(Linux 上的 /dev/nvme*1)。
  • Other than the addition of local storage, the C5 and C5d share the same specs.除了增加本地存储之外,C5 和 C5d 具有相同的规格。
  • You can use any AMI that includes drivers for the Elastic Network Adapter (ENA) and NVMe您可以使用任何包含 Elastic Network Adapter (ENA) 和 NVMe 驱动程序的 AMI
  • Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key.每个本地 NVMe 设备都使用 XTS-AES-256 分组密码和唯一密钥进行硬件加密。
  • Local NVMe devices have the same lifetime as the instance they are attached to and do not stick around after the instance has been stopped or terminated.本地 NVMe 设备与它们所连接的实例具有相同的生命周期,并且在实例停止或终止后不会继续存在。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM