简体   繁体   English

无法通过 SSH 连接到曾经可以工作的 GCP 虚拟机实例

[英]Cannot SSH into the GCP VM instances that used to work

I created a few GCP VM instances yesterday all using the same configuration but running different tasks.我昨天创建了几个 GCP VM 实例,它们都使用相同的配置但运行不同的任务。 I could SSH into those instances via the GCP console and they were all working fine.我可以通过 GCP 控制台通过 SSH 连接到这些实例,它们都运行良好。
Today I want to check if the tasks are done, but I cannot SSH into any of those instances via the browser anymore...The error message reads:今天我想检查任务是否完成,但我无法再通过浏览器通过 SSH 连接到任何这些实例......错误消息如下:

Connection via Cloud Identity-Aware Proxy Failed
Code: 4010
Reason: destination read failed
You may be able to connect without using the Cloud Identity-Aware Proxy.

So I retried with Cloud Identity-Award Proxy disabled.所以我在禁用 Cloud Identity-Award Proxy 的情况下重试。 But then it reads:但随后它写道:

Connection Failed
An error occurred while communicating with the SSH server. Check the server and the network configuration.

Running跑步

gcloud compute instances list

displayed all my instances and the status is RUNNING .显示了我的所有实例,状态为RUNNING But when I ran但是当我跑

gcloud compute instances get-serial-port-output [instance-name]

using the [instance-name] returned from the above command.使用从上述命令返回的 [instance-name]。 (This is to check if the boot disk of the instance has run out of free space.) It returned (这是为了检查实例的启动盘是否已用完可用空间。)它返回

(gcloud.compute.instances.get-serial-port-output) Could not fetch serial port output: The resource '...' was not found

Some extra info:一些额外的信息:
I'm accessing the VM instance from the same internet (my home internet) and everything else is the same我正在从同一个互联网(我的家庭互联网)访问 VM 实例,其他一切都一样
I'm the owner of the project我是项目的所有者
My account is using a GCP free trial with $300 credit我的帐户正在使用 300 美元赠金的 GCP 免费试用版
The instances have machine type c2-standard-4 and are using Linux Deep Learning实例的机器类型为 c2-standard-4 并且使用 Linux Deep Learning
The gcloud config looks right to me: gcloud 配置对我来说很合适:

$ gcloud config list
[component_manager]
disable_update_check = True
[compute]
gce_metadata_read_timeout_sec = 5
[core]
account = [my_account]
disable_usage_reporting = True
project = [my_project]
[metrics]
environment = devshell

Update:更新:
I reset one of the instances and now I can successfully SSH into that instance.我重置了其中一个实例,现在我可以成功地通过 SSH 连接到该实例。 However the job running on the instance stopped after reset.但是,在实例上运行的作业在重置后停止。
I want to keep the jobs running on the other instances.我想让作业在其他实例上运行。 Is there a way to SSH into other instances without reset?有没有办法在不重置的情况下通过 SSH 连接到其他实例?

You issue is at the VM side .您的问题是在 VM 端 Task's you're running make the ssh service unable to accept incoming connection and only after the restart you were able to connect.您正在运行的任务使 ssh 服务无法接受传入连接,只有在重新启动后才能连接。

You should be able to see the instance's serial console output using gcloud compute instances get-serial-port-output [instance-name] but if for some reason you're not You may try instead using GCP console - go to the instance's details and click on [Serial port 1 (console)][1] and you will see the output.您应该能够使用gcloud compute instances get-serial-port-output [instance-name]查看实例的串行控制台输出,但如果由于某种原因您不是您可以尝试使用 GCP 控制台 - 转到实例的详细信息并单击[Serial port 1 (console)][1] ,您将看到输出。

You may even interact with your VM (login) via the console .您甚至可以通过控制台与您的虚拟机交互(登录) This is particularily usefull if something stopped the ssh service but for that you need a login/password so first you have to access the VM or use the startup script to add a user with your password.如果某些东西停止了 ssh 服务,这特别有用,但为此您需要登录名/密码,因此首先您必须访问 VM 或使用启动脚本添加具有密码的用户。 But then again - this requires a restart.但话又说回来 - 这需要重新启动。

In either case it seems that the restarting your VM's is the best option .在任何一种情况下,重新启动 VM 似乎都是最好的选择 But you may try to figure out what is causing ssh service to stop after some time by inspecting logs.但是您可能会尝试通过检查日志来找出导致 ssh 服务在一段时间后停止的原因。 Or you can create your own (disk space, memory, cpu etc) by using cron with df -Th /mountpoint/path | tail -n1 >> /name_of_the_log_file.log或者您可以使用crondf -Th /mountpoint/path | tail -n1 >> /name_of_the_log_file.log创建自己的(磁盘空间、内存、cpu 等) df -Th /mountpoint/path | tail -n1 >> /name_of_the_log_file.log df -Th /mountpoint/path | tail -n1 >> /name_of_the_log_file.log . df -Th /mountpoint/path | tail -n1 >> /name_of_the_log_file.log

You can for example use cron for checking & starting ssh service.例如,您可以使用cron来检查和启动 ssh 服务。

And if something doesn't work as supposed to (according to documentation) - go to the IssueTracker and create a new issue to get more help.如果某些事情没有按预期工作(根据文档) - 转到IssueTracker并创建一个新问题以获得更多帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM