简体   繁体   中英

Cannot SSH into the GCP VM instances that used to work

I created a few GCP VM instances yesterday all using the same configuration but running different tasks. I could SSH into those instances via the GCP console and they were all working fine.
Today I want to check if the tasks are done, but I cannot SSH into any of those instances via the browser anymore...The error message reads:

Connection via Cloud Identity-Aware Proxy Failed
Code: 4010
Reason: destination read failed
You may be able to connect without using the Cloud Identity-Aware Proxy.

So I retried with Cloud Identity-Award Proxy disabled. But then it reads:

Connection Failed
An error occurred while communicating with the SSH server. Check the server and the network configuration.

Running

gcloud compute instances list

displayed all my instances and the status is RUNNING . But when I ran

gcloud compute instances get-serial-port-output [instance-name]

using the [instance-name] returned from the above command. (This is to check if the boot disk of the instance has run out of free space.) It returned

(gcloud.compute.instances.get-serial-port-output) Could not fetch serial port output: The resource '...' was not found

Some extra info:
I'm accessing the VM instance from the same internet (my home internet) and everything else is the same
I'm the owner of the project
My account is using a GCP free trial with $300 credit
The instances have machine type c2-standard-4 and are using Linux Deep Learning
The gcloud config looks right to me:

$ gcloud config list
[component_manager]
disable_update_check = True
[compute]
gce_metadata_read_timeout_sec = 5
[core]
account = [my_account]
disable_usage_reporting = True
project = [my_project]
[metrics]
environment = devshell

Update:
I reset one of the instances and now I can successfully SSH into that instance. However the job running on the instance stopped after reset.
I want to keep the jobs running on the other instances. Is there a way to SSH into other instances without reset?

You issue is at the VM side . Task's you're running make the ssh service unable to accept incoming connection and only after the restart you were able to connect.

You should be able to see the instance's serial console output using gcloud compute instances get-serial-port-output [instance-name] but if for some reason you're not You may try instead using GCP console - go to the instance's details and click on [Serial port 1 (console)][1] and you will see the output.

You may even interact with your VM (login) via the console . This is particularily usefull if something stopped the ssh service but for that you need a login/password so first you have to access the VM or use the startup script to add a user with your password. But then again - this requires a restart.

In either case it seems that the restarting your VM's is the best option . But you may try to figure out what is causing ssh service to stop after some time by inspecting logs. Or you can create your own (disk space, memory, cpu etc) by using cron with df -Th /mountpoint/path | tail -n1 >> /name_of_the_log_file.log df -Th /mountpoint/path | tail -n1 >> /name_of_the_log_file.log .

You can for example use cron for checking & starting ssh service.

And if something doesn't work as supposed to (according to documentation) - go to the IssueTracker and create a new issue to get more help.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM