简体   繁体   English

在vSphere上BOSH Director安装失败

[英]BOSH Director Installation fails on vSphere

This is my first BOSH installation for PKS. 这是我第一次为PKS安装BOSH。 Environment: 环境:

  • vSphere 6.5 with VCSA 6.5u2, 带有VCSA 6.5u2的vSphere 6.5,
  • OpsMgr 2.2 build 296 OpsMgr 2.2内部版本296
  • bosh stemcell vsphere-ubuntu-trusty build 3586.25 bosh stemcell vsphere-ubuntu-trusty版本3586.25
  • Using a flat 100.x network, no routing/firewall involved. 使用平面100.x网络,不涉及路由/防火墙。

Summary - After deploying the OpsMgr OVF template, I'm configuring and installing BOSH Director. 摘要 -部署OpsMgr OVF模板后,我将配置和安装BOSH Director。 However, it fails at "Waiting for Agent" in the dashboard. 但是,它在仪表板上的“等待代理”中失败。 A look at the 'current' log in the OpsMgr VM shows that it keeps trying to read settings from /dev/sr0, because the agent.json specifies settings Source as CDROM. 查看OpsMgr VM中的“当前”日志后,发现它一直尝试从/ dev / sr0读取设置,因为agent.json将设置Source指定为CDROM。 It cannot find any CDROM, so it fails. 它找不到任何CDROM,因此失败。

A few questions: 几个问题:

  1. How do I login to the VM that BOSH creates when I change the setting to "default BOSH password" for all VMs in Ops Mgr? 将Ops Mgr中所有虚拟机的设置更改为“默认BOSH密码”时,如何登录BOSH创建的虚拟机?
  2. There is no bosh.yml under /var/tempest/workspaces/default/deployments. / var / tempest / workspaces / default / deployments下没有bosh.yml。 Some docs point to it. 一些文档指出了这一点。 So I don't know what settings its applying. 所以我不知道它应用什么设置。 Is the location wrong? 位置错误吗?
  3. Is there a way to change the stemcell used by the OpsMgr VM? 有没有办法更改OpsMgr VM使用的干细胞? Maybe I cantry using the previous build? 也许我可以尝试使用以前的版本?
  4. How is the agent.json actually populated? 实际上是如何填充agent.json的?
  5. Any suggestions on troubleshooting this? 关于排除故障有什么建议吗?

All logs/jsons below: 以下所有日志/ json:

the GUI dashboard log: GUI仪表板日志:

===== 2018-07-30 08:20:52 UTC Running "/usr/local/bin/bosh --no-color --non-interactive --tty create-env /var/tempest/workspaces/default/deployments/bosh.yml"
Deployment manifest: '/var/tempest/workspaces/default/deployments/bosh.yml'
Deployment state: '/var/tempest/workspaces/default/deployments/bosh-state.json'

Started validating
Validating release 'bosh'... Finished (00:00:00)
Validating release 'bosh-vsphere-cpi'... Finished (00:00:00)
Validating release 'uaa'... Finished (00:00:00)
Validating release 'credhub'... Finished (00:00:01)
Validating release 'bosh-system-metrics-server'... Finished (00:00:01)
Validating release 'os-conf'... Finished (00:00:00)
Validating release 'backup-and-restore-sdk'... Finished (00:00:04)
Validating release 'bpm'... Finished (00:00:02)
Validating cpi release... Finished (00:00:00)
Validating deployment manifest... Finished (00:00:00)
Validating stemcell... Finished (00:00:14)
Finished validating (00:00:26)

Started installing CPI
Compiling package 'ruby-2.4-r4/0cdc60ed7fdb326e605479e9275346200af30a25'... Finished (00:00:00)
Compiling package 'vsphere_cpi/e1a84e5bd82eb1abfe9088a2d547e2cecf6cf315'... Finished (00:00:00)
Compiling package 'iso9660wrap/82cd03afdce1985db8c9d7dba5e5200bcc6b5aa8'... Finished (00:00:00)
Installing packages... Finished (00:00:15)
Rendering job templates... Finished (00:00:06)
Installing job 'vsphere_cpi'... Finished (00:00:00)
Finished installing CPI (00:00:23)

Starting registry... Finished (00:00:00)
Uploading stemcell 'bosh-vsphere-esxi-ubuntu-trusty-go_agent/3586.25'... Skipped [Stemcell already uploaded] (00:00:00)

Started deploying
Waiting for the agent on VM 'vm-87b3299a-a994-4544-8043-032ce89d685b'... Failed (00:00:11)
Deleting VM 'vm-87b3299a-a994-4544-8043-032ce89d685b'... Finished (00:00:10)
Creating VM for instance 'bosh/0' from stemcell 'sc-536fea79-cfa6-46a9-a53e-9de19505216f'... Finished (00:00:12)
Waiting for the agent on VM 'vm-fb90eee8-f3ac-45b7-95d3-4e8483c91a5c' to be ready... Failed (00:09:59)
Failed deploying (00:10:38)

Stopping registry... Finished (00:00:00)
Cleaning up rendered CPI jobs... Finished (00:00:00)

Deploying:
Creating instance 'bosh/0':
    Waiting until instance is ready:
    Post https://vcap:<redacted>@192.168.100.201:6868/agent: dial tcp 192.168.100.201:6868: connect: no route to host

Exit code 1
===== 2018-07-30 08:32:20 UTC Finished "/usr/local/bin/bosh --no-color --non-interactive --tty create-env /var/tempest/workspaces/default/deployments/bosh.yml"; Duration: 688s; Exit Status: 1
Exited with 1.

The bosh_state.json bosh_state.json

ubuntu@opsmanager-2-2:~$ sudo cat /var/tempest/workspaces/default/deployments/bosh-state.json

{
    "director_id": "851f70ef-7c4b-4c65-73ed-d382ad3df1b7",
    "installation_id": "f29df8af-7141-4aff-5e52-2d109a84cd84",
    "current_vm_cid": "vm-87b3299a-a994-4544-8043-032ce89d685b",
    "current_stemcell_id": "dcca340c-d612-4098-7c90-479193fa9090",
    "current_disk_id": "",
    "current_release_ids": [],
    "current_manifest_sha": "",
    "disks": null,
    "stemcells": [
        {
            "id": "dcca340c-d612-4098-7c90-479193fa9090",
            "name": "bosh-vsphere-esxi-ubuntu-trusty-go_agent",
            "version": "3586.25",
            "cid": "sc-536fea79-cfa6-46a9-a53e-9de19505216f"
        }
    ],
    "releases": []

The agent.json agent.json

ubuntu@opsmanager-2-2:~$ sudo cat /var/vcap/bosh/agent.json
{
"Platform": {
    "Linux": {

    "DevicePathResolutionType": "scsi"
    }
},
"Infrastructure": {
    "Settings": {
    "Sources": [
        {
        "Type": "CDROM",
        "FileName": "env"
        }
    ]
    }
}
}
ubuntu@opsmanager-2-2:~$

Finally, the current BOSH log 最后,当前的BOSH日志

/var/vcap/bosh/log/current


2018-07-30_08:42:22.69934 [main] 2018/07/30 08:42:22 DEBUG - Starting agent
2018-07-30_08:42:22.69936 [File System] 2018/07/30 08:42:22 DEBUG - Reading file /var/vcap/bosh/agent.json
2018-07-30_08:42:22.69937 [File System] 2018/07/30 08:42:22 DEBUG - Read content
2018-07-30_08:42:22.69937 ********************
2018-07-30_08:42:22.69938 {
2018-07-30_08:42:22.69938   "Platform": {
2018-07-30_08:42:22.69939     "Linux": {
2018-07-30_08:42:22.69939
2018-07-30_08:42:22.69939       "DevicePathResolutionType": "scsi"
2018-07-30_08:42:22.69939     }
2018-07-30_08:42:22.69939   },
2018-07-30_08:42:22.69939   "Infrastructure": {
2018-07-30_08:42:22.69940     "Settings": {
2018-07-30_08:42:22.69940       "Sources": [
2018-07-30_08:42:22.69940         {
2018-07-30_08:42:22.69940           "Type": "CDROM",
2018-07-30_08:42:22.69940           "FileName": "env"
2018-07-30_08:42:22.69940         }
2018-07-30_08:42:22.69941       ]
2018-07-30_08:42:22.69941     }
2018-07-30_08:42:22.69941   }
2018-07-30_08:42:22.69941 }
2018-07-30_08:42:22.69941
2018-07-30_08:42:22.69941 ********************
2018-07-30_08:42:22.69943 [File System] 2018/07/30 08:42:22 DEBUG - Reading file /var/vcap/bosh/etc/stemcell_version
2018-07-30_08:42:22.69944 [File System] 2018/07/30 08:42:22 DEBUG - Read content
2018-07-30_08:42:22.69944 ********************
2018-07-30_08:42:22.69944 3586.25
2018-07-30_08:42:22.69944 ********************
2018-07-30_08:42:22.69945 [File System] 2018/07/30 08:42:22 DEBUG - Reading file /var/vcap/bosh/etc/stemcell_git_sha1
2018-07-30_08:42:22.69946 [File System] 2018/07/30 08:42:22 DEBUG - Read content
2018-07-30_08:42:22.69946 ********************
2018-07-30_08:42:22.69946 dbbb73800373356315a4c16ee40d2db3189bf2db
2018-07-30_08:42:22.69947 ********************
2018-07-30_08:42:22.69948 [App] 2018/07/30 08:42:22 INFO - Running on stemcell version '3586.25' (git: dbbb73800373356315a4c16ee40d2db3189bf2db)
2018-07-30_08:42:22.69949 [File System] 2018/07/30 08:42:22 DEBUG - Checking if file exists /var/vcap/bosh/agent_state.json
2018-07-30_08:42:22.69950 [File System] 2018/07/30 08:42:22 DEBUG - Stat '/var/vcap/bosh/agent_state.json'
2018-07-30_08:42:22.69951 [Cmd Runner] 2018/07/30 08:42:22 DEBUG - Running command 'bosh-agent-rc'
2018-07-30_08:42:22.70116 [unlimitedRetryStrategy] 2018/07/30 08:42:22 DEBUG - Making attempt #0
2018-07-30_08:42:22.70117 [DelayedAuditLogger] 2018/07/30 08:42:22 DEBUG - Starting logging to syslog...
2018-07-30_08:42:22.70181 [Cmd Runner] 2018/07/30 08:42:22 DEBUG - Stdout:
2018-07-30_08:42:22.70182 [Cmd Runner] 2018/07/30 08:42:22 DEBUG - Stderr:
2018-07-30_08:42:22.70183 [Cmd Runner] 2018/07/30 08:42:22 DEBUG - Successful: true (0)
2018-07-30_08:42:22.70184 [settingsService] 2018/07/30 08:42:22 DEBUG - Loading settings from fetcher
2018-07-30_08:42:22.70185 [ConcreteUdevDevice] 2018/07/30 08:42:22 DEBUG - Kicking device, attempt 0 of 5
2018-07-30_08:42:22.70187 [ConcreteUdevDevice] 2018/07/30 08:42:22 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:23.20204 [ConcreteUdevDevice] 2018/07/30 08:42:23 DEBUG - Kicking device, attempt 1 of 5
2018-07-30_08:42:23.20206 [ConcreteUdevDevice] 2018/07/30 08:42:23 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:23.70217 [ConcreteUdevDevice] 2018/07/30 08:42:23 DEBUG - Kicking device, attempt 2 of 5
2018-07-30_08:42:23.70220 [ConcreteUdevDevice] 2018/07/30 08:42:23 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:24.20229 [ConcreteUdevDevice] 2018/07/30 08:42:24 DEBUG - Kicking device, attempt 3 of 5
2018-07-30_08:42:24.20294 [ConcreteUdevDevice] 2018/07/30 08:42:24 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:24.70249 [ConcreteUdevDevice] 2018/07/30 08:42:24 DEBUG - Kicking device, attempt 4 of 5
2018-07-30_08:42:24.70253 [ConcreteUdevDevice] 2018/07/30 08:42:24 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:25.20317 [ConcreteUdevDevice] 2018/07/30 08:42:25 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:25.20320 [ConcreteUdevDevice] 2018/07/30 08:42:25 ERROR - Failed to red byte from device: open /dev/sr0: no such file or directory
2018-07-30_08:42:25.20321 [ConcreteUdevDevice] 2018/07/30 08:42:25 DEBUG - Settling UdevDevice
2018-07-30_08:42:25.20322 [Cmd Runner] 2018/07/30 08:42:25 DEBUG - Running command 'udevadm settle'
2018-07-30_08:42:25.20458 [Cmd Runner] 2018/07/30 08:42:25 DEBUG - Stdout:
2018-07-30_08:42:25.20460 [Cmd Runner] 2018/07/30 08:42:25 DEBUG - Stderr:
2018-07-30_08:42:25.20461 [Cmd Runner] 2018/07/30 08:42:25 DEBUG - Successful: true (0)
2018-07-30_08:42:25.20462 [ConcreteUdevDevice] 2018/07/30 08:42:25 DEBUG - Ensuring Device Readable, Attempt 0 out of 5
2018-07-30_08:42:25.20463 [ConcreteUdevDevice] 2018/07/30 08:42:25 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:25.20464 [ConcreteUdevDevice] 2018/07/30 08:42:25 DEBUG - Ignorable error from readByte: open /dev/sr0: no such file or directory
2018-07-30_08:42:25.70473 [ConcreteUdevDevice] 2018/07/30 08:42:25 DEBUG - Ensuring Device Readable, Attempt 1 out of 5
2018-07-30_08:42:25.70476 [ConcreteUdevDevice] 2018/07/30 08:42:25 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:25.70477 [ConcreteUdevDevice] 2018/07/30 08:42:25 DEBUG - Ignorable error from readByte: open /dev/sr0: no such file or directory
2018-07-30_08:42:26.20492 [ConcreteUdevDevice] 2018/07/30 08:42:26 DEBUG - Ensuring Device Readable, Attempt 2 out of 5
2018-07-30_08:42:26.20496 [ConcreteUdevDevice] 2018/07/30 08:42:26 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:26.20497 [ConcreteUdevDevice] 2018/07/30 08:42:26 DEBUG - Ignorable error from readByte: open /dev/sr0: no such file or directory
2018-07-30_08:42:26.70509 [ConcreteUdevDevice] 2018/07/30 08:42:26 DEBUG - Ensuring Device Readable, Attempt 3 out of 5
2018-07-30_08:42:26.70512 [ConcreteUdevDevice] 2018/07/30 08:42:26 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:26.70513 [ConcreteUdevDevice] 2018/07/30 08:42:26 DEBUG - Ignorable error from readByte: open /dev/sr0: no such file or directory
2018-07-30_08:42:27.20530 [ConcreteUdevDevice] 2018/07/30 08:42:27 DEBUG - Ensuring Device Readable, Attempt 4 out of 5
2018-07-30_08:42:27.20533 [ConcreteUdevDevice] 2018/07/30 08:42:27 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:27.20534 [ConcreteUdevDevice] 2018/07/30 08:42:27 DEBUG - Ignorable error from readByte: open /dev/sr0: no such file or directory
2018-07-30_08:42:27.70554 [ConcreteUdevDevice] 2018/07/30 08:42:27 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:27.70557 [settingsService] 2018/07/30 08:42:27 ERROR - Failed loading settings via fetcher: Getting settings from all sources: Reading files from CDROM: Waiting for CDROM to be ready: Reading udev device: open /dev/sr0: no such file or directory
2018-07-30_08:42:27.70559 [settingsService] 2018/07/30 08:42:27 ERROR - Failed reading settings from file Opening file /var/vcap/bosh/settings.json: open /var/vcap/bosh/settings.json: no such file or directory
2018-07-30_08:42:27.70560 [main] 2018/07/30 08:42:27 ERROR - App setup Running bootstrap: Fetching settings: Invoking settings fetcher: Getting settings from all sources: Reading files from CDROM: Waiting for CDROM to be ready: Reading udev device: open /dev/sr0: no such file or directory
2018-07-30_08:42:27.70561 [main] 2018/07/30 08:42:27 ERROR - Agent exited with error: Running bootstrap: Fetching settings: Invoking settings fetcher: Getting settings from all sources: Reading files from CDROM: Waiting for CDROM to be ready: Reading udev device: open /dev/sr0: no such file or directory
2018-07-30_08:42:27.71258 [main] 2018/07/30 08:42:27 DEBUG - Starting agent


<and this whole block just keeps repeating>

How do I login to the VM that BOSH creates when I change the setting to "default BOSH password" for all VMs in Ops Mgr? 将Ops Mgr中所有虚拟机的设置更改为“默认BOSH密码”时,如何登录BOSH创建的虚拟机?

That's not a good idea. 那不是一个好主意。 The default password is well-known and you should almost always use randomly generated passwords. 默认密码是众所周知的,您几乎应该始终使用随机生成的密码。 I'm not honestly sure why that's even an option. 老实说,我不确定为什么甚至可以选择。 The only thing that comes to mind might be some extremely rare troubleshooting scenario. 唯一想到的可能是一些极为罕见的故障排除方案。

That said, you can securely obtain the randomly generated password through Ops Manager, if you need to access the VM manually. 也就是说,如果您需要手动访问虚拟机,则可以通过Ops Manager安全地获取随机生成的密码。 You can also securely access VMs via bosh ssh , and credentials are handled automatically. 您还可以通过bosh ssh安全地访问VM,并自动处理凭据。 Even for troubleshooting, you don't usually need that option. 即使是进行故障排除,通常也不需要该选项。

There is no bosh.yml under /var/tempest/workspaces/default/deployments. / var / tempest / workspaces / default / deployments下没有bosh.yml。 Some docs point to it. 一些文档指出了这一点。 So I don't know what settings its applying. 所以我不知道它应用什么设置。 Is the location wrong? 位置错误吗?

The location is correct but the file contains sensitive information so Ops Manager deletes it immediately after it's done being used. 该位置正确,但是文件包含敏感信息,因此Ops Manager在使用完后会立即将其删除。

If you want to see the contents of the file, the easy way is to navigate to https://ops-man-fqdn/debug/files and you can see all of the configuration files, including your bosh.yml . 如果要查看文件的内容,最简单的方法是导航到https://ops-man-fqdn/debug/files然后可以看到所有配置文件,包括bosh.yml The hard way is to watch the folder above while a deploy is going on and you'll see the file exist for a short period of time. 困难的方法是在进行部署时监视上面的文件夹,您会看到文件存在很短的时间。 You can make a copy during that window. 您可以在该窗口中进行复制。 The only advantage to the hard way is that you'll get the actual file, whereas the debug endpoint shows a file with sensitive info redacted. 困难方式的唯一好处是您将获得实际文件,而调试端点将显示一个已删除敏感信息的文件。

Is there a way to change the stemcell used by the OpsMgr VM? 有没有办法更改OpsMgr VM使用的干细胞? Maybe I cantry using the previous build? 也许我可以尝试使用以前的版本?

I don't think this is an issue with the stemcell. 我认为这与干细胞无关。 There are lots of people using those and not having this issue. 有很多人在使用这些,而没有这个问题。 If a larger issue like this were found with a stemcell, you would see a notice up on Pivotal Network and Pivotal would publish a new, fixed stemcell. 如果在干细胞中发现了类似的较大问题,您将在Pivotal Network上看到一个通知,Pivotal将发布一个新的固定干细胞。

The problem also seems to be with how the VM is receiving it's initial bootstrap configuration. 问题似乎也与VM如何接收其初始引导程序配置有关。 I'd suggest looking into that more before messing with the stemcells. 我建议您在弄干干细胞之前再进行研究。 See below. 见下文。

How is the agent.json actually populated? 实际上是如何填充agent.json的?

Believe it or not, for vSphere environments, that file is read from a fake CD-ROM that's attached to the VM. 不管您是否相信,对于vSphere环境,该文件都是从虚拟机附带的伪造CD-ROM中读取的。 There's not a lot documented, but it's mentioned briefly in the BOSH docs here. 没有太多的文档记录,但是在此处的BOSH文档中已简要提及。

https://bosh.io/docs/cpi-api-v1-method/create-vm/#agent-settings https://bosh.io/docs/cpi-api-v1-method/create-vm/#agent-settings

Any suggestions on troubleshooting this? 关于排除故障有什么建议吗?

Look to understand why the CD-ROM can't be mounted. 希望了解为什么无法安装CD-ROM。 BOSH needs that to get it's bootstrap configuration, so you need to make that work. BOSH需要它来进行引导程序配置,因此您需要使其工作。 If there is something in your vSphere environment that is preventing the CD-ROM from being mounted, you'll need to modify it to allow the CD-ROM to be mounted. 如果您的vSphere环境中存在阻止安装CD-ROM的问题,则需要对其进行修改以允许安装CD-ROM。

If there's nothing on the vSphere side, I think the next step would be to check the standard system logs under /var/log and dmesg output to see if there are any errors or clues as to why the CD-ROM can't be loaded/read from. 如果在vSphere方面没有任何内容,我认为下一步是检查/var/logdmesg输出下的标准系统日志,以查看是否存在任何错误或提示无法加载CD-ROM的原因/读取。

Lastly, try doing some manual tests to mount & read from the CD-ROM. 最后,尝试进行一些手动测试以安装和读取CD-ROM。 Start by looking at one of the BOSH deployed VMs in the vSphere client, look at the hardware settings and make sure there is a CD-ROM attached. 首先查看vSphere客户端中BOSH部署的VM之一,查看硬件设置并确保附有CD-ROM。 It should point to a file called env.iso in the same folder as the VM on your datastore. 它应指向与数据存储上的VM相同的文件夹中的env.iso文件。 If that's attached & connected, start up the VM and try to mount the CD-ROM. 如果已连接并连接,请启动VM并尝试安装CD-ROM。 You should be able to see the BOSH config files on that drive. 您应该能够在该驱动器上看到BOSH配置文件。

Hope that helps! 希望有帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM