EC2 用户数据脚本在 Centos7 AMI 上运行非常缓慢

Question

There appears to be a 25 second delay every time a userdata script touches the disk on the centos 7 AMI from AWS marketplace.每次用户数据脚本接触来自 AWS 市场的 centos 7 AMI 上的磁盘时，似乎都有 25 秒的延迟。

Here's my script:这是我的脚本：

#!/bin/bash -ex
echo "[TIMER] START $(date +%s.%N)"
current_user=$(whoami)
echo "Running as: $current_user"
sudo id -u myuser &>/dev/null || sudo useradd myuser
echo "[TIMER] CreatedUser $(date +%s.%N)"
time sudo yum update -y
echo "[TIMER] Yum Update $(date +%s.%N)"
sudo mkdir -p /opt/myuser/resources
echo "[TIMER] Create /opt/myuser/resources $(date +%s.%N)"

sudo bash -c "cat > /etc/systemd/system/my-service.service" <<EOF
[Unit]
Description=My Service
After=network-online.target

[Service]
User=myuser
Group=myuser
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/bash -ex -c 'echo "Hello World"'

[Install]
Alias=my-service
WantedBy=default.target
EOF

echo "[TIMER] Make my-service.service $(date +%s.%N)"
sudo chmod 644 /etc/systemd/system/my-service.service
echo "[TIMER] Chmod $(date +%s.%N)"
sudo systemctl daemon-reload
echo "[TIMER] daemon-reload $(date +%s.%N)"
sudo systemctl enable my-service
echo "[TIMER] enable $(date +%s.%N)"
sudo systemctl start my-service
echo "[TIMER] END: my-service $(date +%s.%N)"

I launch a c5.large of this AMI and use the above as my userdata script: https://aws.amazon.com/marketplace/pp/B00O7WM7QW我启动了这个 AMI 的 c5.large，并将上面的内容用作我的用户数据脚本： https://aws.amazon.com/marketplace/pp/B00O7WM7QW

Timers result:定时器结果：

[TIMER] START 1546978269.809559549
[TIMER] CreatedUser 1546978320.472706964
[TIMER] Yum Update 1546978356.991642552
[TIMER] Create /opt/myuser/resources 1546978382.033044767
[TIMER] Make my-service.service 1546978407.074353857
[TIMER] Chmod 1546978432.111791937
[TIMER] daemon-reload 1546978457.195078083
[TIMER] enable 1546978482.265036318
[TIMER] END: my-service 1546978507.313735938

| CENTOS 7                                                  |                      |             |
|-----------------------------------------------------------|----------------------|-------------|
|                                                           |                      |             |
| log                                                       | timestamp            | seconds     |
| [TIMER] START 1546978269.809559549                        | 1546978269.809559549 |             |
| [TIMER] CreatedUser 1546978320.472706964                  | 1546978320.472706964 | 50.66315007 |
| [TIMER] Yum Update 1546978356.991642552                   | 1546978356.991642552 | 36.51893997 |
| [TIMER] Create /opt/myuser/resources 1546978382.033044767 | 1546978382.033044767 | 25.04139996 |
| [TIMER] Make my-service.service 1546978407.074353857      | 1546978407.074353857 | 25.04131007 |
| [TIMER] Chmod 1546978432.111791937                        | 1546978432.111791937 | 25.03743982 |
| [TIMER] daemon-reload 1546978457.195078083                | 1546978457.195078083 | 25.08328009 |
| [TIMER] enable 1546978482.265036318                       | 1546978482.265036318 | 25.06995988 |
| [TIMER] END: my-service 1546978507.313735938              | 1546978507.313735938 | 25.04870009 |
|                                                           |                      |             |
|                                                           | total (s)            | 237.50418   |
|                                                           |                      |             |
|                                                           | total (m)            | 3.958402999 |

If you scroll to the right in my ASCII table you can see that simple commands like mkdir , chmod , and useradd are taking 25 seconds.如果您在我的 ASCII 表中滚动到右侧，您会看到像mkdir 、 chmod和useradd这样的简单命令需要 25 秒。 Why does this happen?为什么会这样？

EDIT:编辑：

contents of /etc/hosts /etc/hosts的内容

$ hostname
ip-172-31-40-213.us-west-2.compute.internal
$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

example log from /var/log/messages , the systemd logs show that creating the sudo session takes the 25 seconds:来自/var/log/messages的示例日志，systemd 日志显示创建 sudo session 需要 25 秒：

Jan  9 23:50:32 ip-172-31-35-166 cloud-init: + echo '[TIMER] Make my-service.service 1547077832.899069408'
Jan  9 23:50:32 ip-172-31-35-166 cloud-init: [TIMER] Make my-service.service 1547077832.899069408
Jan  9 23:50:32 ip-172-31-35-166 cloud-init: + sudo chmod 644 /etc/systemd/system/my-service.service
Jan  9 23:50:32 ip-172-31-35-166 systemd: Removed slice User Slice of root.
Jan  9 23:50:32 ip-172-31-35-166 systemd: Created slice User Slice of root.
Jan  9 23:50:32 ip-172-31-35-166 systemd: Started Session c3 of user root.
Jan  9 23:50:57 ip-172-31-35-166 cloud-init: ++ date +%s.%N
Jan  9 23:50:57 ip-172-31-35-166 cloud-init: + echo '[TIMER] Chmod 1547077857.946078493'
Jan  9 23:50:57 ip-172-31-35-166 cloud-init: [TIMER] Chmod 1547077857.946078493

journalctl log shows the likely culprit: journalctl日志显示可能的罪魁祸首：

Jan 09 23:50:32 ip-172-31-35-166.us-west-2.compute.internal cloud-init[1197]: + echo '[TIMER] Make my-service.service 1547077832.899069408'
Jan 09 23:50:32 ip-172-31-35-166.us-west-2.compute.internal cloud-init[1197]: [TIMER] Make my-service.service 1547077832.899069408
Jan 09 23:50:32 ip-172-31-35-166.us-west-2.compute.internal cloud-init[1197]: + sudo chmod 644 /etc/systemd/system/my-service.service
Jan 09 23:50:32 ip-172-31-35-166.us-west-2.compute.internal systemd[1]: Removed slice User Slice of root.
Jan 09 23:50:32 ip-172-31-35-166.us-west-2.compute.internal sudo[13392]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/chmod 644 /etc/systemd/system/my-service.service
Jan 09 23:50:32 ip-172-31-35-166.us-west-2.compute.internal systemd[1]: Created slice User Slice of root.
Jan 09 23:50:32 ip-172-31-35-166.us-west-2.compute.internal systemd[1]: Started Session c3 of user root.
Jan 09 23:50:57 ip-172-31-35-166.us-west-2.compute.internal sudo[13392]: pam_systemd(sudo:session): Failed to create session: Connection timed out
Jan 09 23:50:57 ip-172-31-35-166.us-west-2.compute.internal sudo[13392]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jan 09 23:50:57 ip-172-31-35-166.us-west-2.compute.internal sudo[13392]: pam_unix(sudo:session): session closed for user root
Jan 09 23:50:57 ip-172-31-35-166.us-west-2.compute.internal cloud-init[1197]: ++ date +%s.%N
Jan 09 23:50:57 ip-172-31-35-166.us-west-2.compute.internal cloud-init[1197]: + echo '[TIMER] Chmod 1547077857.946078493'
Jan 09 23:50:57 ip-172-31-35-166.us-west-2.compute.internal cloud-init[1197]: [TIMER] Chmod 1547077857.946078493

Googling more, I find: https://github.com/systemd/systemd/issues/2863谷歌搜索更多，我发现： https://github.com/systemd/systemd/issues/2863

This has been fixed in a later version of systemd but centos on AWS EC2 comes with systemd version 219 and I can't really update it myself.这已在更高版本的 systemd 中修复，但 AWS EC2 上的 centos 附带 systemd 版本 219，我自己无法真正更新它。 Any suggestions?有什么建议么？ Is there some config I can place to avoid this issue?我可以放置一些配置来避免这个问题吗？ I can remove most instances of sudo in my userdata script but I do need it for things like:我可以在我的用户数据脚本中删除大多数sudo 实例，但我确实需要它来执行以下操作：

sudo -H -u myuser bash -ex <<EOF
  ... commands
EOF

FWIW Amazon Linux 2 comes with the same version of systemd but does not exhibit this behavior. FWIW Amazon Linux 2 带有相同版本的 systemd，但没有表现出这种行为。

Answer 1

Issue and solution is noted in Redhat's link here: https://access.redhat.com/solutions/5692661 Redhat 的链接中记录了问题和解决方案： https://access.redhat.com/solutions/5692661

In summary, it's not normal to run commands as sudo in a userdata script, thus the default policy is to not allow this, which causes a 25sec delay while it attempts to run pam_systemd and times out due to the dbus 25sec timeout.总之，在 userdata 脚本中以 sudo 运行命令是不正常的，因此默认策略是不允许这样做，这会导致 25 秒延迟，同时它尝试运行 pam_systemd 并由于 dbus 25 秒超时而超时。

In my case I was attempting to run su <user> -c "command" .就我而言，我试图运行su <user> -c "command" 。 My error was found by running journalctl -b (-b is for current boot session).我的错误是通过运行journalctl -b发现的（-b 用于当前引导会话）。 And you can find the related error log like:您可以找到相关的错误日志，例如：

pam_systemd(su:session): Failed to create session: Connection time out

EC2 用户数据脚本在 Centos7 AMI 上运行非常缓慢

问题描述

1 个解决方案

解决方案1
0 2023-01-11 20:08:01

EC2 用户数据脚本在 Centos7 AMI 上运行非常缓慢

问题描述

1 个解决方案

解决方案1 0 2023-01-11 20:08:01

解决方案1
0 2023-01-11 20:08:01