简体   繁体   English

为什么nohup命令可能无法在实例启动时不运行Terraform“ aws_instance” user_data中失败

[英]Why might a nohup command fail in Terraform “aws_instance” user_data not run at instance launch

I'm using Terraform v0.11.7 and AWS provider 1.30 to build an environment to run load tests with locust built on Debian 9.5 AMI. 我正在使用Terraform v0.11.7和AWS提供程序1.30来构建环境,以使用基于Debian 9.5 AMI构建的蝗虫来运行负载测试。

My module exposes a num_instances var used to determine the locust command line used. 我的模块公开了一个num_instances var,用于确定所使用的蝗虫命令行。 Below is my configuration. 下面是我的配置。

resource "aws_instance" "locust_master" {
  count                   = 1

  ami                     = "${var.instance_ami}"
  instance_type           = "${var.instance_type}"
  key_name                = "${var.instance_ssh_key}"
  subnet_id               = "${var.subnet}"
  tags                    = "${local.tags}"
  vpc_security_group_ids  = ["${local.vpc_security_group_ids}"]

  user_data = <<-EOF
              #!/bin/bash
              # Install pip on instance.
              curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
              sudo python3 get-pip.py
              rm get-pip.py
              # Install locust and pyzmq on instance.
              sudo pip3 install locustio pyzmq
              # Write locustfile to instance.
              echo "${data.local_file.locustfile.content}" > ${local.locustfile_py}
              # Write locust start script to instance.
              echo "nohup ${var.num_instances > 1 ? local.locust_master_cmd : local.locust_base_cmd} &" > ${local.start_sh}
              # Start locust.
              sh ${local.start_sh}
              EOF
}

resource "aws_instance" "locust_slave" {
  count                   = "${var.num_instances - 1}"

  ami                     = "${var.instance_ami}"
  instance_type           = "${var.instance_type}"
  key_name                = "${var.instance_ssh_key}"
  subnet_id               = "${var.subnet}"
  tags                    = "${local.tags}"
  vpc_security_group_ids  = ["${local.vpc_security_group_ids}"]

  user_data = <<-EOF
              #!/bin/bash
              set -x
              # Install pip on instance.
              curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
              sudo python3 get-pip.py
              rm get-pip.py
              # Install locust and pyzmq on instance.
              sudo pip3 install locustio pyzmq
              # Write locustfile to instance.
              echo "${data.local_file.locustfile.content}" > ${local.locustfile_py}
              # Write locust master dns name to instance.
              echo ${aws_instance.locust_master.private_dns} > ${local.locust_master_host_file}
              # Write locust start script to instance.
              echo "nohup ${local.locust_slave_cmd} &" > ${local.start_sh}
              # Start locust.
              sh ${local.start_sh}
              EOF
}

If I SSH into the locust_master instance after it has been launched, I see the /home/admin/start.sh script, but it does not appear to have been run, as I do not see the nohup.out file and locust is not in my running processes. 如果我在locust_master之后通过SSH进入locust_master实例,我会看到/home/admin/start.sh脚本,但是它似乎没有运行,因为我看不到nohup.out文件,而locust没有在我的跑步过程中 If I manually run the same sh /home/admin/start.sh script on that host, the service starts, and I can disconnect from the host and still access it. 如果我在该主机上手动运行相同的sh /home/admin/start.sh脚本,该服务将启动,并且我可以断开与主机的连接,并仍然可以访问它。 The same problem is exhibited on the locust_slave host(s). locust_slave主机上也出现相同的问题。

What might cause running the start.sh in aws_instance user_data to fail? 是什么导致aws_instance user_data中的start.sh运行失败? Are there any gotchas I should be aware of when executing scripts in user_data? 在user_data中执行脚本时,我应该注意哪些陷阱?

Many thanks in advance! 提前谢谢了!

Thanks for the tip! 谢谢你的提示! I was not aware of that log file, and it did point it out. 我不知道该日志文件,但确实指出了该文件。 It was a relative path issue. 这是一个相对路径问题。 I assumed that user_data commands would be executed with /home/admin as the working directory, so locust couldn't find the locustfile.py file. 我假设将以/home/admin作为工作目录执行user_data命令,所以locust找不到locustfile.py文件。 Using absolute path to locustfile.py solved the problem. 使用绝对路径到locustfile.py解决了该问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM