简体   繁体   English

桉树实例将无法启动,立即因停止而失败->待处理->因错误而停止

[英]Eucalyptus instance will not start, immediately fails from stopped --> pending --> stopping with errors

I was running Eucalyptus 4.0 - the environment is sounds and has been up for a couple of years without issue prior. 我正在运行Eucalyptus 4.0-环境良好,已经运行了好几年没有问题。 I went through the shutdown procedure (stop all instances, stop eucalyptus-cloud, stop eucalyptus-cc, stop each node) and shutdown the environment recently for a move. 我经历了关闭过程(停止所有实例,停止eucalyptus-cloud,停止eucalyptus-cc,停止每个节点)并最近关闭环境以进行迁移。

When I restored the environment all of the services came back online but no instances would start - new, old, etc. I noticed some issues about IP allocation (network has not changed in this process) so I release all of them back to the cloud and then re-allocated them. 当我恢复环境时,所有服务都重新联机,但没有实例会启动-新的,旧的等。我注意到有关IP分配的一些问题(在此过程中网络没有更改),因此我将所有这些都释放回云中然后重新分配它们。

I then had came across some online information due to other errors I was observing and ended up modifying two parameters: 然后由于观察到的其他错误,我遇到了一些在线信息,最终修改了两个参数:

euca-modify-property -p cloud.network.global_max_network_tag=2048
euca-modify-property -p cloud.network.global_min_network_tag=1024

Once this was done and I restarted the cloud again I was able to successfully launch new instances. 完成此操作后,我再次重新启动了云,便能够成功启动新实例。 With no long on the existing instances I upgraded --> 4.0.1 --> 4.0.2. 在没有现有实例的情况下,我升级了-> 4.0.1-> 4.0.2。 Everything appeared upgrade without issue (my console still reports 4.0.0 but euca-version reports eucalyptus 4.0.2 with euca2ools 3.1.1/Omega). 一切似乎都没有问题地升级了(我的控制台仍然报告4.0.0,但是euca版本报告的是eucalyptus 4.0.2和euca2ools 3.1.1 / Omega)。

However, I'm about 14 hours into it and I cannot start an old [EBS-backed] instance. 但是,我大约需要14个小时的时间,并且无法启动旧的[EBS支持]实例。 It goes from stopped --> pending --> stopping --> stopped in a matter of seconds - and you can only even tell that from the logs. 它从停止->待处理->停止->在几秒钟内停止-甚至只能从日志中得知。 I believe there is some extra data leftover in the "metadata_extant_network" table (maybe something did not shutdown properly?) but I cannot identify what, nor can I remove records manually due to FK constraints, and I don't want to risk corrupting the database. 我相信“ metadata_extant_network”表中还有一些多余的数据(也许某些东西没有正确关闭?),但是我无法确定是什么,也由于FK约束我不能手动删除记录,并且我不想冒险破坏数据库。 Here are my logs when I attempt to start an instance - there must be a "proper" way to do this ... : 这是我尝试启动实例时的日志-必须有一种“适当”的方式来执行此操作...:

cloud-exhaust.log cloud-exhaust.log

Tue Dec 9 10:04:29 2014  WARN [org.jboss.netty.channel.DefaultChannelPipeline:Eucalyptus.eucalyptus:Ephemeral
[bitronix.tm.twopc.Preparer:Eucalyptus.eucalyptus:EphemeralConfiguration:arn:euca:eucalyptus:::com.eucalyptus.network.DispatchingNetworkingService/.class java.util.concurrent.ThreadPoolExecutor$Worker#346] executing transaction with 0 enlisted resource
Tue Dec 9 10:04:30 2014  WARN [org.hibernate.engine.jdbc.spi.SqlExceptionHelper:Eucalyptus.eucalyptus:EphemeralConfiguration:arn:euca:eucalyptus:::com.eucalyptus.network.DispatchingNetworkingService/.class java.util.concurrent.ThreadPoolExecutor$Worker#346] SQL Error: 0, SQLState: 23503
Tue Dec 9 10:04:30 2014 ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper:Eucalyptus.eucalyptus:EphemeralConfiguration:arn:euca:eucalyptus:::com.eucalyptus.network.DispatchingNetworkingService/.class java.util.concurrent.ThreadPoolExecutor$Worker#346] ERROR: update or delete on table "metadata_extant_network" violates foreign key constraint "fk6a62681ed068841d" on table "metadata_network_group"
  Detail: Key (id)=(c75a9938419237320141929ac6a02eea) is still referenced from table "metadata_network_group".

postgresql-Tue.log postgresql-Tue.log

ERROR:  update or delete on table "metadata_extant_network" violates foreign key constraint "fk6a62681ed068841d" on table "metadata_network_group"
DETAIL:  Key (id)=(c75a9938419237320141929ac6a02eea) is still referenced from table "metadata_network_group".
STATEMENT:  delete from metadata_extant_network where id=$1 and version=$2
ERROR:  update or delete on table "metadata_extant_network" violates foreign key constraint "fk6a62681ed068841d" on table "metadata_network_group"
DETAIL:  Key (id)=(c75a9938419237320141929ac6a02eea) is still referenced from table "metadata_network_group".
STATEMENT:  delete from metadata_extant_network where id=$1 and version=$2

cloud-output.log cloud-output.log

2014-12-09 10:04:30 ERROR | org.hibernate.exception.ConstraintViolationException: could not execute statement
2014-12-09 10:04:41  INFO | :1418144681687:Address:ADDRESS_STATE:TOP:Address 192.168.0.216 arn:aws:euare:000000000001:user/nobody available 0.0.0.0  AddressTransition system:unallocated->impending(true)
2014-12-09 10:04:41 ERROR | com.eucalyptus.cloud.util.MetadataException: org.hibernate.LazyInitializationException: could not initialize proxy - no Session
2014-12-09 10:04:41  WARN | Aborting resource token: ResourceToken:i-812D40D4:resources=TypedContext:{com.eucalyptus.util.TypedKey(NetworkResources)=[com.eucalyptus.compute.common.network.PrivateNetworkIndexResource(5), com.eucalyptus.compute.common.network.PublicIPResource()]}

cloud-debug.log cloud-debug.log

Tue Dec 9 10:04:30 2014 ERROR [NetworkGroups:Eucalyptus.eucalyptus:EphemeralConfiguration:arn:euca:eucalyptus:::com.eucalyptus.network.DispatchingNetworkingService/.class java.util.concurrent.ThreadPoolExecutor$Worker#346] org.hibernate.exception.ConstraintViolationException: could not execute statement
Tue Dec 9 10:04:41 2014  INFO [AdmissionControl:Compute.10] Found authorized clusters: [cc-192.168.0.150]
Tue Dec 9 10:04:41 2014  INFO [AdmissionControl:Compute.10] Availability: cc-192.168.0.150 -> 5
Tue Dec 9 10:04:41 2014 ERROR [ClusterAllocator:Eucalyptus.cluster:ClusterConfiguration:arn:euca:eucalyptus:cluster01:cluster:cc-192.168.0.150/.class java.util.concurrent.ThreadPoolExecutor$Worker#458] com.eucalyptus.cloud.util.MetadataException: org.hibernate.LazyInitializationException: could not initialize proxy - no Session
Tue Dec 9 10:04:41 2014  WARN [Allocations:Eucalyptus.cluster:ClusterConfiguration:arn:euca:eucalyptus:cluster01:cluster:cc-192.168.0.150/.class java.util.concurrent.ThreadPoolExecutor$Worker#458] Aborting resource token: ResourceToken:i-812D40D4:resources=TypedContext:{com.eucalyptus.util.TypedKey(NetworkResources)=[com.eucalyptus.compute.common.network.PrivateNetworkIndexResource(5), com.eucalyptus.compute.common.network.PublicIPResource()]}

cloud-error.log cloud-error.log

Tue Dec 9 10:04:30 2014 ERROR [NetworkGroups:Eucalyptus.eucalyptus:EphemeralConfiguration:arn:euca:eucalyptus:::com.eucalyptus.network.DispatchingNetworkingService/.class java.util.concurrent.ThreadPoolExecutor$Worker#346] org.hibernate.exception.ConstraintViolationException: could not execute statement
Tue Dec 9 10:04:41 2014 ERROR [ClusterAllocator:Eucalyptus.cluster:ClusterConfiguration:arn:euca:eucalyptus:cluster01:cluster:cc-192.168.0.150/.class java.util.concurrent.ThreadPoolExecutor$Worker#458] [com.eucalyptus.cloud.run.ClusterAllocator.cleanupOnFailure(ClusterAllocator.java):274] com.eucalyptus.cloud.util.MetadataException: org.hibernate.LazyInitializationException: could not initialize proxy - no Session

So then I logged into the PostgreSQL database directly, removed the FK constraints, and manually removed the rows identified in the logs: 因此,然后我直接登录PostgreSQL数据库,删除了FK约束,并手动删除了日志中标识的行:

ALTER TABLE metadata_extant_network DROP CONSTRAINT "fk45157a25f1ac537e";
ALTER TABLE metadata_network_group DROP CONSTRAINT "fk6a62681ed068841d";
DELETE FROM metadata_extant_network WHERE id='c75a9938419237320141929ac6a02eea';

The delete was successful put after attempting to restart the instances I receive a new error: 尝试重新启动实例后,删除操作成功完成,我收到一个新错误:

euca-start-instances: error (InternalFailure): Failed to allocate network tag for network: arn:aws:euca:eucalyptus:821881850233:security-group/ownCloud/: no network tags are free.

Tue Dec 9 11:04:23 2014 ERROR [org.mule.exception.DefaultMessagingExceptionStrategy:Compute.15] 
********************************************************************************
Message               : Component that caused exception is: DefaultJavaComponent{Compute.component}. Message payload is of type: StartInstancesType
Code                  : MULE_ERROR--2
--------------------------------------------------------------------------------
Exception stack is:
1. Failed to allocate network tag for network: arn:aws:euca:eucalyptus:821881850233:security-group/ownCloud/: no network tags are free. (com.eucalyptus.cloud.util.NotEnoughResourcesException)
  com.eucalyptus.network.NetworkGroup:325 (null)
2. Failed to allocate network tag for network: arn:aws:euca:eucalyptus:821881850233:security-group/ownCloud/: no network tags are free. (com.eucalyptus.cloud.util.NotEnoughResourcesException)
  com.eucalyptus.cloud.run.AdmissionControl$RunAdmissionControl:148 (null)
3. Failed to allocate network tag for network: arn:aws:euca:eucalyptus:821881850233:security-group/ownCloud/: no network tags are free. (java.lang.RuntimeException)
  com.eucalyptus.util.Exceptions:255 (null)
4. Failed to allocate network tag for network: arn:aws:euca:eucalyptus:821881850233:security-group/ownCloud/: no network tags are free. (com.eucalyptus.util.EucalyptusCloudException)
  com.eucalyptus.compute.service.ComputeService:69 (null)
5. Component that caused exception is: DefaultJavaComponent{Compute.component}. Message payload is of type: StartInstancesType (org.mule.component.ComponentException)
  org.mule.component.DefaultComponentLifecycleAdapter:352 (http://www.mulesoft.org/docs/site/current3/apidocs/org/mule/component/ComponentException.html)
--------------------------------------------------------------------------------
Root Exception stack trace:
com.eucalyptus.cloud.util.NotEnoughResourcesException: Failed to allocate network tag for network: arn:aws:euca:eucalyptus:821881850233:security-group/ownCloud/: no network tags are free.
    at com.eucalyptus.network.NetworkGroup.extantNetwork(NetworkGroup.java:325)
    at com.eucalyptus.network.GenericNetworkingService$_prepareSecurityGroup_closure3_closure12.doCall(GenericNetworkingService.groovy:198)
    at sun.reflect.GeneratedMethodAccessor770.invoke(Unknown Source)
    + 3 more (set debug level logging or '-Dmule.verbose.exceptions=true' for everything)
********************************************************************************

It looks like the you have configured a value for vlan tags that is not compatible with your security group settings. 您似乎已为vlan标签配置了与您的安全组设置不兼容的值。 You should not restrict the global range unless you need to reserve vlan tags for some other use. 除非需要保留vlan标签用于其他用途,否则不应限制全局范围。

https://www.eucalyptus.com/docs/eucalyptus/4.0.2/#install-guide/configuring_security_groups.html https://www.eucalyptus.com/docs/eucalyptus/4.0.2/#install-guide/configuring_security_groups.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么postgresql不会立即开始返回行? - Why does not postgresql start returning rows immediately? AWS EC2 免费套餐实例经常自动停止 - AWS EC2 free tier instance is automatically stopping frequently PGAdmin 4 备份数据库失败,没有错误 - PGAdmin 4 fails to backup database with no errors 根据使用情况自动启动和停止 PostgreSQL Amazon RDS 实例 - Starting and Stopping PostgreSQL Amazon RDS Instance Automatically Based on Usage Postgres.app无法启动 - Postgres.app fails to start Airflow 调度器启动任务失败 - Airflow scheduler fails to start tasks Pgpool 无法在 kubernetes 作为 pod 启动 - Pgpool fails to start on kubernetes as a pod 尝试访问 Amazon RDS 数据库实例时出现来自 Lambda 的超时错误 - timeout errors from Lambda when trying to access an Amazon RDS DB instance 从具有私有和公共 IP 的虚拟机连接到私有 IP 上的 Google Cloud SQL 实例失败 - Connecting to Google Cloud SQL instance on private IP from a VM with both private and public IPs fails 在ubuntu启动时停止启动postgresql - Stopping postgresql from starting on ubuntu startup
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM