简体   繁体   中英

Why Gnocchi apply 'server_group' to resource slowly?

I add the metadata "metering.server_group":"corey-group" to an instance while creating, and check it by using nova show , it is applied, then I check the Gnocchi resource using gnocchi resource show --type instance ${instance-id} , the attribute server_group is None in the begining, but after a while, it will be applied (always on the hour, ex: 07:00, 08:00...), I have no idea what happens, I think this issue will cause Gnocchi gets incorrect datasets while doing aggregation, so I spent some times to troubleshoot it.

First of all, the attributes of Gnocchi resource stored in database:

MariaDB [(none)]> use gnocchi
MariaDB [gnocchi]> select * from resource_type where name='instance';
# check its tablename, ex: rt_xxxxxx
MariaDB [gnocchi]> select * from rt_xxxxxx where display_name='corey-vm';

+----------------+---------------------+-----------+--------------------------------------+-------------------------+------------------+---+
| display_name   | host                | image_ref | flavor_id                            | server_group | id             | flavor_name  |
+----------------+---------------------+-----------+--------------------------------------+-------------------------+------------------+---+
| corey-vm       | corey-test-com-001  | NULL      | 26e46b4c-23bd-4224-a609-29bd3094a18e | NULL         | xxxxxx         | corey-flavor |
+----------------+---------------------+-----------+--------------------------------------+-------------------------+------------------+---+

As you can see, the column server_group should be corey-group , but it is always NULL when the instance is just created, and seems like ceilometer updates the resource per hour on the hour .

I added some log in the file ceilometer/publisher/gnocchi.py , and found that it updates resource every minutes, but the variable resource_extra gets server_group only on the hour , that's why it is None is the begining.

Here are some parts of the logs

2020-11-09 11:59:15 DEBUG ceilometer.publisher.gnocchi Resource {'host': u'test-com-002', 'display_name': u'vm-001', 'flavor_id': u'xxx', 'flavor_name': u'xxx'} publish_samples /usr/lib/python2.7/site-packages/ceilometer/publisher/gnocchi.py:345

2020-11-09 12:00:15 DEBUG ceilometer.publisher.gnocchi Resource {'host': u'test-com-002', 'display_name': u'vm-001', 'flavor_name': u'xxx', 'server_group': 'corey-group' } publish_samples /usr/lib/python2.7/site-packages/ceilometer/publisher/gnocchi.py:345

2020-11-09 12:01:15 DEBUG ceilometer.publisher.gnocchi Resource {'host': u'test-com-002', 'display_name': u'vm-001', 'flavor_id': u'xxx', 'flavor_name': u'xxx'} publish_samples /usr/lib/python2.7/site-packages/ceilometer/publisher/gnocchi.py:345

But I stuck at this point, I can't understand why the variable resource_extra can't gets server_group each time. What causes this happpening exactly? (Running on Queens)

I would appreciate any ideas.

Update 09/11/2020

After some days of troubleshooting, I still can't find the root cause.

But I found a command line to apply the 'server_group' manually, that can help me to avoid Gnocchi gets incorrect datasets to aggregate.

Here it is:

gnocchi resource update --type instance -a server_group:corey-group ${resource_id}

Update 11/11/2020

I tried to grep the integer 3600 and modify them to 300, but nothing changed, below are what I've tried.

/etc/ceilometer/ceilometer.conf

[compute]
resource_cache_expiry = 300

ceilometer/compute/discovery.py

cfg.IntOpt('resource_cache_expiry',
            default=300,

ceilometer/publisher/zaqar.py

DEFAULT_TTL = 300

Update 12/11/2020

I can't reproduce this issue on Pike.

Maybe you can refer to the following discussions:

Heat autoscaling with gnocchi based aodh alarms requires use of naive instance_discovery_method setting with ceilometer compute agents?

According to the reference, try to change the default instance_discovery_method from "libvirt_metadata" to "naive" in ceilometer config file, like this:

[compute]
instance_discovery_method = naive

Switching to "naive" resolves this issue, however it obviously generates load on the Nova API for metadata retrieval.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM