Tag Archives: auto-scaling

How to make auto-scaling work for nova with heat and ceilometer ?

I was trying to test this feature for a very long time but never got a chance to dig into it. Today, I got a opportunity to work on this feature. I prepared a packstack OSP 7 [Kilo] setup and took the reference from wonderful official Red Hat documentation [1] to make this work.

In this article I am going to cover only scale-up scenario.

Step 1 : While installing packstack we need to make below options as “yes” so that required components can be installed.

# egrep “HEAT|CEILOMETER” /root/answer.txt | grep INSTALL
CONFIG_CEILOMETER_INSTALL=y
CONFIG_HEAT_INSTALL=y
CONFIG_HEAT_CLOUDWATCH_INSTALL=y
CONFIG_HEAT_CFN_INSTALL=y

If you have already deployed packstack setup no need to worry just enable these in answer.txt file which is used for creating existing setup and run the packstack installation command again.

Step 2 : Created three templates to make this work.

cirros.yaml – Contains the information for spawning an instance. Script is used to generate the cpu utilization alarm.

environment.yaml – Environment file to call cirros.yaml template.

sample.yaml — Containing the main logic for scaling-up.

# cat cirros.yaml
heat_template_version: 2014-10-16
description: A simple server.
resources:
server:
type: OS::Nova::Server
properties:
#block_device_mapping:
#  – device_name: vda
#    delete_on_termination: true
#    volume_id: { get_resource: volume }
image: cirros
flavor: m1.tiny
networks:
– network: internal1
user_data_format: RAW
user_data: |
#!/bin/sh
while [ 1 ] ; do echo $((13**99)) 1>/dev/null 2>&1; done

# cat environment.yaml
resource_registry:
“OS::Nova::Server::Cirros”: “cirros.yaml”

# cat sample.yaml
heat_template_version: 2014-10-16
description: A simple auto scaling group.
resources:
scale_group:
type: OS::Heat::AutoScalingGroup
properties:
cooldown: 60
desired_capacity: 1
max_size: 3
min_size: 1
resource:
type: OS::Nova::Server::Cirros
scaleup_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: { get_resource: scale_group }
cooldown: 60
scaling_adjustment: +1
cpu_alarm_high:
type: OS::Ceilometer::Alarm
properties:
meter_name: cpu_util
statistic: avg
period: 60
evaluation_periods: 1
threshold: 20
alarm_actions:
– {get_attr: [scaleup_policy, alarm_url]}
comparison_operator: gt

 

Shedding some information on sample.yaml file, initially I am spawning only one instance and scaling this up-to maximum of 3 instances. Threshold of ceilometer set to 20.

Step 3 : Modify the ceilometer sampling interval for cpu_util in “/etc/ceilometer/pipeline.yaml” file. Changed this value from default of 10mins to 1 min.

– name: cpu_source
interval: 60
meters:
– “cpu”
sinks:
– cpu_sink

Restart all openstack services after making this change.

Step 4 : Let’s create a stack now.

[root@allinone7 VIKRANT(keystone_admin)]# heat stack-create teststack1 -f sample.yaml -e environment.yaml
+————————————–+————+——————–+———————-+
| id                                   | stack_name | stack_status       | creation_time        |
+————————————–+————+——————–+———————-+
| 0f163366-c599-4fd5-a797-86cf40f05150 | teststack1 | CREATE_IN_PROGRESS | 2016-10-10T12:02:37Z |
+————————————–+————+——————–+———————-+

Instance spawned successfully and alarm is created once the heat stack creation is completed.

[root@allinone7 VIKRANT(keystone_admin)]# nova list
+————————————–+——————————————————-+——–+————+————-+———————–+
| ID                                   | Name                                                  | Status | Task State | Power State | Networks              |
+————————————–+——————————————————-+——–+————+————-+———————–+
| 845abae0-9834-443b-82ec-d55bce2243ab | te-yvfr-ws5tn26msbub-zpeebwwwa67w-server-pxu6pqcssmmb | ACTIVE | –          | Running     | internal1=10.10.10.53 |
+————————————–+——————————————————-+——–+————+————-+———————–+

[root@allinone7 VIKRANT(keystone_admin)]# ceilometer alarm-list
+————————————–+—————————————-+——————-+———-+———+————+——————————–+——————+
| Alarm ID                             | Name                                   | State             | Severity | Enabled | Continuous | Alarm condition                | Time constraints |
+————————————–+—————————————-+——————-+———-+———+————+——————————–+——————+
| 7746e457-9114-4cc6-8408-16b14322e937 | teststack1-cpu_alarm_high-sctookginoqz | insufficient data | low      | True    | True       | cpu_util > 20.0 during 1 x 60s | None             |
+————————————–+—————————————-+——————-+———-+———+————+——————————–+——————+

Checking the events in heat-engine.log file.

~~~
2016-10-10 12:02:37.499 22212 INFO heat.engine.stack [-] Stack CREATE IN_PROGRESS (teststack1): Stack CREATE started
2016-10-10 12:02:37.510 22212 INFO heat.engine.resource [-] creating AutoScalingResourceGroup “scale_group” Stack “teststack1” [0f163366-c599-4fd5-a797-86cf40f05150]
2016-10-10 12:02:37.558 22215 INFO heat.engine.service [req-681ddfb8-3ca6-4ecb-a8af-f35ceb358138 f6a950be30fd41488cf85b907dfa41b5 41294ddb9af747c8b46dc258c3fa61e1] Creating stack teststack1-scale_group-ujt3ixg3yvfr
2016-10-10 12:02:37.572 22215 INFO heat.engine.resource [req-681ddfb8-3ca6-4ecb-a8af-f35ceb358138 f6a950be30fd41488cf85b907dfa41b5 41294ddb9af747c8b46dc258c3fa61e1] Validating TemplateResource “ws5tn26msbub”
2016-10-10 12:02:37.585 22215 INFO heat.engine.resource [req-681ddfb8-3ca6-4ecb-a8af-f35ceb358138 f6a950be30fd41488cf85b907dfa41b5 41294ddb9af747c8b46dc258c3fa61e1] Validating Server “server”
2016-10-10 12:02:37.639 22215 INFO heat.engine.stack [-] Stack CREATE IN_PROGRESS (teststack1-scale_group-ujt3ixg3yvfr): Stack CREATE started
2016-10-10 12:02:37.650 22215 INFO heat.engine.resource [-] creating TemplateResource “ws5tn26msbub” Stack “teststack1-scale_group-ujt3ixg3yvfr” [0c311ad5-cb76-4956-b038-ab2e44721cf1]
2016-10-10 12:02:37.699 22214 INFO heat.engine.service [req-681ddfb8-3ca6-4ecb-a8af-f35ceb358138 f6a950be30fd41488cf85b907dfa41b5 41294ddb9af747c8b46dc258c3fa61e1] Creating stack teststack1-scale_group-ujt3ixg3yvfr-ws5tn26msbub-zpeebwwwa67w
2016-10-10 12:02:37.712 22214 INFO heat.engine.resource [req-681ddfb8-3ca6-4ecb-a8af-f35ceb358138 f6a950be30fd41488cf85b907dfa41b5 41294ddb9af747c8b46dc258c3fa61e1] Validating Server “server”
2016-10-10 12:02:38.004 22214 INFO heat.engine.stack [-] Stack CREATE IN_PROGRESS (teststack1-scale_group-ujt3ixg3yvfr-ws5tn26msbub-zpeebwwwa67w): Stack CREATE started
2016-10-10 12:02:38.022 22214 INFO heat.engine.resource [-] creating Server “server” Stack “teststack1-scale_group-ujt3ixg3yvfr-ws5tn26msbub-zpeebwwwa67w” [11dbdc5d-dc67-489b-9738-7ee6984c286e]
2016-10-10 12:02:42.965 22213 INFO heat.engine.service [req-e6410d2a-6f85-404d-a675-897c8a254241 – -] Service 13d36b70-a2f6-4fec-8d2a-c904a2f9c461 is updated
2016-10-10 12:02:42.969 22214 INFO heat.engine.service [req-531481c7-5fd1-4c25-837c-172b2b7c9423 – -] Service 71fb5520-7064-4cee-9123-74f6d7b86955 is updated
2016-10-10 12:02:42.970 22215 INFO heat.engine.service [req-6fe46418-bf3d-4555-a77c-8c800a414ba8 – -] Service f0706340-54f8-42f1-a647-c77513aef3a5 is updated
2016-10-10 12:02:42.971 22212 INFO heat.engine.service [req-82f5974f-0e77-4b60-ac5e-f3c849812fe1 – -] Service 083acd77-cb7f-45fc-80f0-9d41eaf2a37d is updated
2016-10-10 12:02:53.228 22214 INFO heat.engine.stack [-] Stack CREATE COMPLETE (teststack1-scale_group-ujt3ixg3yvfr-ws5tn26msbub-zpeebwwwa67w): Stack CREATE completed successfully
2016-10-10 12:02:53.549 22215 INFO heat.engine.stack [-] Stack CREATE COMPLETE (teststack1-scale_group-ujt3ixg3yvfr): Stack CREATE completed successfully
2016-10-10 12:02:53.960 22212 INFO heat.engine.resource [-] creating AutoScalingPolicy “scaleup_policy” Stack “teststack1” [0f163366-c599-4fd5-a797-86cf40f05150]
2016-10-10 12:02:55.152 22212 INFO heat.engine.resource [-] creating CeilometerAlarm “cpu_alarm_high” Stack “teststack1” [0f163366-c599-4fd5-a797-86cf40f05150]
2016-10-10 12:02:56.379 22212 INFO heat.engine.stack [-] Stack CREATE COMPLETE (teststack1): Stack CREATE completed successfully
~~~

Step 5 : Once the alarm is triggered, it will initiate the creation of one more instance.

[root@allinone7 VIKRANT(keystone_admin)]# ceilometer alarm-history 7746e457-9114-4cc6-8408-16b14322e937
+——————+—————————-+———————————————————————-+
| Type             | Timestamp                  | Detail                                                               |
+——————+—————————-+———————————————————————-+
| state transition | 2016-10-10T12:04:48.492000 | state: alarm                                                         |
| creation         | 2016-10-10T12:02:55.247000 | name: teststack1-cpu_alarm_high-sctookginoqz                         |
|                  |                            | description: Alarm when cpu_util is gt a avg of 20.0 over 60 seconds |
|                  |                            | type: threshold                                                      |
|                  |                            | rule: cpu_util > 20.0 during 1 x 60s                                 |
|                  |                            | time_constraints: None                                               |
+——————+—————————-+———————————————————————-+

Log from ceilometer log file.

~~~
From : /var/log/ceilometer/alarm-evaluator.log

2016-10-10 12:04:48.488 16550 INFO ceilometer.alarm.evaluator [-] alarm 7746e457-9114-4cc6-8408-16b14322e937 transitioning to alarm because Transition to alarm due to 1 samples outside threshold, most recent: 97.05
~~~

Step 6 : In the heat-engine.log file, we can see that triggered alarm has started the scaleup_policy and stack came in “UPDATE IN_PROGRESS” state. We are seeing two events because 2 instances are getting spawned, remember we set the max number of instances to 3, first instance got deployed during stack creation and remaining 2 instances are triggered at alarm. Actually at first alarm, first instance got triggered, as utilization stayed more than threshold for next min hence 3rd instance got triggered.

~~~

2016-10-10 12:04:48.641 22213 INFO heat.engine.resources.openstack.heat.scaling_policy [-] Alarm scaleup_policy, new state alarm
2016-10-10 12:04:48.680 22213 INFO heat.engine.resources.openstack.heat.scaling_policy [-] scaleup_policy Alarm, adjusting Group scale_group with id teststack1-scale_group-ujt3ixg3yvfr by 1
2016-10-10 12:04:48.802 22215 INFO heat.engine.stack [-] Stack UPDATE IN_PROGRESS (teststack1-scale_group-ujt3ixg3yvfr): Stack UPDATE started
2016-10-10 12:04:48.858 22215 INFO heat.engine.resource [-] updating TemplateResource “ws5tn26msbub” [11dbdc5d-dc67-489b-9738-7ee6984c286e] Stack “teststack1-scale_group-ujt3ixg3yvfr” [0c311ad5-cb76-4956-b038-ab2e44721cf1]
2016-10-10 12:04:48.919 22214 INFO heat.engine.service [req-ddf93f69-5fdc-4218-a427-aae312f4a02d – 41294ddb9af747c8b46dc258c3fa61e1] Updating stack teststack1-scale_group-ujt3ixg3yvfr-ws5tn26msbub-zpeebwwwa67w
2016-10-10 12:04:48.922 22214 INFO heat.engine.resource [req-ddf93f69-5fdc-4218-a427-aae312f4a02d – 41294ddb9af747c8b46dc258c3fa61e1] Validating Server “server”
2016-10-10 12:04:49.317 22214 INFO heat.engine.stack [-] Stack UPDATE IN_PROGRESS (teststack1-scale_group-ujt3ixg3yvfr-ws5tn26msbub-zpeebwwwa67w): Stack UPDATE started
2016-10-10 12:04:49.346 22215 INFO heat.engine.resource [-] creating TemplateResource “mmm6uxmlf3om” Stack “teststack1-scale_group-ujt3ixg3yvfr” [0c311ad5-cb76-4956-b038-ab2e44721cf1]
2016-10-10 12:04:49.366 22214 INFO heat.engine.update [-] Resource server for stack teststack1-scale_group-ujt3ixg3yvfr-ws5tn26msbub-zpeebwwwa67w updated
2016-10-10 12:04:49.405 22212 INFO heat.engine.service [req-ddf93f69-5fdc-4218-a427-aae312f4a02d – 41294ddb9af747c8b46dc258c3fa61e1] Creating stack teststack1-scale_group-ujt3ixg3yvfr-mmm6uxmlf3om-m5idcplscfcx
2016-10-10 12:04:49.419 22212 INFO heat.engine.resource [req-ddf93f69-5fdc-4218-a427-aae312f4a02d – 41294ddb9af747c8b46dc258c3fa61e1] Validating Server “server”
2016-10-10 12:04:49.879 22212 INFO heat.engine.stack [-] Stack CREATE IN_PROGRESS (teststack1-scale_group-ujt3ixg3yvfr-mmm6uxmlf3om-m5idcplscfcx): Stack CREATE started
2016-10-10 12:04:49.889 22212 INFO heat.engine.resource [-] creating Server “server” Stack “teststack1-scale_group-ujt3ixg3yvfr-mmm6uxmlf3om-m5idcplscfcx” [36c613d1-b89f-4409-b965-521b1ae2cbf3]
2016-10-10 12:04:50.406 22214 INFO heat.engine.stack [-] Stack DELETE IN_PROGRESS (teststack1-scale_group-ujt3ixg3yvfr-ws5tn26msbub-zpeebwwwa67w): Stack DELETE started
2016-10-10 12:04:50.443 22214 INFO heat.engine.stack [-] Stack DELETE COMPLETE (teststack1-scale_group-ujt3ixg3yvfr-ws5tn26msbub-zpeebwwwa67w): Stack DELETE completed successfully
2016-10-10 12:04:50.930 22215 INFO heat.engine.update [-] Resource ws5tn26msbub for stack teststack1-scale_group-ujt3ixg3yvfr updated
2016-10-10 12:05:07.865 22212 INFO heat.engine.stack [-] Stack CREATE COMPLETE (teststack1-scale_group-ujt3ixg3yvfr-mmm6uxmlf3om-m5idcplscfcx): Stack CREATE completed successfully

~~~

Step 7 : We can see the event-list of created stack for more understanding.

[root@allinone7 VIKRANT(keystone_admin)]# heat event-list 0f163366-c599-4fd5-a797-86cf40f05150
+—————-+————————————–+———————————————————————————————————————————-+——————–+———————-+
| resource_name  | id                                   | resource_status_reason                                                                                                           | resource_status    | event_time           |
+—————-+————————————–+———————————————————————————————————————————-+——————–+———————-+
| teststack1     | 6ddf5a0c-c345-43ad-8c20-54d67cf8e2a6 | Stack CREATE started                                                                                                             | CREATE_IN_PROGRESS | 2016-10-10T12:02:37Z |
| scale_group    | 528ed942-551d-482b-95ee-ab72a6f59280 | state changed                                                                                                                    | CREATE_IN_PROGRESS | 2016-10-10T12:02:37Z |
| scale_group    | 9d7cf5f4-027f-4c97-92f2-86d208a4be77 | state changed                                                                                                                    | CREATE_COMPLETE    | 2016-10-10T12:02:53Z |
| scaleup_policy | a78e9577-1251-4221-a1c7-9da4636550b7 | state changed                                                                                                                    | CREATE_IN_PROGRESS | 2016-10-10T12:02:53Z |
| scaleup_policy | cb690cd5-5243-47f0-8f9f-2d88ca13780f | state changed                                                                                                                    | CREATE_COMPLETE    | 2016-10-10T12:02:55Z |
| cpu_alarm_high | 9addbccf-cc18-410a-b1f6-401b56b09065 | state changed                                                                                                                    | CREATE_IN_PROGRESS | 2016-10-10T12:02:55Z |
| cpu_alarm_high | ed9a5f49-d4ea-4f68-af9e-355d2e1b9113 | state changed                                                                                                                    | CREATE_COMPLETE    | 2016-10-10T12:02:56Z |
| teststack1     | 14be65fc-1b33-478e-9f81-413b694c8312 | Stack CREATE completed successfully                                                                                              | CREATE_COMPLETE    | 2016-10-10T12:02:56Z |
| scaleup_policy | e65de9b1-6854-4f27-8256-f5f9a13890df | alarm state changed from insufficient data to alarm (Transition to alarm due to 1 samples outside threshold, most recent: 97.05) | SIGNAL_COMPLETE    | 2016-10-10T12:05:09Z |
| scaleup_policy | a499bfef-1824-4ef3-8c7f-e86cf14e11d6 | alarm state changed from alarm to alarm (Remaining as alarm due to 1 samples outside threshold, most recent: 95.7083333333)      | SIGNAL_COMPLETE    | 2016-10-10T12:07:14Z |
| scaleup_policy | 2a801848-bf9f-41e0-acac-e526d60f5791 | alarm state changed from alarm to alarm (Remaining as alarm due to 1 samples outside threshold, most recent: 95.0833333333)      | SIGNAL_COMPLETE    | 2016-10-10T12:08:55Z |
| scaleup_policy | f57fda03-2017-4408-b4b9-f302a1fad430 | alarm state changed from alarm to alarm (Remaining as alarm due to 1 samples outside threshold, most recent: 95.1444444444)      | SIGNAL_COMPLETE    | 2016-10-10T12:10:55Z |
+—————-+————————————–+———————————————————————————————————————————-+——————–+———————-+

We can see three instances running.

[root@allinone7 VIKRANT(keystone_admin)]# nova list
+————————————–+——————————————————-+——–+————+————-+———————–+
| ID                                   | Name                                                  | Status | Task State | Power State | Networks              |
+————————————–+——————————————————-+——–+————+————-+———————–+
| 041345cc-4ebf-429c-ab2b-ef0f757bfeaa | te-yvfr-mmm6uxmlf3om-m5idcplscfcx-server-hxaqqmxzv4jp | ACTIVE | –          | Running     | internal1=10.10.10.54 |
| bebbd5a0-e0b2-40b4-8810-978b86626267 | te-yvfr-r7vn2e5c34b6-by4oq22vnxbo-server-ktblt3evhvd6 | ACTIVE | –          | Running     | internal1=10.10.10.55 |
| 845abae0-9834-443b-82ec-d55bce2243ab | te-yvfr-ws5tn26msbub-zpeebwwwa67w-server-pxu6pqcssmmb | ACTIVE | –          | Running     | internal1=10.10.10.53 |
+————————————–+——————————————————-+——–+————+————-+———————–+

 

[1] https://access.redhat.com/documentation/en/red-hat-enterprise-linux-openstack-platform/7/single/auto-scaling-for-compute/#example_auto_scaling_based_on_cpu_usage

Advertisements