EV Observe - Configure Services

Last modified on 2023/11/28 14:19

Services are controls run during host monitoring to check that they are working correctly.

Each service:

  • Is associated with a company/site.
  • Is associated with a service template from which it inherits a set of properties, e.g. availability and control information required for monitoring the corresponding host.
  • Can be associated with a host.
  • Can trigger the sending of notifications when there is a change in status. If the status is not acknowledged by the operations team, this will trigger an escalation to successively higher levels.

Examples

  • The SQL - Connection failed service is associated with several servers. It is associated with the F_MS-Azure-PaaS-Metrics service template from which it inherits controls for monitoring Azure PaaS metrics.
  • Notification policy in the event of an incident in the SQL - Connection failed service:
    • Notification timeslots defined:
      • From 5 am to 12 noon for team A
      • From 12 noon to 8 pm for team B
    • An incident occurs on one of the servers associated with the service:
      • Warning at 10 am: Notification sent to team A.
      • Warning at 1 pm: Notification sent to team B.
      • Warning between 8 pm and 5 am: No team is informed. If the incident is still present at 5 am, a notification will be sent to team A.
  • Escalation policy in the event of an incident in the SQL - Connection failed service:
    • 24/7 notifications sent to the Level 1 On-call contact group.
    • If an incident occurs on one of the servers associated with the service, notifications will be sent to the On-call contact group every three minutes.
    • Once the warning is acknowledged by a member of the On-call team, notifications will stop and the escalation process will be interrupted.
    • If the warning is still not acknowledged after 15 minutes, this will trigger an escalation to the Level 2 On-call managers contact group who will receive notifications.

Notes

  • You can associate services with hosts directly in Host forms by selecting a host template.
  • You must first configure the notification policy for the service before you can configure the escalation process.
  • Thresholds defined for detecting instability must comply with Nagios syntax.

Best Practice

  • Use the modification wizards or run an import to apply changes to an entire group of services. Open url.png See the procedure

example  Define a single notification policy for all services whose business impact is High using the Modify the notification policy wizard

Menu access

Configuration > Services > List

Note: Access to the service Detail forms: Monitoring > Monitoring

Screens description

          Service.png

General information

Service template: Service template associated with the service.

  • The values defined in the selected service template will automatically be inherited in the Availability and checks tab.

Name: Name of the monitored service.

  • By default, the service name is identical to the service template name.

   If you modify the service template, the service name will automatically be modified. You must enter it again.

Business impact: Impact of the service within the corporate information system in the event of failure.

Instruction: User-defined text or automatically clickable link displayed when the status is not OK. This enables the operations team to process the incident faster and more efficiently. Open url.png See Instruction URLs.

Document: Similarly to the instruction, this is used to enter additional information to help speed up processing.

Additional information: User-defined text.

Description: Role of the service within the corporate information system.

Availability and checks

Information on availability rate:

  • Availability rate: Target availability rate for the service.
  • Availability period: Timeslot during which the availability rate is calculated. This usually corresponds to the SLA availability target.
     

Check properties:

  • Check Timeslot: Timeslot during which service monitoring is performed and controls are run.
    • The period must be greater than or equal to the entire timeslot defined for calculating the availability rate.
  • Normal check interval: Interval between the running of two controls (in minutes).
  • Additional checks: Number of times the control is repeatedly run, if its initial status is not OK, before sending the first notification.
    • If additional controls are defined, the status to be confirmed (SOFT) corresponds to the initial control status and the confirmed status (HARD) corresponds to the status returned after the last additional control is run.
    • The sending of notifications and the calculation of the availability rate are based on the confirmed status.
    • Interval: Interval between the running of two additional controls (in minutes).
    • Time before first notification: Time automatically calculated based on the number of additional controls and the interval between the running of two additional controls.

example  Additional controls = 4; Interval = 5

  • If an incident is detected during the initial control, the monitoring control will be run every five minutes, up to a maximum of four times, as long as the status to be confirmed is not OK.
  • The time before the first notification is sent or before the first confirmed status will be equal to 20 minutes (4 * 5).
ServiceMonitoringAccounts

  • Locked monitoring account:
     

       This field will appear only if monitoring account information is required for running the service.

    • By default, the service will use the monitoring account information inherited from the host, or alternatively, from the parent site of the host, from a higher-level site or from the company.
    • If a configuration specific to the host is required, the account must be locked.
       

      example  SNMP authentication credentials for the service different from those inherited from the host

    • Select Yes to lock the monitoring account for the service. Next, enter the configuration information specific to the type of account.  The values defined will apply to all current and future hosts associated with the service.
    • To restore the inherited values for the monitoring account, select No to unlock the monitoring account.  The level from which monitoring account information is inherited will appear next to the field.
       

      example  My Company inheritance account

Actions

Action template associated with the service, used to perform an action when there is a change in the service status.

example  Restart the Windows Update service automatically on the COPCGRE61 server if the service stops running.

  • The parameters to be specified depend on the selected action template.
  • Locked monitoring account:
     

       This field will appear only if monitoring account information is required for running the action template.

    • By default, the action template will use the monitoring account information inherited from the host, or alternatively, from the parent site of the host, from a higher-level site or from the company.
    • If a configuration specific to the action template run by the service is required, the account must be locked. To do this, select Yes to lock the monitoring account and enter the configuration information specific to the type of account.  The values defined will apply to all current and future hosts associated with the service.
       

      example  SNMP authentication credentials for the service different from those inherited from the company

    • To restore the inherited values for the monitoring account, select No to unlock the monitoring account.

Notifications

Notification policy defined for the service, indicating trigger events and timeslots as well as notification recipients.

      Open url.png See:

Enable notifications: Used to define a notification policy for the service. If you select Yes, you must specify the contextual fields that will appear. You can disable notifications by selecting No.

Fields for defining a notification policy for the service

Notification period: Timeslot during which events occurring in the monitored service will trigger notifications.

  • Events outside this period will not trigger any notification. If the incident is still present when the next notification period is applicable, then a notification will be triggered.

For these events: Type of event that will trigger a notification.

  • Warning: Notification sent when the service is operational but requires close monitoring in order to anticipate and prevent a status change to Critical.
  • Unknown: Notification sent when the service status is unknown to monitoring.
  • Critical: Notification sent when the service is non-operational.
  • Up: Notification sent when the service is operating normally again.
  • Unstable: Notification sent when the service is considered to be unstable based on the high and low flapping thresholds defined for detecting instability.
    • The service instability rate is calculated using the last 21 reports stored. It is recalculated each time a monitoring control is run. Older values are weighted less heavily than more recent ones.
    • The service is considered to be unstable when the instability rate exceeds the high flapping threshold.
    • It will once again be considered stable when the instability rate drops below the low flapping threshold.

   Thresholds defined for detecting instability must comply with Nagios syntax.

   When the state of the service is unstable, notifications will be disabled to restrict the number of warnings triggered. They will remain disabled until the state of the host is once again stable.

Best Practice icon.png  You can view the instability rate in real time in the General information tab of the service Detail form (menu Monitoring > Monitoring).

Level 1 contact(s) and contact group(s): List of Level 1 contacts and groups to whom notifications should be sent during the notification timeslots specified.

  • Only active contacts and contact groups will appear.

Escalations

   You must first configure the notification policy for the service before you can configure the escalation process.

      Open url.png See the example

Level 1 escalation: Used to indicate that the notification must be repeated when the status is not acknowledged by the Level 1 operations team, after the number of controls defined is reached.

  • Level 1 contacts are defined in the Notifications tab.

Level 2 escalation / Level 3 escalation: Used to send notifications to the contact or contact groups specified when the status is not acknowledged by the lower-level operations team, after the number of notifications defined is reached. The notification will be repeated if the status is not acknowledged, after the number of controls defined is reached.

Relations

List of hosts associated with the service.

  • You can search for hosts by name, company, site, category, tag or business impact.

   If you select multiple hosts, this will duplicate the service for each host. A service is always associated with a single host.

Procedures

How to create a service

Step 1: Select the company where you want to implement the new service

SelectCompanyInCompanyTree_Procedure

1. Go to the Web app.

2. Select the company from the company tree structure.

Notes:

  • The selected company must be associated with a Box.
  • You can create a new company. Open url.png See the procedure

    Company tree structure.png

Step 2: Create the new service 

1. Select Configuration > Services > List in the menu.

2. Select the Mode: Box tab or Mode:  Agent tab depending on whether monitoring is performed via a Box or an agent.

3. Click Add.

4. Select each tab and specify the information on the new service.

5. Click Apply.

The service will be created. It will be visible to the company and its lower-level sites.

   If the new service requires monitoring account information, you must check that the account is correctly configured as regards the associated hosts. You can do this in the Accounts tab in the Host form.

Step 3: Set up monitoring for the new service

1. Generate the Box configuration to ensure that the new service is taken into account.

  • Select Configuration > General > Loading in the menu.
    All of the Boxes you are authorized to access as administrator and whose configuration is not up-to-date will appear.
  • Click Apply.
    • The Box configuration will be updated.
    • The monitoring of the new service will start on the associated Box.
    • Notifications will be sent as defined in the notification policy.
    • If the monitoring account information required by a service cannot be found, by inheritance or in locked mode, the configuration cannot be applied. A message will appear, indicating the accounts to be specified.

2. Check that monitoring data for the service is correctly reported in the Box in the Monitoring > Monitoring menu.

How to apply changes to multiple services at the same time

Best Practice icon.png  You can also run an import. Open url.png See the procedure

1. In the company tree, select the parent company of the services you want to modify.

2. Select Configuration > Services > List and select the Mode: Box tab or the Mode:  Agent tab depending on whether monitoring is performed via a Box or an agent.

3. Select the services to be modified.

4. Click More in the toolbar and select the wizard you want.

          Mass update for services.png

5. Specify the information specific to the wizard.

6. Click Apply.

The modifications will be applied to all of the selected services.

7. Generate the Box configuration to ensure that the modifications are taken into account for each service. You can do this in the Configuration > General > Loading menu.

Tags:
Powered by XWiki © EasyVista 2024