Status: in pre-design phase. Date: Sep 26 2003; Last revised:
It is often required to monitor the service level in networks. Service level is normally covered by Service Level Agreement (SLA), which defines the following parameters:
Describes the particular service in terms of functionality and means of monitoring. Examples are: IP VPN connectivity, WAN uplink, SQL database engine.
Describes the periodic time intervals when service outage is possible due to some maintenance work. It may be unconditional (outage is always possible within the window), or conditional (customer confirmation required for outage within the window). Notification period is normally defined for maintenance outages. Example: every 1st Tuesday of the month between 6AM and 8 AM, with 96 hours notification time.
Outages may be caused by: 1). system failure; 2). service provider's infrastructure failure; 3). customer activity.
These are the guarantees that the sevice provider gives to the customer. Violation of these guarantees is compensated by penalties defined.
These may include: Maxium maintenance downtime per specified period; Maximum downtime period due to failures on the service provider side; Minimum service availability per specified period.
In order to store the service level information, we need a new datasource type in RRFW: event. It represents an atomic information about a single event in time, e.g. it canot be devided into more specific elements or sub-events. Its attributes are as follows:
Several events belong to one and only one group. Event group is a unique entity that describes the service.
Unique name within the event group. Describes the type of the event, such as
maintenance
, downtime
. Events with the same names cannot overlap in
time.
Timestamp of the event start.
Positive integer that specifies the length of the event in seconds. Zero duration means that the event has not yet finished.
Event-specific (name, value) pairs.
Events are uniquely identified by (Event group, Event name, Start time) triple.
Renderer should be able to display the events at different summary levels and in different combinations. Event reports should be specified by expressions, as follows:
downtime AND NOT maintenance
.
(downtime AND NOT maintenance)[-2DAYS,NOW]
(downtime[-2DAYS,NOW] AND NOT maintenance AND
NOT downtime[200309151200,200309151300])
Sum of durations, substract of durations...
Events may be generated by the following sources:
SNMP collector may create events on some faulty conditions, like host unreachable, or on SNMP variables change, like interface status. Also it's possible to create an ICMP Echo collector type, which would generate events based on pinging the hosts.
Obviously, a new monitor action will be to create events.
First from commandline interface, and later from thr Web interface, the human operators may create the scheduled events, like maintenance outages. Security policy should protect certain types of events from human intervention.
Copyright (c) 2003 Stanislav Sinyagin <ssinyagin@k-open.com>