|
|
Aim of the project Hedeby is a Service Domain Management system which makes it possible to manage scalable services. This project is developed by the Sun Grid Engine Management team. As the Sun Grid Engine project, the Hedeby project has also been open sourced under SISSL license . The Service Domain Manager is designed to handle very different kind of services. The main purpose is solving resource lacking of such services. Hedeby is interresting for all administrators managing huge services with an administration interface. The Service Domain Manager will be able to detect scalabilty problem and resolve them. For the first release we (the Hedeby team) will concentrate on using Hedeby to manage the Sun Grid Engine service. In future it should be able to support other services. What is a resource? In the project Hedeby a resource can be nearly everything. It can be some hardware (e.g. a host or a printer) or it can be a software (a specific application or licenses). In general a resource is something a service uses to provide the service. If you give a service more resources, it can do more work in the same time. Each resource should have a system wide unique id, the resource id to be fully usable by Hedeby. This id must identify the resource. For a host resource this can be the full qualified hostname. A resource in Hedeby is seen as single entity. It is not wanted to share resources between services e.g. if Hedeby should manage a license which allows the usage of a software for ten user concurrently, the administrator has to add ten resources to the Hedeby system. Each license resource must have a unique id, as sharing a license may lead to violiting of license agreement and to service malfunction. Also, sharing a resource between services can lead to downgrading a service performance so ideally each resource should be assigned to a single service exclusively. See Section 1.1.1.2, “Ambiguous Resources” for details about non-unique resource ids. A Hedeby system stores for each registered resource a set of properties. These resource properties describes the resource. Examples are the number of CPUs, memory and architecture of a host, version number of a software, number of licenses. Each resource property has a name and a value. ![]() Resources in a Hedeby system The Hedeby system distinguishes between static and dynamic resource. A static resource can not be removed from a service. The Hedeby system will never touch a static resource. When removing a service all assigned static resources will disappear from the Hedeby system. Dynamic resources can be removed (unassigned) from service and added to another (assigned). A service can use couple of resources even before the service is managed by Hedeby e.g. web server cluster (service) is running on four servers (resources). When a service becomes managed, it reports its resources to Hedeby (auto-discovery of resources) - in this case, it may happen that service reports a resource with a resource id that is already used in Hedeby system. This may signal either that a resource is shared between two services or that there is just a name collision. Either case is not wanted and we call such resource AMBIGUOUS as it is not clear which service should use the resource exclusively. The case has to be solved manually by removing one instance of the ambiguous resources from the system. Presence of ambiguous resource does not meant that the system is not functional at all. An ambiguous resource may be fully or partially functional (depending on a service), but to avoid possible problems, system puts several constrains on an ambiguous resource:
The only operations allowed on an ambiguous resource are:
The Hedeby system knows the following resource states
A service in the term of Hedeby is a piece of software. It can be a database, an application server or any other software. The only constrain is that the software has to provide a service management interface. To make a service manageable Hedeby needs a driver for the service. Such a driver is called service adapter. The service adapter is packaged in a jar file. It has its own configuration and in the current version is runs inside a service container. ![]() Services in a Hedeby system From the view of a Hedeby system a service as the following states:
State changes of a service can be triggered from Hedeby system or from the service itself. The service adapter has to be smart enough to detect external service state changes (e.g. a service adapter for a Grid Engine has to catch the "qmaster goes down" event). What really happens if a Hedeby system starts or stopps a service is a implementation detail of the service adapter (e.g does service adapter for Grid Engine shutdown qmaster?). The Hedeby system knows only the states reported by the service adapter which interprets the states of the real service. The Hedeby system distinguishes between registered and unregistered services. Unregistered service are not included into the decision making process. No resources are assigned or unassigned. The service will also report no needs if it is unregistered. To make the performance measurable, each service defines a set of key performance indicators. The service adapter collections this numerical values by using the service management interface. Each PKI can have additional properties which indentifies the performance of a logical unit of the service. Example 1.2. KPI's with properties
The service adapter is responsible to map the properties of the KPI's into resource properties. He has to provide some maping tables. Hedeby allows the definition of rules which describes the current state of a service. A single rule is called service level objective. If a SLO can be fulfilled or not for a specific time. If a SLO is fulfilled we say the service is in compliance with this SLO. With a set of SLOs the administrator of the Hedeby system defines implicit a service level agreement ( SLA) for a service. If all SLOs are fullfilled the service works for a defined scenario. The SLA itself can not be defined in the Hedeby system. Only the set of SLOs. Example 1.3. Typical SLA for a service The following graphic shows a typical SLA definition for a service. The SLA if formulated with two SLOs. The number of pending request to the service should always be less then 10 and the throughput time of a request should be less then 3s. ![]() SLOs of a service If some SLOs are not fulfilled the service has a need for additional resources. The service has to describe what kind of resources are needed by specifing resource properties. The Hedeby system ( Section 1.1.3, “Resource Provider”) will try the solve this lack of resources by assigning new resources to the service. A need contains the information about the needed resource (type of resource, resource properties) and a urgency. This urgency is a non-negative number (0 and above) where the higher number specifies the more urgent need. The administrator of the Hedeby system has to define what need will be generated if SLOs are not fulfilled. Example 1.4. Service reports a need The following graphic shows the SLA defintion described in Example 1.3, “Typical SLA for a service ”. If one of the SLOs is not fullfilled the service will report the need the new resource of type host a needed. The urgency of the need is 75 (relatively high). ![]() A service reports a need The calculated urgency is only absolute for this service. Settings in the Policy Engine relativates the urgency in comparision to other services (see Section 1.1.3.1, “Policy Engine”). Hedeby provides a special service and component in each Hedeby system. It is named the Spare Pool. This spare_pool service collects all resources which are not heavilly used by service to which are they currently assigned by sending constant request. There could be more than one spare_pool components installed. The Spare Pool supports only one SLO. No matter how many resources are assigned to the Spare Pool the SLO is never fullfilled. The urgency of generated need of the Spare Pool is configurable by the adminstrator. It should be small enough so that no resource is assigned to the Spare Pool while other service needs them. Example 1.5. Service gets resource from Spare Pool In this example we have a Hedeby system with three services (including the Spare Pool). The Spare Pool contains currently six resources. Service #1 is in compliance with it's SLOs. Service #2 has a need for an additional resources. The urgency of the Spare Pool is lower then the urgency of service #2. The service domain manager is taking one resource out of the Spare Pool and is assigning it to service #2. ![]() Role of the Spare Pool in a Hedeby system The resource usage gives the Hedeby system the information how important the resource for this service is. The usage is non-negative number (0 is also allowed). It's the resonsibility of the service to keep the usage of the resource uptodate (e.g. if KPI if the service has changed). In general we can say that the usage of a resource is the maximum urgency of the SLOs which needs the resource to be fulfilled. Example 1.6. Resource Usage A service has six resource assigned (R1-R6). There exists two SLOs for this service. SLO1 has urgency 50 and SL2 has urgency 30. Resource R2, R3 and R4 have a usage 30, because they are need to full fill SLO2. Resource R5 and R6 have usage 50 (urgency of SLO1). Resource R1 is need by SLO1 and SLO2. In such cases the resource will have the maximum urgency of all associated SLOs, this means R1 has usage 50 (= max(urgency of SLO1, urgency of SLO2)). ![]() Usage of assigned resource The Resource Provider is the central component in a Hedeby system. It has the control over all services and resources. Each service adapter must inform the Resource Provider if the state of a service or a resource has changed. The Resource Provider makes the decisions whether a service gets a resource or not. The following image illustrates the decision making process:
At startup the Resource Provider discovers the Hedeby system. It asks all services what resources they posses and store that information in it's local storage. With the Policy Engine it is possible to define policies which influences the decisions of the Resource Provider. The Policy Engine calculates out of the need of a service a new urgency. ![]() The Policy Engine rules the decision making process of the resource Provider The Policy Engine has access to statistical values of the resource usage. The following information can be provided:
NoteTo make time base decisions the Policy Engine will need information about how long has a resource been assigned to service. The Policy Engine provides a generic interface which make it possible to plug other implemenation into the Hedeby system. Example 1.7. Example for a simple Policy Engine A simple implementation of a Policy Engine can weight the importance of a service by given them different priorities. The policy engine multiplies the urgencies of the services reported in a need with the priority of the service and gets so the weighted needs.
For the services the following SLOs are defined:
The Policy Engine weights the urgencies of the reported needs by multipling the priority of the service:
The police engine reports the needs with new calculated urgencies to the Resource Provider. The Resource Provider gives service B the signal that the free Resource from the Spare Pool can be assigned. After the assigment is finished service B send an event to the Resource Provider. The next scheduling run starts.
WarningMissconfiguration of the SLOs and the policies will lead into a swinging system. We have to implement mechanisms to prevent such situations. There is no strict definition of a policy setting - Policy Engine is open for 3rd party enhancements, therefore it does not rely on any special definition/implementation of a policy setting. An example of a policy setting can be the following rule : Hedeby currently embrace only a simple Policy Engine implementation which does take into account only Priority setting. Priority is value assigned to Service adapter (generally to a managed service) and is subjective importance of service (defined by an Hedeby administrator). The decision process is based on an algorithm that takes into account the requirements of the service which are specified by Need and a data provided by a policy manager. NoteBy Resource Provider (RP) we understand an interfaces that encloses a set of managers that are responsible for whole decision making process (service manager, resource manager, request processor, order processor).NoteNeed is a quantified request for a resource with certain properties. One possible sample of Need: "4 resources of host type with 4GB of memory" which means that a service asks for 4 hosts with 4GB of memory. Another possiblity of Need: "1 resource of SW license type" which means that a service asks for a license (for the special SW).The complete algorithm can basicaly be divided into solving the two cases:
The first case is in detail described in the following steps:
The second case is in detail described in the folowing steps:
Reporter component is a log/monitoring tool for Hedeby. The role of reporter component is to intercept and gather informations about what is going on in the system. Administrator can specify what kind of data he is interested in.Reporter is able to store informations and notifications that comes from Configuration Service, Resource Provider and all services that are installed in the system. The reporter component is prepared to store data in ARCo data base (Grid Engines Accounting and Resourting Console). By prepared we mean that, there is a special ARCo format file created, that stores suitable for ARCo data. The data from ARCo file aren't so much readable for normal user, thats why Administrator can get and print out on the screen data using CLI commands. The data can be filtered using available filters. More about Reporter component you can find here: Section 2.2.5.4, “Reporter Component” Executor is used whenever there is a need to set up (or destroy) service component on a resource that has to be a part of the service, especially in situation when there is no other way how to communicate with the resource. Once the resource is configured by executor, service adapter can use different way of communication with the resource (usually a communication channel provided by the managed service). Executor give Hedeby the possibility to execute actions or commands on a resource. For this purpose in a Hedeby system the administrator can install on each resource an executor component. The features of executor component highly depends on the type of resource. In general the executor executes a command on a resource. For host resources user switching will be possible. Mainly the service apdaters will use the executors for installing/uninstalling software on a resource. However the usage of executors is not restricted to the service adapters. This section describes the basic actions or use case which can be executed on a Hedeby system. Adding a service is triggered from the UI. The adminstrator has to provide the following information.
When adding a service the Hedeby system validates the configuration parameters. On any error the action is rejected. With a valid configuration the service adapter is instantiated. State of the service is UNKNOWN. The service is registered in the RP. Removing a service is only possible if the service is in state SHUTDOWN or ERROR. This action is triggered from the UI. The following steps are executed:
Starting a service can be triggered from the UI. The administrator has to provide the name of the service.
NoteThe service adapter does not observe the service if the service adapter is in state UNKNOWN or SHUTDOWN. If a service is started without Hedeby it has no effect on the Hedeby system. There exists two possiblilities to stop an service:
The following shows all possible state transitions of resource in a Hedeby system. Each state transition requires a couple of component interactions. For adding a new resource to a Hedeby system the administrator uses the user interface. For adding a resource the administrator has to specify the following information:
When adding a resource the UI sends a corresponding request direct to the Resource Provider. The Resource Provider validates the input parameters, stores the resource in it's local storage and assigns the resource to the first service willing to accept the resource. Resource is added automatically to the Hedeby system each time a service adapter discovers that a service uses a resource that is unknown in the Hedeby system. Such resource may be marked as static if service adapter is not able to remove the resource from service. State of a discovered resource reflects the actual resource state depending on the service adapter (for GE adapter, it may be ASSIGNED if discovered resource has execd running, or ERROR if discovered resource has not execd running). The assignment of a Resource can be triggered in two ways:
For a Resource assignment the following actions are executed:
The same as with the assignment process the unassignment can also be triggered manually (over the UI) or automatically (RP). The following actions are executed during the unassigment:
Removing a resource is possible if the resource is owned by a service (it depends on a service adapter to check the resource state to allow/disallow the removal of resource, ideally only resource in ASSIGNED and ERROR state can be removed). The administrator uses the UI to trigger this action. Only the name of the resource must be specified. Administrator can remove a resource even if the resource is owned by resource provider (the resource state is not checked as this operation should be performed only if system is in inconsistent state). The administrator uses the UI to trigger this action. Only the name of the resource must be specified. Any unforeseen error during assigment/unassigment sets a resource into ERROR state. If a resource is in ERROR state the Hedeby system treats it as unusable. If service adapter of the service that owns the ERROR resource does not support active reset of resource (automatic cleanup), the administrator must cleanup the resource manually. After the clean up the resource state can be reset manually (UI). Only the name of the resource must be specified. |
|
![]() |
By any use of this Website, you agree to be bound by these Policies and Terms of Use. |