Login | Register
Login | Register

My pages Projects SunSource.net openCollabNet

2.4. Hedeby system configuration

2.4.1. Basics

2.4.1.1. Preferences

There are two ways how Hedeby can be installed. We call this ways preferences. Preferneces can be set for particular user - in that case we speak about USER preferences or for whole system platform - SYSTEM preferences and this installation has to be performed by superuser. The use of SYSTEM preferences makes available such features like: autostart/smf support for hosts within the Hedeby system.

The Hedeby is using its own implementation of java preferences. SYSTEM preferences are located in /etc/sdm/bootstrap/<system_name>. USER preferences are located in <USER_HOME>/.sdm/bootstrap/<system_name>. More about java preferences you can find here

The files structure for preferences looks like:

<system_name> --\
                |
                |-- hosts --\
                |           |-- <host_name> --\
                |           |                 |-- smf --\
                |           |                 |         |
                |           |                 |         \----prefs.properties
                |           |                 |
                |           |                 \----prefs.properties
                |           |                          
                |           \-- prefs.properties
                |
                \-- prefs.properties

The exemple content of prefs.properties file in <system_name> directory, it is the main bootstrap information about system:

version=0.1
localspool=/var/spool/sdm/localspool/
csInfo=foo\:2324
smf=true
ssl_disable=false
dist=/net/foo/sdm_dist
auto_start=false

Table 2.5. Description of the file content:

ParameterValueDescription
auto_starttrue/false This optional parameter defines if Hedeby system has to be started during the machine boot up process. WARNING: It works only when Hedeby is installed with the SYSTEM preferences.
versionfloat value Parameter that specifies the version of system preferences.
smftrue/false This optional parameter defines if Hedeby system is installed with SMF support feature.
cs_infoString This parameter defines the location of CS component of Hedeby system. The format is host_name:port.
distString (Path to the Hedeby dist directory) More information you can find here Section 2.4.1.2.1, “Dist directory”
localSpoolString (Path to the Hedeby local spool directory) More information you can find here Section 2.4.1.2.2, “Local Spool directory”

The exemple content of prefs.properties file in <system_name>/host/<host_name> directory, it is the host specific information about system:

localspool=/var/spool/sdm/localspool/
master=false
dist=/net/foo/sdm_dist

Table 2.6. Description of the file prefs.properties content. File located in <system_name>/host/<host_name>

ParameterValueDescription
mastertrue/false This parameter defines this host is master host or managed host.
distString (Path to the Hedeby dist directory) Optional parameter if not defined the value will be taken from the main prefs.properties file
localSpoolString (Path to the Hedeby local spool directory) Optional parameter if not defined the value will be taken from the main prefs.properties file

The exemple content of prefs.properties file in smf directory, that contains SMF information about system:

rp_vm=svc\:/application/management/sdm/mySystem/jvm\:rp_vm
executor_vm=svc\:/application/management/sdm/mySystem/jvm\:executor_vm

Table 2.7. Description of the file prefs.properties content. File located in <system_name>/host/<host_name>/smf

ParameterValueDescription
<jvm_name>String (smf service_name for JVM) This pairs defines which JVM was installed with SMF support and which SMF service name is assigned to JVM. More information about SMF can be found here Section 2.2.6.2, “SMF Support”

2.4.1.2. Directories

2.4.1.2.1. Dist directory

This is directory with the Hedeby system installation files. You can find here binaries, installation scripts, libraries, manuals and state files that are used by Installer.

sdm_dist --\
           |
           |-- bin --\
           |         \-- sdmadm
           |         
           |-- util --\
           |          |-- arch
           |          |-- arch_variables
           |          |-- sdmsmf.sh
           |          |-- smf_sdmsvc
           |          |-- supportRc.sh
           |          |
           |          |-- sdmST --\
           |          |           |-- sdm_st
           |          |           \-- st_settings.sh
           |          |
           |          |-- templates --\
           |                          |-- sdm.env.template
           |                          |-- jaas.config.template
           |                          |-- java.policy.template
           |                          |-- sdmsvc
           |                          |-- sdm_template.xml
           |                          |-- sdm_template_masterhost.xml
           |                          |-- start_sh.template
           |                          |-- logging.properties.template
           |                          \-- ge-adapter --\
           |                                           |-- install_execd.conf
           |                                           |-- install_execd.sh
           |                                           |-- uninstall_execd.conf
           |                                           |-- uninstall_execd.sh
           |
           |-- lib --\
           |         |-- ext --\
           |         |         |-- endorsed --\
           |         |         |              \-- *.jar
           |         |         \-- *.jar
           |         |
           |         |-- <PLATFORM_ARCHITECTURE> --\
           |         |                             \-- libplatform.so
           |         |-- sdm-common.jar               
           |         |-- sdm-starter.jar              
           |         |-- sdm-ge-adapter.jar           
           |         |-- sdm-ge-adapter-impl.jar  
           |         |-- sdm-security.jar  
           |         \-- sdm-security-impl.jar
           |
           \-- man --\
                     \-- man1 --\
                                |-- sdmadm.1

Table 2.8. Description of Dist directory content:

File/DirectoryDescription
bin This directory contents an executables for Hedeby system.
sdmadm Command line util for administrating Hedeby. More info you can find here Section 2.2, “Hedeby system administration”
util/templates This directory contains templates (necessary for installation). and arch script which is used to detect the system architecture.
util This directory contains utility scripts used by Hedeby.
util/arch Script for detecting architecture for machine on which the Hedeby is run.
util/arch_variables Script containing definitions of variables for different architectures supported by Hedeby.
util/supportRc.sh Script used by Hedeby for installation/uninstallation of RC scripts support .
util/sdmsmf Script used by Hedeby for installation/uninstallation of SMF support .
util/smf_sdmsvc Script used by Hedeby SMF support to manage lifecycle of Hedeby system.
man This directory contents manual for sdmadm.
lib This directory contains libraries for Hedeby system.

2.4.1.2.2. Local Spool directory

Local spool directory has to be specify on the local file system and for each managed host. The localspool directory can be different for each host. In local spool directory information about running componant are stored, but only this which JVMs are on that host, there are also logs from JVMs. This is also place for spooling local data and for security infos like certificates and keystores.

 localSpool --\
              |-- log --\
              |         \-- log files 
              |
              |-- run --\
              |         \-- files with the pids of running components 
              |             on local host
              |
              |-- security --\
              |              |-- ca --\
              |              |        \ (files for Grid Engine CA)
              |              |
              |              |-- deamons --\
              |              |             \-- keystores for JVM`s
              |              |
              |              \-- users --\
              |                          \-- keystores for Hedeby users
              |
              |-- spool --\
              |           \-- cs    spool directory for configuration
              |           \-- ...   spool directories for Hedeby components
              |
              |-- logging.properties
              |
              |
              \-- tmp --\
                        \-- tmp directories for Hedeby components

Table 2.9. Description of Local spool directory content:

File/DirectoryDescription
log This directory is for keeping the log files of JVMs which are running on that host. Section 2.2.5, “How to Monitor the System”
run This directory is used for keeping the prints of running JVMs on this host. Inside the file that coresponding with JVM name you can find a PID number of procss on which this component is running.
security This directory contains security informations like certificates or keystores of components and Hedeby users. You can find more hereSection 2.4.3.6, “Security”
spool This directory is used by the Hedeby components as persistent data storage. It contains the complete configuration of the Hedeby System (only in the host where the configuration service is running)
tmp This directory is used by the Hedeby components as temporary data storage
logging.properties This file is used to store logging settings for Hedeby system. More info about monitoring Hedeby system you can find here Section 2.2.5, “How to Monitor the System”

2.4.2. Java Virtual Machines

2.4.2.1. Overview (new)

The typical starting point for the Hedeby system configuration is to define the java virtual machines where the components should run. This also includes to define on which host a component should run.

When a Hedeby component should be started on a local host (Which might be done by sdmadm start command) the global configuration is used to find out which components have to be started in which virtual java machines on the local host.

One important task in the JVM configuration is to define the environment variables for the JVM. The current configuration supports the setting the LD_LIBRARY_PATH environment variables. All components startet in a JVM will have this defined environment setting.

The global configuration file is loaded/stored by the configuration service. It is stored in the local spool directory of the host where CS is running (<local spool>/spool/cs/global.xml).

The configuration file is written in XML format. The structure of the file is defined in the xml schema hedeby-common.xsd.

2.4.2.1.1. Definition of JVMs

Example 2.3. JVM configuration

<?xml version="1.0" encoding="UTF-8"?>
<global ...>
    <jvm1 name="root_jvm"2 user="root"3 port="0"4>
        <component ...>5
        </component>        
        ...
        <jvmArg>-Dfoo.bar.prop=aaa</jvmArg>6
        <jvmArg>-Xmx512M</jvmArg>
        
        <ldLibraryPath>7
            <pathelement>/opt/sge/lib/${ARCH}</pathelement>
        </ldLibraryPath>
    </jvm>
    ...
</global>
1

Each Hedeby system defines a set of Java virtual machines (JVMS). Each JVM is defined in a <jvm> tag.

2

Each JVM needs a name. The name has to be unique in the Hedeby system.

3

The user attribute specifies the name of the process owner of the JVM. Hedeby will try to start the JVM under this user account. This is only possible if a priviliged user starts Hedeby (on unix system user root).

4

Each JVM hosts a JMX server. The port attribute specifies the port where incoming request are accepted. If the value is 0 the port will be dynamically allocated.

5

Hedeby components (see Section 2.4.2.1.2, “Definition of Components”) are defined in the a <component> of each JVM.

6

With <jvmArg> tags it is possible to pass additional startup parameter to the JVM.

7

The <ldLibraryPath> allows the definition of the LD_LIBRARY_PATH for the JVM in a platform independed way.


2.4.2.1.2. Definition of Components

Inside the JVM configuration it is possible to define the components for this JVM. We have two different types of components:

Component types

Singleton

Only one instance of Singleton can exit in a Hedeby system. The configuration of a singleton only allows one host.

MultiComponent

One each host of the Hedeby one instance of this component can exist. The configuration defines a pattern for the hosts.

Example 2.4. Component configuration

<?xml version="1.0" encoding="UTF-8"?>
<global ...>
    <jvm ...>
        <component xsi:type="MultiComponent"1
                   name="executor"2
                   classname="com.sun.grid.grm....ExecutorImpl"3>
            <hosts>4
                <include>.*</include>
                <exclude>foo.bar</exclude>
            </hosts>
            <config>executor</config>5
        </component>
        <component xsi:type="Singleton"6
                   name="ge_service"
                   classname="com.sun.grid.grm....GEServiceImpl"
                   host="ge_master"7>
            <config>ge_service</config>
        </component>
        ...
    </jvm>
    ...
</global>
1

This component is a MultiComponent. On each of the hosts of a Hedeby system one instance of the component can exit.

2

Each component has a system wide unique name.

3

The classname attribute specifies the name of the java class which implements the component. This class must be available in the classpath of the JVMS.

4

Not each component will be started on each host. The <hosts> tag defines the hosts where the component will be started. It is possible to include and exclude hosts. The value of the <include> and <exclude> tag can be a java regular expression. The hosts tag is only allowed for MultiComponents.

5

Define the path to the configuration of the component With the parameter the component can load the configuration from the config service.

7

The component ge_service is a Singleton. It exists only one instance of this component inside of the Hedeby system. The host attribute defines the name of the host where this Singleton is started.


2.4.2.1.3. Configuration Service

The Configuration Service (CS) is a special component. It is not explicitly defined in a component tag. The Hedeby system detects automatically the JVM which hosts CS. It compares the hostname and the port with the CS URL stored in the preferences. The CS JVM must have a static port.

% sdmadm -s system1 show_configs -f
system  type    host       port   properties
--------------------------------------------
system1 SYSTEM master_host 310061
    spool=/var/spool/hedeby/system1
     dist=/opt/hedeby
     
    <?xml version="1.0" encoding="UTF-8"?>
    <global ...>
    <jvm name="cs_jvm" user="sdm_admin" port="31006"2>
    </jvm>
    ...
    </global>
1

The sdmadm show_config command shows the CS contact information including the hostname of the port of the CS component.

2

At the startup of the JVM cs on the master host the Hedeby system automatically dectects that this JVM hosts CS.

2.4.3. Components and their Configuration

This chapter describes configuration of the Hedeby components.

2.4.3.1. Resource Provider

2.4.3.1.1. Overview

The Resource Provider is the main component of the Hedeby system which provides information about resources and processes all information from the managed services. There exists only one Resource Provider in a Hedeby system and it should run in the JVM started as admin user (No root privileges required).

Once the Resource Provider component has been started the command line clients can get information about resources and which resources are assigned to services.

It is necessary to define the following component configurations before the Resource Provider can be started:

2.4.3.1.2. Configure the Resource Provider in the JVM system configuration

A typical component configuration entry in the system configuration file for the Resource Provider looks as follows:

Example 2.5. Example for Hedeby Resource Provider system component configuration

    
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<common:global name="mySystem"
               xmlns:executor="http://hedeby.sunsource.net/hedeby-executor"
               xmlns:reporter="http://hedeby.sunsource.net/hedeby-reporter"
               xmlns:security="http://hedeby.sunsource.net/hedeby-security"
               xmlns:resource_provider="http://hedeby.sunsource.net/hedeby-resource-provider"
               xmlns:common="http://hedeby.sunsource.net/hedeby-common"
               xmlns:ge_adapter="http://hedeby.sunsource.net/hedeby-gridengine-adapter">

    <common:jvm port="0"
                user="root"
                name="rp_vm">
        <common:component xsi:type="common:Singleton"
                          host="foo"
                          autostart="true"
                          classname="com.sun.grid.grm.resource.impl.ResourceProviderImpl"
                          name="resource_provider"
                          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>

        ...
    </common:jvm>
    ...
</common:global>

2.4.3.1.3. Resource Provider configuration

The Resource Provider component configuration is stored as a component configuration in the config service.

The configuration file is written in XML format. The structure is defined in the xml schema hedeby-resource-provider.xsd.

Example 2.6. Example for Resource Provider configuration

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<common:componentConfig xsi:type="resource_provider:ResourceProviderConfig"1
                        xmlns:executor="http://hedeby.sunsource.net/hedeby-executor"
                        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                        xmlns:reporter="http://hedeby.sunsource.net/hedeby-reporter"
                        xmlns:security="http://hedeby.sunsource.net/hedeby-security"
                        xmlns:resource_provider="http://hedeby.sunsource.net/hedeby-resource-provider"
                        xmlns:common="http://hedeby.sunsource.net/hedeby-common"
                        xmlns:ge_adapter="http://hedeby.sunsource.net/hedeby-gridengine-adapter"
    period="60">2

    <resource_provider:policies xsi:type="resource_provider:PriorityPolicyManagerConfig"
       defaultPriority="49">3
        <resource_provider:priority service="spare_pool"
                                    name="spare_pool_priority">1</resource_provider:priority>4
    </resource_provider:policies>    
</common:componentConfig>


1

The Resource Provider configuration is special component configuration. the xsi:type attribute defines the real type (must be ResourceProviderConfig for the Resource Provider).

2

The polling period at which to reprocess unsatisfied requests. Resource Provider will periodically query the managed Service Containers in search of available resources, until either the needed resources are located, or the Service Container rescinds the resource request.

3

The <policies tag defines the configuration for the Policy Engine. Currently only a priority based Policy Engine is defined (xsi:type="resource_provider:PriorityPolicies").

4

The <priority> defines the priority of each service. With the priority the urgencies of the services needs are weighted.

2.4.3.2. Reporter

2.4.3.2.1. Overview

Reporter component is a log/monitoring tool for Hedeby.

The role of reporter component is to intercept and gather informations about what is going on in the system. Administrator can sprecify what kinf of data he is interested in. Right now reporter is able to store informations and notifications that comes from Configuration Service, Resource Provider and all services that are installed in the system.

The reporter component is prepared to store data in arco data base. By prepared we mean that, there is a special arco format file created, that stores suitable for Arco data.

The data from Arco file arent so much readable for normal user, thats why Administrator can get and print out on the screen data using CLI commands. The data can be filtered using avaiable filters.

2.4.3.2.2. Reporter configuration

The Reporter component configuration is stored as a component configuration in the config service. The path to the configuration is defined in the <config> tag of the component definition.

The configuration file is written in XML format. The structure is defined in the xml schema hedeby-reporter.xsd.

Example 2.7. Example for Reporter configuration


<?xml version="1.0" encoding="UTF-8"?>
<reporter:reporter 
    xmlns:reporter="http://hedeby.sunsource.net/hedeby-reporter" 
    xmlns:common="http://hedeby.sunsource.net/hedeby-common" 
    filePattern="report-%g.log"
    fileCount="4"
    fileSize="5242880"/>                 
                
               


  • level

    As default set to ALL. Specify the verbose level of the reporter component. With the ALL as a default all kind of rows will be stored in the file. For INFO only the NOTIFICATION. For FINE also RESOURCE, STATE, CONFIGURATION, NEED. For FINER - NEED_PROP and RES_PROP.

  • filePattern

    As default set to report-%g.log. This value represent pattern for reporting file that will be used to store records in. The directory where this file could be found is the local spool of reporter component.

  • fileCount

    As default set to 4. This value represent max number of reporting files that can be created for reporting data.

  • fileSize

    As default set to 5242880. This value represent max size of reporting files that can be created for reporting data.

  • fileAppend

    As default set to true. This value idicates wheter data can append to file or new file will be created instead.

  • showResourceProviderEvents

    As default set to true. This value idicates wheter we want to report Resource Provider events or not.

  • showManagementEvents

    As default set to true. This value idicates wheter we want to report Resource Provider resource processing events or not.

  • showCSEvents>

    As default set to true. This value idicates wheter we want to report cs events or not.

2.4.3.3. Service Adapters, Grid Engine Adapter

Service Adapter is a component representing service that is managed by Hedeby System. Currently Hedeby supports only two type of services: Spare Pools and GE Adapter (managing Grid Engine).

The Service Adapter is capable of starting and stopping the currently running Service. To start a Service Adapter without starting a Service means to connect the Service Adapter to an already running Service. To stop a Service Adapter without stopping the associated Service means to disconnect from the Service but leave it running. Stopping a Service Adapter without stopping the associated Service also means that any resources assigned to that Service are effectively lost to Hedeby system.

Service Adapter talks to a specific Service. As Hedeby is supporting only Grid Engine (GE) there exists only Grid Engine Adapter. The specifics of gathering information in order to evaluate SLO's and the process of preparing and adding a new resource or releasing a current resource is handled by the Service Adapter. To achieve this GE Adapter uses JGDI and Executor Section 2.4.3.4, “Executors”.

The Service Adapter acts as the container for SLO's associated with its Service. The Service Adapter interacts with the Service to maintain a current perspective on the Service's SLO's. When an SLO is not being met, the Service Adapter must normalize that SLO into an urgency, an integer from 0 to 99. The Service Adapter also maintains a list of SLO priorities. When SLO non-compliance is raised to the Resource Provider, the Service Adapter first applies the SLO priority to the event, before sending it to the Resource Provider.

2.4.3.3.1. GEAdapter Configuration

A Grid Engine service in Hedeby is defined over an entry in the global configuration. Normally it is not necessary to modify this configuration. It is automatically created with the sdmadm add_ge_service. The following samples shows a typical definition of a Grid Engine Service.

Example 2.8. Example for Hedeby service system component configuration

<?xml version="1.0" encoding="UTF-8"?>
<global name="new_hedeby_system"
        xmlns='http://hedeby.sunsource.net/hedeby-common'
        xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
    ...
    <jvm name="ge_jvm" user="grm_admin" port="0">
        <component name="ge_service" 
                   classname="com.sun.grid.grm.service.impl.ge.GEServiceImpl">1
            <classpath>
                <pathelement>${GRM_DIST}/lib/sdm-ge-adapter.jar/>2
            </classpath>
            <hosts>3
                <include>master_host</include>
            </hosts>
            <config>ge_service</config>
        </component>
    </jvm>
    ...
</global>
1

Name of the class which implements the service. In the case of the Grid Engine service this is always com.sun.grid.grm.service.impl.ge.GEServiceImpl.

2

Necessary classpath for loading the service implementation. The Grid Engine Service needs also jgdi.jar in the classpath. It will be loaded in a extra classloader since it depends on SGE_ROOT (defined in the GEServiceConfig).

3

Defines the name of the host where the GE service will run. Normally a service component is only started on one host. A Grid Engine service component runs normally on the host where qmaster is installed.


As all other configurations Grid Engine Service configuration is defined in an xml structure. It can be modified with the sdmadm modify_component -c <component name> command. This command opens the configuration in an editor. The following section describes the configuration of a GEAdapter:

Example 2.9. Example for Grid Engine service configuration

<?xml version="1.0" encoding="UTF-8"?>

<componentConfig xsi:type="ge:GEServiceConfig"1
                    xmlns='http://hedeby.sunsource.net/hedeby-common'
                    xmlns:ge='http://hedeby.sunsource.net/hedeby-gridengine-adapter'
                    xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
    
    <slos>2
        ...
    </slos>
    
    <ge:connection clusterName="ge"3
                   root="/opt/sge" cell="default"
                   masterPort="31002" execdPort="31003" jmxPort="23011" 
                   username="sge_admin" password="secret"
                   keystore="/var/sgeCA/port31002/default/userkeys/sge_admin/keystore"/>
    
    <ge:sloUpdateInterval unit="minutes" value="5"/>4
    
    <ge_adapter:jobSuspendPolicy5
           suspendMethods="reschedule_jobs_in_rerun_queue
                           reschedule_restartable_jobs
                           suspend_jobs_with_checkpoint">
        <ge_adapter:timeout unit="minutes" value="2"/>
    </ge_adapter:jobSuspendPolicy>
    
    <ge:qmasterReconnectInterval unit="seconds" value="2"/>6
    <ge:execdStartupTimeout unit="seconds" value="60"/>7
    <ge:execdShutdownTimeout unit="seconds" value="60"/>8
    
    <ge:staticHosts>9
        <include>foo.bar</include>
    </ge:staticHosts>
    
    
    <!-- Configuration for Solaris execds -->
    <ge:execd adminHost="true"10
              submitHost="false" cleanupDefault="false" rcScript="false"
              ignoreFQDN="false" defaultDomain="foo.bar">

       <ge:filter>operatingSystemName = "SunOs" | operatingSystemName = "Solaris" </ge:filter>11

       <ge:localSpoolDir>/export/home/sge/execd</ge:localSpoolDir>
       
       <ge_adapter:installTemplate executeOn="exec_host">
            >ge_adapter:maxRuntime value="2" unit="minutes"/>12
            <ge_adapter:script>/opt/hedeby/grm_util/templates/ge-adapter/install_execd.sh</ge_adapter:script>13
            <ge_adapter:conf>/opt/hedeby/grm_util/templates/ge-adapter/install_execd.conf</ge_adapter:script>
        </ge_adapter:installTemplate>
        <ge_adapter:uninstallTemplate executeOn="exec_host">
            <ge_adapter:script>/opt/hedeby/util/templates/ge-adapter/uninstall_execd.sh</ge_adapter:script>
            <ge_adapter:conf>/opt/hedeby/grm_util/templates/ge-adapter/uninstall_execd.conf</ge_adapter:script>
       </ge_adapter:uninstallTemplate>
       
    <ge:execd>
       
    <-- Default execd install configuration -->
    </ge:execd>
       <ge:localSpoolDir>/usr/local/sge/execd</ge:localSpoolDir>14
    <ge:execd mapping="default">

</componentConfig>


1

A GEServiceConfiguration is a special component configuration. the <xsi:type> specifies the type. Must be ge:GEServiceConfig for a Grid Engine service.

2

In the first section of a GEServiceConfig always defines the list of SLOs (for details see the section called “SLO definition”).

3

The <connection> defines the connection parameters to the Grid Engine instance. The following parameters are required:

clusterName

Name of the cluster. The cluster name for Grid Engine has been introduced with version 6.2. It defines a unique name for the cluster. This parameter is necessary for the execd installation/uninstallation.

root

Path to root directory of the Grid Engine instance (SGE_ROOT).

cell

Cell name of the Grid Engine instance (SGE_CELL).

masterPort

Port where qmaster is listening. Required for installing an execd on a managed host. (SGE_MASTER_PORT).

execdPort

Port where execd is listening. Required for installing an execd on a managed host. (SGE_EXECD_PORT).

jmxPort

The Grid Engine Adapter communicates via JMX with qmaster. The jmxPort defines the port where qmasters MBeanServer is listening.

username

Username for JMX authentication. Must be a valid Grid Engine admin user.

password

Password for JMX authentication. Storing the password of inside of the GEAdpater configuration is not advisable. If possible a keystore for authentication should be used.

keystore

Defines the path to the keystore which contains the private key of the admin user. This credential will be used for JMX authentication against qmaster. The keystore can be created with the sge_ca script (included in Grid Engine 6.2 distribution, $SGE_ROOT/util/sgeCA/sge_ca). If the keystore is password protected the credentials can be specified in the password attribute of the connection element.

4

The Grid Engine Adapter updates periodially it's SLOs. The sloUpdateInterval defines how often this update is executed. For each update a qstat over the JMX connection is executed.

The administrator has to specify the value and the unit of the update interval. Valid values for unit are seconds, minutes and hours.

5

Before GEAdapter uninstalls an execd it checks the active jobs on this host. Depending on the job suspend policy configuration it can migrate jobs to different hosts. GEAdapter supports the following jobs suspend methods:

reschedule_jobs_in_rerun_queue

Any job running in a queue or a queue instance with the rerun flag can be rescheduled. If the reschedule_jobs_in_rerun_queue is specified GEAdapter will reschedule such jobs.

reschedule_restartable_jobs

Any active job the restart flag (see qstat -j) can be rescheduled. If the reschedule_restartable_jobs is specified GEAdapter will reschedule such jobs.

suspend_jobs_with_checkpoint

Any active job with an checkpointing environment can be suspended. Grid Engine will automatically reschedule the job on a different execd. If the suspend_jobs_with_checkpoint is specified GEAdapter will suspend jobs with checkpointing environment.

If any active job can not be rescheduled or suspended (if a checkpoint environment is defined) the GEAdapter will wait for the end of these jobs. It will wait at least until the timeout of the jobSuspendPolicy occurs.

6

If the connection to qmaster breaks down GEAdapter tries periodically a reconnection. The qmasterReconnectInterval parameter defines how ofter the reconnect is executed.

The administrator has to specify the value and the unit of the update interval. Valid values for unit are seconds, minutes and hours.

7

Whenever GEAdapter installs an execd on a manage host it is waiting for the signal from qmaster that the execd is now running. The execdStartupTimeout element defines how long GEAdapter is waiting for this signal. If this timeout occurs GEAdapter sets the resource into error state.

8

Whenever GEAdapter uninstalls an execd from a manage host it is waiting for the signal from qmaster that the execd has been stopped. The execdShutdownTimeout element defines how long GEAdapter will wait for this signal. If this timeout occurs GEAdapter sets the resource into error state.

9

In the staticHosts list the administrator defines a list of hosts the Grid Engine Adapter will never give up.

10

Defines the parameter for installing/uninstalling an execd on a managed host. The following attributes are defined.

adminHost

The default value is false. If this attribute is set to true the host will become Grid Engine admin host. If it is set to false the host will only be admin host during the execution of the exec installation script.

submitHost

If this attibute is set to true assigned host will become Grid Engine submit host. The default value is false.

cleanupDefault

The default execd installation adds the new host on the Grid Engine side to the @allhost host group and to the all.q cluster queue. If this parameter is set to true GEAdapter will remove this host from the @allhost host group and the all.q cluster queue if after uninstalling the execd. The default value for this attribute is false.

rcScript

If this parameter is set to true the boot time startup scripts for the execd will be installed/uninstalled for the host.

ignoreFQDN

If true the execd will ignore the domain names during hostname resolving. If false execd will use the fully qualified domain name. The default value is

defaultDomain

Name of the default domain of the execd.

mapping

Name of the used complex to resource property mapping (the section called “Complex to Resource Property Mapping”).

11

A Grid Engine Adapter configuration can have serveral execd definitions. The filter element defines which execd settings are used for a specific host. The value of this element must be a valid resource property filter expression (the section called “Filtering”). The order of the execd elements matters. The first element where filter expression against matches against the resource properties of a host resource is used for the exec installation/uninstallation.

An execd element without filter element matches against each host resource. Each GEAdapter should have such a element with the default settings at the end of the configuration.

14

Defines the local spool directory used for the execd.

13

The default installation of an execd is done with the Grid Engine auto install feature. GEAdapter uses some templates and replaces in these templates placesholders with configured values. The Hedeby distribution contains default templates. The administrator can override the pathes in the templates to modify customize installation/uninstallation (Section 2.4.3.3.2, “Execd installation/uninstallation”) . Each template has a script and a configuration. In both files the configured values are replaced. The following templates are available.

installTemplate

This template is used for creating the script for installing a execd (default script is <dist>/grm_util/templates/ge-adapter/install_execd.sh, default configuration template is <dist>/grm_util/templates/ge-adapter/install_execd.conf).

uninstallTemplate

This template is used for creating the script for uninstalling a execd (default script is <dist>/grm_util/templates/ge-adapter/uninstall_execd.sh.template, default configuration template is <dist>/grm_util/templates/ge-adapter/uninstall_execd.conf)

The executeOn attribute of a install template defines where the scripts generated out of the template will be executed. If executeOn is set to qmaster_host, the script will be executed of the executor component running on Grid Engines qmaster host. If executeOn is set to exec_host, the script will be executed over the executor of the execd host (managed host).

In the above example the GEAdapter will create the file install_execd.sh and install_execd.conf in a temporar directory on the managed host. The executor will execute the following command on the managed host:

# cd <tmp directory<
# ./install_execd.sh install_execd.conf
                   

12

With the maxRuntime element the maxiumum runtime of the execd installation or uninstalltion will be limited. If the installation script does not finish within this time it will be interrupted.

SLO definition
<?xml version="1.0" encoding="UTF-8"?>

<componentConfig ...>
    <slos>
        <slo name="<name of SLO>"1
             xsi:type="<type of SLO>"2
             urgency="13"3
             ...4>
             <resourceFilter>5
                 hardwareCpuArchitecture = "amd64" & operatingSystemName = "Solaris"
             </resourceFilter>
             ...6
        </slo>
    </slos>
</componentConfig>
1

Each SLO has a service wide unique name.

2

Concrete type of the SLO. Valid values for a Grid Engine Adapater are MinResourceSLOConfig, FixedUsageSLOConfig and MaxPendingJobsSLOConfig.

4

Depending on the SLO type additional attributes can follow.

3

Urgency of this SLO. The urgency is a number between 0 and 99. Each resources which is required by this SLO has a usage which is more or equals to the urgency of the SLO (see also Section 1.1.2.7, “Resource Usage”).

5

The request element defines the concrete resource which request once the SLO is not met. The request defines the required properties of a resource to full fill this SLO.

Depending on the SLO type additional elements can follow.

MinResourceSLO

The MinResourceSLO counts the number of assigned resources of a service which matches to a resource filter. This SLO ensure that the number these resources is not less then the defined minimum.

It further defines that all required resources for this SLO has the at least an usage which is equal or higher then the urgency of the SLO.

<?xml version="1.0" encoding="UTF-8"?>

<componentConfig ...>
    <slos>
        <slo name="minHostResourceSample"
             xsi:type="MinHostResourceSLOConfig"1
             urgency="13"
             min="20"2>
             <request>...</request>
             <resourceFilter>3
             <![CDATA[
                 hardwareCpuArchitecture = "amd64" & 
                 operatingSystemName ="Solaris"
             ]]>
             </resourceFilter>
        </slo>
    </slos>
</componentConfig>
1

The concrete type is MinResourceSLOConfig.

2

The min attribute defines the minimum number of matching resource for a met SLO.

3 The MinResourceSLO allows a resourceFilter. This filter defines set of resources which a considered by this SLO. The fiter is applied against the properties if the assigned resource (assigned to the service, see also the section called “Filtering”).
PermanentRequestSLO

The PermanentRequestSLO sends permanent requests, a need for a resource. This SLO is mainly used in spare pools components to gather unused resources. Typicaly it has the lovest urgency set.

It is also possible to define an urgency and resource type that is wanted by the service which is using this SLO.

    <?xml version="1.0" encoding="UTF-8"?>
    
    <common:slos>
        <common:slo xsi:type="common:PermanentRequestSLOConfig1
                    urgency="1" 2
                    name="PermanentRequestSLO">
            <common:request>type = "host"</common:request>3
        </common:slo>
    </common:slos>
1

The concrete type is PermanentRequestSLOConfig.

2

The urgency parameter defines the urgency for the needs sent by this service

3

This element defines the properties that describes a need which is sent by this service. It could be resource type host for example.

FixedUsageSLO

The FixedUsageSLO is a special SLO. It never produces any need. It only ensures the all assigned resources of the service which matches the resourceFilter of this SLO has a usage which is equal or higher then the urgency of the SLO.

<?xml version="1.0" encoding="UTF-8"?>

<componentConfig ...>
    <slos>
        <slo name="fixUsageSample"
             xsi:type="FixedUsageSLOConfig"1
             urgency="13">
             <resourceFilter>2
             <![CDATA[
                 hardwareCpuArchitecture = "amd64" & 
                 operatingSystemName = "Solaris"
             ]]>
             </resourceFilter>
        </slo>
        ...
    </slos>
    ...
</componentConfig>
1

The concrete type is FixedUsageSLOConfig.

2

All resources which matches this resourceFilter are considered by the SLO. (the section called “Filtering”)

MaxPendingJobsSLO

The MaxPendingJobsSLO counts the number of pending jobs of a Grid Engine instance which matches a job filter. If the number of jobs is higher then a maximum value a need is produced.

Each host resource which matches the request filter and which runs jobs matching to the job filter will have at least the usage of the MaxPendingJobsSLO.

<?xml version="1.0" encoding="UTF-8"?>

<componentConfig xmlns:ge="http://hedeby.sunsource.net/hedeby-gridengine-adapter"
    ...>
    <slos>
        <slo name="maxPendingJobs"
             xsi:type="ge:MaxPendingJobsSLOConfig"1
             urgency="13"
             max="100"2>
             <resourceFilter>3
             <![CDATA[
                 hardwareCpuArchitecture="amd64" & 
                 operatingSystemName="Solaris"
             ]]>
             </resourceFilter>
             <ge:jobFilter> license_1 = "1"</ge:jobFilter>4
        </slo>
        ...
    </slos>
    ...
</componentConfig>
1

The concrete type is ge:MaxPendingJobsSLOConfig. The configuration of this SLO is defined in a different xml namespace (xmlns:ge="http://hedeby.sunsource.net/hedeby-gridengine-adapter"). The type of the configuration must have a namespace prefix. All elements which are specific to the MaxPendingJobsSLO must also have the prefix.

3

If the MaxPendingJobsSLO detects that the cluster has more pending jobs as defined in attribute max it produces a need. The properties of the needed resources are defined in the request element.

4

All resources which matches this jobFilter are considered by the SLO. (the section called “Filtering”) The jobFilter allows filtering of pending jobs according to the hard resource requests.

Filtering

The configuration of the Hedeby components allows the definition of filters in serveral places. All these filters uses the same filter language. The following listing shows the generic definition of a filter in Backus–Naur form (BNF):


   filter:         orExpr <EOF>
   orExpr:         andExpr ("|" andExpr)*
   andExpr:        expr ("&" expr)*
   expr:           "(" orExpr ")" | "!" orExpr | booleanExpr
   booleanExpr:    compareExpr | matchEpr
   compareExpr:    value ( "<"|"<="|"="|"!="|">="|">" ) value
   matchExpr:      "matches" stringLiteral
   value:          identifier | constant   
   constant:       int_literal | float_literal | string_literal | bool_literal | null_literal
   identifier:     ["a"-"z","A"-"Z","_"] ( ["a"-"z","A"-"Z","_","0"-"9","."] )*  
   int_literal:    integer literal (e.g. 10, 1G, 1g, xFF)
   float_literal:  floating point literal as in java (e.g. 12.0E3G)
   string_literal: string literal as in java (e.g. "a")
   bool_literal:   "true" | "false"
   null_literal:   "null"
    

The basic expression of the filter language is the compareExpr. It compares two operands based on an operator (<,<=, =, >=, >) and the result is the boolean value true or false. The two operands are defined with the value rule. The operands can be constants or the value of a variable. The name of the variable is the identifer. Constant values can be

ConstantDescription
string_literalSequence of characters enclosed by " or ' ('aaaa', "bbbb"). The ' and " can be quoted with a backslash ("aaaa\"dd")
boolean_literaltrue or false
null_literalnull, can be used to check that a variable has a value (sampe_var = null)
float_literal or int_literal

0, 0.1, -10.0, 1E-7 (scientic format)

Additionally the number can have the following multipier suffixes:

MultiplierDescription
GDecimal Giga, the value is multiplied by 109
MDecimal Mega, the value is multiplied by 106
KDecimal Kilo, the value is multiplied by 103
gBinary Giga, the value is multiplied by 230
mBinary Mega, the value is multiplied by 220
kBinary Kilo, the value is multiplied by 210

The basic expressions can be combined with or/and expressions. The and expression has a higher binding than the or expression. The binding can be influenced be setting brackets. The basic expressions can also be negated:

ExpressionResult
true & false | true true
false | false & true false
(true | false) & true true
!(1 < 12.0G) false

According to the constants the filter language knowns four different data types (string, boolean, number or null type). If a compare expression has to compare different types of data it tries to do a data conversation with following the rules

Operand 1Operand 2Description
stringstring No conversion necessary. Case sensitive string compare is done
stringboolean The string datatype will be converted into the boolean value true if the content of the string is "true" or "TRUE". Otherwise the string will be converted into the boolean value false.
stringnumber If the content of the string is a valid numeric literal it will be converted into a number. If it does not conatin a valid numeric literal only a comparision with the != operation will have the result true ("a" = 10 => false, "a" != 10 => true, "a" < 10 => false, "a" > 10 => false).
stringnull Only a the != operator will return true ("a" = null => false, "a" != null => true, "a" < null => false, "a" > null => false).
booleanboolean true = true => true, true = false => false, true != true => false, true != false => true, true > false = true, true < false => false
booleannumber Only a the != operator will return true (true = 1.0 => false, false = 1 => false, true != 1 => true, false != 1 => true, true > 1 => false, false > 1 => false, true < 1 => false, false < 1 => false
booleannull Only a the != operator will return true (true = null => false, false = null => false, true != null => true, false != null => true, true > null => false, false > null => false, true < null => false, false < null => false
numbernull Only a the != operator will return true (1 = null => false, 1 != null => true, 1 > null => false, 1 < null => false,... )

The matchExpr can be used to match the string representation of a value against a regular expression:

"a" matches "[a-z]*" => true
The filter expression uses the standard java implementation of regular expressions. For more details please have a look at http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html

Example 2.10. Job filters for the MaxPendingJobsSLO

The MaxPendingJobsSLO produces needs if the number of pending jobs in a Grid Engine cluster exceeds a certain number. The administrator can configure with a filter expression what jobs are considered by the SLO. The variables in the filter expressions (identfiers) can evaluate all hard resource requests of the jobs. In the following example job 22377 is pending in the Grid Engine cluster:

% qstat -j 22377
==============================================================
job_number:                 22377
exec_file:                  job_scripts/8
...
hard resource_list:         arch=sol-sparc64, num_proc=2, lic=1
...
               

With the following job filter the job with id 22377 would be considered if the MaxPendingJobsSLO has the following job filter:

                   
arch = "sol-amd64" & lic = "1"

               

The above filter will match if a Grid Engine job is submitted with qsub -l arch=sol-amd64 -l lic=1.

                   
arch matches "sol-.*"

               

Using matches a regular expression can be used for filtering. The above filter matches against all jobs with a hard request for a complex arch starting with sol.


Example 2.11. Resource Request Filter for SLOs

If a SLO has a need it sends a request to Resource Provider. This request contains a request filter which is used to find matching resource. The request filter has access to any resource property of the resources.

                   
(hardwareCpuArchitecture = amd64 & hardwareCpuCount >= 2)  |
(hardwareCpuArchitecture = sol-sparc64 & hardwareCpuCount >= 4)

               

The above resource filter will match against all resources with more then one amd64 CPU or all sol-sparc64 hosts with more then 3 CPUs

                   
state != "ERRROR"
                   
               

The last example shows that the request filter can also filter resource according to it's state. Only resource which are not in error state will match.


Complex to Resource Property Mapping

GEAdpater automatically updates the properties of the assigned host resource. With the "Complex to Resource Property Mapping" the administrator can define what complex values are used. After the installation of the first Grid Engine Service the default mapping is installed. It can be displayed with the sdmadm show_ge_complex_mapping command.

For a complex mapping the administrator has to define the name of the complex the value of the complex and the list of resource property which will be used.

<ge:mapping>
    <ge:resource>
        <source name="arch">sol-sparc64</source>1
        <target>
            <property name="hardwareCpuArchitecture">sparcv9</property>
            <property name="operatingSystemName">Solaris</property>
        </target>
    </resource>
    <ge:resource>
        <source name="num_procs">*</source>2
        <target>
            <property name="hardwareCpuCount">$VALUE</property>
        </target>
    </resource>
</ge:mapping>

1

Each host which has the arch complex set to sol-sparc64 is mapped into a host resource with the resource properties hardwareCpuArchitecture=sparcv9, operatingSystemName=Solaris.

2

If a host has defined the complex num_proc the value if this complex is transformed into a resource property hardwareCpuCount. The value of the resource property is the same as the value of the complex.

2.4.3.3.2. Execd installation/uninstallation

When ever a resource is assigned to a Grid Engine service the Grid Engine Adapter tries to install an exec daemon on the managed host. When ever a resource is removed from an execdaeomon it uninstalls the execd daemon from the managed host. This section describes how install/uninstall is done.

Per default Grid Engine uses Grid Engines auto installation feature to perform an execd installation/uninstallation. The installation is done by executing a install script and with configuration file on the executor of the manged host.

GEAdpater expects the following exit value from the install/uninstall scripts:

0

Execd daemon has been successfully installed/uninstalled.

1

Installation/uninstallation was not successful. The host has been modified. In this case the host resource will be set into error state. Administrator has to cleanup the host. It will be reused after a reset (sdmadm reset_resource).

2

Installation was not successful. However the host has not been modified. In this case the GEAdapter will reject the resource. This means it tells the Resource Provider that it can not use the resource. Resource provider will assign the resource to a different host

The installation script and the configuration file is generated out of templates. The pathes to the templates can be defined in the execd elements of the GEAdapter configuration (see also Example 2.9, “Example for Grid Engine service configuration”). GEAdapter replaces in this templates some placeholder with the installing settings. The following placeholders are used:

@@@SGE_CLUSTER_NAME@@@

Name of the Grid Engine Cluster name (content of $SGE_ROOT/$SGE_CELL/common/cluster_name file

@@@SGE_ROOT@@@

The Grid Engine root directory ($SGE_ROOT).

@@@CELL_NAME@@@

Name of the Grid Engine cell ($SGE_CELL)

@@@EXEC_HOST@@@

Name of the host where the executor will be installed

@@@SGE_EXECD_PORT@@@

Port of the execd daemon of the cluster

@@@SGE_QMASTER_PORT@@@

Port where qmaster is listening for incoming requests.

@@@EXEC_HOST_LIST@@@

Contains the name of the execd which will be installed (only set for installation).

@@@EXEC_HOST_LIST_RM@@@

Contains the name of the execd which will be unnstalled (only set for uninstallation).

@@@DEFAULT_DOMAIN@@@

Name of the default domain if the cluster (needed for hostname resolving).

@@@HOSTNAME_RESOLVING@@@

If set to false hostnames will be resolved by the full qualified host name.

@@@ADD_TO_RC@@@

Should the boot time startup scripts be installed (only set for installation, this does not include SMF support).

@@@REMOVE_RC@@@

If this value is set to true the boot time startup script will be removed (only set for uninstallation).

@@@SUBMIT_HOST_LIST@@@

If the execd should become a submit host the hostname is included in the @@@SUBMIT_HOST_LIST@@@.

@@@EXECD_SPOOL_DIR_LOCAL@@@

Contains the path of the local spool directory for a exec daemon.

Default install template
BASEDIR=`pwd`
SGE_ROOT="@@@SGE_ROOT@@@"

if [ ! -d "$SGE_ROOT" ]; then
    echo "SGE_ROOT directory $SGE_ROOT does not exists"
    exit 2
fi
if [ ! -f "$SGE_ROOT/inst_sge" ]; then
    echo "inst_sge script in directory $SGE_ROOT not found"
    exit 2
fi

if [ ! -x "$SGE_ROOT/inst_sge" ]; then
    echo "inst_sge script in directory $SGE_ROOT is not executable"
    exit 2
fi

if [ ! -f "$BASEDIR/install_execd.conf" ]; then
    echo "auto config file $BASEDIR/install_execd.conf not found"
    exit 2
fi

cd "$SGE_ROOT"
./inst_sge -x -noremote -auto $BASEDIR/install_execd.conf
res=$?
exit $res

2.4.3.4. Executors

2.4.3.4.1. Overview

The general overview of Executor component can be found in Section 1.2.1.4, “ Executor ” and Section 1.1.5, “Executor”

2.4.3.4.2. Definition of Executor in Hedeby System

The General configuration of Executor component is specified in Java Virtual Machines section For detailed information about specific fields (see Section 2.4.2, “Java Virtual Machines”).

Example 2.12. Executor configuration

<?xml version="1.0" encoding="UTF-8"?>
<executor:executor 
                 idleTimeout="60"1
                 maxPoolSize="10"2 
                 corePoolSize="3"3 
                 keepFiles="false"4
                 xmlns:executor="http://hedeby.sunsource.net/hedeby-executor"
                 xmlns:reporter="http://hedeby.sunsource.net/hedeby-reporter"
                 xmlns:security="http://hedeby.sunsource.net/hedeby-security"
                 xmlns:resource_provider="http://hedeby.sunsource.net/hedeby-resource-provider"
                 xmlns:common="http://hedeby.sunsource.net/hedeby-common"
                 xmlns:ge_adapter="http://hedeby.sunsource.net/hedeby-gridengine-adapter"">                 
   <executor:maxCommandRuntime unit="minutes" value="10"/>5
</executor:executor>

1

The executor manages a thread pool. Each thread executes and observes a command. The idleTimeout defines how long a thread is kept alive in the thread pool if it is idle.

2

Maximum number of threads in the executor thread pool. This parameter limits the number of concurrently executed commands.

3

The minimum number of thread in the thread pool. This number of thread is always kept alive, even if they are idle.

4

If the keepFiles flag is set the executor does not delete the temporary files of the executed commands. This is helpful for debugging purpose.

5

The maxCommandRuntime element allows to limit the maximum runtime of a command executed by this executor. If the runtime is exceeded the executor will interrupt the command. The unit attribute can have the values seconds, minutes or hours. The value attribute specifies the time interval in the given units. The default value is 10 minutes.

Note

To be fully operative, the Executor has to be run on JVM that is run as "root" user who has uid equal to 0. Otherwise, the user switching is not working anymore. Executor should be defined on each resource host.

Example 2.13. Typical executor component definition

    
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<common:global name="mySystem"
               xmlns:executor="http://hedeby.sunsource.net/hedeby-executor"
               xmlns:reporter="http://hedeby.sunsource.net/hedeby-reporter"
               xmlns:security="http://hedeby.sunsource.net/hedeby-security"
               xmlns:resource_provider="http://hedeby.sunsource.net/hedeby-resource-provider"
               xmlns:common="http://hedeby.sunsource.net/hedeby-common"
               xmlns:ge_adapter="http://hedeby.sunsource.net/hedeby-gridengine-adapter">
    <common:jvm port="0"
                user="root"
                name="executor_vm">
        <common:c