       Upgrading to Grid Engine 5.3/Installing a SGE(EE) 5.3 patch
       -----------------------------------------------------------

   All users who plan to make an upgrade from a previous version of
   CODINE/GRD/SGE or who install a Grid Engine 5.3 patch should carefully
   read this document.

   This document contains all information how to carry out successfully an
   upgrade from a previous version of the product to Grid Engine 5.3
   (including Enterprise Edition). The update procedure does not support any
   upgrades from CODINE 4.x or GRD 1.x.

   If you are going to install a Grid Engine 5.3 patch please begin to read
   this document with 1.5) "Installing a SGE(EE) 5.3 patch".


Content
-------

1) Introduction
   1.1) Terms used in this document
   1.2) Supported upgrade paths
   1.3) Estimated time for the upgrade procedure
   1.4) Upgrading from SGE 5.3beta2
   1.5) Installing a SGE(EE) 5.3 patch
2) Installation changes
   2.1) Name of daemons and commands
   2.2) Daemon startup scripts
   2.3) Environment variables
   2.4) Service name for communication daemon
   2.5) Other file name changes
   2.6) QSI
3) Upgrading to SGE(EE) 5.3
4) Cluster shutdown: No running daemons, no running or pending jobs
5) Backup of the old CODINE/GRD/SGE system
6) Deleting directories of the old CODINE/GRD/SGE system
   6.1 Do not delete old cluster configuration
   6.2 Check any references in configuration to old files
   6.3 Delete old distribution installed with 'pkgadd' on Solaris
   6.4 Delete old distribution installed from 'tar.gz' files
7) Unpacking the SGE distribution
   7.1) Prerequisites
   7.2) Installing SGE with 'pkgadd' on Solaris
   7.3) Installing SGE 'tar.gz' distribution
8) Rename the service codine_commd/grd_commd
9) Decide about using the 'adminuser' feature
10) Running the update script
   10.1) Login as user root or as 'adminuser'
   10.2) Set SGE_ROOT and environment variables
   10.3) Run the update script
11) Installing the new startup script locally an all execution hosts
12) Restart SGE
13) Upgrading from SGE(EE) 5.3beta2
14) Installing a SGE(EE) 5.3 patch 
15) Copyright


1) Introduction
---------------

   1.1 Terms used in this document
   -------------------------------

   The term "Grid Engine" refers to the Sun product "Sun ONE Grid Engine"
   and the software from the Grid Engine open source project.

   The term "Grid Engine, Enterprise Edition" refers to the Sun product "Sun
   ONE Grid Engine, Enterprise Edition" and the software from the Grid
   Engine open source project.

   SGE and SGEEE are used as abbreviations for "Grid Engine" and "Grid
   Engine, Enterprise Edition".

   This document usually refers to SGE(EE) 5.3. Version "5.3" always refers
   to the most recent patch release of this software. E.g. SGE(EE) 5.3p1 and
   SGE(EE) 5.3p2 are patch releases of SGE(EE) 5.3.

   A CDROM or a download of SGE(EE) 5.3 may automatically provide you with
   the most recent patch release of this software. See 1.3 for more
   information about SGE(EE) patch releases.

   1.2 Supported upgrade paths
   ---------------------------

   Upgrading to Grid Engine and Grid Engine, Enterprise Edition is supported
   through the script '$SGE_ROOT/util/sge_update.sh' of the Grid Engine 5.3
   distribution. You can upgrade from any version of

      CODINE 5.0.x
      CODINE 5.1.x
      GRD 5.0.x
      GRD 5.1.x
      Sun Grid Engine 5.2.x
      Grid Engine 5.3beta1
      Grid Engine 5.3beta2  (see special notes in 1.2))

   to this version of Grid Engine 5.3 (SGE 5.3) or Grid Engine Enterprise
   Edition 5.3 (SGEEE 5.3). You cannot "downgrade" from GRD to SGE 5.3.


   1.3 Estimated time for the upgrade procedure
   --------------------------------------------

   We estimate the following time for the various steps of the upgrade:

   Reading this document and preparing the upgrade:  0:30-2 hours
   Shutting down all SGE daemons, verification:      1-3 minutes per host
   Deleting old spool files:                         
      - spool directories on shared file system      5 minutes
      - spool directories installed locally          1-3 minutes per host
   Making a backup of the old configuration          5-10 minutes
   Deleting the old version:			     5-10 minutes
   Installing the new version:		             5-15 minutes
   Running the upgrade procedure:		     5-10 minutes
   Updating the local startup scripts and starting
   the daemons:					     1-3 minutes per host

   Depending on the complexity of your installation it may be necessary to
   carry out additional functional tests for the various SGE objects, like

        - tests of parallel environments
        - tests of checkpointing environments
        - load sensor scripts
        - cluster and queue configuration settings like
           prolog, epilog, terminate, suspend methods
        - command tests (qsub, qrsh, qlogin, qsh)

   If you modified original sample scripts and configuration examples and
   you are referencing these files located in the original distribution or
   if you decide not to use the compatibility mode for environment variables
   (which is suggested), these additional tests may take several hours to
   ensure the full functionality of Grid Engine 5.3.

   The upgrade procedure will be easier to carry out if you have access to
   user root (rsh or ssh) without providing a password to all your execution
   hosts (this does not necessarily need to be from your qmaster machine).


   1.4 Upgrading from SGE 5.3beta2
   -------------------------------

   If you are upgrading from SGE(EE) 5.3beta2 to SGE(EE) 5.3 and you are not
   changing the product mode from SGE --> SGEEE, you basically only need to
   change the binaries. See 13) for detailed directions.

   If you wish to upgrade from SGE 5.3beta2 to SGEEE 5.3 plaese follow the
   directions below.


   1.5 Installing a SGE(EE) 5.3 patch
   ----------------------------------

   If you already installed SGE(EE) 5.3 or a SGE(EE) 5.3 patch release
   (e.g. 5.3p1 etc.) and you want to upgrade to the newest patch level


   If you are upgrading from a previous version please make sure to

       - install the latest patch release if available

       - install SGE(EE) 5.3 and all patches for SGE(EE) 5.3 if a patch
          release is not available.

   A patch release of SGE(EE) is a release which contains the original
   version of SGE(EE) and all available patches. After installation of a
   patch release it is not neccessary to install


2) Installation changes compared to previous versions
-----------------------------------------------------

   Probably the biggest (installation wise) changes since versions 5.0-5.2. 
   is the name of the daemons, environment variables and the name of the
   service for the communication daemon. There are no more differences
   between the names if the product is running as Grid Engine (SGE) or as
   Grid Engine Enterprise Edition (SGEEE). This change will make it much
   easier to upgrade an SGE cluster to SGEEE.

   2.1) Name of daemons and commands
   ---------------------------------

   The name of all Grid Engine daemons now begins with the "sge_" prefix:

      sge_commd
      sge_coshepherd
      sge_execd
      sge_qmaster
      sge_schedd
      sge_shadowd
      sge_shepherd
   
   The command "{cod|grd}commdcntl" has been renamed to 'sgecommdcntl'.

   It is not supported to rename any of the binaries of the distribution!

   If you use any scripts or tools in your SGE cluster which monitor Grid
   Engine daemons, you need to make sure to check for the new names.


   2.2) Daemon startup scripts
   ---------------------------

   The name of the startup script has changed. The system wide startup
   script in

      <codine_root>/<cell>/codine5
      <grd_root>/<cell>/grd5

   now has the name

       <sge_root>/<cell>/rcsge

   The per machine startup script, which is often installed in

       /etc/init.d/{codine5|grd5}

   and the symbolic link which is often installed in

       /etc/rc2.d/S95{codine5|grd5}

   is now called

       rcsge 

   and

       S95rcsge

   It is necessary to delete the old local startup scripts and the symbolic
   links and replace them with the new startup script.

   On Linux systems with the "insserv" binary (see section (11) below), the
   number for the "S95rcsge" script will be determined dynamically.
    

   2.3) Environment variables
   --------------------------

   The name of environment variables which begin in version 5.0-5.2 with the
   CODINE_/COD_/GRD_ prefix now begin with the "SGE_" prefix. Here are a few
   examples:

       CODINE_ROOT/GRD_ROOT --> SGE_ROOT
       COD_CELL/GRD_CELL    --> SGE_CELL
       COD_O_HOME           --> SGE_O_HOME
       GRD_STDOUT_PATH      --> SGE_STDOUT_PATH

   SGE(EE) 5.3 supports a compatibility mode where the old names of
   variables still can be used. The upgrade script will ask if you want to
   set this mode.

   Only SGE(EE) 5.3 will support this mode. Future SGE(EE) versions may not
   support this compatibility mode anymore. 

   It is not recommended to use this compatibility mode. If the size of a
   typical user environment is already near its limit you may encounter
   problems with the proper setting of all environment variables when a job
   is started.


   2.4) Service name for communication daemon
   ------------------------------------------

   The communication daemon (commd) service names has changed from

       codine_commd/grd_commd --> sge_commd

   The old name of the service is not anymore supported.


   2.5) Other file name changes
   ----------------------------

   All files and manual pages which had the prefix codine_/cod_/grd_ are
   renamed and use the prefix "sge_".

   In <root_dir>/<cell>

      codine_aliases/grd_aliases    --> sge_aliases
      cod_request/grd_request       --> sge_request      

   In the user's home directory and the submit directory:

       .cod_request/.grd_request    --> .sge_request

   
   2.6) QSI
   --------

   The QSI (Queuing System Interface) is not anymore part of the product.


3) Upgrading to SGE(EE) 5.3
---------------------------

   The upgrade procedure comprises the following steps:
   
         - shutting down the cluster                 (see (4))
         - backing up your old CODINE/GRD/SGE system (see (5))
         - deleting old files and directories        (see (6))
         - unpacking the distribution                (see (7))
         - rename the service                        (see (8))
         - decide about "admin_user"                 (see (9))
         - run the update script                     (see (10))
         - install new startup script on every host  (see (11))
         - restart Grid Engine                       (see (12))

   It is recommended that after making a backup of your old CODINE/GRD/SGE
   cluster, to delete all files of the old distribution. See (5) and (6) for
   more information.


4) Cluster shutdown: No running daemons, no running or pending jobs
-------------------------------------------------------------------

   Make sure to shutdown your cluster before upgrading. There must be no
   running or pending jobs at qmaster or at the execution hosts. You should
   also make sure that there are no more running communication daemons
   (cod_commd, grd_commd, sge_commd) or a running scheduler daemon
   (cod_schedd grd_schedd, sge_schedd) on the qmaster host.

   To be absolutely safe, you can login to every host of your CODINE/GRD/SGE
   cluster and execute an appropriate 'ps' command. You might want to 'grep'
   for the string 'cod_', 'grd_' or 'sge_' in your 'ps' output to identify
   remaining CODINE/GRD/SGE processes (if you execute the commands described
   in the next section, all CODINE/GRD components should be gone).

   You can shutdown your CODINE/GRD/SGE cluster with the following commands:

      # qconf -kej                 (kills all execution daemons and jobs)
      # qconf -ks                  (kills the scheduler daemon)
      # qconf -km                  (kills the master daemon)   
      # {cod|grd|sge}commdcntl -k  (kills the communication daemon)
      --> Execute the last command from every execution host and from
          the master host.

   To be absolutely sure that there are no old jobs in your configuration
   you should delete the execution daemon spool directories and the spooled
   jobs in the qmaster spool directory:

      # rm -rf <execd_spool_dir>/<hostname>
      # rm -rf <qmaster_spool_dir>/jobs
      # rm -rf <qmaster_spool_dir>/job_scripts
      # rm -rf <qmaster_spool_dir>/zombies


5) Backup of the old CODINE/GRD/SGE system
------------------------------------------

   Before beginning the upgrade procedure it is highly recommended to make
   a backup of your old CODINE/GRD/SGE system.

   To minimize the size of the backup you may safely delete:

      - the execd spool directories (no configuration is stored here) in:
                   
         <execd_spool>/<hostname>

      - old "messages" files of qmaster and scheduler:

         <qmaster_spool_dir>/messages
         <qmaster_spool_dir>/schedd/messages

   You can make the backup with the command:

      % tar cvf OLDSGE-BACKUP.tar <your_sge_root_dir>

   You should also create a separate backup of your CODINE/GRD/SGE
   configuration (assuming everything is installed in standard locations):

      % tar cvf OLDSGE-CONFIG.tar $CODINE_ROOT/default/common \
                                  $CODINE_ROOT/default/spool/qmaster


6) Deleting directories of the old CODINE/GRD/SGE system
--------------------------------------------------------

   6.1 Do not delete old cluster configuration
   -------------------------------------------

   You must not delete your old cluster configuration. Thus you must not
   delete your 'common' directory which is located in:

      <your_sge_root>/<cell>/common

   This directory is usually located in: 

       <your_sge_root>/default/common
   
   You also must not delete your qmaster spool directory (the path to your
   qmaster spool directory is defined in the global cluster configuration).
   Often the qmaster spool directory is located in:

       <your_sge_root>/default/spool/qmaster


   6.2 Check any references in configuration to old files
   ------------------------------------------------------

   If you made any local changes to files of the distribution, please make
   sure to make a separate backup of these files. Typically these might be
   files which where modified and are now referenced in your cluster
   configuration or are used by jobs from your users:

       - files in the 'mpi/', 'pvm/', 'ckpt/' directory and are used by a
         parallel environment (PE) or checkpointing environment (CKPT)

       - wrapper commands in 'mpi/' used by batch job scripts

       - load sensor scripts in 'util/resources/loadsensors'

   
   6.3 Delete old distribution installed with 'pkgadd' on Solaris
   --------------------------------------------------------------

   If you installed SGE(EE) 5.2.x or SGE(EE) 5.3beta on Solaris with
   "pkgadd" it is typically safe to remove the following packages (not all
   of these packages are usually available on your system):

      SDRMdoc
      SDRMcomm
      SDRMsp32
      SDRMsp64
      SDRMsia
      SDRMEdoc
      SDRMEcomm
      SDRMEsp32
      SDRMEsp64

   with the "pkgrm" command. To see what SGE packages are installed you can
   run the command:

      # pkginfo | grep SDRM


   6.4 Delete old distribution installed from 'tar.gz' files
   ---------------------------------------------------------

   If you installed a previous version by unpacking the "tar" distribution
   it is typically safe to delete the following files and directories (not
   all files necessarily may exist in your CODINE/GRD/SGE root directory):

      # cd <your_sge_root>
      # rm -rf 3rd_party
      # rm -f README*
      # rm -f LICENSE*
      # rm -f UPGRADE*
      # rm -rf api
      # rm -rf bin
      # rm -rf catman
      # rm -rf ckpt
      # rm -rf doc
      # rm -rf examples
      # rm -f inst_codine
      # rm -f inst_grd
      # rm -f install_execd
      # rm -f install_qmaster
      # rm -rf locale
      # rm -rf man
      # rm -rf mpi
      # rm -rf pvm
      # rm -rf qmon
      # rm -rf qsi
      # rm -rf security
      # rm -rf util
      # rm -rf utilbin


7) Unpacking the SGE distribution
---------------------------------

   7.1 Prerequisites
   -----------------

   For unpacking the SGE distribution you need to login to the machine where
   user root has read/write permissions in the $SGE_ROOT directory. This is
   either your fileserver or a machine where the NFS mount point is
   configured appropriately. 

   Installation of the distribution needs to be done by user 'root'. The NFS
   clients must not mount the $SGE_ROOT directory with the '-nosuid' NFS
   mount option - otherwise the 'qrsh' command (and related commands like
   'qmake' and 'qtcsh') will not work. If you cannot mount $SGE_ROOT without
   the '-nosuid' you can configure the path to your "qrsh" command in the
   global and local cluster configuration.

   It is not necessary to allow read/write permissions for user root to
   install and run SGE successfully.


   7.2) Installing SGE with 'pkgadd' on Solaris
   --------------------------------------------

   If you plan to install SGE for Solaris with 'pkgadd' please make sure to
   remove all previous 'SDRM*' packages or all directories of the
   distribution which are mentioned in the previous section.

   The default base directory ($SGE_ROOT) and default 'adminuser' has
   changed to:

      Base directory: /gridware/sge (or /gridware/sgeee)   
      Adminuser:      sgeadmin

   You should select your previous base directory and your previous
   'adminuser' name for the new installation.

   It is recommended to install SGE with 'pkgadd' on Solaris if it's
   technically possible (file server is a Solaris machine are file system is
   mounted by a Solaris machine). This will allow you to manage the software
   and patches with standard mechanisms provided by Sun. If you are going to
   install SGE for other binary architectures you can easily add and unpack
   the 'tar.gz' files for these architectures in your $SGE_ROOT directory.


   7.3) Installing SGE 'tar.gz' distribution
   -----------------------------------------

   If you downloaded the SGE distribution as 'tar.gz' files please login to
   the machine where user root has appropriate file permissions, login as
   user root and execute the following commands:

      # cd <your_sge_root>
      # umask 022
      # gzip -dc sge-<version>-common.tar.gz | tar xvpf -
      # gzip -dc sge-<version>-doc.tar.gz | tar xvpf -
      # gzip -dc sge-<version>-bin-<arch>.tar.gz | tar xvpf -

   Now you should set the file permissions:

      # cd <your_sge_root>
      # util/setfileperm.sh <adminuser> <unixgroup> <your_sge_root>

   where 

      <adminuser>     is the Unix user account under which the spool files
                      by SGE should be created (Please see section (9) for
                      more information about the "adminuser" functionality
                      of SGE).

      <unixgroup>     is the Unix group name.

      <your_sge_root> is the absolute path where your $SGE_ROOT directory
                      is located.

   Example:

      # cd <your_sge_root>
      # util/setfileperm.sh sgeadmin adm `/bin/pwd`


8) Rename the service codine_commd/grd_commd
--------------------------------------------

   If you are using the TCP service 'codine_commd' or 'grd_commd' for
   defining the communication port in /etc/services or your NIS server you
   need to rename the service to 'sge_commd'.


9) Decide about using the 'adminuser' feature
---------------------------------------------

   Since CODINE/GRD 5.0, Grid Engine supports the so called 'adminuser' user
   feature.

   The purpose of this feature is to support to start and run Grid Engine
   daemons on NFS clients where user root has no read/write permissions. 
   For security reasons many sites do not want to export their NFS file
   systems to the NFS clients and granting read/write permissions for user
   root. If you want to configure Grid Engine to use the 'adminuser', you
   should now carry out the following steps:

   a) create the 'adminuser' account on all your Grid Engine hosts (qmaster
      and execution hosts) or create the 'adminuser' in your NIS passwd
      database.

      The recommended username is 'sgeadmin'. The password for the
      'adminuser' should have the same protection as the password for user
      root. The password for the SGE 'adminuser' may not be given to any
      users who are not entitled to get the 'root' password on your systems.

   b) login as user root on your NFS file server or on a NFS client where
      user root has read/write permissions in the $SGE_ROOT directory

   c) if you did not install the Grid Engine 5.3 distribution with 'pkgadd'
      on Solaris please run the script:

         $SGE_ROOT/util/setfileperm.sh <adminuser> <group> <your_sge_root>

      to set the file permissions of your Grid Engine distribution

   d) edit your global cluster configuration file

        $SGE_ROOT/<cell>/common/configuration

     and edit the configuration entry

        'admin_user'

     and enter the name of the 'adminuser' , e.g.

        admin_user   sgeadmin

   e) change (recursively) the owner of the following directories

         $SGE_ROOT/<cell>/common/
         <qmaster_spool_directory>
         <execd_spool_directory>/hostname

      to the 'adminuser', e.g

         # chown -R sgeadmin $SGE_ROOT/default/common
         # chown -R sgeadmin $SGE_ROOT/default/spool/*

      If the spool directories of your execution daemons are not installed
      in the default location $SGE_ROOT/default/spool/<hostname> or if they
      are installed on a local file system you need to login to every
      execution host and change the owner of the execution daemon spool
      directory.


10) Running the update script
----------------------------- 

   10.1 Login as user root or as 'adminuser'
   ----------------------------------------

   If you are using the 'adminuser' feature, please login as the 'adminuser'
   or run the update script with the "adminrun" command as outlined above.


   10.2 Set SGE_ROOT and environment variables
   -------------------------------------------

   Please now set your SGE_ROOT variable and SGE_CELL variable if
   applicable (if you used the default cell name 'default' this is not
   necessary) and start the upgrade script. Make sure that the setting of
   $SGE_ROOT does not contain any automounter prefixes like '/tmp_mnt' and
   that $SGE_ROOT is set to a value that this directory can be accessed from
   all your execution and submit hosts. The variable COMMD_PORT should be
   set if you are not using the service 'sge_commd'. If you are using the
   COMMD_PORT variable, please make sure to use an unused reserved port
   number.

      # SGE_ROOT=<your_sge_root>, export SGE_ROOT (mandatory)
      # SGE_CELL=<yourcell>; export SGE_CELL      (depends on installation)
      # COMMD_PORT=<portnumber>                   (depends on installation)


   10.3 Run the update script
   --------------------------

      # cd $SGE_ROOT
      # util/sge_update.sh
           or
     # $SGE_ROOT/utilbin/<arch>/util/sge_update.sh    

   The script will ask you about your current version of CODINE/GRD/SGE
   and if you want to upgrade to Grid Engine (SGE) or Grid Engine
   Enterprise Edition (SGEEE).

   The script will modify and delete the following files:

   In $SGE_ROOT/<cell>/common
   --------------------------

      Delete: 
         codine5|grd5
         history/
         license
         qsi/
         statistics

      Rename:
         codine_aliases|grd_aliases --> sge_aliases
         cod_request                --> sge_request (if exists)
         
      Update:
         configuration
         product_mode
         settings.csh
         settings.sh
  
      Create:
         rcsge   (the new startup script which replaces codine5|grd5)


   In <qmaster_spool_directory>
   ----------------------------
      Delete:
         messages
         jobs/
         job_scripts/
         zombies/
         schedd/messages
         
      Update:
         complexes/queue
         exec_hosts/
         schedd/

   A copy of your changed files and directories will be saved in 

      <sge_root>/<cell>/common/<YYYYmmdd-hh:mm:ss>


11) Installing the new startup script locally an all execution hosts
--------------------------------------------------------------------

   You should now remove the old per machine startup script and install the
   new script. Depending on your operating system the startup script is
   installed in /etc/init.d/ or /sbin/init.d/ with a symbolic link in the
   corresponding 'rc2.d' or 'rc3.d' directory.

   Please call the script

      # cd $SGE_ROOT
      # util/update_commands/sge_startupscript.sh

   to

      - remove your old startup script
      - add the new startup script

   On IBM AIX and on Cray Unicos this procedure is not supported.

   The procedure will fail to delete the old startup scripts if you copied
   or renamed the script.

   On Linux systems with the "/sbin/insserv" binary (e.g. SuSe 7.1 or
   higher) the script will install the new startup script with that
   mechanism (and try to remove the old startup script with 
   "/sbin/insserv -r".

   You need to replace the startup script on all exec hosts. If you have
   access for user root without providing a password from one of your
   machines you may use the script

      # cd $SGE_ROOT
      # util/sgeremoterun -all -- util/update_commands/sge_startupscript.sh

   to login to all hosts for which an execution host in your qmaster spool
   directory is configured and run the script in the command line.

   'sgeremoterun' by default uses 'rsh', with the parameter '-ssh' remoterun
   will use 'ssh' to login to the remote host.

   Call "sgeremoterun" without any parameters to see all supported command
   line options:

   usage: util/sgeremoterun [-noexec] [-ssh] [-all] [-noqmaster] \
                            [host1]... -- command

      -noexec    do nothing, just print what would be done
      -ssh       uses "ssh" instad of "rsh"
      -sshpath   path where ssh is installed of not in /bin:/usr/bin:/usr/local/bin
      -all       run command on all execution hosts found in qmaster spool dir.
      -noqmaster do not run command on current qmaster host


12) Restart SGE
---------------

   You should now restart your qmaster daemon:

      # $SGE_ROOT/$COD_CELL/common/rcsge -qmaster

   If your qmaster hosts is also an execution host you can also start the
   execution daemon:

      # $SGE_ROOT/$COD_CELL/common/rcsge -execd

   To start SGE execution daemons on all your hosts you may use the command
   'sgeremoterun':

      # cd $SGE_ROOT
      # util/sgeremoterun -all $SGE_ROOT/$COD_CELL/common/rcsge -execd

13) Upgrading from SGE(EE) 5.3beta2
-----------------------------------

   An upgrade from SGE(EE) 5.3 beta2 to SGE(EE) 5.3 is possible by changing
   the binaries only.

   There may be pending jobs in the system.

   There also may be (most types) of running jobs in the system when special
   care for the 'sge_shepherd' binary is taken. However it is not supported
   to have any running jobs of the following types in the system:

          - qmake
          - qrsh 
          - qtcsh
          - qlogin
          - tightly integrated parallel jobs
 
   Upgrade
   -------

   1) Make a backup of your old binaries, distribution and configuration
   2) shutdown your cluster, make sure there are also no more running
      sge_commd's:

           # qconf -ke all -ks -km
             (wait one minute)

           On qmaster host:
           # $SGE_ROOT/util/shutdown_commd.sh -all

      Verify that no SGE daemons are running. Do NOT kill the sge_shepherd's
      if there are running jobs.

   3) Rename with the "mv" command your shepherd binaries. DO NOT COPY the
      binary!

          # cd $SGE_ROOT/bin
          # mv <arch>/sge_shepherd <arch>/sge_shepherd.sge53b2
          
   4) Unpack the distribution

   5) Set the file permissions with

         $SGE_ROOT/util/setfileperm.sh

   6) Restart SGE(EE) 5.3 on your qmaster host and your execution hosts
      (see 12 above).


14) Installing a SGE(EE) 5.3 patch
----------------------------------

   If you are installing a patch release you don't need to read the
   following section.

   14.1 Introduction
   -----------------

   There are two types of patches available:

      - patch in "tar.gz" format

      - patch in Sun Microsystems patch format to be installed with
        "patchadd"

   A patch in "tar.gz" will usually contain all binaries, even those which
   where not changed by the patch. A patch in "patchadd" format will only
   contain the files which where changed by the patch.

   14.1 Patch installation
   -----------------------

   These installation instructions assume that you are running a homogenous
   Sun Grid Engine cluster where all hosts share the same directory for the
   binaries. If you are running Sun Grid Engine in a heterogenous
   environment (mix of 32-bit and 64-bit binaries for Solaris and/or other
   operating systems) it is only necessary to shutdown the daemons for the
   architecture for which the patch is applied. If you installed the
   binaries on a local partition, you only need to stop the SGE daemons for
   that host on which you are installing the patch.

   Before you unpack the patch in "tar.gz" or install the patch
   with "patchadd" on Solaris you need to carry out the following steps.

   By default there may by no running jobs when the patch is installed.
   There may pending batch jobs, but no pending interactive jobs (qrsh,
   qmake, qsh, qtcsh).

   It is possible to install the patch with running batch jobs. To avoid a
   failure of the active "sge_shepherd" binary it is necessary to move the
   old shepherd binary (and copy it back prior the installation of the
   patch).

   In no case it is supported to install the patch with running interactive
   jobs or with running qmake jobs or other jobs which use the tight
   parallel integration support of SGE.

   If the patch contains a new "sge_commd" binary (this is of course always
   the case if you are installing the patch in "tar.gz" format) you also
   need to move away the old "sge_commd" binary.

   Stopping the Sun Grid Engine cluster to start jobs
   --------------------------------------------------

   Disable all queues that no new jobs are started:

      # qmod -d '*'

   Optional (only needed of there are running jobs which should continue to
   run when the patch is installed:

      # cd $SGE_ROOT/bin
      # mv solaris64/sge_shepherd solaris64/sge_shepherd.sge53
      # cp -p solaris64/sge_shepherd.sge53 solaris64/sge_shepherd

   It is important that the binary is first moved and then copied back to
   the original location using the "-p" switch of the cp command.


   Shutting down the Grid Engine qmaster, scheduler and communication daemon
   -------------------------------------------------------------------------

   You need to shutdown (and restart) the qmaster and scheduler daemon and
   all execution daemons on all SGE hosts. It is only necessary to shutdown
   the communication daemons (sge_commd) of the patch contains a new
   "sge_commd" binary

   Shutdown your execution hosts and qmaster/scheduler:

      # qconf -ke all
      (wait 30 seconds)
      # qconf -ks
      # qconf -km
      
   If patch contains the "sge_commd" binary you need to shutdown all
   communication daemons to avoid a crahs of the running "sge_commd" when
   the new binary is installed. Login to your qmaster machine as user root
   and enter:

      # $SGE_ROOT/util/shutdown_commd.sh -all

   Now please verify with the 'ps' command that the qmaster and scheduler
   daemon (sge_qmaster, sge_schedd) and the execution daemon (sge_execd) and
   communication deamons (sge_commd) on all your hosts are stopped.


   Installing the patch and restarting Sun Grid Engine
   ---------------------------------------------------

   Now please install the patch with 'patchadd' or by unpacking the "tar.gz"
   files in $SGE_ROOT.

   After installing the patch you need to restart your SGE cluster. Please
   login to your qmaster machine and enter:

      # /etc/init.d/rcsge  (or the location of the startup script on your
                            operating system)

   Now you should repeat this step on all your execution hosts.

   After restarting SGE you may again enable your queues:

      # qmod -e '*'

   If you renamed the shepherd binary you may safely delete the old binary
   when all jobs finished which where running prior the patch installation.


14) Copyright
-------------
   (c) 2002 Sun Microsystems, Inc. Use is subject to license terms.
