Command Line Interface Guide

HPC Gateway provides a set of command line to perform multiple tasks, like checks, administrations, etc … These command are mostly written in python and are located in 2 areas:

  • $HPCG_HOME/core/bin
  • $HPCG_HOME/core/sys

HPC Gateway environment must be sourced before to execute one command.

One can list all commands available:

$ source /opt/hpcg/core/etc/profile.sh
$ hpcg     (press Tab-Tab to get completion)
hpcg.sh                           hpcg_backup.py                    hpcg_cluster_update_utilities.py  hpcg_users_list.py
hpcg_app_delete.py                hpcg_cluster_set_scheduler.py     hpcg_gridfs_delete.py             hpcg_watch_log.sh
hpcg_app_export.py                hpcg_cluster_update_all.py        hpcg_gridfs_list.py               hpcg_watch_monitor.sh
hpcg_app_import.py                hpcg_cluster_update_forge.py      hpcg_users_create.py              hpcg_watch_processes.sh
hpcg_app_list.py                  hpcg_cluster_update_lib.py        hpcg_users_init.py                

The most important commands are:

  • hpcg_cluster_set_scheduler.py: configure the batch system for a specific cluster agent
  • hpcg_users_create.py: create and configure a list of users
  • hpcg_app_import.py: import an application package

hpcg_cluster_set_scheduler.py configure the batch system for a specific cluster agent. If the scheduler is not specified, the command try to guess it from default directories. Supported batch systems are:

  • Torque
  • PBS-pro
  • SGE
  • Slurm

- Sample of setting SGE batch system

$ source /opt/hpcg/core/etc/profile.sh
$ $HPCG_HOME/core/sys/hpcg_cluster_set_scheduler.py 
usage: hpcg_cluster_set_scheduler.py --help --cluster=<cluster_name> --scheduler=<scheduler_name> --directory=<scheduler_directory>
usage: hpcg_cluster_set_scheduler.py -h -c <cluster_name> -s <scheduler_name> -d <scheduler_directory>

$ $HPCG_HOME/core/sys/hpcg_cluster_set_scheduler.py -s sge -d $HPCG_HOME/core/cluster/tpl/mediators/sge
2015/12/08 14:01:29 - INFO  - CLUSTER_NAME = hcs01
2015/12/08 14:01:29 - INFO  - SCHEDULER_NAME = sge
2015/12/08 14:01:29 - INFO  - SCHEDULER_DIRECTORY = /opt/hpcg/core/cluster/tpl/mediators/sge
2015/12/08 14:01:29 - INFO  - Put gridfs files in mediators directory for hcs01
removed all instances of 'clusters/hcs01/mediators/batch_monitor' from GridFS
added file: clusters/hcs01/mediators/batch_monitor
removed all instances of 'clusters/hcs01/mediators/batch_commands.json' from GridFS
added file: clusters/hcs01/mediators/batch_commands.json
removed all instances of 'clusters/hcs01/mediators/batch_detail' from GridFS
added file: clusters/hcs01/mediators/batch_detail
removed all instances of 'clusters/hcs01/mediators/batch_output.txt' from GridFS
added file: clusters/hcs01/mediators/batch_output.txt
removed all instances of 'clusters/hcs01/mediators/batch_status' from GridFS
added file: clusters/hcs01/mediators/batch_status
removed all instances of 'clusters/hcs01/mediators/batch_release' from GridFS
added file: clusters/hcs01/mediators/batch_release
removed all instances of 'clusters/hcs01/mediators/batch_options.201512071733.json' from GridFS
added file: clusters/hcs01/mediators/batch_options.201512071733.json
removed all instances of 'clusters/hcs01/mediators/batch_submit' from GridFS
added file: clusters/hcs01/mediators/batch_submit
2015/12/08 14:01:29 - INFO  - Configure scheduler options for hcs01
2015/12/08 14:01:29 - INFO  - 
2015/12/08 14:01:29 - INFO  - Cluster agent need to be restarted to get new files !
2015/12/08 14:01:29 - INFO  - 

Note: that the Cluster Agent must be restarted after the command execution to take into account the changes.

$ hpcg.sh -s restart -l cluster -c vijay
=> Signal to stop cluster ...
Stopping cluster hcs01 [pid=29261]: OK


=> List hpcgadmin processes ...
=> ps -edf | grep -i /opt/hpcg
10020     1194  8620  0 14:03 pts/2    00:00:00 /bin/sh /opt/hpcg/core/bin/hpcg.sh -s restart -l cluster -c vijay
...
=> Signal to start cluster ...
/opt/hpcg/external/jdk1.8.0_60/bin/java -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=7007 
-Dtorii.agent.cluster.name=hcs01 -Djava.util.logging.config.file=/opt/hpcg/repo/etc/logging.hcs01.properties 
-Dtorii.mongo.host.name=hcs01 -Dtorii.mongo.host.port=27017 
-jar /opt/hpcg/live/clusters/hcs01/lib/cluster-agent-jar-with-dependencies.jar
Listening for transport dt_socket at address: 7007
OK Tue Dec  8 14:03:30 GMT 2015 (pid=1275)


=> List hpcgadmin processes ...
=> ps -edf | grep -i /opt/hpcg
10020     1194  8620  0 14:03 pts/2    00:00:00 /bin/sh /opt/hpcg/core/bin/hpcg.sh -s restart -l cluster -c vijay
...

- Algorithm for automatic detection

The algorithm for automatic detection is:

    if os.path.isfile("/var/spool/torque/server_name"):
        batch = "torque"
        common.log("Found torque batch system (-f /var/spool/torque/server_name)")
        blist = blist + " " + batch
        found += 1
    elif os.path.isfile("/etc/pbs.conf"):
        batch = "pbspro"
        common.log("Found pbspro batch system (-f /etc/pbs.conf)")
        blist = blist + " " + batch
        found += 1
    elif os.path.isdir("/opt/sge"):
        batch = "sge"
        common.log("Found sge batch system (-d /opt/sge)")
        blist = blist + " " + batch
        found += 1
    elif os.path.isfile("/etc/slurm/slurm.conf"):
        batch = "slurm"
        common.log("Found slurm batch system (-f /etc/slurm/slurm.conf)")
        blist = blist + " " + batch
        found += 1

    if found == 0:
        common.log_warning("No batch system found. Install dummy batch system.")
    elif found > 1:
        common.log_warning("More than 1 batch system has been found: " + blist)
        common.log_warning("Batch system selected is " + batch)
        common.log_warning("Use the -s option to force a specific batch system")

If the auto-detection fails, because the batch system is not installed in default directory for example, then it is necessary to setup the batch system manually, using the hpcg_cluster_set_scheduler command.

- Update batch system queues listed in HPC Gateway queue dropdown lists

HPC Gateway web interface presents in several panels a queue dropdown list based on queues configured within the Cluster defined batch system. I you reconfigure your batch system in order to add or remove a queue, you must update HPC Gateway queue list to reflect the configuration of your batch system configuration using the following command as hpcgadmin:

    $ whoami
    hpcgadmin
    $ source /opt/hpcg/core/etc/profile.sh
    $ hpcg_cluster_update_queues.py 

    Fujitsu - HPC Gateway - queue update tool  - V1.0
    -------------------------------------------------

    Host         : pachinko
    Job manager  : PBS

    Queue list   :
                 : workq
                 : superfast

    Update of DB successful

hpcg_users_create.py create 1 or more user(s) on the system and configure it in the HPC Gateway database.

$ source /opt/hpcg/core/etc/profile.sh
$ hpcg_users_create.py user [user_list]

hpcg_app_import.py import an application pakage in HPC Gateway. By default, the imported application is owned by hpcgadmin.

$ source /opt/hpcg/core/etc/profile.sh
$ hpcg_app_import.py  <path_to_application_package>
$ hpcg_app_import.py  <path_to_folder_containing_application_definition>