This is a list of problems and solutions that could be encountered by an end-user when using HPC Gateway. This list is not exhaustive and will be enhanced based on the returns of HPC Gateway usage. It is highly recommended that you build and share your own trouble shooting wiki page in your local area for your end users as well.

You can also consult the Administrator's TroubleShooting.



Problem: There is no cluster available in the cluster list

Contact your HPC Gateway administrator to check the mongoDB content. He might need to restart the mongo process.



Problem: The application status stays in REGISTERED

Contact your HPC Gateway administrator to check the status of the cluster agent. He might need to restart it.



Problem: I get a "Failed to submit the jobs" message in the task monitor

Open the task monitor. The Message “Failed to submit the jobs”, means that the submission to the batch system has failed.

 Failed to submit the jobs message

Review the full message by getting the tooltip or (better) opening the Task Details window (double click on the task) and looking in the Execution Details

Batch system resources are not available - change the request in Scheduler Option specified to the scheduler is not valid, Fix it

If the message ends like: qsub: cannot connect to server … connection refused Contact your HPC Gateway administrator to check the status of the batch system. He might need to restart it.

If the message is related to batch system options, check the options you have set into the Scheduler section and fix them.



Problem: The application status is FAILED

Open the task monitor and check the Exit status and Message column. This should indicate a first reason why the application status is FAILED.

Check the run directory

To get more detailed application information, open the Task Details window, then open the run directory (RunDir) located in the left panel under Application Paramaters. The application will have put application logs in this directory. You can open and review the application messages in the logs with the Notepad.

 Open the file explorer to the application run directory

Check the runlog directory

To get full information including batch system logs, open the Task Details window, then open the run directory (RunDir) located in the right panel under Execution Details.

 Open the file explorer to the system run log directory

Basically, the run log directory contains:

  • batch_detail.log : output of batch_detail command
  • batch_submit.log : output of batch_submit command
  • <xxx>.<phase>.batch : full script that is submitted to the batch system
  • <xxx>.<phase>.batch.out : script execution output
  • <xxx>.<phase>.eo<jobid> : batch system output
  • <xxx>.<phase>.out : application output - this is the output shown in the Task Details window
  • <xxx>.<phase>.status : application status - this is the status shown in the Task Monitor window

You can open and review the system messages in the logs with Notepad. If you see error messages, contact your HPC Gateway administrator and send him these system log files.



Problem: The upload doesn't work correctly with Internet Explorer 11 (IE11)

The file upload on IE 11 fail, but the upload is successfully done. This is a well-known Windows configuration problem.

To successfully file upload on IE 11, you need to create a new key application/json in the registry.

  • Open regedit tool
  • Go to Computer\HKEY_CLASSES_ROOT\MIME\Database\ContentType
  • Create a New key: application/json
  • Add a New String value: CLSID with a value of {25336920-03F9-11cf-8FD0-00AA00686F13}
  • Add a New DWORD value: Encoding with a value of 80000 (hexadecimal)

Windows registry



Problem: A user cannot use File Explorer or execute a script

One of the most frequent problem is the bad configuration of SSH for a given user. As a consequence the system cannot act as the user and cannot do anything in his name.

The causes can be :

  • The user does not exist in Unix. Make sure the user can SSH in his name to the system.
  • The user has not granted the SSH access to Gateway. Make sure hpcgadmin can SSH in the name of the user (ssh -i)
  • The .bashrc display some message in not interractive mode. Make sure the user can use SFTP.



Problem:

Solution



If followed as part of the training programme return to topics page.