Differences

This shows you the differences between two versions of the page.

Link to this comparison view

fujitsu:hpcgateway:guides:faq:troubleshooting_user [2018/12/10 18:05] (current)
Line 1: Line 1:
 +===== User's TroubleShooting =====
 +
 +This is a list of problems and solutions that could be encountered by an end-user when using HPC Gateway. This list is not exhaustive and will be enhanced based on the returns of HPC Gateway usage. It is highly recommended that you build and share your own trouble shooting wiki page in your local area for your end users as well.
 +
 +
 +You can also consult the [[fujitsu:​hpcgateway:​guides:​faq:​troubleshooting_admin|Administrator'​s TroubleShooting]].
 +
 +
 +\\
 +----
 +==== Problem: There is no cluster available in the cluster list ====
 +Contact your HPC Gateway administrator to check the mongoDB content. He might need to restart the mongo process.
 +
 +
 +\\
 +----
 +==== Problem: The application status stays in REGISTERED ====
 +Contact your HPC Gateway administrator to check the status of the cluster agent. He might need to restart it.
 +
 +
 +\\
 +----
 +==== Problem: I get a "​Failed to submit the jobs" message in the task monitor ====
 +
 +Open the task monitor. The Message "​Failed to submit the jobs", means that the submission to the batch system has failed.
 +
 +{{ :​fujitsu:​hpcgateway:​guides:​faq:​submission_error_000.jpg?​direct&​400 | Failed to submit the jobs message}}
 +
 +
 +Review the full message by getting the tooltip or (better) opening the Task Details window (double click on the task) and looking in the Execution Details
 +
 +{{:​fujitsu:​hpcgateway:​guides:​faq:​submission_error_message_000.jpg?​direct&​200|Batch system resources are not available - change the request in Scheduler}}
 +{{:​fujitsu:​hpcgateway:​guides:​faq:​submission_error_message_001.jpg?​direct&​200|Option specified to the scheduler is not valid, Fix it}}
 +{{:​fujitsu:​hpcgateway:​guides:​faq:​submission_error_message_002.jpg?​direct&​200|}}
 +
 +
 +If the message ends like: qsub: cannot connect to server ... connection refused
 +Contact your HPC Gateway administrator to check the status of the batch system. He might need to restart it.
 +
 +If the message is related to batch system options, check the options you have set into the Scheduler section and fix them.
 +
 +
 +\\
 +----
 +==== Problem: The application status is FAILED ====
 +Open the task monitor and check the Exit status and Message column. This should indicate a first reason why the application status is FAILED. ​
 +
 +=== Check the run directory ===
 +To get more detailed application information,​ open the Task Details window, then open the run directory (RunDir) located in the left panel under Application Paramaters. The application will have put application logs in this directory. You can open and review the application messages in the logs with the Notepad.
 +
 +{{ :​fujitsu:​hpcgateway:​guides:​faq:​task_details_rundir.jpg?​direct&​400 | Open the file explorer to the application run directory}}
 +
 +
 +=== Check the runlog directory ===
 +To get full information including batch system logs, open the Task Details window, then open the run directory (RunDir) located in the right panel under Execution Details.
 +
 +{{ :​fujitsu:​hpcgateway:​guides:​faq:​task_details_runlog.jpg?​direct&​400 | Open the file explorer to the system run log directory}}
 +
 +
 +Basically, the run log directory contains:
 +   * batch_detail.log : output of batch_detail command
 +   * batch_submit.log : output of batch_submit command
 +   * <​xxx>​.<​phase>​.batch : full script that is submitted to the batch system
 +   * <​xxx>​.<​phase>​.batch.out : script execution output
 +   * <​xxx>​.<​phase>​.eo<​jobid>​ : batch system output
 +   * <​xxx>​.<​phase>​.out : application output - this is the output shown in the Task Details window
 +   * <​xxx>​.<​phase>​.status : application status - this is the status shown in the Task Monitor window
 +
 +
 +You can open and review the system messages in the logs with Notepad. If you see error messages, contact your HPC Gateway administrator and send him these system log files.
 +
 +\\
 +----
 +==== Problem: The upload doesn'​t work correctly with Internet Explorer 11 (IE11) ====
 +
 +The file upload on IE 11 fail, but the upload is successfully done. This is a well-known [[https://​connect.microsoft.com/​IE/​Feedback/​Details/​793307|Windows configuration problem]].
 +
 +To successfully file upload on IE 11, you need to create a new key **application/​json** in the registry.
 +
 +  * Open regedit tool
 +  * Go to Computer\HKEY_CLASSES_ROOT\MIME\Database\ContentType
 +  * Create a New key: **application/​json**
 +  * Add a New String value: **CLSID** with a value of **{25336920-03F9-11cf-8FD0-00AA00686F13}**
 +  * Add a New DWORD value: **Encoding** with a value of **80000** (hexadecimal)
 +
 +
 +{{ :​fujitsu:​hpcgateway:​guides:​faq:​error_upload_ie11_registry_setup.png?​direct&​400 |Windows registry}}
 +
 +\\
 +----
 +==== Problem: A user cannot use File Explorer or execute a script ====
 +
 +One of the most frequent problem is the bad configuration of SSH for a given user. 
 +As a consequence the system cannot act as the user and cannot do anything in his name.
 +
 +The causes can be :
 +  * The user does not exist in Unix. Make sure the user can SSH in his name to the system.
 +  * The user has not granted the SSH access to Gateway. Make sure hpcgadmin can SSH in the name of the user (ssh -i)
 +  * The .bashrc display some message in not interractive mode. Make sure the user can use SFTP.
 +
 + 
 +
 +\\
 +----
 +==== Problem: ​ ====
 +Solution
 +
 +
 +\\
 +----
 +If followed as part of the training programme return to [[fujitsu:​hpcgateway:​training:​programme#​fault_analysis_and_diagnosis|topics]] page.