Use Linux Cluster

This page contains an introduction to the FIMM Linux cluster and how to get your jobs distributed among the other nodes.

Connecting to the Cluster

Please refer to this page for connection information.

Basic Commands and Usage of the Cluster

Now that you've logged in, there are a number of things that you can do. Here are some of the basics that you should be familiar with.

How to use the Linux cluster?

Our Linux cluster is using N1GE load sharing system. When you log into our Linux system, your session is running on ssh.fimm.fi. However, in order to have your jubs running on the actual calculation nodes, you must submit your jobs using the N1GE system.

The easiest way to do this is to use program grun we made for this. Just prefix any normal command with the grun command and omit the STDOUT etc. redirection. E.g. to run a chrmatcher job type:

grun chrmatcher MySeq.fasta human 96 1.

This will submit your job to the N1GE system and it will be run on the first available calculation node. For other options of the grun just given command grun with no parameters. You can always get a listing of your jobs by the qstat command.

NOTE! Because the ssh.fimm.fi is shared by all interactive logins to our Linux cluster, it must not be used for running any longer jos (taking more than a minute). If long jobs are running on ssh.fimm.fi our sys admins may have to kill them in order to keep the server available for interactive logins. Just use grun to submit your long jobs!

Working with the Linux Command Line

Some of the installed tools (especially bioinformatics tools) are only available from the Linux command line interface.

Once you open a connection to Linux command line you'll be able to run all of the command-line-based tools. Usually each tool will have a specific command name associated with it. For example, in order to use GeneHunter, you can enter the command 'gh'. You may have to look online for documentation that's specific to each tool.

The choice is yours whether you want to use a graphical Linux desktop or a text terminal Command prompt connection. They are both a means of access to the same system, FIMM Linux cluster. The graphical interface gives you access to all the graphical programs in addition to the programs run on the command prompt.

Submitting Jobs to the Cluster

By default, jobs run when you log in will only run on ssh.fimm.fi. Jobs run in this manner will be given a low priority, and the machine will likely be busy.

Jobs can be submitted to the cluster in order to have them run on an optimal node. It is recommended that all jobs lasting more than a second or two be submitted to the cluster in the manner described below.

How to Submit Jobs to the Cluster

The command grun is used to submit jobs to the cluster. In order to use it, just prefix your desired command with grun .

For example:

Your desired command is:

  • $ merlin -p merlin.ped -d merlin.dat -m merlin.map --vc --quiet --pdf

In order to run the same command, but utilizing the cluster:

  • $ grun merlin -p merlin.ped -d merlin.dat -m merlin.map --vc --quiet --pdf

If you want to give a specific name for your job, use the "-n" parameter with grun, e.g.:

  • $ grun -n MyMerlinJob merlin -p merlin.ped -d merlin.dat -m merlin.map --vc --quiet --pdf

Monitoring Job Status

The command qstat will list your jobs, and their status. Under the status column, you'll see a character that represents the status.

The status can be one of the following:

  • Q -- Queued. The job hasn't started running yet. Currently the cluster is too busy. Your job will run when resources are available.
  • R -- Running. The job is currently running. You can use the command $ qstat -n to see where it is running.
  • E -- Finished (Ended). The job has completed running. The E status will remain displayed in the queue for informational purposes. You can work with the results as you would usually with a finished job.

Visual Report of Cluster Status (For Experts)(disabled for now)

We use Ganglia to monitor the status of the cluster. You can access the ganglia pages, which are only available within the cluster, by using SSH port forwarding. Here is an example:

ssh -L18080:localhost:80 myusername@ssh.fimm.fi

Once that port is forwarded, you can use your web browser to view the pages at the following url: http://localhost:18080/ganglia-webfrontend/ (not working right now).

Getting information about jobs

Here's some useful commands for displaying jobs:

  • qjobs - displays all running jobs on all nodes
  • qstat -g c - displays queues, load and available slots. Note: only all.q is active now.
  • status -a  - displays all running jobs
  • qhost - display nodes and information about their status

Job Status Report Notifications

Notifications about job status (e.g. completed or failed jobs) are sent to your email account, which you have given to FIMM. By default, you will get an email when the submitted job starts, when it ends and if for any reason, you job fails.

Deleting Jobs

Both running Jobs and jobs that are queued (either R or Q status) can be deleted using the command:

  • qdel x

Where x is the job identifier ( an integer ) listed by qstat. You may delete all your jobs by command

  • qdel -u

Managing Job Output

If your job produces output by creating new files, or modifying existing files, then using the cluster will be no different than running the commands normally.

If your command produces output to standard out or standard error (i.e. what would normally show on the screen if the command is run normally), then the contents of the output will be written to files in your home directory.

The naming pattern for those files is as follows:

  • standard out: .Opteronix.OU e.g. 414.Opteronix.OU
  • standard err: .Opteronix.ER e.g. 414.Operonix.ER

If your job fails

If your job fails, you will get by default an error email. Please see the standard error file (.ER) produced for hints why the job failed. You may also want to first try to run your job interactively (i.e. omitting the grun -n etc. command prefix), if your jobs starts ok, then you can stop it (Ctrl-C) an submit it to the cluster. Should you need help, please contact FIMM by emailing to bbu-support helsinki.fi.
References

  • Sun N1 Gridengine Manual . This is an in-depth reference on GridEngine, the job management system used to submit jobs to the cluster. Consult this documentation if you need advanced command options.