Compute Jobs and Simulations


For a long time, people in MRSEC used public desktops machines like joule, fresnel, alder, and even some more ancient and forgotten machines like romulan, for long-term computational jobs and simulations. Then we setup a small server called pondermatic (which had initially been setup earlier on as an actual cluster) for that purpose, but it was inadequate.

That era has long ago ended. MRSEC has purchased a new computational system called superabacus (named by Jin Wang in 2011), which will replace joule as JFI/MRSEC's server to login, get a UNIX shell environment, handle mail and home directory files, and most importantly, as an informal setting to run simulations. However, there are some things to know about superabacus before you get started.

Who can use superabacus?

Anyone with a JFI account. If you don't have a JFI account, please get one -- see here for information on how to get one and what benefits it offers. Please note that this is not the same thing as the CNet account that NSIT gave you when you came to the University.

What can I do on superabacus?

The short answer is, whatever you used to do on joule and harper when those systems were available, except better, because it's a real cluster with worker nodes and not just a standalone UNIX machine. Superabacus offers access to your JFI home directory, JFI email (alpine, mutt, procmail, etc.), automated nightly backups from our backup server "feynman", and a full UNIX environment with compilers and scripting tools. If you used to do it on harper and/or joule, you can still do it on superabacus.

However, since superabacus is a cluster, if you want to run jobs, there are some things you'll need to know. Rather than run your jobs on superabacus itself, you'll want to submit them through the Sun Grid Engine queueing system. The machine you login on is actually not the work node for most types of computations (there are exceptions though, see below). Sun Grid Engine (SGE) has its own commands for controlling your jobs and seeing the queues.

Much like on the old joule and fresnel, you can also access your daily backups from superabacus via the NFS mounts at /nfs/feynman. See the discussion of backups in JFI for more information.

Changing your password

Just use the passwd command. It will prompt you for your existing password, and then you can follow that by entering (twice) the password you'd like to have. It will be changed for all JFI machines, not just superabacus.

Layout of the System

Superabacus has a login node (or "head node") that faces the network, which is where you will be when you login. This machine is relatively new, so not really that slow, but still it only has 8 processor cores, and so we try to keep the running of jobs on the actual login node to a minimum. Behind it, there are three workers nodes with 48 processor cores and 64GB of memory each, capable of running up to 48 serial jobs or 48 parallel threads per node. These nodes should be used by submitting jobs to them through SGE's queueing system (using SGE commands like qsub). Run jobs through the queueing system, and do not ssh directly into the worker nodes. If I deem that computation on the login machine is getting excessive or is slowing things down for other users, I WILL KILL JOBS/SIMULATIONS until usability is restored.

How to use SGE

To see your own jobs, type: qstat
To see everyone's jobs, type: qstat -f -u "*"

To submit a job, use the qsub command, which has a long manual that's worth reading over briefly (type man qsub to see it). Please note! SGE jobs can never be binary executables (like C or Fortran programs). They are always shell scripts! That means that if you've coded a simulation program that runs as a C or Fortran program, you will still need to create a script wrapper that runs that program from a script. SGE jobs are never bare executables!

The reason is because SGE jobs are controlled by "active comments" in your scripts, which are commented out lines that contain settings for SGE. They are interpreted as code by SGE even though they are commented out. Since these settings are needed, your jobs must always be in the form of a shell script.

SGE active comments

There are many guides on the Internet that discuss how you setup your script environment for an SGE job, but one of the best I've seen is this one from UCLA, which is a pretty good starter guide. Note that SGE has many more commands than these, and they all have man pages for full information.

Also, if you need help, please let us know. We administer SGE clusters for three different research groups here at the JFI, so this is something we deal with a lot.

Some additional information on using SGE:

New Jersey Institute of Technology

University of California at San Diego

Other Locally Installed Applications on Superabacus:

The Message Of The Day ("MOTD") bulletin that you see when you login may mention locally installed or commercially bought applications that are available on superabacus. Most of these are simple to launch and run, but here are some advisories that may apply where needed:

Intel Compilers ("ifort", "icc", "icpc"...)

The Intel compilers need to have a script run in your environment to work correctly. If you're a bash user, you can put this into your ~/.bashrc file (creating it if you don't have one):

# For Intel Fortran/C/C++ Compiler Support:
source /opt/intel/composerxe-2011.0.084/bin/ intel64

If you're a tcsh user, use this in your ~/.cshrc instead (creating the file if you don't have one). Note the conditional test below is needed by tcsh to make this run only if you're on an interactive session. If you already have a test block like this in your .cshrc file, just put the source line inside it:

# For Intel Fortran/C/C++ Compiler Support:
if ($?prompt) then
source /opt/intel/composerxe-2011.0.084/bin/ intel64

Exceptions for Computation on the Login Machine

Some of the programs that we may have installed on superabacus, such as Matlab and Mathematica, may use a graphical user interface (rather than command line shell), and require you to connect to the system with a command like ssh -Y superabacus to allow graphics windows from superabacus to be opened on your desktop. These types of programs can also be troublesome to run on the worker machines behind the login node, especially if your program relies on this type of graphics intrinsically. Computation on the login node is allowed for programs like this, like Matlab, Mathematica, etc. I will also allow light computation of other kinds as long as it does not use up all 8 of the login node's processor cores or slow down the system for other users. Please use the SGE queueing system to run managed jobs on the worker nodes instead if you can.

Future Expansion

Currently, we have only three 48-core worker nodes behind the login/head node. However it is possible for us to add more, and they do not all necessarily even have to be the same type of machine. It is likely that at some point we will put whatever spare computational resources we have available behind the front machine and make them too available as worker nodes. SGE's commands allow users to specify which types of machines they would like their jobs to run on, for cases where users may have a preference about this.