Bobsced Cluster

From Earlham Cluster Department

Jump to: navigation, search

Contents

Todo

Howtos

Updating nodes to be kickstarted & adding new packages

Adding post install scripts to kickstart

Using 411 tools

cluster-fork

disabling reinstall (kickstart) after hard reset

Generating an up-to-date machinefile

Ethernet:

cluster-fork /sbin/ifconfig -a | grep -1 Ethernet | awk '{printf("%s slots=4\n",$2)}' | cut -d : -f 2 > bs-eth-hosts

Infiniband:

cluster-fork /sbin/ifconfig -a | grep -1 UNSPEC | awk '{printf("%s slots=4\n",$2)}' | cut -d : -f 2 > bs-ib-hosts 

Notice that the only difference is the search field in the first grep command. UNSPEC here refers to Infiniband.

General Info

NIS Importing

http

/cluster

/cluster/bobsced/etc/

Known Error Messages

Please refer here if you encounter an error message on BobSCEd that you cannot handle.

RLIMIT_MEMLOCK

$./area_mpi
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
This will severely limit memory registrations.--------------------------------------------------------------------------
The OpenFabrics (openib) BTL failed to initialize while trying to
allocate some locked memory.  This typically can indicate that the
memlock limits are set too low.  For most HPC installations, the
memlock limits should be set to "unlimited". The failure occured
here:
 Local host:    bobsced0
 OMPI source:   btl_openib_component.c:1040
 Function:      ompi_free_list_init_ex_new()
 Device:        mthca0
 Memlock limit: 32768
You may need to consult with your system administrator to get this
problem fixed.  This FAQ entry on the Open MPI web site may also be
helpful:
   http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.
 Local host:   bobsced0
 Local device: mthca0
--------------------------------------------------------------------------

This error is specifically related to infiniband on BobSCEd. If you're using ethernet, specify as such by creating a file,

~/.openmpi/mca-params.conf

with the first line being

btl = ^openib

This will tell BobSCEd not to try infiniband, and will stop the error. If you are using Infiniband and find a solution to this error, please place it in the wiki.

References

Rocks Documentation

Troubleshooting Platform Open Cluster Stack (OCS) and Platform Lava

411 Tools

RHEL

Personal tools
Namespaces
Variants
Actions
websites
wiki
this semester
Toolbox