Cluster: New BobSCEd Install Log

From Earlham Cluster Department

(Difference between revisions)
Jump to: navigation, search
(Cluster Image Review)
(Cluster Image Review)
 
(3 intermediate revisions not shown)
Line 17: Line 17:
* MPICH1 installed with <code>./configure --prefix=/mounts/bobsced/usr/local/modules-sw/mpich1/1.2.7p1</code>
* MPICH1 installed with <code>./configure --prefix=/mounts/bobsced/usr/local/modules-sw/mpich1/1.2.7p1</code>
** Make note: to uninstall - /mounts/bobsced/usr/local/modules-sw/mpich1/1.2.7p1/sbin/mpiuninstall
** Make note: to uninstall - /mounts/bobsced/usr/local/modules-sw/mpich1/1.2.7p1/sbin/mpiuninstall
-
** It looks to /mounts/bobsced/usr/local/modules-sw/mpich1/1.2.7p1/share/machines.ARCH instead of just running on the host machine if run without a machine file.  I removed this file to force people to use a machinesfile... that's going to have to be generated from PBS as part of the qsub script.
+
** It looks to /mounts/bobsced/usr/local/modules-sw/mpich1/1.2.7p1/share/machines.ARCH instead of just running on the host machine if run without a machine file.  I removed this file to force people to use a machinesfile... that's going to have to be generated from PBS as part of the qsub script. <font color="green">Gave up and added it back in until I finish writing a qsub script that incorporates it</font>
* MPICH2 installed with
* MPICH2 installed with
:<code>./configure --prefix=/mounts/bobsced/usr/local/modules-sw/mpich2/2.1.2 --enable-cxx --enable-f90 --enable-f77 --enable-threads=multiple --with-thread-package=posix</code>
:<code>./configure --prefix=/mounts/bobsced/usr/local/modules-sw/mpich2/2.1.2 --enable-cxx --enable-f90 --enable-f77 --enable-threads=multiple --with-thread-package=posix</code>
Line 176: Line 176:
| align="center" style="background:#f0f0f0;"|'''Notes'''
| align="center" style="background:#f0f0f0;"|'''Notes'''
|-
|-
-
| mpich1||OK, except allocates 1 process to bs0||OK, except allocates 1 process to bs0||OK||OK||/mounts/bobsced/usr/local/modules-sw/mpich1/1.2.7p1/share/machines.LINUX||Creates temporary file PIxxxxx while running under qsub; MPI does not respect qsub # of nodes given
+
| mpich1||OK, except allocates 1 process to bs0||OK, except allocates 1 process to bs0||OK||OK||/mounts/bobsced/usr/local/modules-sw/mpich1/1.2.7p1/share/machines.LINUX||Creates temporary file PIxxxxx while running under qsub; MPI does not respect qsub nodes given, could get around this by creating machinefile at runtime
|-
|-
| mpich2-osc||N/A||N/A||OK, MUST specify # nodes, ppn||OK, MUST specify # nodes, ppn||Generated by PBS||Currently uses OSC's pbs-specific mpiexec (cannot be run outside of qsub)
| mpich2-osc||N/A||N/A||OK, MUST specify # nodes, ppn||OK, MUST specify # nodes, ppn||Generated by PBS||Currently uses OSC's pbs-specific mpiexec (cannot be run outside of qsub)
Line 182: Line 182:
| mpich2 ||OK, must setup mpd ring||OK, must set up mdp ring||OK, must set up mdp ring||OK, must set up mdp ring||N/A (mpd ring)||Requires chmod 600 .mpd.conf in home directory with MPD_SECRETWORD=somevalue; setup ring with <code><nowiki>sort $PBS_NODEFILE | uniq -c | awk '{print $2":"$1}' > /cluster/home/kwanous/tmp/mpd.nodes; mpdboot -f /cluster/home/kwanous/tmp/mpd.nodes -n 2</nowiki></code>
| mpich2 ||OK, must setup mpd ring||OK, must set up mdp ring||OK, must set up mdp ring||OK, must set up mdp ring||N/A (mpd ring)||Requires chmod 600 .mpd.conf in home directory with MPD_SECRETWORD=somevalue; setup ring with <code><nowiki>sort $PBS_NODEFILE | uniq -c | awk '{print $2":"$1}' > /cluster/home/kwanous/tmp/mpd.nodes; mpdboot -f /cluster/home/kwanous/tmp/mpd.nodes -n 2</nowiki></code>
|-
|-
-
| openmpi 1.*||OK (can send machinefile)||OK (can send machinefile)||All runs on one node (can send machinefile)||||
+
| openmpi 1.*||OK for running on one node||OK for running on one node||Uses PBS_NODES automatically||Uses PBS_NODES automatically||
|}
|}
Line 204: Line 204:
mpiexec -np 9 /cluster/home/kwanous/a.out
mpiexec -np 9 /cluster/home/kwanous/a.out
mpdallexit
mpdallexit
 +
 +
rm -rf /cluster/home/kwanous/tmp/mpd.nodes
</pre>
</pre>
Line 220: Line 222:
'''Openmpi'''
'''Openmpi'''
 +
<pre>#!/bin/bash
 +
#PBS -N testopenmpi
 +
#PBS -l cput=00:60:00
 +
#PBS -l nodes=2:ppn=4
 +
 +
hostname
 +
. /etc/profile.d/modules.sh
 +
module load modules modules-init modules-bobsced
 +
module load openmpi/1.3.1
 +
mpirun -np 9 /cluster/home/kwanous/a.out</pre>

Latest revision as of 00:16, 22 April 2010

The source code for *anything* installed locally is in /usr/local/src. The source for *anything* installed on NFS is in /mounts/bobsced/usr/local/src.

Contents

Modules Software

Intel Compilers

Openmpi

MPICH

./configure --prefix=/mounts/bobsced/usr/local/modules-sw/mpich2/2.1.2 --enable-cxx --enable-f90 --enable-f77 --enable-threads=multiple --with-thread-package=posix

Using Modules

Important commands -

Log

Green color indicates something that still needs to be done.

Cloning

Head Node

Yum installed:

Install C3 tools from http://www.csm.ornl.gov/torc/C3/C3softwarepage.shtml

Ganglia

Networking

static_routes="bs0"
route_bs0="192.168.0.1 159.28.234.200"

Modules

Torque

Maui

Intel Firmware Updates

Mail

NFS

WebMO

Path to perl:         /usr/bin/perl
Webserver name:       bs0-new.cluster.earlham.edu
HTML directory:       /var/www/webmo
HTML URL:             /webmo
CGI script directory: /var/www/cgi-bin
CGI script URL:       /cgi-bin
User files directory: /mounts/bobsced/WebMO
SuexecUserGroup bob users
Erroneous write during file extend. write 160 instead of 4096
Probably out of disk space.
Write error in NtrExt1: No such file or directory

or

Write error in NtrExt1: Bad address

Infiniband

The 2.6.18-164.el5 kernel is installed, but do not have drivers available. 
Cannot continue.
ln -s /usr/mst/lib/2.6.18-128.el5/ /usr/mst/lib/2.6.18-164.2.1.e15.plus
[root@bs0-new ~]# mst start
    Starting MST (Mellanox Software Tools) driver set: 
Loading MST PCI module                                     [  OK  ]
Loading MST PCI configuration module                       [  OK  ]
Saving configuration for PCI device 01:00.0                [  OK  ]
Create devices

Cluster Image Review

' Command Line -np 8 Command Line -np 9 Torque -np 8 Torque -np 9 Machinefile Notes
mpich1OK, except allocates 1 process to bs0OK, except allocates 1 process to bs0OKOK/mounts/bobsced/usr/local/modules-sw/mpich1/1.2.7p1/share/machines.LINUXCreates temporary file PIxxxxx while running under qsub; MPI does not respect qsub nodes given, could get around this by creating machinefile at runtime
mpich2-oscN/AN/AOK, MUST specify # nodes, ppnOK, MUST specify # nodes, ppnGenerated by PBSCurrently uses OSC's pbs-specific mpiexec (cannot be run outside of qsub)
mpich2 OK, must setup mpd ringOK, must set up mdp ringOK, must set up mdp ringOK, must set up mdp ringN/A (mpd ring)Requires chmod 600 .mpd.conf in home directory with MPD_SECRETWORD=somevalue; setup ring with sort $PBS_NODEFILE | uniq -c | awk '{print $2":"$1}' > /cluster/home/kwanous/tmp/mpd.nodes; mpdboot -f /cluster/home/kwanous/tmp/mpd.nodes -n 2
openmpi 1.*OK for running on one nodeOK for running on one nodeUses PBS_NODES automaticallyUses PBS_NODES automatically

Submission scripts:
Mpich1

Mpich2

#!/bin/bash
#PBS -N testmpich2
#PBS -l cput=00:60:00
#PBS -l nodes=2:ppn=4

. /etc/profile.d/modules.sh
module load modules modules-init modules-bobsced
module load mpich2

sort $PBS_NODEFILE | uniq -c | awk '{print $2":"$1}' > /cluster/home/kwanous/tmp/mpd.nodes

mpdboot -f /cluster/home/kwanous/tmp/mpd.nodes -n 2
mpiexec -np 9 /cluster/home/kwanous/a.out
mpdallexit

rm -rf /cluster/home/kwanous/tmp/mpd.nodes

Mpich2 - OSC

#!/bin/bash
#PBS -N testmpich1
#PBS -l cput=00:60:00
#PBS -l nodes=2:ppn=4

hostname
. /etc/profile.d/modules.sh
module load modules modules-init modules-bobsced
module load mpich2
mpirun -np 8 /cluster/home/kwanous/a.out

Openmpi

#!/bin/bash
#PBS -N testopenmpi
#PBS -l cput=00:60:00
#PBS -l nodes=2:ppn=4

hostname
. /etc/profile.d/modules.sh
module load modules modules-init modules-bobsced
module load openmpi/1.3.1
mpirun -np 9 /cluster/home/kwanous/a.out
Personal tools
Namespaces
Variants
Actions
websites
wiki
this semester
Toolbox