Cluster: New BobSCEd Install Log

From Earlham Cluster Department

(Difference between revisions)
Jump to: navigation, search
(Infiniband)
(Log)
Line 103: Line 103:
<pre>Write error in NtrExt1: Bad address</pre>
<pre>Write error in NtrExt1: Bad address</pre>
** To fix this, do <code>echo 0 > /proc/sys/kernel/randomize_va_space</code>
** To fix this, do <code>echo 0 > /proc/sys/kernel/randomize_va_space</code>
 +
** <font color="green">This needs to be set to happen all the time on boot</code>
== Infiniband ==
== Infiniband ==

Revision as of 00:40, 18 October 2009

Contents

Scratch Space

Log

Green color indicates something that still needs to be done.

Cloning

Head Node

Yum installed:

Install C3 tools from http://www.csm.ornl.gov/torc/C3/C3softwarepage.shtml

Ganglia

Networking

static_routes="bs0"
route_bs0="192.168.0.1 159.28.234.200"

Modules

Torque

Maui

Intel Firmware Updates

Mail

NFS

WebMO

Path to perl:         /usr/bin/perl
Webserver name:       bs0-new.cluster.earlham.edu
HTML directory:       /var/www/webmo
HTML URL:             /webmo
CGI script directory: /var/www/cgi-bin
CGI script URL:       /cgi-bin
User files directory: /mounts/bobsced/WebMO
SuexecUserGroup bob users
Erroneous write during file extend. write 160 instead of 4096
Probably out of disk space.
Write error in NtrExt1: No such file or directory

or

Write error in NtrExt1: Bad address

Infiniband

  • Drivers downloaded from here - the Red Hat 5.3 ones
    • mount the ISO as a loopback somewhere (ie mount -o loop /mounts/bobsced/usr/src/MLNX_OFED_LINUX-1.4-rhel5.3.iso /media/)
    • run with -msm (ie /media/mlnxofedinstall --msm
  • Need to boot into the kernel that came in the original install (2.6.18-128.el5), otherwise get a message like this:
The 2.6.18-164.el5 kernel is installed, but do not have drivers available. 
Cannot continue.
  • Then (before rebooting back to old kernel), run mst start
  • Then, sym link to current kernel version, like this: (see Rocks discussion here about it)
    • You should get the version number in the error mst start will give you
ln -s /usr/mst/lib/2.6.18-128.el5/ /usr/mst/lib/2.6.18-164.2.1.e15.plus
  • Success looks like this:
[root@bs0-new ~]# mst start
    Starting MST (Mellanox Software Tools) driver set: 
Loading MST PCI module                                     [  OK  ]
Loading MST PCI configuration module                       [  OK  ]
Saving configuration for PCI device 01:00.0                [  OK  ]
Create devices
Personal tools
Namespaces
Variants
Actions
websites
wiki
this semester
Toolbox