Ccg-admin
From Earlham Cluster Department
Current To Do
- This section has been moved to a GDrive document, https://docs.google.com/document/d/1_qsH4eFZRW_rmqq2kgMF_DbKBL-HukjBuAdr93j0oJ0/edit
Cluster Pages
New Software
- Figure-out if this should be a yum package or a source kit, installed into the system space or modules, and what library dependencies it has. Proceed as appropriate.
Building software packages
- Download the source tarball into /root/install
- Unpack, build with configure, etc.
- Make a <package>-<version>.config.sh script that runs ./configure with all your options (so that it's kept around in case we need to reinstall).
- To configure, give --prefix=/mounts/al-salam/software/<package>-<version>
- Run your config.sh and continue building/installing as normal
- Create a soft link from /mounts/al-salam/software/<package> to /mounts/al-salam/software/<package>-<version>
Installing a yum package into Modules
- How?
= Enabling built software packages within Modules
- sudo su -
- cd /mounts/al-salam/software/Modules/3.2.7/modulefiles
- ls and look for another package that has a similar usage model as the package you're installing (e.g., Python module, C/C++ library, utility, library+utilities)
- Copy that to your new package, e.g., cp -r openmpi <software>
- cd <software>; ls Note the filename that appears.
- Move that file to your package's <version>
- Edit <version>
- Change references to the package you copied to the new one you're installing, including the version number, path, variable names, etc.
- Check modulefile(4) for keywords, etc., within the module file.
- Usually, that's all you need to do. Verify that it shows up in module avail and that module load <software> doesn't throw any errors and allows your package to work.
If you think your new package is important enough to be loaded by default, then add it to the list in /mounts/al-salam/software/Modules/3.2.7/init/al-salam.{sh,csh}
Adding a New Host
DNS/DCHP for a single host
- Find an IP that's not in use. Easiest way to do that is look in this file /var/named/master/cluster.zone.
- Add name and IP like the pattern in the file.
- At the top of the file, be sure to change the serial number at the top to represent the year,month,day, and version.
- Save the file. Every time you add an entry to the zone file, you have to edit the reverse zone file. The reverse zone file is /var/named/master/159.28.234.zone.
- Add an entry for the host you added in the zone file.
- Next you'll want to stop DNS and then start DNS with the following command. service named stop and then use service named start
- Now that DNS is updated, we have to update DHCP.
- The file you want to edit is /etc/dhcp/dhcpd.conf. Towards the bottom of the file you'll add host <hostname> { hardware ethernet <MACaddress> fixed-address <hostname>.cluster.earlham.edu; .
- Save the file. Just like we did for the DNS config file, we need to stop and the start DHCP. To stop DHCP use the command service dhcpd stop. Then start DHCP with service dhcpd start.
- As a test, reboot the client.
Users and Groups
Users are authenticated using an LDAP (Lightweight Directory Access Protocol) server running on Hopper. This is how users are authenticated throughout the entire cluster realm. We use LDAP for all users and groups except for ccg-admin user, root user, and the wheel group. Those users and that group are local to each cluster. Every user is apart of the users LDAP group, which is group number 115, and all clusters should look at ldap first and then files. This is specified in the /etc/nsswitch.conf file.
A user can change their password by the passwd command while on Hopper. It will prompt them for their current password, and then their desired new password. If it's successful it will tell you that at the end, with something like 'All LDAP tokens successfully changed.' or something close to that.
Creating New Users
Because creating users in LDAP is somewhat confusing, sample files and a python script were written to help. The script is addusers.py and lives in ~root/ldap-files/ on hopper. I'll explain things on here, but there's a README file in that directory that will explain everything as well.
To create a user in LDAP, you must create an .ldif file for that user. This is what addusers.py does for you. addusers.py takes a file of new users as a command line argument. The file must specify First Name:username:email for each user, and each user should be on a separate line. The file add-user.ldif is an example of what the file should look like.
sudo python addusers.py add-user.ldif will create an .ldif file for each user, and use ldapadd to add them to the LDAP database. The contents of the .ldif file for each user added will be printed to the screen, and each user will be sent an email with their username and password. addusers.py is set to clean up after itself, so you don't need to worry about that. There's one more thing that has to be done after this step. We need to setup the ssh keys for each user. For each user created:
Become that user: su - user
SSH to as0: ssh as0
It will ask you for the password of the user. Then it will prompt you for information about where to save the public key file and for a passphrase. For all of these, just hit enter. That will set it to the default. Go back to hopper and do the same thing for all the new users.
It is VERY important that you use UID and GID numbers that have not already been taken. If new users and groups have been being added correctly, then there shouldn't be a problem with that. maxuid is a file that specifies the next UID to use when creating a new user. addusers.py reads from that file when creating the .ldif files for each user, and at the end overwrites that file with the new naxuid. If you're nervous about the UID numbers, it is ok to double check. Doing ldapsearch -x should output everything in the database with the latest entry at the bottom. Look for the UID in that entry and compare it with maxuid. If maxuid is one above that number then all is golden. It's also safe to look in /etc/passwd to make sure no one is using that number either.
Other modifications to the DB
Other modifications to the database, like adding a new group, adding users to that group, deleting users, all use .ldif files similar to adding users. In the same directory as the above files, there are sample .ldif files that do these operations. Each file's name should be what it does. add-group.ldif will add a group, using ldapadd command. add-user-to-group.dif can be used to add a user to a group, and del-from-grp.ldif can delete users from groups, and chg-pw show an example of changing the password of a user. All three using the ldapmodify command. Make sure you modify the files to what you need, especially don't forget to change the gid if you add a group. Make sure it's a GID that's not already in use.
The command for modifying the database, if i was adding a user to a group. You'll need to specify the Manager password to the end of this command. It has been redacted here but can be found in the README file, only with root privileges.
ldapmodify -f add-user-to-group.ldif -D "cn=Manager,dc=cluster,dc=loc"
If you're adding a group, change ldapmodify to ldapadd. The cn=Manager stuff is just specifying the manager of the database, which will allow you to change it.
To delete a user from the ldap database, it's a little simpler. Use the command below. Again, the Manager password must be specified and has been redacted here, but it will be the same as the password used in the other commands, and can be found in the README file. You will need to change the uid to be equal to the uid of the person you want to delete. The uid here is just the person's username.
ldapdelete "uid=sbsp,ou=people,dc=cluster,dc=loc" -D "cn=Manager,dc=cluster,dc=loc"
Monitoring
- Ganglia - http://cluster.earlham.edu/ganglia/?r=month&s=descending&c=
- Machine room environment -
New Hopper
- Currently known as Megamind
Disable graphical booting screen in CentOS
To enable verbose booting and remove the loading bar graphic, simply remove rhgb quiet
from the file /boot/grub/grub.conf
.
rhgb
stands for redhat graphical boot, the quiet
option tells CentOS to suppress even more boot information.
Rebuilding and Creating RAID 1 Arrays with mdadm
Crating Arrays
To create a mirrored array with two drives, sda
and sdb
, on partitions, sda1
and sdb1
:
mdadm --create --verbose /dev/md0 --level=raid1 --raid-devices=2 /dev/sda1 /dev/sdb1
Now you can monitor the status of the building of the array with:
cat /proc/mdstat
Once finished, save your mdadm
configuration with:
mdadm --verbose --detail --scan > /etc/mdadm.conf
You may need to edit this file to remove unwanted lines or to add an email address to MAILADDR
to be notified if a drive failure occurs:
MAILADDR user1@dom1.com, user2@dom2.com
On some systems, mdadm
's configuration file is /etc/mdadm/mdadm.conf
, it is very important to put the configuration in the correct location.
Rebuilding Arrays
If a drive ever fails, or is the system is booted with a drive removed, you will need to add it back into the array.
Failed Drive
In this example, /dev/sda1
and /dev/sdb1
make up the RAID 1 array /dev/md0
. Let us say that /dev/sdb
fails.
Determining failed drive
Run
cat /proc/mdstat
[root@lo4 ~]# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb1[1](F) sda1[0] 204736 blocks super 1.0 [2/1] [U_] unused devices: <none>
When a drive fails or is missing, you will see an underscore in the array output ([U_]
instead of [UU]
). (F)
will be displayed next to the failed drive (sdb1[1](F)
).
If not, running lsblk
or fdisk -l
may help you determine which drive it is that failed
hdparm -I /dev/sda | grep "Serial Number"
Will give you the serial number of /dev/sda
, which may help you identify physical disks as well.
Remove Failed Drive
If a drive has has failed, it should be removed from the mdadm
array before being replaced.
mdadm --manage /dev/md0 --fail /dev/sdb1
[root@lo4 ~]# mdadm --manage /dev/md0 --fail /dev/sdb1 mdadm: set /dev/sdb1 faulty in /dev/md0
Now, we can remove it from the array.
mdadm --manage /dev/md0 --remove /dev/sdb1
[root@lo4 ~]# mdadm --manage /dev/md0 --remove /dev/sdb1 mdadm: hot removed /dev/sdb1 from /dev/md0
Check /proc/mdstat
. There should no longer be any (F)
or listed drive besides sda1[0]
.
Power down the system.
shutdown -h now
Replace Drive
Now that everything is powered down, remove the failed HDD then replace it with the new one.
Once the drive is replaced, boot the system back up.
Add New Drive to Array
Recreate the partitioning scheme of /dev/sda
on the new drive.
sfdisk -d /dev/sda | fdisk /dev/sdb
Then verify with lsblk
or fdisk -l
.
mdadm --manage /dev/md0 --add /dev/sdb1
[root@lo4 ~]# mdadm --manage /dev/md0 --add /dev/sdb1 mdadm: added /dev/sdb1
Finally, check the status of the rebuilding with
cat /proc/mdstat
[root@lo4 ~]# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb1[1] sda1[0] 204736 blocks super 1.0 [2/1] [U_] [===========>.........] recovery = 57.7% (118400/204736) finish=0.0min speed=118400K/sec unused devices: <none>
Missing Drive
In this example, /dev/sda1
and /dev/sdb1
make up the RAID 1 array /dev/md0
. Let us say that /dev/sdb1
is missing.
Use lsblk
to examine HDD partitions with block sizes. Alternatively, you can use fdisk -l
or any other utility you prefer.
Now check the status of mdadm
with:
cat /proc/mdstat
Personalities : [raid1] md0 : active raid1 sdb1[1](F) sda1[0] 204736 blocks super 1.0 [2/1] [U_] unused devices: <none>
When a drive fails or is missing, you will see an underscore in the array output ([U_]
instead of [UU]
).
Use the output from lsblk
and /proc/mdsat
to match the present drive in an active mdadm
array (/dev/md0
) with the corresponding partition on the missing drive. For example, "match" /dev/sda1
with /dev/sdb1
(after verifying their block sizes are the same).
Now add /dev/sdb1
back into the array:
mdadm --manage /dev/md0 --add /dev/sdb1
[root@lo4 ~]# mdadm --manage /dev/md0 --add /dev/sdb1 mdadm: added /dev/sdb1
You can view the status of the rebuilding array with:
cat /proc/mdstat
[root@lo4 ~]# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb1[1] sda1[0] 204736 blocks super 1.0 [2/1] [U_] [===========>.........] recovery = 57.7% (118400/204736) finish=0.0min speed=118400K/sec unused devices: <none>
References and Resources
- http://www.howtoforge.com/replacing_hard_disks_in_a_raid1_array
- http://ubuntuforums.org/showthread.php?t=1760217
- https://raid.wiki.kernel.org/index.php/RAID_setup#RAID-1
- http://unix.stackexchange.com/questions/80501/no-etc-mdadm-conf-in-centos-6
Torque PBS
Modifying pbs_server
Configuration
- Backup old
qmgr
/pbs_server
configuration.
qmgr -c 'print server' > qmgr_pbs_server.backup
Note: you can simply list the qmgr
pbs_server
configuration with: qmgr -c 'p s'?
.
Modify qmgr
Server Variable
An example for modifying a server variable for pbs_server
with qmgr
.
- Unset
acl_hosts
and reset as onlyheadnode.hostname
.
$ qmgr $ unset server acl_hosts $ set server acl_hosts = headnode.hostname
Restarting pbs_server
Sometimes you need to make a change to the pbs_server
(or add a new node).
The following shuts down pbs_server
without killing jobs.
$ qterm -t quick $ pbs_server
Running the Head Node as a Compute Node
Make sure the correct hostname (local to the compute nodes) is specified in /var/spool/torque/server_priv/nodes
and /var/spool/torque/mom_priv/config
.
When running pbs_mom
from the head node, it may be necessary to specify the local hostname with:
pbs_mom -H headnode.hostname
If you are running pbs
as a service, you may also need to modify the init script for pbs_mom
.
Starting pbs_mom
at boot (pbs
as a Service)
Copy pbs_mom
init script into /etc/init.d/
To find where the pbs_mom
init script is located, use the locate
command.
$ locate /init.d/pbs_mom /mounts/layout/software/torque/torque-4.1.0/contrib/init.d/pbs_mom /mounts/layout/software/torque/torque-4.1.0/contrib/init.d/pbs_mom.in
cp /path/to/contrib/init.d/pbs_mom /etc/init.d/pbs_mom
Add to chkconfig
$ chkconfig --add pbs_mom $ chkconfig pbs_mom on
Custom init script for pbs_mom
You can make a copy of /etc/init.d/pbs_mom
called something like my_pbs_mom
in order to specify your own pbs_mom
flags. For example if you need to specify the local hostname with -H headnode.hostname
. If you do this, the above chkconfig
commands should be issued with my_pbs_mom
instead of pbs_mom
.
Note: init scripts should have permissions 0755.
Infiniband
Installing
Mellanox vs. RedHat Open Fabrics distributions (OFED)
You can either get the required Infiniband packages from the RHEL package manager, or directly from Mellanox.
When making your choice, keep in mind the following:
We had oddities with our IB network until we started using the Mellanox OFED. One of the joys of OFED as an industry standard is that every IB vendor has their own perversion of it. What makes it especially frustrating is that RHEL/CentOS ship their own OFED and disentangling them in an automated way can be challenging. Mellanox OFED will uninstall RHEL OFED during its installation, but woe be unto the one who tries to do a "yum upgrade" at some point in the future. -Skylar Thompson
We opted for installing the latest Mellanox OFED. Mellenox OFED will remove a previous installation of RHEL OFED. After installation we have to separate the Mellanox OFED "infiniband support"
yum group as its own separate entity as yum upgrade
will cause the files to be overwritten by RHEL's packages.
Once the appropriate CentOS ISO is downloaded, execute the following script:
tar -xvzf /path/to/MLNX.tgz cd MLNX cd MLNX_OFED_LINUX-2.4-1.0.0-rhel6.5-x86_64/ ls ./mlnxofedinstall [OPTIONS]
Then reboot for good measure.
IPoIB vs. native IB, or NFS / RDMA
IPoIB implements a TCP/IB layer on top of Infiniband and adds the Host Channel Adapter (HCA) as a Network Interface Card (NIC) to the system (Ex: ib0).
Using Infiniband "naively" with NFS / RDMA potentially allows for sending messages (packets) with greater bandwidth and significantly less CPU usage / involvement, as long as you have RDMA compatible hardware of course.
Unreliable Datagram vs. Connected Mode
I have read that connected mode is comparable to using jumbo frames (thus favorable), but recently it seems datagram has become more stable and is preferred. In any case you can switch between modes at run-time with:
echo datagram > /sys/class/net/ibX/mode echo connected > /sys/class/net/ibX/mode
Set up IPoIB
Installing the Mellanox Infiniband drivers with the --all
flag should configure much of IPoIB
already. There is a network configuration file in /etc/sysconfig/network-scripts/ifcfg-ib0
.
You can configure IPoIB
to use its own static IP address, or use the network configuration for an existing Ethernet configuration.
Here is an example ifcfg-ib<n>
taken from the [ref:two Mellanox user manual].
# Static settings; all values provided by this file IPADDR_ib0=11.4.3.175 NETMASK_ib0=255.255.0.0 NETWORK_ib0=11.4.0.0 BROADCAST_ib0=11.4.255.255 ONBOOT_ib0=1 # Based on eth0; each '*' will be replaced with a corresponding octet # from eth0. LAN_INTERFACE_ib0=eth0 IPADDR_ib0=11.4.'*'.'*' NETMASK_ib0=255.255.0.0 Mellanox Technologies Confidential 1.5.2-2.1.0-1.1.1000 Driver Features 82 Mellanox Technologies NETWORK_ib0=11.4.0.0 BROADCAST_ib0=11.4.255.255 ONBOOT_ib0=1 # Based on the first eth<n> interface that is found (for n=0,1,...); # each '*' will be replaced with a corresponding octet from eth<n>. LAN_INTERFACE_ib0= IPADDR_ib0=11.4.'*'.'*' NETMASK_ib0=255.255.0.0 NETWORK_ib0=11.4.0.0 BROADCAST_ib0=11.4.255.255 ONBOOT_ib0=1
Subnet Manager (OpenSM)
openSM setup
If your infiniband switch does not support a subnet manger on the hardware you will need to set up opensm to be run by the head node.
upon installation the opensm deamon will be found in/etc/init.d/opensmd
, in order to stream-line things add the daemon (as well as any others not found in services) to your services using: complete -W "$(ls /etc/init.d/)" serviceNext, attempt to start opensm by using
service opensmd startMake sure that the opensmd is set to start on boot-up
chkconfig --list opensmd
troubleshooting
Should the service not start for any reason use lsmod | grep ^ib
to check what infiniband modules are running. Here is an example output of what you should see
ib_ucm 12120 0 ib_ipoib 122881 0 ib_cm 42214 3 ib_ucm,rdma_cm,ib_ipoib ib_uverbs 61976 2 rdma_ucm,ib_ucm ib_umad 12562 0 ib_sa 35753 5 rdma_ucm,rdma_cm,ib_ipoib,ib_cm,mlx4_ib ib_mad 43632 4 ib_cm,ib_umad,mlx4_ib,ib_sa ib_core 117605 12 rdma_ucm,ib_ucm,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_uverbs,ib_umad,mlx5_ib,mlx4_ib,ib_sa,ib_mad ib_addr 7796 3 rdma_cm,ib_uverbs,ib_core
I found that the ib_umad
module is directly related to opensm. If it or any other modules aren't loaded you will need to add them to the rc.modules
file
echo modprobe "module name" >> /etc/rc.modulesand then update permissions
chmod +x /etc/rc.modules
example:
echo modprobe u_mad >> /etc/rc.modules chmod +x /etc/rc.modules
Subnet Manager Failover
Setting up failover for opensm isn't challenging, but it is good to document which nodes are the subnet managers as the behavior of the network will be strange without any of the managers running. We discovered that with our GPFS cluster when we accidentally rebooted both managers at the same time - no nodes could join the network, including the subnet managers, until we took some manual action. -Skylar Thompson
Failover is necessary when ruining a subnet manager (SM) on your Infiniband machines (rather than a switch). Essentially, failover is a configuration that ensures that if one machine goes down, there is guaranteed to be a SM running on another machine. With Infiniband, you need an SM to be active, otherwise the machines will not be able to communicate with each other.
Switch Configuration
Initial Setup
Certain Infiniband switches can run a subnet manager. This is ideal and in this situation, failover is not necessary. To configure our switch, the Mellanox SX6018, you need to connect the console port to the serial port of an Infiniband machine. Next, install and run the serial terminal program minicom
and login with username: admin
and password: admin
. Go through the configuration wizard (the defaults are fine). We did not enable IPv6.
Installing minicom
yum install minicom
And set the port to /dev/ttyS0
minicom -s
Running the switch setup wizard. Run minicom
and login. Then run the following commands proceeding the >
or #
.
switch > enable switch # configure terminal switch (config) # jump-start
From the Mellanox switch manual:
Before attempting a remote (for example, SSH) connection to the switch, check the mgmt0
interface configuration. Specifically, verify the existence of an IP address. To check the current mgmt0 configuration, enter the following commands:
Note that the commands start after the >
or #
.
switch > enable switch # configure terminal switch (config) # show interfaces mgmt0
Enabling / Running the Subnet Manager
You can enable, manage, configure, and run the subnet manager (along with many other things) through the Mellanox switch web interface control (management) panel. However, if you don't want to bother with getting it working, you can simply enable the subnet manager straight from a logged-in minicom
switch prompt.
Again, the commands start after the >
and #
.
switch > enable switch # configure terminal switch (config) # ib sm
OpenMPI testing
We tested OpenMPI using an prime number generator script found here: /cluster/home/charliep/cvs-hopper/primes/
. We ran primes_batch
with mpirun
, specifying the desired amount of machines using a machinefile
/ hostfile
.
Creating a Machine / Hosts File
A machinefile, or hostfile lists information about the nodes for mpirun
to use. You should be able to make
an appropriate file and run it on the connected machines using Infiniband.
make primes_batch mpirun primes_batch --np=4 -hostfile=hostfile primes_batch
General testing
The OFED comes with loads of testing programs.
-
ibping
-
ibdiagnet
-
ibstatus
-
ibstat
Testing with CHARMM
References
- http://www.mellanox.com/page/products_dyn?product_family=26
- http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_User_Manual_v2.2-1.0.1.pdf
- http://www.shocksolution.com/2012/12/installing-and-configuring-infiniband-on-a-red-hat-system/
- https://access.redhat.com/solutions/301643
- https://niktips.wordpress.com/2011/02/02/activating-infiniband-stack-in-linux/
- https://software.intel.com/en-us/articles/understanding-the-infiniband-subnet-manager/
- https://docs.oracle.com/cd/E19802-01/820-2189-10/ib-nem-sw-overview.html
- http://people.redhat.com/dledford/infiniband_get_started.html
- http://pkg-ofed.alioth.debian.org/howto/infiniband-howto.html
- https://www.kernel.org/doc/Documentation/infiniband/ipoib.txt
- http://www.mellanox.com/pdf/whitepapers/InfiniBandFAQ_FQ_100.pdf
- http://www.mcs.anl.gov/~balaji/pubs/2010/ispass/ispass10.ipoib.pdf
- http://www.bctes.com/nat-linux-iptables.html
- http://www.mellanox.com/page/products_dyn?product_family=150&mtag=sx6015_sx6018
- https://thegeekinthecorner.wordpress.com/category/infiniband-verbs-rdma/
- http://www.mellanox.com/related-docs/user_manuals/SX60XX_User_Manual.pdf
- http://www.cyberciti.biz/tips/connect-soekris-single-board-computer-using-minicom.html
- http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi/linux/bks/SGI_Admin/books/ICEX_Admin_Guide/sgi_html/ch04.html#Z1226348317tls
- https://www.kernel.org/doc/Documentation/filesystems/nfs/nfs-rdma.txt
Installing CHARMM
Load latest openMPI
module load modules module load gcc/4.9.0 module load openmpi
Install libquadmath.so.0
./install.com gnu M
Clean (if needed)
./install.com gnu M distclean