From Earlham Cluster Department

Revision as of 17:16, 15 January 2017 by Anschwa (Talk | contribs)
Jump to: navigation, search


Current To Do

Cluster Pages

New Software

For a lot of the software we install on the clusters, we install them as modules. Environment modules are an easy way of installing multiple versions of software and allowing users to trivially change their environment variables path (PATH, LD_LIBRARY_PATH, C_INCLUDE_PATH, etc. ) to point to which version they want to use. If I want to use gcc version 5.1.0 instead of the system default 4.4.7, all I would have to type is:

    module load gcc/5.1.0  

If gcc 4.4.7 were also a module and I wanted to swap them to make sure I'm using gcc 5.1.0, all I would type is:

    module swap gcc/4.4.7 gcc/5.1.0 

Before installing the software, figure-out if this should be a yum package or a source kit, installed into the system space or modules, and what library dependencies it has. Proceed as appropriate.

Building software packages

Needs to be Updated

Installing a yum package into Modules structure

Enabling built software packages within Modules structure

On the head node all of the clusters, modules are installed into /mounts/machine/software where machine is the name of the actual machine. This directory is visible to all of the nodes of the machine. In that directory, there should be subdirectories of the different modules that are available, and within those each version has it's own directory (as of July 15 2015, the modules setup and organization is messy on many of the cluster, the notes here are how it should be set up from here on out).

So, for example, if we have gcc versions 5.1.0, 4.7.1, and 4.9.0 installed on layout, it would look like this:

   $ ls -1 /mounts/fatboy/software/gcc

If you think your new package is important enough to be loaded by default, then add it to the list in /mounts/al-salam/software/Modules/3.2.7/init/al-salam.{sh,csh}

DNS/DCHP for a single host

Find an IP that's not in use. Easiest way to do that is look in this file /var/named/master/ Add name and IP like the pattern in the file, like below. At the top of the file, be sure to change the serial number at the top to represent the year, month, day, and version.	IN	A 

Save the file. Every time you add an entry to the zone file, you have to edit the reverse zone file. The reverse zone file is /var/named/master/ Add an entry for the host you added in the zone file. Notice the first number there is the last octet of the IP that you gave the host.

    126	IN	PTR 

Next you'll want to stop DNS and then start DNS with the following command.

   service named stop
   service named start

Now that DNS is updated, we have to update DHCP. The file you want to edit is /etc/dhcp/dhcpd.conf. Towards the bottom of the file you'll add

    host <hostname> { hardware ethernet <MACaddress> fixed-address <hostname>; .

Save the file. Just like we did for the DNS config file, we need to stop and the start DHCP.

   service dhcpd stop 
   service dhcpd start

As a test, reboot the client.

Setting up LDAP

When installing and configuring ldap, it can be tedious and frustrating, but no worries! I went through the troubles and took notes as I went so no one else would have to suffer like I did! These notes are pretty detailed, but I would suggest using one of the other servers with a newer centos version (layout, fatboy) as a resource when installing and configuring, especially if you are configuring it for a cluster.

Packages that need to be installed (both head and compute nodes):

We use NSS and NSLCD in conjunction with PAM for ldap authentication. It may be older than SSSD, but we already know how to do it. So, we want to turn off SSSD. If sssd is not running, then great, that'll make your life a lot easier!

   service stop sssd
   chkconfig sssd off #so it doesn't restart if the machines reboots
   chkconfig --del sssd #delete it as a service because we don't want it

There are a lot of files that need to be modified in order for ldap to work correctly.

   URI ldap://
   BASE dc=cluster, dc=loc
   TLS_CACERTDIR /etc/openldap/cacerts
    passwd:  ldap files
    group:  ldap files
    shadow: ldap files

    ethers:     files
    netmasks:   files
    networks:   files
    protocols:  files
    rpc:        files
    services:   files ldap

    netgroup:   ldap files

    publickey:  nisplus

    automount:  files ldap
    aliases:    files

    sudoers:    ldap files
   rootbinddn cn=Manager,dc=cluster,dc=loc
   nss_base_passwd ou=people,dc=cluster,dc=loc?one
   nss_base_shadow ou=people,dc=cluster,dc=loc?one
   nss_base_group ou=group,dc=cluster,dc=loc?one
   nss_map_objectclass posixAccount User
   nss_map_objectclass shadowAccount User
   nss_map_objectclass posixGroup Group
   nss_map_attribute uid userName
   nss_map_attribute gidNumber gid
   nss_map_attribute uidNumber uid
   nss_map_attribute cn groupName
   base dc=cluster,dc=loc
   pam_password crypt
   uri ldap://
   ssl no
   tls_cacertdir /etc/openldap/cacert
   uri ldap://
   instead of pam_sss.o, it should be
   base   group  ou=group,dc=cluster,dc=loc
   base   passwd ou=people,dc=cluster,dc=loc
   base   shadow ou=people,dc=cluster,dc=loc
   uid nslcd
   gid ldap
   uri ldap://
   base dc=cluster, dc=loc
   ssl no
   tls_cacertdir /etc/openldap/cacerts
   UsePAM yes

Since we deleted sssd, we need to start the alternative, and make sure it starts on boot up.

   service nslcd start
   chkconfig nslcd on

Check to make sure nscd is turned off. That is a caching service for ldap. Since we're so small here, we don't really need that.

   service nscd stop
   chkconfig nscd off

Users and Groups

Users are authenticated using an LDAP (Lightweight Directory Access Protocol) server running on Hopper. This is how users are authenticated throughout the entire cluster realm. We use LDAP for all users and groups except for ccg-admin user, root user, and the wheel group. Those users and that group are local to each cluster. Every user is apart of the users LDAP group, which is group number 115, and all clusters should look at ldap first and then files. This is specified in the /etc/nsswitch.conf file.

A user can change their password by the passwd command while on Hopper. It will prompt them for their current password, and then their desired new password. If it's successful it will tell you that at the end, with something like 'All LDAP tokens successfully changed.' or something close to that.

Creating New Users

Because creating users in LDAP is somewhat confusing, sample files and a python script were written to help. The script is and lives in ~root/ldap-files/ on hopper. I'll explain things on here, but there's a README file in that directory that will explain everything as well.

To create a user in LDAP, you must create an .ldif file for that user. This is what does for you. takes a file of new users as a command line argument. The file must specify First Name:username:email for each user, and each user should be on a separate line. The file add-user.ldif is an example of what the file should look like.

sudo python add-user.ldif will create an .ldif file for each user, and use ldapadd to add them to the LDAP database. The contents of the .ldif file for each user added will be printed to the screen, and each user will be sent an email with their username and password. is set to clean up after itself, so you don't need to worry about that. There's one more thing that has to be done after this step. We need to setup the ssh keys for each user. For each user created:

Become that user:

    su - user 

SSH to as0:

    ssh as0  

It will ask you for the password of the user. Then it will prompt you for information about where to save the public key file and for a passphrase. For all of these, just hit enter. That will set it to the default. Go back to hopper and do the same thing for all the new users.

It is VERY important that you use UID and GID numbers that have not already been taken. If new users and groups have been being added correctly, then there shouldn't be a problem with that. maxuid is a file that specifies the next UID to use when creating a new user. reads from that file when creating the .ldif files for each user, and at the end overwrites that file with the new naxuid. If you're nervous about the UID numbers, it is ok to double check. Doing ldapsearch -x should output everything in the database with the latest entry at the bottom. Look for the UID in that entry and compare it with maxuid. If maxuid is one above that number then all is golden. It's also safe to look in /etc/passwd to make sure no one is using that number either.

Other modifications to the DB

Other modifications to the database, like adding a new group, adding users to that group, deleting users, all use .ldif files similar to adding users. In the same directory as the above files, there are sample .ldif files that do these operations. Each file's name should be what it does. add-group.ldif will add a group, using ldapadd command. add-user-to-group.dif can be used to add a user to a group, and del-from-grp.ldif can delete users from groups, and chg-pw show an example of changing the password of a user. All three using the ldapmodify command. Make sure you modify the files to what you need, especially don't forget to change the gid if you add a group. Make sure it's a GID that's not already in use.

The command for modifying the database, if i was adding a user to a group. You'll need to specify the Manager password to the end of this command. It has been redacted here but can be found in the README file, only with root privileges.

    ldapmodify -f add-user-to-group.ldif -D "cn=Manager,dc=cluster,dc=loc" 

If you're adding a group, change ldapmodify to ldapadd. The cn=Manager stuff is just specifying the manager of the database, which will allow you to change it.

To delete a user from the ldap database, it's a little simpler. Use the command below. Again, the Manager password must be specified and has been redacted here, but it will be the same as the password used in the other commands, and can be found in the README file. You will need to change the uid to be equal to the uid of the person you want to delete. The uid here is just the person's username.

    ldapdelete "uid=sbsp,ou=people,dc=cluster,dc=loc" -D "cn=Manager,dc=cluster,dc=loc" 


Disable graphical booting screen in CentOS

To enable verbose booting and remove the loading bar graphic, simply remove rhgb quiet from the file /boot/grub/grub.conf.

rhgb stands for redhat graphical boot, the quiet option tells CentOS to suppress even more boot information.

Rebuilding and Creating RAID arrays with mdadm

mdadm RAID Documentation

Torque PBS

Modifying pbs_server Configuration

  qmgr -c 'print server' > qmgr_pbs_server.backup

Note: you can simply list the qmgr pbs_server configuration with: qmgr -c 'p s'?.

Modify qmgr Server Variable

An example for modifying a server variable for pbs_server with qmgr.

  $ qmgr
  $ unset server acl_hosts
  $ set server acl_hosts = headnode.hostname

Restarting pbs_server

Sometimes you need to make a change to the pbs_server (or add a new node).

The following shuts down pbs_server without killing jobs.

  $ qterm -t quick
  $ pbs_server

Running the Head Node as a Compute Node

Make sure the correct hostname (local to the compute nodes) is specified in /var/spool/torque/server_priv/nodes and /var/spool/torque/mom_priv/config.

When running pbs_mom from the head node, it may be necessary to specify the local hostname with:

  pbs_mom -H headnode.hostname

If you are running pbs as a service, you may also need to modify the init script for pbs_mom.

Starting pbs_mom at boot (pbs as a Service)

Copy pbs_mom init script into /etc/init.d/

To find where the pbs_mom init script is located, use the locate command.

$ locate /init.d/pbs_mom

  cp /path/to/contrib/init.d/pbs_mom /etc/init.d/pbs_mom

Add to chkconfig

  $ chkconfig --add pbs_mom
  $ chkconfig pbs_mom on

Custom init script for pbs_mom

You can make a copy of /etc/init.d/pbs_mom called something like my_pbs_mom in order to specify your own pbs_mom flags. For example if you need to specify the local hostname with -H headnode.hostname. If you do this, the above chkconfig commands should be issued with my_pbs_mom instead of pbs_mom.

Note: init scripts should have permissions 0755.


Infiniband Documentation

Installing CHARMM

Load latest openMPI

module load modules
module load gcc/4.9.0
module load openmpi


./ gnu M

Clean (if needed)

  ./ gnu M distclean

Bacula Backup Management


Fluorite (Machine)

The jail quartz is the CS bacula director which lives on the machine fluorite,

Location of configuration file on BSD (i.e. quartz) can be found here: /usr/local/etc/bacula/bacula-dir.conf/ and /usr/local/etc/bacula/bacula-fd.conf/. Each bacula client has its own bacula-fd.conf configuration file that points back to quartz.

Helpful Commands for working with jails

Bacula Commands (on Quartz)

Personal tools
this semester