Ccg-admin
From Earlham Cluster Department
Contents |
Current To Do
- Space-a.cluster.earlham.edu
ldap [KM]- make/export volume [AR]
sudo [KM]
- Fatboy.cluster.earlham.edu
sudo[KM]- figure out groups problem [KM]
-
anaconda python[AR]
- Layout.cluster.earlham.edu
- sudo on all nodes [KM]
- change passwords on all nodes (root, exxact) [KM]
- michael's list (including anaconda python (https://store.continuum.io/cshop/anaconda) on all nodes, [KM]
- qsub install (torque) on all nodes [KM]
- Install WebMo [KM]
- Put CD in fatboy and get stuff off of it to copy to layout. Install Gaussian, then Torque, then WebMo in /mounts/software
- Al-salam.cluster.earlham.edu
- anaconda python on all nodes
sudo on all nodes[KM]- update WebMo on Al-Salam (after layout install/test) [KM]
- All machines
- Close root's sshd's configuration (on head node only) to disallow remote root connection [KM]
- Develop a global file system naming structure and implement it [KM,AR,CP]
ntp client config (using proto) on hopper, layout*, al-salam*, fatboy, bigfe [AR]
- General
- Low latency network - fiber or copper based 10GbE [IB]
- Rack consolidation -
- Permission masks for places where N people will be working, layout, fatboy, shared filesystems (/cluster, etc.) [AR]
- Ganglia or equivalent setup, consider MRTG too?
This list should be annotated with the initials of who is working on each item.
Cluster Pages
Installing Software
- Download the source tarball into /root/install
- Unpack
- Make a <package>-<version>.config.sh script that runs ./configure with all your options (so that it's kept around in case we need to reinstall).
- To configure, give --prefix=/mounts/al-salam/software/<package>-<version>
- Run your config.sh and continue building/installing as normal
- Create a soft link from /mounts/al-salam/software/<package> to /mounts/al-salam/software/<package>-<version>
Enabling a package within Modules
- sudo su -
- cd /mounts/al-salam/software/Modules/3.2.7/modulefiles
- ls and look for another package that has a similar usage model as the package you're installing (e.g., Python module, C/C++ library, utility, library+utilities)
- Copy that to your new package, e.g., cp -r openmpi <software>
- cd <software>; ls Note the filename that appears.
- Move that file to your package's <version>
- Edit <version>
- Change references to the package you copied to the new one you're installing, including the version number, path, variable names, etc.
- Check modulefile(4) for keywords, etc., within the module file.
- Usually, that's all you need to do. Verify that it shows up in module avail and that module load <software> doesn't throw any errors and allows your package to work.
If you think your new package is important enough to be loaded by default, then add it to the list in /mounts/al-salam/software/Modules/3.2.7/init/al-salam.{sh,csh}
DNS/DCHP for a single host
- 1)Find an IP that's not in use. Easiest way to do that is look in this file /var/named/etc/namedb/master/cluster.zone.
- 2)Add name and IP like the pattern in the file.
- 3)At the top of the file, be sure to change the serial number at the top to represent the year,month,day, and version.
- 4)Save the file. Every time you add an entry to the zone file, you have to edit the reverse zone file. The reverse zone file is /var/named/etc/namedb/master/159.28.234.zone.
- 5)Add an entry for the host you add in the zone file.
- 6)Next you'll want to stop DNS and then start DNS with the following command. service named stop and then use service named start
- 7)Now that DNS is updated, we have to update DHCP.
- 8)The file you want to edit is /usr/local/etc/dhcpd.conf. Towards the bottom of the file you'll add host <hostname> { hardware ethernet <MACaddress> fixed-address <hostname>.cluster.earlham.edu; .
- 9)Save the file. Just like we did for the DNS config file, we need to stop and the start DHCP. To stop the config use the command /usr/local/etc/rc.d/isc-dhcpd stop. Then start the DHCP with /usr/local/etc/rc.d/isc-dhcpd start.
- 10)As a test, reboot the client.
Users and Groups
Users are authenticated based on an LDAP server running on Hopper. cpu is installed on Hopper as an LDAP-user management tool. You should use it to view/edit/create users unless you're super comfortable with ldapmodify and LDIF. Passwords can be changed easily with the ldpasswd command on Hopper. It can be used both by users to change their own password and root to change another user's password.
Groups are also in LDAP. Check the tail end of the result of cpu cat for group info.
man cpu-ldap will tell you all about using cpu for user/group management. For the most part, its format is pretty similar to pw, but there are some minor differences. Read the man page.
Monitoring
- Ganglia - http://cluster.earlham.edu/ganglia/?r=month&s=descending&c=
- Machine room environment -
New Hopper
- Currently known as Megamind