Cluster Information

From Earlham Cluster Department

Jump to: navigation, search

Pre Tapia

SC12 Buildout

To Do Saturday in SLC

  1. Test BCCD release candidate, make a release
  2. Wiki shuffle for website move
  3. Plan Buildout - include shutdown, startup, run loops
  4. Update assembly instructions - harvest pre-assembled frames section, change network switch mounting screws, note which is head node (no fan)
  5. Review assembly video
  6. Make 20 BCCD USB
  7. Scout buildout room
  8. Pickup key for 258 from Committee Office
  9. Changes to poster (FedEx Office Print & Ship Center, Salt Lake City, UT)//Open until 9pm
  10. Print 15 sets of instructions
  11. Update flyer and print 100

To Do Sunday in SLC

  1. Put stickers on
  2. Pre-assemble frame rails to plates, ease ventilation holes in center plate, enlarge switch hole on 2 hole plates
  3. Check inventory sheets and contents for each kit
  4. Install RAM in each main board
  5. Spot-check one main board per kit, BIOS upgraded?
  6. Put serial numbers on every unit (this batch and previous) that we see, e.g. v4-# and v4a-#
  7. Check one board in each kit for BIOS upgrade

Fall, 2011

26 October

21 September

LittleFe v4

Before Shipping Date for SC11

Before SC11

Before OK/PUPR:

Saturday July 30th:

Take to OK/PUPR:


MicroFe:

LF v4:

LF v3:

BW Internship

BobSCEd:

Mobeen:

Events

BW Undergraduate Petascale Institute @ NCSA - 29 May through 11 June

Introduction to Parallel Programming and Cluster Computing workshop @ UW-ISU - 25 June through 1 July

Intermediate Parallel Programming and Distributed Computing workshop @ OU-PUPR - 29 July through 6 August

Projects

Logs

Personal Schedules

Al-Salam/CCG Downtime tasks

How to use the PetaKit

  1. If you do not already have it, obtain the source for the PetaKit from the CVS repository on hopper (curriculum-modules/PetaKit).
  2. cd to the Subkits directory of PetaKit and run the area-subkit.sh to make an area subkit tarball or GalaxSee-subkit.sh to make a GalaxSee subkit tarball.
  3. scp the tarball to the target resource and unpack it.
  4. cd into the directory and run ./configure --with-mpi --with-openmp
  5. Use stat.pl and args_man to make an appropriate statistics run. See args_man for a description of predicates. Example:
perl -w stat.pl --program area --style serial,mpi,openmp,hybrid --scheduler lsf --user leemasa --problem_size 200000000000
--processes 1,2,3,4,5,6,7,8-16-64 --repetitions 10 -m -tag Sooner-strongest-newest --mpirun mpirun.lsf --ppn 8

Modifying Programs for Use with PetaKit

Old To Do

Date represents last meeting where we discussed the item

 * Nice new functionality, see 
 * Waiting on clean data to finish multiple resource displays
 * Error bars for left and right y-axes with checkboxes for each

In first box: Put initial of who is doing run

In second box: B = builds, R = runs, D = reports back to database, S = there is a good set of runs (10 per data point) for strong scaling in the database that appear on a graph, W = there is a good set of runs (10 per data point) for weak scaling in the database that appear on a graph

area under curve GalaxSee
Serial MPI OpenMP Hybrid Serial MPI OpenMP Hybrid
ACLs Sam Sam Sam Sam AW AW AW AW
BobSCEd
BigRed Sam Sam Sam Sam
Sooner Sam Sam Sam Sam
pople AW/CP AW/CP AW/CP AW/CP

Problem-sizes


 * wiki page
 * Decommission Cairo
 * Figure out how to mount on Telco Rack
 * Get pdfs of all materials -- post them on wiki
 * Get Fitz's liberation instructions into wiki
 * Get Kevin's VirtualBox instructions into wiki
 * pxe booting -- see if they booted, if you can ssh to them, if the run matrix works 
 * Send /etc/bccd-revision with each email
 * Send output of netstat -rn and /sbin/ifconfig -a with each email
 * Run Matrix
 * For the future:  scripts to boot & change bios, watchdog timer, 'test' mode in bccd, send emails about errors
 * USB scripts -- we don't need the "copy" script
 * Leaving 8:00 Wednesday
 * Brad, Sam, or Gus pick up the van around 7, bring it by loading dock outside Noyes
 * Posters -- new area runs for graphs, start implementing stats collection and OpenMP, print at small size (what is that?)
 * Take 2 LittleFes, small switch, monitor/kyb/mouse (wireless), printed matter
 * Next meeting: Saturday 6/Feb @ 3 pm

Generalized, Modular Parallel Framework

10,000 foot view of problems

this conceptual view may not reflect current code
Parent Process Sends Out Children Send Back Results Compiled By
Area function, bounds, segment size or count sum of area for specified bounds sum
GalaxSee complete array of stars, bounds (which stars to compute) an array containing the computed stars construct a new array of stars and repeat for next time step
Matrix x Matrix n rows from Matrix A and n columns from Matrix B, location of rows and cols n resulting matrix position values, their location in results matrix construct new result array

Visualizing Parallel Framework

http://cs.earlham.edu/~carrick/parallel/parallelism-approaches.png

Parallel Problem Space

Summer of Fun (2009)

An external doc for GalaxSee
Documentation for OpenSim GalaxSee

What's in the database?

GalaxSee (MPI) area-under-curve (MPI, openmpi) area-under-curve (Hybrid, openmpi)
acl0-5 bs0-5 GigE bs0-5 IB acl0-5 bs0-5 GigE bs0-5 IB acl0-5 bs0-5 GigE bs0-5 IB
np X-XX 2-20 2-48 2-48 2-12 2-48 2-48 2-20 2-48 2-48

What works so far? B = builds, R = runs, W = works

area under curve GalaxSee (standalone)
Serial MPI OpenMP Hybrid Serial MPI OpenMP Hybrid
acls BRW BRW BRW BRW BR
bobsced0 BRW BRW BRW BRW BR
c13 BR
BigRed BRW BRW BRW BRW
Sooner BRW BRW BRW BRW
pople
Charlie's laptop BR

To Do

Implementations of area under the curve

GalaxSee Goals

GalaxSee - scale to petascale with MPI and OpenMP hybrid.

LittleFe

Notes from May 21, 2009 Review

BobSCEd Upgrade

Build a new image for BobSCEd:

  1. One of the Suse versions supported for Gaussian09 on EM64T [v11.1] - Red Hat Enterprise Linux 5.3; SuSE Linux 9.3, 10.3, 11.1; or SuSE Linux Enterprise 10 (see G09 platform list) <-- CentOS 5.3 runs Gaussian binaries for RHEL ok
  2. Firmware update?
  3. C3 tools and configuration [v4.0.1]
  4. Ganglia and configuration [v3.1.2]
  5. PBS and configuration [v2.3.16]
  6. /cluster/bobsced local to bs0
  7. /cluster/... passed-through to compute nodes
  8. Large local scratch space on each node
  9. Gaussian09
  10. WebMO and configuration [v9.1] - Gamess, Gaussian, Mopac, Tinker
  11. Infiniband and configuration
  12. GNU toolchain with OpenMPI and MPICH [GCC v4.4.0], [OpenMPI v1.3.2] [MPICH v1.2.7p1]
  13. Intel toolchain with OpenMPI and native libraries
  14. Sage with do-dads (see Charlie)
  15. Systemimager for the client nodes?

Installed:

Fix the broken nodes.

(Old) To Do

BCCD Liberation

Curriculum Modules

LittleFe

Infrastructure

SC Education

Current Projects

Past Projects

General Stuff

Items Particular to a Specific Cluster

Curriculum Modules

Possible Future Projects

Archive

Personal tools
Namespaces
Variants
Actions
websites
wiki
this semester
Toolbox