Cluster Information
From Earlham Cluster Department
Line 17: | Line 17: | ||
# Finish 14? + 1 wiring harnesses | # Finish 14? + 1 wiring harnesses | ||
# Test BCCD release candidate | # Test BCCD release candidate | ||
+ | # QA kits (2) - Kristin | ||
# Organize 15 tool kits | # Organize 15 tool kits | ||
# Organize extra fasteners kit and extra parts kit | # Organize extra fasteners kit and extra parts kit |
Revision as of 11:08, 9 November 2012
SC12 Buildout
Friday in Richmond
Shopping:
- #6-32 x 3/4" pan head (how many?)
- HST for wiring harness
- MicroFe bolts and fasteners
- 1" velcro
- Bolts, nuts, nylon washers for WiFi/BlueTooth mount (? x 1/2" - 14)
- Superglue
To Do:
- Finish distributing fasteners, shock cords, etc. to kits
- Cut MicroFe plywood and plexiglass
- Organize frame bars (4 x 14 units)
- Cut main board plate pads for 14 units
- Finish 14? + 1 wiring harnesses
- Test BCCD release candidate
- QA kits (2) - Kristin
- Organize 15 tool kits
- Organize extra fasteners kit and extra parts kit
- Organize MicroFe parts - disk, 2 main boards, USB-Ethernet, switch plate,
- Build MicroFe
Take to SLC
- Projectors, power strips, network jumpers, assembly tools (ordered 10 + 5 from lab + charlie)
- Files (ease vent holes, enlarge switch hole, clean plate slots), soldering iron and solder, drill, bits, electric driver,
- Shirts from Charlie's office
To Do Saturday in SLC
- Wiki shuffle for website move
- Plan Buildout - include shutdown, startup, run loops
- Update assembly instructions - harvest pre-assembled frames section, change network switch mounting screws, note which is head node (no fan)
- Review assembly video
- Make 20 BCCD USB
- Scout buildout room
- Call Freeman for pick up
- Changes to poster (FedEx Office Print & Ship Center, Salt Lake City, UT)//Open until 9pm
To Do Sunday in SLC
- Put stickers on
- Pre-assemble frame rails to plates, ease ventilation holes in center plate, enlarge switch hole on 2 hole plates
- Check inventory sheets and contents for each kit
- Install RAM in each main board
- Spot-check one main board per kit, BIOS upgraded?
- Put serial numbers on every unit (this batch and previous) that we see, e.g. v4-# and v4a-#
- Check one board in each kit for BIOS upgrade
Fall, 2011
26 October
- Things to do before Wednesday Nov 2nd
- Drilling
- BIOS flashing
- Clean aluminum plates
- Drill holes on handles and L (light holders)
- Cat the foam
- Assemble the plates (2 from each LF)
- Things to do before Wednesday Nov 9th
- Harnesses
- Main board plate pads
- Things to order
- projector
- stickers
- packaging
- Things to take to SC11
- LF poster and signs
- gigabit switch
- hand tools
- gear bag(s)
- projector (new) and tripods
- big swag bag
- flip cam
- Business cards
- Tool box
- USB's
- Things to ship to SC11 (on Nov 3)
- 15 kits
- foam cut in 2 pieces
- frame plates (middle and end) with 3 bars mounted on each
- 15 kits
- Keep your log files up-to-date
- Inventory to CVS
- Things to do at SC11
- harvest swag for students
21 September
- Update LF v4 inventory - 22 units
- Update MF v2 inventory - 4 units
- MMM - check CVS update
- Paper - Oct 10 deadline (whole paper), see Google Doc
- BCCD testing -
LittleFe v4
- Setup/enroll littlefe-users list (just OU and PR plus developers)
- Setup/enroll littlefe-developers list
- Message to v4 users about screw length on guide bars and header pins, adding fan to last board(s)
- Bill Dave Naugler for a frame
- Complete and deliver remaining Intel units and Shodor unit
- Publish updated parts list
- Reconcile accounting
- Inventory for SC11, ordering
- Document modified process (frames partially pre-assembled, disks pre-liberated), pictures included in manual
- Document WiFi setup, BT setup
- Assembly video
- Promotional video - use in booth and Monday night at opening gala
Before Shipping Date for SC11
- Build frame sub-assemblies
- Assemble 15+ kits
- Take - projectors, MicroFes, LF v4 p1, toolbox, table-top screen
Before SC11
- Test VLC and promotional video under BCCD for Monday evening gala
Before OK/PUPR:
- Acquisitions
-
Shock cords -
Nuts and bolts - Projector (in OK or BestBuy?)
-
Small drill, glue gun -
Padding for under mainboard plates -
Antenna mounting material
-
- Software
- BT keyboard and mouse support and documentation (including MicroFe)
-
Setup v4/bccd documentation station -
WiFi documentation -
BIOS flash documentation - BIOS settings documentation (PXE, hyperthreading)
- Assembly documentation based on script
- BCCD release process, ISO build
- Hardware
-
Inventory mounting hardware - Test different locations and mounting for WiFi antenna
-
Drill switch/LED plates -
Assemble 10 hardware kits as defined in wiki -
Populate spares box
-
Saturday July 30th:
- Practice BCCD release and ISO build processes
- Make 10 BCCD bootable USBs
- Items from the Software list above
Take to OK/PUPR:
- Small drill, drill bits, driver bits, pin vice, countersink
- Six sets of hand tools - 3/8 and 5/16 ND, #2 Phillips, #0 Phillips, 3/8 wrench (have 4)
- Short #2 Phillips, needle-nose pliers, slip joint pliers, files (round and flat), glue guns and sticks, scissors, razor knife, thermometers, small screwdriver set
- Spares box
- Watts Up
- 16 port switch and jumpers
- Powered USB hub (for duplicating flash drives with ISO)
- Toolbox with all of this stuff
- Projector
- MicroFe, v4 p1, v4 p2
- USB keyboard and mouse
MicroFe:
- Screen
- Disk mounting (2, easier switching)
- Apple kyb and mouse setup
- Power supply
- Mount and install wireless adapter
LF v4:
- Kit Inventories
- Tune-up v4 prototype 1
- Consider assembling v4 prototype 2
- Order
- network cables, one end right angle to the right?
- stickers - LittleFe, EAPF, others?
- Serial number labels? v4 - p1, v4 - p2, v4 - 100 .. v4 - 125
- SATA cables
- additional network switches (plus gig for v3 cudafe?)
- disk drive mounting screws
- others?
- Wireless adapter inventory
- USB to ethernet adapter s
- Make wiring harnesses
- Pre-assemble frame kits
LF v3:
- Tune-up unit for Aaron to take to ISU
BW Internship
- Plan the march (see email to JH and additional notes)
- March the plan
BobSCEd:
- Hardware tour and re-deploy
- Lean Debian layer with VirtualBox?
- BCCD
- Preserve and upgrade plumbing
Mobeen:
- Make 26 wiring harnesses - cut 7 to length(s), strip, solder ends, heat shrink tubing
- Identify short SATA cables with right-angle end for the mainboard side
- Identify network cables with correct lengths and right-angle connectors (to the right) on one end
Events
BW Undergraduate Petascale Institute @ NCSA - 29 May through 11 June
- Charlie, Ivan and Mobeen will be gone
Introduction to Parallel Programming and Cluster Computing workshop @ UW-ISU - 25 June through 1 July
- Charlie, Ivan, and Aaron will be gone
Intermediate Parallel Programming and Distributed Computing workshop @ OU-PUPR - 29 July through 6 August
- Charlie, Ivan, and Aaron will be gone
- with first LittleFe buildout at OU (Sunday) and PUPR (Thursday?)
Projects
- LittleFe v4 Build-out @ Earlham - n weeks of x people's time
- BobSCEd -> BCCD design and deployment - n weeks of x people's time
- BW/Petascale project work - n weeks of x people's time
- T-voc hardware and software - n weeks of x people's time
Logs
Personal Schedules
- Charlie: Out 10-18 June (beach), 25 June - 2 July (workshop), 18-22 July (TG11), 28 July - 7 August (workshop)
- Ivan:
- Aaron:
- Fitz:
- Mobeen:
- Brad:
Al-Salam/CCG Downtime tasks
- LDAP Server migration from bs-new -> hopper
- yum update over all nodes
- Turn HT off
- PVFS
- PBS server on Hopper
How to use the PetaKit
- If you do not already have it, obtain the source for the PetaKit from the CVS repository on hopper (curriculum-modules/PetaKit).
- cd to the Subkits directory of PetaKit and run the area-subkit.sh to make an area subkit tarball or GalaxSee-subkit.sh to make a GalaxSee subkit tarball.
- scp the tarball to the target resource and unpack it.
- cd into the directory and run ./configure --with-mpi --with-openmp
- Use stat.pl and args_man to make an appropriate statistics run. See args_man for a description of predicates. Example:
perl -w stat.pl --program area --style serial,mpi,openmp,hybrid --scheduler lsf --user leemasa --problem_size 200000000000 --processes 1,2,3,4,5,6,7,8-16-64 --repetitions 10 -m -tag Sooner-strongest-newest --mpirun mpirun.lsf --ppn 8
Modifying Programs for Use with PetaKit
Old To Do
Date represents last meeting where we discussed the item
- Brad's Graphing Tool (28/Feb/10)
* Nice new functionality, see * Waiting on clean data to finish multiple resource displays * Error bars for left and right y-axes with checkboxes for each
- TeraGrid Runs (28/Feb/10)
In first box: Put initial of who is doing run
In second box: B = builds, R = runs, D = reports back to database, S = there is a good set of runs (10 per data point) for strong scaling in the database that appear on a graph, W = there is a good set of runs (10 per data point) for weak scaling in the database that appear on a graph
area under curve | GalaxSee | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Serial | MPI | OpenMP | Hybrid | Serial | MPI | OpenMP | Hybrid | |||||||||
ACLs | Sam | Sam | Sam | Sam | AW | AW | AW | AW | ||||||||
BobSCEd | ||||||||||||||||
BigRed | Sam | Sam | Sam | Sam | ||||||||||||
Sooner | Sam | Sam | Sam | Sam | ||||||||||||
pople | AW/CP | AW/CP | AW/CP | AW/CP |
Problem-sizes
- Big Red: 750000000000
- Sooner: 200000000000
- BobSCEd: 18000000000
- ACL: 18000000000
- New cluster (5/Feb/10)
* wiki page * Decommission Cairo * Figure out how to mount on Telco Rack * Get pdfs of all materials -- post them on wiki
- BCCD Testing (5/Feb/10)
* Get Fitz's liberation instructions into wiki * Get Kevin's VirtualBox instructions into wiki * pxe booting -- see if they booted, if you can ssh to them, if the run matrix works * Send /etc/bccd-revision with each email * Send output of netstat -rn and /sbin/ifconfig -a with each email * Run Matrix * For the future: scripts to boot & change bios, watchdog timer, 'test' mode in bccd, send emails about errors * USB scripts -- we don't need the "copy" script
- SIGCSE Conference -- March 10-13 (28/Feb/10)
* Leaving 8:00 Wednesday * Brad, Sam, or Gus pick up the van around 7, bring it by loading dock outside Noyes * Posters -- new area runs for graphs, start implementing stats collection and OpenMP, print at small size (what is that?) * Take 2 LittleFes, small switch, monitor/kyb/mouse (wireless), printed matter
- Spring Cleaning (Noyes Basement) (5/Feb/10)
* Next meeting: Saturday 6/Feb @ 3 pm
Generalized, Modular Parallel Framework
10,000 foot view of problems
Parent Process Sends Out | Children Send Back | Results Compiled By | |
---|---|---|---|
Area | function, bounds, segment size or count | sum of area for specified bounds | sum |
GalaxSee | complete array of stars, bounds (which stars to compute) | an array containing the computed stars | construct a new array of stars and repeat for next time step |
Matrix x Matrix | n rows from Matrix A and n columns from Matrix B, location of rows and cols | n resulting matrix position values, their location in results matrix | construct new result array |
Visualizing Parallel Framework
http://cs.earlham.edu/~carrick/parallel/parallelism-approaches.png
Parallel Problem Space
- Dwarf (algorithm family)
- Style of parallelism (shared, distributed, GPGPU, hybrid)
- Tiling (mapping problem to work units to workers)
- Distribution algorithm (getting work units to workers)
Summer of Fun (2009)
An external doc for GalaxSee
Documentation for OpenSim GalaxSee
What's in the database?
GalaxSee (MPI) | area-under-curve (MPI, openmpi) | area-under-curve (Hybrid, openmpi) | |||||||
---|---|---|---|---|---|---|---|---|---|
acl0-5 | bs0-5 GigE | bs0-5 IB | acl0-5 | bs0-5 GigE | bs0-5 IB | acl0-5 | bs0-5 GigE | bs0-5 IB | |
np X-XX | 2-20 | 2-48 | 2-48 | 2-12 | 2-48 | 2-48 | 2-20 | 2-48 | 2-48 |
What works so far? B = builds, R = runs, W = works
area under curve | GalaxSee (standalone) | |||||||
---|---|---|---|---|---|---|---|---|
Serial | MPI | OpenMP | Hybrid | Serial | MPI | OpenMP | Hybrid | |
acls | BRW | BRW | BRW | BRW | BR | |||
bobsced0 | BRW | BRW | BRW | BRW | BR | |||
c13 | BR | |||||||
BigRed | BRW | BRW | BRW | BRW | ||||
Sooner | BRW | BRW | BRW | BRW | ||||
pople | ||||||||
Charlie's laptop | BR |
To Do
- Fitz/Charlie's message
- Petascale review
- BobSCEd stress test
Implementations of area under the curve
- Serial
- OpenMP (shared)
- MPI (message passing)
- MPI (hybrid mp and shared)
- OpenMP + MPI (hybrid)
GalaxSee Goals
- Good piece of code, serves as teaching example for n-body problems in petascale.
- Dials, knobs, etc. in place to easily control how work is distributed when running in parallel.
- Architecture generally supports hybrid model running on large-scale constellations.
- Produces runtime data that enables nice comparisons across multiple resources (scaling, speedup, efficiency).
- Render in BCCD, metaverse, and /dev/null environments.
- Serial version
- Improve performance on math?
GalaxSee - scale to petascale with MPI and OpenMP hybrid.
- GalaxSee - render in-world and steer from in-world.
- Area under a curve - serial, MPI, and OpenMP implementations.
- OpenMPI - testing, performance.
- Start May 11th
LittleFe
- Testing
- Documentation
- Touch screen interface
Notes from May 21, 2009 Review
- Combined Makefiles with defines to build on a particular platform
- Write a driver script for GalaxSee ala the area under the curve script, consider combining
- Schema
- date, program_name, program_version, style, command line, compute_resource, NP, wall_time
- Document the process from start to finish
- Consider how we might iterate over e.g. number of stars, number of segments, etc.
- Command line option to stat.pl that provides a Torque wrapper for the scripts.
- Lint all code, consistent formatting
- Install latest and greatest Intel compiler in /cluster/bobsced
BobSCEd Upgrade
Build a new image for BobSCEd:
- One of the Suse versions supported for Gaussian09 on EM64T [v11.1] - Red Hat Enterprise Linux 5.3; SuSE Linux 9.3, 10.3, 11.1; or SuSE Linux Enterprise 10 (see G09 platform list) <-- CentOS 5.3 runs Gaussian binaries for RHEL ok
- Firmware update?
- C3 tools and configuration [v4.0.1]
- Ganglia and configuration [v3.1.2]
- PBS and configuration [v2.3.16]
- /cluster/bobsced local to bs0
- /cluster/... passed-through to compute nodes
- Large local scratch space on each node
- Gaussian09
- WebMO and configuration [v9.1] - Gamess, Gaussian, Mopac, Tinker
- Infiniband and configuration
- GNU toolchain with OpenMPI and MPICH [GCC v4.4.0], [OpenMPI v1.3.2] [MPICH v1.2.7p1]
- Intel toolchain with OpenMPI and native libraries
- Sage with do-dads (see Charlie)
- Systemimager for the client nodes?
Installed:
Fix the broken nodes.
(Old) To Do
BCCD Liberation
- v1.1 release - upgrade procedures
Curriculum Modules
- POVRay
- GROMACS
- Energy and Weather
- Dave's math modules
- Standard format, templates, how-to for V and V
LittleFe
- Explore machines from first Intel donation (notes and pictures)
- Build 4 SCED units
Infrastructure
- Masa's GROMACS interface on Cairo
- gridgate configuration, Open Science Grid peering
- hopper'
SC Education
- Scott's homework (see the message)
- SC10 brainstorming
Current Projects
Past Projects
General Stuff
- Todo
- General
- Hopper
- Howto's
- Networking
- 2005-11-30 Meeting
- 2006-12-12 Meeting
- 2006-02-02 Meeting
- 2006-03-16 Meeting
- 2006-04-06 Meeting
- Node usage
- Numbers for Netgear switches
- Latex Poster Creation
- Bugzilla Etiquette
- Modules
Items Particular to a Specific Cluster
Curriculum Modules
- gprof - statistical source code profiler
- Curriculum
- Fluid Dynamics
- Population Ecology
- GROMACS Web Interface
- Wiki Life for Academics
- PetaKit
Possible Future Projects
Archive
- TeraGrid '06 (Indianapolis, June 12-15, 2006)
- SIAM Parallel Processing 2006 (San Fransisco, February 22-24, 2006)
- Conference webpage
- Little-Fe abstract
- Low Latency Kernal abstract
- Folding@Clusters
- Best practices for teaching parallel programming to science faculty (Charlie only)