Al-salam

From Earlham Cluster Department

(Difference between revisions)
Jump to: navigation, search
(Parts List)
(Have done)
 
(24 intermediate revisions not shown)
Line 1: Line 1:
Al-Salam is the working name for the Earlham Computer Science Department's upcoming cluster computer.
Al-Salam is the working name for the Earlham Computer Science Department's upcoming cluster computer.
-
At the moment Al-Salam exists only as a $40,000 grant and a growing list of tentative specifications:
+
== Installation Notes ==
 +
=== headnode ===
 +
I'll be maintaining a script, <tt>/root/install/al-salam.sh</tt>, that will also serve as a log. Also following along with [[Cluster: New BobSCEd Install Log#Head Node|BobSCEd-new logs]] for consistency between clusters.
 +
 
 +
==== TODO ====
 +
* MPI
 +
* Software installations into /cluster/al-salam
 +
* User auth via bs0-new's ldap
 +
* torque/maui
 +
* Ganglia
 +
* shorewall
 +
* modules
 +
 
 +
==== Have done ====
 +
 
 +
* yum install:
 +
gcc.x86_64 gcc-c++.x86_64 gcc-gfortran.x86_64 \
 +
gcc44.x86_64 gcc44-c++.x86_64 gcc44-gfortran.x86_64 \
 +
apr-x86_64 apr-devel.x86_64 expat-devel.x86_64 \
 +
blas.x86_64 dhcp.x86_64
 +
* rpm install:
 +
** c3
 +
** libconfuse
 +
** libconfuse-devel
 +
* /etc/c3.conf:
 +
cluster al-salam {
 +
    as0.cluster.earlham.edu:as0.al-salam.loc
 +
    as[1-12]
 +
}
 +
* in hopper:/etc/rc.conf:
 +
static_routes="bs0 as0"
 +
route_as0="192.168.1.1 159.28.234.150"
 +
* hopper:/etc/namedb/master/cluster.zone:
 +
as0.cluster.earlham.edu.      IN  A 159.28.234.150
 +
as.cluster.earlham.edu.      IN  CNAME as0
 +
al-salam.cluster.earlham.edu. IN  CNAME as0
 +
* hopper: /etc/namedb/named.conf
 +
<pre>
 +
acl al-salam {
 +
        192.168.1.0/24; // Al-Salam internal network
 +
        159.28.234.150; // Al-Salam headnode
 +
};
 +
 
 +
view al-salam {
 +
        match-clients { al-salam; };
 +
 
 +
        zone "al-salam.loc" {
 +
                type master;
 +
                allow-transfer { none; };
 +
                file "master/al-salam.loc";
 +
        };
 +
 
 +
        zone "1.168.192.in-addr.arpa" {
 +
                type master;
 +
                allow-transfer { none; };
 +
                file "master/1.168.192.in-addr.arpa";
 +
        };
 +
        zone "cluster.earlham.edu" {
 +
                type master;
 +
                allow-transfer { servers; };
 +
                file "master/cluster.zone";
 +
        };
 +
        zone "234.28.159.IN-ADDR.ARPA" {
 +
                type master;
 +
                allow-transfer { servers; };
 +
                file "master/159.28.234.zone";
 +
        };
 +
 
 +
        zone "." {
 +
                type hint;
 +
                file "master/named.root";
 +
        };
 +
};
 +
</pre>
 +
* hopper:/etc/namedb/master/al-salam.loc
 +
** copy from bobsced.loc, amend as necessary
 +
* hopper:/etc/namedb/master/1.168.192.in-addr.arpa
 +
** copy from 0.168.192.in-addr.arpa, amend as necessary
 +
* hopper:/etc/namedb/master/159.28.234.zone
 +
150 IN  PTR as0.cluster.earlham.edu.
 +
* hopper:/usr/local/etc/dhcpd.conf
 +
<pre>
 +
        subnet 192.168.1.0 netmask 255.255.255.0 {
 +
 
 +
                option routers                  192.168.1.1;
 +
                option subnet-mask              255.255.255.0;
 +
                option domain-name              "al-salam.loc";
 +
                option domain-name-servers      159.28.234.1;
 +
 
 +
                next-server                    159.28.234.17;
 +
                filename "pxelinux.0";
 +
 
 +
                host as1.al-salam.loc { hardware ethernet 00:30:48:F2:99:DC; fixed-address 192.168.1.101; }
 +
                host as2.al-salam.loc { hardware ethernet 00:30:48:F3:0D:32; fixed-address 192.168.1.102; }
 +
                host as3.al-salam.loc { hardware ethernet 00:30:48:F2:99:DA; fixed-address 192.168.1.103; }
 +
                host as4.al-salam.loc { hardware ethernet 00:30:48:F2:99:CC; fixed-address 192.168.1.104; }
 +
                host as5.al-salam.loc { hardware ethernet 00:30:48:F2:99:C4; fixed-address 192.168.1.105; }
 +
                host as6.al-salam.loc { hardware ethernet 00:30:48:F2:9A:06; fixed-address 192.168.1.106; }
 +
                host as7.al-salam.loc { hardware ethernet 00:30:48:F3:0D:30; fixed-address 192.168.1.107; }
 +
                host as8.al-salam.loc { hardware ethernet 00:30:48:F2:99:D6; fixed-address 192.168.1.108; }
 +
                host as9.al-salam.loc { hardware ethernet 00:30:48:F2:99:C6; fixed-address 192.168.1.109; }
 +
                host as10.al-salam.loc { hardware ethernet 00:30:48:F2:9A:0A; fixed-address 192.168.1.110; }
 +
                host as11.al-salam.loc { hardware ethernet 00:30:48:F2:99:E0; fixed-address 192.168.1.111; }
 +
                host as12.al-salam.loc { hardware ethernet 00:30:48:F2:99:A2; fixed-address 192.168.1.112; }
 +
</pre>
 +
* as0:/etc/dhcrelay
 +
# Command line options here
 +
INTERFACES="eth0 eth1"  # on layout both interfaces are required, originally only one was listed here
 +
DHCPSERVERS="cluster.earlham.edu"
 +
 
 +
* The extra bits on layout (all as root)
 +
$ yum install -y dhcp
 +
$ /etc/sysconfig/dhcrelay
 +
INTERFACES="eth0 eth1"
 +
DHCPSERVERS="cluster.earlham.edu"
 +
$ iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
 +
$ service iptables save
 +
 
 +
* hopper:
 +
  vi exports # add entries (check to make sure they aren't already covered by existing rules)
 +
  vi hosts.allow # add entries
 +
 
 +
=== compute nodes ===
 +
* clone from bs1-new using udpcast
 +
* modify network, etc. settings from ''bobsced'' to ''al-salam''
== Latest Overarching Questions ==
== Latest Overarching Questions ==
Line 24: Line 148:
# Switch - managed, cut-through
# Switch - managed, cut-through
## Fitz: Having a hard time finding anyone who sells cut-through switches
## Fitz: Having a hard time finding anyone who sells cut-through switches
 +
### How about [http://www.tigerdirect.com/applications/searchtools/item-Details.asp?sku=H24-J9021A&SRCCODE=CHANNELINC&cisrccode=cii_7240393&cpncode=20-3634289 this] store-and-forward switch from hp?
# Power distribution - rack-mount PDUs
# Power distribution - rack-mount PDUs
Line 63: Line 188:
! [[Al-salam#Newegg_Quote_.231|Newegg #1]]
! [[Al-salam#Newegg_Quote_.231|Newegg #1]]
! [[Al-salam#Newegg_Quote_.232|Newegg #2]]
! [[Al-salam#Newegg_Quote_.232|Newegg #2]]
-
! [[Al-salam#Working_list_.233|Working list #3]]
+
! [[Al-salam#Intel_List_.231|Intel List #1]]
 +
! [[Al-salam#AMD_List_.231|AMD List #1]]
 +
! [[Al-salam#AMD_List_.232|AMD List #2]]
|-
|-
| '''CPU'''
| '''CPU'''
Line 71: Line 198:
| 128 2.4GHz Intel E5530
| 128 2.4GHz Intel E5530
| 112 2.4GHz Intel E5530
| 112 2.4GHz Intel E5530
-
| 96 2.4GHz Intel E5530
+
| 100 2.4GHz Intel E5530
 +
| 156 2.0GHz [http://www.newegg.com/Product/Product.aspx?Item=N82E16819105189 AMD Opteron 2350]
 +
| 126 2.6GHz [http://www.newegg.com/Product/Product.aspx?Item=N82E16819105189 AMD Opteron 2435]
|-
|-
| '''RAM'''
| '''RAM'''
Line 80: Line 209:
| 168GB DDR3-1333
| 168GB DDR3-1333
| 144GB DDR3-1333
| 144GB DDR3-1333
 +
| 160GB DDR2-800
 +
| 120GB DDR2-800
|-
|-
| '''GPU'''
| '''GPU'''
Line 87: Line 218:
| None
| None
| 4 Tesla C1060
| 4 Tesla C1060
 +
| 2 Tesla C1060
 +
| 2 Tesla C1060
| 2 Tesla C1060
| 2 Tesla C1060
|-
|-
| '''Local disk'''
| '''Local disk'''
 +
| Yes
 +
| Yes
| Yes
| Yes
| Yes
| Yes
Line 101: Line 236:
| Yes
| Yes
| Yes
| Yes
 +
| No
 +
| No
| No
| No
| No
| No
Line 111: Line 248:
| No
| No
| IPMI on GPU nodes
| IPMI on GPU nodes
 +
| IPMI
 +
| IPMI
| IPMI
| IPMI
|-
|-
Line 120: Line 259:
| 12U
| 12U
| 12U
| 12U
 +
| 20U
 +
| 10U
|-
|-
| '''Price'''
| '''Price'''
Line 127: Line 268:
| $32,910.56
| $32,910.56
| $34,696.78
| $34,696.78
-
| $33,555.26
+
| $35,846.00
 +
| $35,275.00
 +
| $33,755.00
|}
|}
Line 277: Line 420:
* Price Tag: $32,910.56
* Price Tag: $32,910.56
-
==Newegg Quote #2==
+
==Newegg Quote #2 (the one we purchased?) ==
 +
 
* 2x [http://secure.newegg.com/WishList/PublicWishDetail.aspx?WishListNumber=8560589 Newegg list]
* 2x [http://secure.newegg.com/WishList/PublicWishDetail.aspx?WishListNumber=8560589 Newegg list]
** 1U
** 1U
Line 299: Line 443:
* Price tag: $34,696.78
* Price tag: $34,696.78
-
==Working list #1==
+
==Intel List #1==
-
* Chassis + mainboards: http://www.newegg.com/Product/Product.aspx?Item=N82E16816110039
+
* 13x Chassis + mainboard: http://www.provantage.com/supermicro-sys-6016t-gtf~7SUP91FA.htm
-
* CPU: http://www.newegg.com/Product/Product.aspx?Item=N82E16819117184
+
* 25x CPU: http://www.newegg.com/Product/Product.aspx?Item=N82E16819117184
-
* RAM: http://www.newegg.com/Product/Product.aspx?Item=N82E16820139041
+
* 39x RAM: http://www.newegg.com/Product/Product.aspx?Item=N82E16820139041
-
* HDD: http://www.newegg.com/Product/Product.aspx?Item=N82E16822136280
+
* 14x HDD: http://www.newegg.com/Product/Product.aspx?Item=N82E16822136280
 +
* 2x Tesla card: http://www.tigerdirect.com/applications/searchtools/item-details.asp?EdpNo=4259469&SRCCODE=GOOGLEBASE&cm_mmc_o=VRqCjC7BBTkwCjCECjCE
* Notes:
* Notes:
-
** $2400/node
+
** ~$2600/node without Tesla
-
** Nice dual node design with two PS (one per node)
+
** ~$3850/node with Tesla
-
** No physical room for tesla
+
** 1.5GB RAM/core
-
** 1.5GB RAM/node
+
** 2 dies/node (8 cores/node)
** 2 dies/node (8 cores/node)
-
** No IPMI
+
** Yes IPMI
 +
** 1 headnode (compute node - one die + one HDD) + 11 compute nodes + 2 Tesla nodes ~= $35,846
-
==Working list #2==
+
==AMD List #1==
-
* Chassis + mainboard: http://www.newegg.com/Product/Product.aspx?Item=N82E16816152097
+
* 20x Chassis: http://www.newegg.com/Product/Product.aspx?Item=N82E16811152128
-
* CPU: http://www.newegg.com/Product/Product.aspx?Item=N82E16819117184
+
* 20x Mainboard: http://www.newegg.com/Product/Product.aspx?Item=N82E16813182108
-
* RAM: http://www.newegg.com/Product/Product.aspx?Item=N82E16820139041
+
* 39x CPU: http://www.newegg.com/Product/Product.aspx?Item=N82E16819105189
-
* HDD: http://www.newegg.com/Product/Product.aspx?Item=N82E16822148511
+
* 40x RAM: http://www.newegg.com/Product/Product.aspx?Item=N82E16820134936
 +
* 21x HDD: http://www.newegg.com/Product/Product.aspx?Item=N82E16822136280
 +
* 2x Tesla: http://www.tigerdirect.com/applications/searchtools/item-details.asp?EdpNo=4259469&SRCCODE=GOOGLEBASE&cm_mmc_o=VRqCjC7BBTkwCjCECjCE
* Notes:
* Notes:
-
** $2200/node
+
** $1650/node without Tesla
-
** 1.5GB RAM/core
+
** $2900/node with Tesla
 +
** 1G RAM/core
** 2 dies/node (8 cores/node)
** 2 dies/node (8 cores/node)
-
** No physical room for tesla
+
** Yes IMPI
-
** No IPMI
+
** 1 headnode (compute node - one die + one HDD) + 17 compute nodes + 2 Tesla nodes ~= $35,275
-
==Working list #3==
+
==AMD List #2==
-
* 12x Chassis + mainboard: http://www.newegg.com/Product/Product.aspx?Item=N82E16816101240
+
* 10x Chassis: http://www.newegg.com/Product/Product.aspx?Item=N82E16811152128
-
* 24x CPU: http://www.newegg.com/Product/Product.aspx?Item=N82E16819117184
+
* 10x Mainboard: http://www.newegg.com/Product/Product.aspx?Item=N82E16813182108
-
* 36x RAM: http://www.newegg.com/Product/Product.aspx?Item=N82E16820139041
+
* 19x CPU: http://www.newegg.com/Product/Product.aspx?Item=N82E16819105189
-
* 12x HDD: http://www.newegg.com/Product/Product.aspx?Item=N82E16822136280
+
* 30x RAM: http://www.newegg.com/Product/Product.aspx?Item=N82E16820134936
-
* 2x Tesla card: http://www.tigerdirect.com/applications/searchtools/item-details.asp?EdpNo=4259469&SRCCODE=GOOGLEBASE&cm_mmc_o=VRqCjC7BBTkwCjCECjCE
+
* 11x HDD: http://www.newegg.com/Product/Product.aspx?Item=N82E16822136280
 +
* 2x Tesla: http://www.tigerdirect.com/applications/searchtools/item-details.asp?EdpNo=4259469&SRCCODE=GOOGLEBASE&cm_mmc_o=VRqCjC7BBTkwCjCECjCE
* Notes:
* Notes:
-
** $2600/node without Tesla
+
** $3130/node without Tesla
-
** $3850/node with Tesla
+
** $4350/node with Tesla
-
** 1.5GB RAM/core
+
** 1G RAM/core
-
** 2 dies/node (8 cores/node)
+
** 2 dies/node (12 cores/node)
-
** Yes IPMI
+
** Yes IMPI
-
** 10 compute nodes + 2 Tesla nodes ~= $33,555.26
+
** 1 headnode (compute node - one die + one HDD) + 7 compute nodes + 2 Tesla nodes ~= $33,755

Latest revision as of 19:38, 15 December 2013

Al-Salam is the working name for the Earlham Computer Science Department's upcoming cluster computer.

Contents

Installation Notes

headnode

I'll be maintaining a script, /root/install/al-salam.sh, that will also serve as a log. Also following along with BobSCEd-new logs for consistency between clusters.

TODO

Have done

gcc.x86_64 gcc-c++.x86_64 gcc-gfortran.x86_64 \
gcc44.x86_64 gcc44-c++.x86_64 gcc44-gfortran.x86_64 \
apr-x86_64 apr-devel.x86_64 expat-devel.x86_64 \
blas.x86_64 dhcp.x86_64
cluster al-salam {
    as0.cluster.earlham.edu:as0.al-salam.loc
    as[1-12]
}
static_routes="bs0 as0"
route_as0="192.168.1.1 159.28.234.150"
as0.cluster.earlham.edu.      IN  A 159.28.234.150
as.cluster.earlham.edu.       IN  CNAME as0
al-salam.cluster.earlham.edu. IN  CNAME as0
 acl al-salam {
        192.168.1.0/24; // Al-Salam internal network
        159.28.234.150; // Al-Salam headnode
 };

view al-salam {
        match-clients { al-salam; };

        zone "al-salam.loc" {
                type master;
                allow-transfer { none; };
                file "master/al-salam.loc";
        };

        zone "1.168.192.in-addr.arpa" {
                type master;
                allow-transfer { none; };
                file "master/1.168.192.in-addr.arpa";
        };
        zone "cluster.earlham.edu" {
                type master;
                allow-transfer { servers; };
                file "master/cluster.zone";
        };
        zone "234.28.159.IN-ADDR.ARPA" {
                type master;
                allow-transfer { servers; };
                file "master/159.28.234.zone";
        };

        zone "." {
                type hint;
                file "master/named.root";
        };
 };
150 IN  PTR as0.cluster.earlham.edu.
        subnet 192.168.1.0 netmask 255.255.255.0 {

                option routers                  192.168.1.1;
                option subnet-mask              255.255.255.0;
                option domain-name              "al-salam.loc";
                option domain-name-servers      159.28.234.1;

                next-server                     159.28.234.17;
                filename "pxelinux.0";

                host as1.al-salam.loc { hardware ethernet 00:30:48:F2:99:DC; fixed-address 192.168.1.101; }
                host as2.al-salam.loc { hardware ethernet 00:30:48:F3:0D:32; fixed-address 192.168.1.102; }
                host as3.al-salam.loc { hardware ethernet 00:30:48:F2:99:DA; fixed-address 192.168.1.103; }
                host as4.al-salam.loc { hardware ethernet 00:30:48:F2:99:CC; fixed-address 192.168.1.104; }
                host as5.al-salam.loc { hardware ethernet 00:30:48:F2:99:C4; fixed-address 192.168.1.105; }
                host as6.al-salam.loc { hardware ethernet 00:30:48:F2:9A:06; fixed-address 192.168.1.106; }
                host as7.al-salam.loc { hardware ethernet 00:30:48:F3:0D:30; fixed-address 192.168.1.107; }
                host as8.al-salam.loc { hardware ethernet 00:30:48:F2:99:D6; fixed-address 192.168.1.108; }
                host as9.al-salam.loc { hardware ethernet 00:30:48:F2:99:C6; fixed-address 192.168.1.109; }
                host as10.al-salam.loc { hardware ethernet 00:30:48:F2:9A:0A; fixed-address 192.168.1.110; }
                host as11.al-salam.loc { hardware ethernet 00:30:48:F2:99:E0; fixed-address 192.168.1.111; }
                host as12.al-salam.loc { hardware ethernet 00:30:48:F2:99:A2; fixed-address 192.168.1.112; }
# Command line options here
INTERFACES="eth0 eth1"   # on layout both interfaces are required, originally only one was listed here
DHCPSERVERS="cluster.earlham.edu"
$ yum install -y dhcp
$ /etc/sysconfig/dhcrelay
INTERFACES="eth0 eth1"
DHCPSERVERS="cluster.earlham.edu"
$ iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
$ service iptables save
 vi exports # add entries (check to make sure they aren't already covered by existing rules)
 vi hosts.allow # add entries

compute nodes

Latest Overarching Questions

Parts List

  1. Nodes - case, motherboard(s), power supply, CPU, RAM, GPGPU cards
  2. Switch - managed, cut-through
    1. Fitz: Having a hard time finding anyone who sells cut-through switches
      1. How about this store-and-forward switch from hp?
  3. Power distribution - rack-mount PDUs

Tentative Specifications

Budget

Nodes

Specialty Nodes

Educationally, we could expect to get significant use out of GPGPUs, but the production use is limited. Increasing the variance of the architecture landscape would be a bonus to education.

Network

Disk

OS

Quick breakdown

Nodes

ION #61116 ION #61164 SM #174536 Newegg #1 Newegg #2 Intel List #1 AMD List #1 AMD List #2
CPU 72 2.4GHz Intel E5530 80 2.4GHz Intel E5530 80 2.4GHz Intel E5530 128 2.4GHz Intel E5530 112 2.4GHz Intel E5530 100 2.4GHz Intel E5530 156 2.0GHz AMD Opteron 2350 126 2.6GHz AMD Opteron 2435
RAM 108GB PC3-10600 120GB PC3-10600 120GB DDR3-1333 192GB DDR3-1333 168GB DDR3-1333 144GB DDR3-1333 160GB DDR2-800 120GB DDR2-800
GPU 2 Tesla C1060 2 Tesla C1060 2 Tesla C1060 None 4 Tesla C1060 2 Tesla C1060 2 Tesla C1060 2 Tesla C1060
Local disk Yes Yes Yes Yes Yes Yes Yes Yes
Shared chassis No Yes Yes No No No No No
Remote mgmt No No IPMI No IPMI on GPU nodes IPMI IPMI IPMI
Size (just nodes) 9U 6U 6U 16U 12U 12U 20U 10U
Price $33,173.20 $33,054.30 $30,078.00 $32,910.56 $34,696.78 $35,846.00 $35,275.00 $33,755.00

Power distribution

PDU1220 PDUMH20 AP9563 AP7801
Vendor TrippLite TrippLite APC APC
Size 1U 1U 1U 1U
Capabilities Dumb Metered Dumb Metered
Input power 20A, 1x NEMA 5-20P 20A, 1x NEMA L5-20P w/ NEMA 5-20P adapter 20A, 1x NEMA 5-20P 20A, 1x NEMA 5-20P
Output power 13x NEMA 5-20R 12x NEMA 5-20R 10x NEMA 5-20R 8x NEMA 5-20R
Price $195 $230 $120 $380

ION Computer Systems Quotation #61116

ION Computer Systems Quotation #61164

Silicon Mechanics Quote #174536

Newegg Quote #1

Newegg Quote #2 (the one we purchased?)

Intel List #1

AMD List #1

AMD List #2

Personal tools
Namespaces
Variants
Actions
websites
wiki
this semester
Toolbox