Batch-computing

From Earlham Cluster Department

(Difference between revisions)
Jump to: navigation, search
(Copy Fitz's Intro Directory Into Your MPI_exercises Directory)
(Introduction to Using Earlham's Linux Cluster Computer)
Line 11: Line 11:
If you have any difficulty with the steps outlined here, then please send e-mail to [mailto:ccg@cs.earlham.edu ccg@cs.earlham.edu]
If you have any difficulty with the steps outlined here, then please send e-mail to [mailto:ccg@cs.earlham.edu ccg@cs.earlham.edu]
 +
This intro is adapted from Henry Neeman's ''Intro to Batch Computing" at OSCER. Check out his page [http://www.oscer.ou.edu/education.php here] for more info.
== Log In ==
== Log In ==

Revision as of 15:44, 30 May 2011

Contents

Introduction to Using Earlham's Linux Cluster Computer

This exercise will help you learn to use Al-Salam, the Linux cluster computer administered by the Earlham College Cluster Computing Group (CCG), a speciality group of the Computer Science department.

Actions and commands that you should perform or enter are in the computer boldface font.

After each Unix command you type at the Unix prompt (explained below), press the Enter key.

An account has been set up for you. Your user name is denoted here as yourusername, but may actually be of the form ncsi####, where ### is a specific user ID number. Or, if you’re a permanent CCG user, your user name may be tied to your name; for example, fitz (Andrew Fitz Gibbon).

If you have any difficulty with the steps outlined here, then please send e-mail to ccg@cs.earlham.edu

This intro is adapted from Henry Neeman's Intro to Batch Computing" at OSCER. Check out his page here for more info.

Log In

  1. From the PC of your choice (Windows, MacOS, Linux or whatever), bring up your web browser (Internet Explorer, Firefox, Opera, Safari, Chrome or whatever) and go to:
  2. http://www.oscer.ou.edu/ssh_install.php

    NOTICE the underscore in the URL, between ssh and install.

    Following the instructions on that page, log in to:

    cluster.earlham.edu

  3. Once you log in, you’ll get some text, and then a Unix prompt – but probably not necessarily a percent sign – with the cursor after it, like so:
  4. %
    

    There may be some information before the prompt character, such as the name of the computer that you’ve logged in to (which may be different from cluster.earlham.edu), your user name, and so on. For purposes of these materials, we’ll generally use the percent sign % to indicate the Unix prompt.

  5. cluster.earlham.edu, also called hopper is what's called a login node. It is not a cluster and you should not run things there! Instead, you should actually be on Al-Salam. You can do this by simply entering:
  6. % ssh al-salam
    
  7. Check the lines of text immediately above the Unix prompt.
  8. If there are lines of text that read something like:

    No directory /cluster/home/yourusername! Logging in with home = "/".
    

    then you should log out immediately by entering

    % exit
    

    and then log back in. If you repeatedly have this problem, then please send e-mail to:

    ccg@cs.earlham.edu

  9. Check to be sure that you’re in your home directory (a directory in Unix is like a folder in Windows, and your home directory in Unix is like your desktop in Windows):
  10. % pwd
    /cluster/home/yourusername
    

    This command is short for “Print working directory;” that is, “print the full name of the directory that I’m currently in.” The output is the name of the directory that you’re currently in.

    If your current working directory is just a slash (which means the root directory, which is like C:\ in Windows), rather than something like /cluster/home/yourusername, then you should log out immediately (as above), then log back in.

    If you repeatedly have this problem, then please send e-mail to:

    ccg@cs.earlham.edu

Set Up (First Time Logging In ONLY)

  1. You should immediately do the following to change your password away from the default:
  2. % ldpasswd
    

    It will ask you for your old password (this is the one you used to log in with the first time) and then your new password twice (to make sure you didn't mistype it the first time).

    You ONLY have to do this for the first time logging in.

  3. Please IMMEDIATELY enter the following:
  4. % echo youremailaddress@yourinstitution.edu > ~/.forward
    

    You WON’T have to do this for future logins.

    NOTES

    • You should replace youremailaddress@yourinstitution.edu with your e-mail address.
    • After your e-mail address comes a blank space, then a greater-than symbol, then another blank space, then tilde slash period forwad, with no spaces between them.
    • The Unix copy command is cp.
    • The first filename after cp is the source (the thing that you’re making a copy of); the second is the destination (the name and/or location of the copy).
    • The filename .forward begins with a period (very important). The filename is pronounced “dot forward.”
    • In Unix, filename s are case sensitive, meaning that it matters whether you use upper case (capital) or lower case (small) for each letter in a filename.
    • In Unix, pieces of a filename (or actually of the directory that it’s in) are separated by slashes, NOT by backslashes as in Windows.
    • The symbol ˜ (known as a tilde, pronounced “TILL-duh”) denotes your home directory (another way to denote your home directory is ˜yourusername).
    • The substring ˜fitz means “the home directory of the user named fitz.”
  5. Create a subdirectory of your home directory named MPI_exercises, like so:
  6. % mkdir ~/MPI_exercises
    

    NOTICE: In the subdirectory name MPI_exercises, the MPI MUST BE CAPITALIZED; that is, the directory name is “capital-M capital-P capital-I underscore exercises” with no spaces or other characters in between.

    This command means: “Create a directory named MPI_exercises as a subdirectory inside my home directory” (it’s like creating a new folder named MPI_exercises on your desktop in Windows).

    You WON’T have to do this for future logins.

  7. Confirm that you have successfully created your MPI_exercises directory by listing the contents of the current working directory:
  8. % ls
    MPI_exercises
    

    This command means: “List the names of the files and subdirectories in my current working directory.”

    NOTICE that the command is “ell ess” — that is, small-L small-S — rather than “one ess” and that ls is short for “list.”

  9. Set the permissions on your MPI_exercises directory so that only you can access it:
  10. % chmod u=rwx,go= MPI_exercises
    

    This command means: “Change the mode (list of permissions) on my subdirectory named MPI_exercises so that I (the user) can read files in it, write files in it, and go into (execute) it, but nobody else can.”

    Your MPI_exercises directory is now accessible only to you. The only other people who can access it are the system administrator's (sysadmins for short) of this Linux cluster computer; that is, the Earlham CCG cluster maintainers.

    You WON’T have to do this for future logins.

  11. Log out of the Linux cluster computer by entering the following command:
  12. % exit
    

    Once you have completed the setup steps in this section, you WON’T have to do them again when you log in later.

Copy Fitz's Intro Directory Into Your MPI_exercises Directory

  1. Log in again, and make sure you're on Al-Salam, not the login server:
  2. % hostname
    

    The result of this command should be as0.cluster.earlham.edu. as in this case is short for Al-Salam and the 0 (that's a zero) means it's what we call the head node. You'll learn more about what that means later.

  3. Confirm that you’re in your home directory:
  4. % pwd
    

    /cluster/home/yourusername

  5. Check that you still have a MPI_exercises subdirectory inside your home directory:
  6. % ls
    MPI_exercises
    
  7. Go into your MPI_exercises subdirectory:
  8. % cd MPI_exercises
    

    This command means: “Change the working directory to MPI_exercises, which is a subdirectory of my current working directory.” (This is like double-clicking a folder in Windows.)

  9. Confirm that you’re in your MPI_exercises subdirectory:
  10. % pwd
    /cluster/home/yourusername/MPI_exercises
    
  11. See what files or subdirectories (if any) are in the current working directory:
  12. % ls
    

    You may get no output, just the Unix prompt; if so, that indicates that your current working directory has no files or subdirectories in it.

  13. SIDEBAR: To learn more about a particular Unix command, enter:
  14. % man commandname
    

    for some command. For example, try

    % man chmod
    

    which will give you the online manual page for the chmod command.

    The output of man goes through another command, more, which shows one screenful at a time. To get the next screenful, press the spacebar; to get the next line, press the Enter key. To quit the more command, press the Q key.

  15. Copy the subdirectory named Intro from Fitz’s MPI_exercises directory into your MPI_exercises directory:
  16. % cp -r ~fitz/MPI_exercises/Intro ~/MPI_exercises/
    

    This command means: “Copy the subdirectory named Intro inside the directory named MPI_exercises under the home directory of user fitz into my directory MPI_exercises under my home directory.”

  17. Confirm that the Intro subdirectory was copied into your MPI_exercises directory:
  18. % ls
    Intro
    
  19. Go into your Intro subdirectory:
  20. % cd Intro
    
  21. Confirm that you’re in your Intro subdirectory:
  22. % pwd
    /cluster/home/yourusername/MPI_exercises/Intro
    
  23. See what files or subdirectories (if any) are in the current working directory (Intro):
  24. % ls
    

    C Fortran90

  25. Go into either your C subdirectory or your Fortran90 subdirectory (BUT NOT BOTH):
  26. % cd C
    

    OR

    % cd Fortran90
    
  27. Confirm that you’re in your C or Fortran90 subdirectory:
  28. % pwd
    /cluster/home/yourusername/MPI_exercises/Intro/C
    

    OR the output of the pwd command might be:

    % pwd
    /cluster/home/yourusername/MPI_exercises/Intro/Fortran90
    
  29. See what files or subdirectories (if any) are in the current working directory:
  30. % ls
    makefile my_number.bsub my_number.c my_number_input.txt
    

    OR the source file might be named my_number.f90 instead of my_number.c.

Edit the Batch Script File To Create Your Own Unique Version

  1. Before you can run the original version of the program, you need to modify your copy of the batch script file my_number.qsub to create a version that’s uniquely yours.

Using your preferred Unix text editor (whether nano, pico, vim, vi, emacs or whatever), edit your copy of my_number.qsub.

For example, if you’re using nano, then the edit command would be:

% nano my_number.qsub

This command means: “Edit the text in the file named my_number.qsub that’s in my current working directory, using the text editor program named nano.”

  1. In nano, notice the little help messages at the bottom of the screen:

^G Get Help ^O WriteOut ^R Read File ^Y Prev Pg ^K Cut Text ^C Cur Pos

^X Exit ^J Justify ^W Where is ^V Next Pg ^U UnCut Text ^T To Spell

For example, consider

ˆW Where is

This means that you should press Ctrl-W (the caret ^ indicates the Ctrl key) to search for a particular string of characters.

Another example:

ˆC Cur Pos

This is short for “Cursor Position” and causes nano to tell you what line number the cursor is located at.

Another example:

ˆK Cut Text

This means “delete the line that the cursor is currently on.”

  1. Using the text editor, make the following changes to my_number.qsub:
  1. Everywhere throughout the file, change yourusername to your user name (which might be of the form ncsi####, or perhaps is based on your name). THIS IS EXTREMELY IMPORTANT!
  2. Everywhere throughout the file, change

youremailaddress@yourinstitution.edu

to your full e-mail address. THIS IS EXTREMELY IMPORTANT!

  1. IMPORTANT! Every few minutes while you’re editing, you should save the work that you’ve done so far, in case your work is interrupted by a computer crashing. In nano, enter Ctrl-O (the letter oh), at which point nano will ask you, near the bottom of the screen:

File Name to write : my_number.qsub

That is, nano wants to know what filename to save the edited text into, with a default filename of my_number.qsub). Press Enter to save to the default filename my_number.qsub.

  1. The lines of text in the batch script file my_number.qsub should be less than 80 characters long, and ideally at most 72 characters long. (Your PuTTY window should be 80 characters wide.)
  2. Some text editors, including nano, try to help keep text lines short, by breaking a long line into multiple short lines. For example, nano might break a line like the following into two separate lines:

#PBS –o

/cluster/home/yourusername/MPI_exercises/Intro/C/my_number_%J_stdout.txt

That is, nano automatically puts a carriage return when the line starts getting too long for its taste.

Unfortunately, the batch scheduler (PBS, for Portable Batch System) will consider this to be an error. Why? Because the batch scheduler cannot allow an individual batch directive – that is, a line starting with #PBS – to use more than one line.

For example, the batch script directive above should be on a single line:

#BSUB –o /cluster/home/yourusername/MPI_exercises/Intro/C/my_number_%J_stdout.txt

So, you’ll need to correct any such occurrences.

  1. After you’ve finished editing, go back up to the top of the batch script file, and CAREFULLY READ THE ENTIRE BATCH SCRIPT FILE FROM START TO FINISH. This will give you a much clearer understanding of what batch computing is and how it works.
  2. Understanding batch computing:

As an analogy, imagine that you’re at a football game and you want a drink. You get up and walk to the concession stand. If there are a lot of people at the concession stand, then you’re going to have to wait a while before a server serves you, but if you’re the only person in line, or more generally if there are at least as many servers behind the counter as customers lined up to buy, then you’ll be served quickly.

Batch computing is analogous, except that instead of food and drink, you and the other users want your jobs to be run, and instead of food servers, the servers are computers that can run jobs. Typically, for a production cluster supercomputer, the number of resources requested by the users – that is, total servers requested – is much larger than the number of available resources (servers).

The only way to make this work is for a program known as a scheduler – in this case, LSF (Load Share Facility) – to decide whose jobs run on which servers, and when.

Compare getting food at a football game to getting food at home, where you just walk up to your fridge or cupboard or whatever, and take out what you want. But if you’ve got hundreds of people getting food, that method won’t work: it doesn’t scale to hundreds of people sharing one source of food, because you can’t fit all of them in front of the one fridge; instead, everyone has to wait their turn at the counter, and work with a server to get served.

Likewise with computing: your normal way of interacting with your laptop won’t work when hundreds of people are sharing one source of computing.

  1. After you’ve finished editing and reading the batch script file, exit the text editor.

For example, in nano, enter Ctrl-X. If you have made any changes since the last time you entered Ctrl-O, then nano will ask you, near the bottom of the screen,

Save modified buffer (ANSWERING "No" WILL DESTROY CHANGES)?

To save your most recent changes to the file (which is probably what you want to do), press the Y key; to avoid saving your most recent changes, press the N key.

After that, nano will behave the same as if you had entered Ctrl-O.

Look At, Make (Compile), and Run the Original Version

  1. For your own understanding, look at the contents of the source file:

% cat my_number.c

OR:

% cat my_number.f90

This command means: “Output the contents of the text file named my_number.c (or my_number.f90) to the terminal screen.”

NOTICE that the command to output the contents of a text file to the terminal screen without using the more command is cat, which is short for “concatenate,” a word that means “output one text file after another in sequence.”

The output of the cat command goes to the terminal screen (known as “standard output,” or “standard out” for short, abbreviated stdout), and in this case, we are only concatenating a single text file, so we’re simply outputting the text file’s contents to the terminal screen.

If you’re using PuTTY as your SSH client, and the contents of the file exceeds the height of the PuTTY window, then you can scroll up or down using the scrollbar on the right side of the window; most other SSH clients have similar capability.

  1. For your own understanding, look at the contents of the input file:

% cat my_number_input.txt

  1. For your own understanding, look at the contents of the makefile:

% cat makefile

  1. Make (compile) the executable program for the original version of my_number.c (or my_number.f90):

% make my_number

gcc -O -c my_number.c

gcc -O -o my_number my_number.o

(It could be the case that the compiler is gfortran and the source file is my_number.f90.)

NOTICE:

-o my_number

indicates that my_number is to be the name of the executable.

If that option had been left out, then by default the name of the executable would be a.out (“the output of the assembler”), WHICH WOULD BE BAD, because then the executable’s filename wouldn’t explain the executable’s purpose.


  1. Submit the batch script file my_number.bsub to the batch scheduler:

% qsub my_number.qsub

You should get back output something like this:

######.as0.al-salam.loc

where ###### is replaced by the batch job ID for the batch job that you’ve just submitted.

  1. Check the status of your batch job:

% qstat

You’ll get one of the following outputs, either nothing at all—that is, the Unix prompt immediately returns (if you get this right after the qstat command, try it several more times, because sometimes there’s a pause just before the batch job starts showing up, as below)—OR:

Job id Name User Time Use S Queue

------------------------- ---------------- --------------- -------- - -----

######.as0 my_number yourusername 0 Q ec

where ###### is replaced by a batch job ID number, and yourusername is replaced by your user name, and where Q is short for “queued,” meaning that your job is waiting to start,

OR:

Job id Name User Time Use S Queue

------------------------- ---------------- --------------- -------- - -----

######.as0 my_number yourusername 00:00:05 R ec

  1. You may need to check the status of your batch job repeatedly, using the qstat command, until it runs to completion. This may take several minutes (occasionally much longer).

You’ll know that the batch job has finished running when it no longer appears in the list of your batch jobs, or it simply returns nothing.

  1. Once your job has finished running, ?nd the standard output and standard error files from your job:

% ls -ltr

NOTICE that the command is “ell ess space hyphen ell tee are” – that is, small-L small-S blank hyphen small-L small-T small-R – rather than “one ess” or “one tee are” and that ls is short for “list” and -ltr is short for “long detailed listing, sorted by time of most recent modification, in reverse order so that the most recently modified file is at the bottom.”

Using this command, you should see files named

my_number_######_stdout.txt

and

my_number_######_stderr.txt

(where ###### is replaced by the batch job ID).

These files should contain the output of my_number. Ideally, the file length of my_number_######_stderr.txt should be zero.


  1. Look at the contents of the standard output file:

% cat my_number_######_stdout.txt

(where ###### is replaced by the batch job ID).

You may want to look at the stderr file as well:

% cat my_number_######_stdout.txt

  1. If this run had ANY problems, then send e-mail to:

ccg@cs.earlham.edu

which reaches all OSCER staff (including Henry), and attach the following files:

makefile

my_number.c

my_number.bsub

my_number_######_stdout.txt

my_number_######_stderr.txt

10. Congratulations! You’ve just run your first batch job. Now continue on.

Edit the C Source File To Create Your Own Unique Version

  1. Now that you’ve run the original version of the program, it’s time to modify your copy of the source file my_number.c (or my_number.f90) to create a version that’s uniquely yours.

Using your preferred Unix text editor (whether nano, pico, vim, vi, emacs or whatever), edit your copy of my_number.c (or my_number.f90).

For example, if you’re using nano, then the edit command would be:

% nano my_number.c

OR

% nano my_number.f90

  1. Using the text editor, make the following changes to my_number.c (or my_number.f90):
    1. In the declaration section, change the constant values assigned to minimum_number, maximum_number, close_distance and computers_number.

You may select any integer values you want, which must be different from 1, 5, 10 and 1 respectively, and minimum_number < computers_number < maximum_number, and they are sufficiently spread out that you can actually do the runs properly.

DON’T CHANGE THE VALUE OF maximum_guesses!!!

    1. In the execution section (also known as the body of the program), change the following sequences of character text to your own words:
    1. Hey!
    2. That’s amazing!
    3. Close, but no cigar.
    4. Bzzzt! Not even close.
  1. Every few minutes while you’re editing, you should save the work that you’ve done so far, in case your work is interrupted by a computer crashing.

In nano, enter Ctrl-O (the letter oh), at which point nano will ask you, near the bottom of the screen:

File Name to Write [Backup]: my_number.c

That is, nano wants to know what filename to save the edited text into, with a default filename of my_number.c (or my_number.f90). Press Enter to save to the default filename my_number.c (or my_number.f90).

  1. A character string literal constant, also known as a character string literal or a string literal for short, is a sequence of characters between a pair of double quotes.

For example, in the C printf statement

printf("This is a printf statement.\n");

the following is a string literal:

"This is a printf statement.\n"

Likewise, in the Fortran90 PRINT statement

PRINT *, "This is a PRINT statement."

the following is a string literal:

"This is a PRINT statement."

We say that the pair of double quotes delimits the sequence of characters in the string literal. Note that, in C, the \n at the end of the string literal tells the program to output a carriage return (also known as a newline) at the end of the line of output text. (In Fortran90, the carriage return is implied by the end of the PRINT statement.)


  1. The lines of text in the source file my_number.c (or my_number.f90) should be less than 80 characters long, and ideally no more than 72 characters long. (Your PuTTY window should be 80 characters wide.)
  2. Some text editors, including nano, try to help keep text lines short, by breaking a long line into multiple short lines. For example, nano might break a line like:

printf("This is a long line and nano will probably break part of it off.\n");

into two separate lines:

printf("This is a long line and nano will probably break part

of it off.\n");

That is, nano automatically puts a carriage return when the line starts getting too long for nano’s taste.

Unfortunately, the C compiler (or the Fortran90 compiler) will consider this to be an error. Why? Because C (or Fortran90) cannot allow an individual string literal to use more than one line (in Fortran90, there’s a goofy way to do it, but it’s bad practice). So, the correct way to write the above example is:

printf("This is a long line and nano will probably");

printf(" break part of it off.\n");

OR:

PRINT *, "This is a long line and nano will probably"

PRINT *, " break part of it off."

  1. Like the lines of source text, the lines of output text should be less than 80 characters long, and ideally no more than 72 characters long. You can break a long line of output text into shorter pieces by making it into two printf (or PRINT) statements.

For example:

printf("Why you big old stinker! That’s not between %d and %d!\n",

minimum_number, maximum_number);

This single printf statement can be converted into two printf statements, like so:

printf("Why you big old stinker! That’s not between\n");

printf(" %d and %d!\n", minimum_number, maximum_number);

OR:

PRINT *, "Why you big old stinker! That’s not between"

PRINT *, minimum_number, " and ", maximum_number, "!"

  1. After you’ve finished editing, exit the text editor. For example, in nano, enter Ctrl-X.

If you have made any changes since the last time you entered Ctrl-O, then nano will ask you, near the bottom of the screen:

Save modified buffer (ANSWERING "No" WILL DESTROY CHANGES)?

To save your most recent changes to the file (which is probably what you want to do), press the Y key; to avoid saving your most recent changes, press the N key. After that, nano will behave the same as if you had entered Ctrl-O.

  1. Edit the input file my_number_input.txt to replace the original input values with input values relevant to your new unique version of the program, speci?cally:
    1. an integer value less than your value for minimum_number;
    2. an integer value greater than your value for maximum_number;
    3. an integer value between your value for minimum_number and your value for maximum_number (inclusive), but far from your value for computers_number;
    4. an integer value close to your value for computers_number (that is, within your value for close_distance of your value for computers_number);
    5. your value for computers_number.

Make (Compile), Run and Debug Your Own Unique Version

  1. Make (compile) your own unique version of the executable program:

% make my_number

gcc -O -c my_number.c

gcc -O -o my_number my_number.o

(It could be the case that the compiler is gfortran and the source file is my_number.f90.)

  1. If the program doesn’t compile, then you’ll need to edit it and figure out where things went wrong. In the worst case, if you’re totally stumped, then copy the original from Henry’s directory again, and start editing from the beginning.
  2. Repeat the instructions in section V, items 5-9, above.
  3. Congratulations! You’ve just completed this exercise.
Personal tools
Namespaces
Variants
Actions
websites
wiki
this semester
Toolbox