# Xsede13-prog-contest

### From Earlham Cluster Department

Amweeden06 (Talk | contribs) |
Amweeden06 (Talk | contribs) |
||

Line 224: | Line 224: | ||

Profile the code for different values of n, and discuss the “most efficient” way | Profile the code for different values of n, and discuss the “most efficient” way | ||

to multiply two matrices. | to multiply two matrices. | ||

+ | |||

+ | =Evaluation Criteria / Rubric= | ||

+ | All solutions must be able to run on the LittleFe platform using existing software on the Bootable Cluster CD (BCCD). They must be placed on the provided 4GB USB stick with legible instructions provided to the graders for running each solution, and a lab notebook (series of text files) describing the work done to produce the solution. No software engineering or runtime points will be awarded for un-runnable solutions. USB sticks with solutions are due in the graders’ hands when the contest room clock reads 3:59pm PDT. | ||

+ | |||

+ | ==Good software engineering (40%)== | ||

+ | Clear, concise code written in one of the following programming languages: C, C++, FORTRAN, Perl, Python, and/or Java. (10%) | ||

+ | Good commenting, docs (including clear instructions on how to run the program to generate a solution), and shared libraries implementing common parts of the solutions (15%) | ||

+ | Solution correctness and/or approach towards solution (is the solution/approach correct, and does it solve the correct problem?) (15%) | ||

+ | |||

+ | ==Runtime (30%)== | ||

+ | The team must time the program using the Linux time command, including these times and the timing command clearly in their documentation. Judges will corroborate these run-times (10%). | ||

+ | An explanation of the run-time of each component of the program should be provided. Points will be awarded for an explanation of how to improve the run-time of each component (10%). | ||

+ | An explanation of how the program is or could be made parallel should be included (10%). | ||

+ | |||

+ | ==Analysis of results (30%)== | ||

+ | An analysis of results and algorithm used should be included, which can include visualization (20%). | ||

+ | An explanation of the speedup and scaling of the parallel program should be included. If no parallel implementation is provided, there should instead be an explanation of the theoretical speedup and scaling (10%). | ||

+ | |||

+ | ==Parallel implementation bonus (10%)== | ||

+ | Bonus points will be awarded for an actual, working parallel implementation that uses MPI, OpenMP, pthreads, and/or CUDA. |

## Revision as of 00:03, 24 July 2013

https://www.xsede.org/web/xsede13/student-programming-competition

## Contents |

# Problem Descriptions

## 1. Area under the curve

Description: Scientists need to calculate the area under a curve in many real-world applications, from determining the acceleration of an object in physics to modeling the change in concentration of a drug in pharmacokinetics. Calculus-based approaches work for calculating the area under simple curves, but for more complex curves, the best method is often to apply a discrete approximation using simple shapes and limits.

Problem: Given the function ln(ln x), approximate the area under the curve from x = 3 to 3000.

## 2. Prime number generation

Description: Because prime numbers have a random (as far as we know) distribution across the number line, generating them can be a very useful task for introducing randomness to a scientific model. A famous scientific application of prime numbers is in cryptography, particularly in the RSA encryption algorithm, which uses pairs of prime numbers as keys to an encoded message. Generating prime numbers that are large enough to be used as keys is a computationally-intensive task.

Problem: Implement a program that generates all of the primes under 10,000,000.

## 3. Cellular automata

Description: A number of interesting scientific models use cellular automata to represent real-world situations. A cellular automaton is a grid of cells, each of which has a state that is often determined by the states of its neighbor cells. A famous example of a cellular automaton is Conway’s Game of Life, but there are many examples elsewhere in science and math.

Problem: Develop a cellular automaton model of the spread of a fire through a
forest. The forest consists of a 2-D grid of trees n trees x n trees (known as
the forest size), with only the center tree initially on fire. A tree burns
completely in one step of the simulation (known as the burn cycle). When an
unburned tree is next to a burning tree, there is percent chance (known as burn
rate) of it catching on fire in the next step of the simulation. The simulation
runs till no trees are burning. A series of simulations (known as the
aggregation amount) are made with the same burn rate to determine the average
percent of burned trees and the average number of burn cycles. Forest size,
burn rate, aggregation amount, and rate increment should be command line
arguments. Produce 132 results using forest sizes of 11, 101, 1001, and 10001;
burn rates of 0, 10, 20, … 100; and aggregation amounts of 1, 10, and 100.

## 4. N-body simulation

Description: In physics, the 2-body problem explores how two physical bodies with certain masses interact with each other through gravitational force. The 2-body problem has been solved, but extending the problem to a 3-body problem by adding just 1 more object makes the problem so chaotic that it has only been solved for special cases. In general, if scientists want to study the n-body problem, in which an arbitrary number of bodies interact with each other, they will often use computer models and simulations.

Problem: Create an n-body simulation involving 100,000 bodies, each with uniform
mass, each interacting with all the other bodies through simple Newtonian
gravitation, and run the simulation for 100 time steps.

## 5. Binary tree traversal

Description: In computer science, trees are a common data structure for storing and processing data. A tree is composed of nodes, each of which point to one or more child nodes. There is one node to rule them all, the root node. This is the first node in the tree, and the only node to which other nodes do not point. In addition to pointing to other nodes, each node will have a value (the meaning of which is dependent on the application). Finally, a tree has no loops (or cycles); that is, processing a tree from top to bottom will visit each node either 0 or 1 times.

A tree can be traversed using several different methods. The two most applicable here are in-order depth-first (visit all left children, then the current node, then all right children); and breadth-first (search each level left-to-right before moving to the next level down).

A binary search tree is a tree with special properties. First, each node will point to only 0, 1, or 2 other nodes. Second, the tree is structured such that the value of all of each node’s left children have values (whatever that may be) less than the node itself; and the value of all of each node’s right children have values greater than the node itself. Finally, each value in the tree must be unique.

Problem: You will read in a file “rand_nums.txt” containing 10000000
newline-delimited integers, which will be in no particular order. You must write
a program that will read in these 10000000 integers, sort them into a binary
search tree, and then perform some operations on them:

- Find all the numbers with a divisor of 11.
- Find the shortest-path between the node with value 797917900 and value

210714487.

## 6. Agent-based simulation

Description: An agent-based simulation involves many individual things (“agents”) moving around in an environment, and is useful in many areas of science including biology (spread of disease), chemistry (motion of particles in an ideal gas), and physics (n-body gravitation).

Problem: There is a neurotoxin outbreak in an underground laboratory. The
scientists inside are completely unprepared for this type of epidemic so their
best bet is to get out as fast as possible. The only way out of the lab is
through the escape elevators (they’re too high-tech for stairs). This will
involve random walking (or running in this case), probability, and spread of
poisonous gas.

A rectangular grid, 800mx900m, represents the underground laboratory, with each coordinate one meter on a side. The lab has an open floor plan with no walls or other obstructions. There are 3 elevators that are randomly placed around the edges of the lab. Each scientist is represented by a single pixel/variable/agent and reacts once per one second time-step. They move around randomly at a rate of one meter per second in panic, always in one of the cardinal directions (up, down, left, right). Any number of scientists can coexist within one cell and do not interact with each other. In a tragic accident, a neurotoxin is released in the exact center of the lab. A set of counters will count three things: the number of scientists that die, the number that safely escape, and the amount of time the entire epidemic takes (the simulation should finish when the lab contains no more scientists). Neurotoxin can kill an individual completely in exactly 120 time steps (two minutes). The population of scientists in the laboratory is 700 (but this is a changeable parameter). What is the experimental probability of all of the scientists in the lab being able to escape to an elevator before death?

## 7. Monte Carlo

Description: Monte Carlo simulations are useful where a model does not have a deterministic result. Instead, the model is run many times (this is sometimes called an ensemble) and the results in aggregate are used to produce a probability distribution of expected outcomes.

Problem: The dart game “301” divides a dartboard into 20 wedges (each wedge with
a different point value), a bulls-eye in the middle worth double points, an
inner ring worth triple points, and an outer ring worth double points. From the
top going clockwise, the wedge point values are 20, 1, 18, 4, 13, 6, 10, 15, 2,
17, 3, 19, 7, 16, 8, 11, 14, 19, 9, 12, and 5.

The game play involves each player getting three throws per turn. Each player starts at 301 points, and each time the player hits the board, the points scored are subtracted from their current score (for instance, hitting 6 on the triple-score ring would subtract 18 points). The first player to reach 0 points wins. A common hint is to throw at 1, with the assumption that if you miss you are likely to land on 20 or 18.

How would you verify this? One idea would be to write a program that simulates dart throws. A dart throw can be thought of as an aim point, a random direction from that aim point, and a distance from that aim point. If one runs this lots of times, one could verify this assumption.

## 8. Logistics

Description: The problem comes from elementary math education, particularly fraction addition. A separate file, not given here, lists 2561 fraction addition exercises. Each of the numbers in this file refer to one of those exercises. Each row in this file represents a well-rounded set of ten exercises. The challenge here is to find how many such sets of exercises we can give to students without assigning the same exercise twice.

Problem: The data set live in wordsEn.txt and has 390945 lines of ten numbers
each.

Your goal is to find the largest number of rows, call it n, that can be collected in a set where none of the 10*n elements repeat.

The worst brute force way to search is to test 390945 choose n combinations for increasing values of n until a complete search fails.

Other terrible brute force methods do not take into account some elements appearing in over half of the rows.

Profile the data, and think about separating the set into smaller subsets. Consider eliminating vast subsets. Try different approaches, justifying your choices in your engineering journal.

## 9. Computational Linguistics

Description: The NPR Sunday Puzzles are a rich collection of problems with anagrams and combinations of letters. Tools for genome sequencing are designed to work with the letters A, C, G, and T, but can be used for any letters. PERL is specially designed for these tasks, but any language will suffice. The question will be an expansion of one of the Sunday Puzzles.

Problem: Take the four words “salt,” “afar,” “lava,” and “trap.” Write them one
under the other, and the words will read the same vertically as horizontally.
This is a word square of four-letter words. Using the given file of 109,582
English words, what is the largest word square you can construct?

## 10. Matrix-Matrix Multiply

Description: Matrix-Matrix Multiplication has many applications in computationally-enabled science. Most applications are for sparse matrices, where most of the elements are zero, especially sparse matrices with some structure or pattern, but this exercise deals with dense matrices. Dense matrix-matrix multiplication of square matrices of size n x n is generally an O(n^3) operation. Multiplication of dense matrices is an “embarrassingly parallelizable” problem.

Problem:
0. If your team does not understand the basic algorithm for matrix-matrix
multiplication, just ask and we’ll walk you through it with no penalty. Also,
if you do not understand the jargon, “profile,” as used below, just ask.

1. Write serial code for randomly generating two n x n integer square matrices and multiplying them. Profile the run time for increasing values of n until you get interesting behavior. Explain that behavior in terms of the sizes of the different levels of memory on the chip.

2. Modify the code to first find the transpose of the second matrix so the memory fetching happens in a different order. Profile for different values of n and discuss when this approach is faster.

3. Parallelize the multiplication. You may choose to do any or all of these:

Parallelize on the head node using shared memory over the two cores Parallelize on the twelve CPU cores using distributed memory. Parallelize over the six nodes using distributed memory, and over the two cores on each node using shared memory. Parallelize on one GPU. Parallelize over the six GPU’s. Use all twelve CPU cores and six GPU cores most efficiently.

Profile the code for different values of n, and discuss the “most efficient” way to multiply two matrices.

# Evaluation Criteria / Rubric

All solutions must be able to run on the LittleFe platform using existing software on the Bootable Cluster CD (BCCD). They must be placed on the provided 4GB USB stick with legible instructions provided to the graders for running each solution, and a lab notebook (series of text files) describing the work done to produce the solution. No software engineering or runtime points will be awarded for un-runnable solutions. USB sticks with solutions are due in the graders’ hands when the contest room clock reads 3:59pm PDT.

## Good software engineering (40%)

Clear, concise code written in one of the following programming languages: C, C++, FORTRAN, Perl, Python, and/or Java. (10%) Good commenting, docs (including clear instructions on how to run the program to generate a solution), and shared libraries implementing common parts of the solutions (15%) Solution correctness and/or approach towards solution (is the solution/approach correct, and does it solve the correct problem?) (15%)

## Runtime (30%)

The team must time the program using the Linux time command, including these times and the timing command clearly in their documentation. Judges will corroborate these run-times (10%). An explanation of the run-time of each component of the program should be provided. Points will be awarded for an explanation of how to improve the run-time of each component (10%). An explanation of how the program is or could be made parallel should be included (10%).

## Analysis of results (30%)

An analysis of results and algorithm used should be included, which can include visualization (20%). An explanation of the speedup and scaling of the parallel program should be included. If no parallel implementation is provided, there should instead be an explanation of the theoretical speedup and scaling (10%).

## Parallel implementation bonus (10%)

Bonus points will be awarded for an actual, working parallel implementation that uses MPI, OpenMP, pthreads, and/or CUDA.