Opened 11 years ago

Closed 11 years ago

#495 closed defect (fixed)

Simultaneous exec of snarfhosts during startx

Reported by: fitz Owned by: fitz
Priority: major Milestone:
Component: Both Version:
Keywords: Cc:
Blocked By: Blocking:
Estimated Hours: 0 Total Hours: 0

Description

Since startx opens two shells, we have two shells simultaneously loading the openmpi module, and therefore two instances of bccd-snarfhosts running during startx.

This might periodically result in seeing something like this:

  $ cat machines
  node000.bccd.net slots=1
  node000.bccd.net slots=1
  node009.bccd.net slots=1
  node009.bccd.net slots=1

I'll run some tests to see if I can easily reproduce it.

Change History (3)

comment:1 Changed 11 years ago by fitz

#!/bin/bash

for i in {1..10}; do 
  echo "---> Run $i"
  startx &> /dev/null &
  sleep 10 
  killall fluxbox
  sleep 5
  cat machines
done

This script shows that approximately 8 times out of 10, we get an inconsistent machines file.

comment:2 Changed 11 years ago by fitz

  • Owner set to fitz
  • Status changed from new to assigned
  • Summary changed from Possible race condition in startx to Simultaneous exec of snarfhosts during startx

Confirmed (with above script) that this happens extremely frequently with the current _latest.

comment:3 Changed 11 years ago by fitz

  • Resolution set to fixed
  • Status changed from assigned to closed

r2330 adds file locking to snarfhosts. Over 200 trial runs with 6 snarfhosts starting simultaneously on a dual-core node, 0 inconsistent machines files were generated by this version.

Note: See TracTickets for help on using tickets.