Winter 2000 IRAM Retreat Feedback
=================================

Harvey Stiegler (TI)
- collect presentations and post them on the web
  - in PowerPoint, hopefully also PDF and HTML
- [Tetzlaff: keep it up between retreats]
- AME
  - benchmarks define the field == what gets measured gets done
  - bencmarks will define what you look at
  - don't do marketing in your own building
  - go talk to outside customers, find out what they want you to
  measure; it will determine what you base your research on later
- Do the projects overlap
  - more synergy here than people think
  - draw a diagram of how IRAM, ISTORE, OceanStore relate (for
  outsiders)
- IRAM apps
  - PDA, multi-node, SmartSIMM
  - glad Jim is proposing to build all of them
  - probably design is not optimized for all of those
  - think about what would change if optimized for each
- Current IRAM power estimate?
- IRAM Testing
  - really important to do coverage analysis
  - have you exercised all design features, gates
  - want close to 100% with your test sequences
- FFT
  - chart shows architecture of VIRAM is approx. comparable to modern
  DSPs; you're really comparing the architectures, but doesn't take
  into account advantage of having the big memory on-board
  - published FFT numbers from company assume all data stored in mem,
  but they have less mem than you
  - if you start making FFTs bigger or have apps that want to do many
  of them, or want to do 2D FFT, when does that break a conventional 
  DSP but IRAM continues OK
    - corrolary: on what apps is that impt
  - advantage of big on-chip mem is not showing up here

Konrad Lai (Intel)
- last time here 2 years ago, a lot is new
- IRAM
  - happy to see settled down on some features, near tape-out
  - glad you dropped a lot of features
  - for next 6 months, testing is important
  - also full-chip models
  - need to test using JTAG; make sure JTAG, other debugging support
  works
  - think about packaging
  - need to demonstrate advantage of on-chip RAM
    - could use do same using DSP + integrated mem
  - perhaps compare using clock cycles instead of absolute time
- ISTORE
  - getting interesting
  - need to look at how other people run same projects
  - building hw took a lot of (too much) time
    - except for DP, you're building a $400 PC
    - ability to inject faults is important, though
    - walk through qualitatively; thinking process more important than
    building it; can find software mechanism to do equivalent
    - you may have forgotten some stuff, like battery sensor
  - s/w management
    - supercomputer people all have emergency management port on Linux
    clusters
    - major issue is lack of s/w for how to use it
    - VA-Linux has a project in that area
      - 256 or 512-node cluster sold to Argonne National Lab
    - cheaper way to get into problem earlier
      - Dave P: We're using ISTORE-0
   - focus on maintainenance is interesting
     - people trying to do OLTP on PCs (e.g. CMU)
     - more interesting to do mgmt research
     - Net-PC stuff is related
       - e.g. control booting through ethernet card
       - LanDesk, Tivoli, etc.
- OceanStore
  - very interesting
  - sounds like lot of progress in only 6 months

Bill Bolosky (Microsoft)
- IRAM
  - Speech recognition
    - if you'd build it, I'd buy it
- ISTORE
  - sounded like solution looking for a problem
  - sounds like you've iterated over apps and designs last couple of
  years
  - what is it -- need an answer to sell the thing, explain it to
  people
  - design principles are right on
  - history is evolution: HW provide more perf, SW people build more
  features (spend perf to produce features)
    - features we need to spend perf on now are to make the system
    *work*
    - working correctly is a feature
    - tradeoff perf, functionality, ship date
    - spend development time, money, cycles on making things easy to
    use and maintain
  - AME benchmarks
    - response of system to events measurement good, but what is
    frequency of the events
    - concrete suggestion: why do computer systems fail, redo that
    paper (Jim Gray paper)
	- today: web server, big DBs, workstations
	- figure out why stuff stops today
	- also is essential input to your simulations
	- I tink you'll find people mismanaging computers is main
	problem
		- mismanagement => lack of availability
    - go from AME to MAE - make manageability go first
    - find friendly people who run these systems
        - e.g. talk to Brewster
	- Hotmail, Amazon, ...
	- Dave: why ISPs stop (e.g. Inktomi)
	  - Konrad: difficult to generalize since lots of custom stuff
	  at ISPs
	  - Bill: real world is heterogenous
	  - have neutral party coordinate (auditing/accounting firm)
- OceanStore
  - giant vision of what you're going to do, impossible to solve in
  total
  - need to narrow your focus
    - what pieces are you going to do
    - security and denial of service fascinating, getting reliability
    out of unreliable parts
  - claiming to run on a truly impressive scale
    - everyone on planet has 10k files => 10^14 files
    - right now mean file size at MS is 32-64K
      - push to 100K you get 10M TB before replicate
  - Kubi: what's most important?
    - you have to get them all right
    - geographic scale stuff, hetero network probably most impt
    - must have introspection work right in this env
    - Bloom Filter work was way off-scale
      - extraordinary claims require extraordinary proof
  - need to weaken file consistency model (like web)
  - they have Sigmetrics paper on file size, etc. distribution

Brian Hold (Micron)
  - IRAM
    - The chip size might be a problem when we put it on a wafer
      (Response from Steve: IBM has said they'll fab this.)
    - Cost: about $1M to fab (if we were to pay for it).  Could 
       be up in the $10M range depending on size.  Retooling and
       engineering time are expensive: piggybacking on an existing
       project would be a bit savings. 
    - The trick to piggybacking is getting the chip in some 
       normal size.  (Dave responsed: the is more like an ASIC
       fab, which is why they are willing to do this specialized
       kind of run.)
    - Brian is working on an "Active Memory" project at Micron
       that is very similar to IRAM.
  - ISTORE
    - Look at MS specs for Wired for Management (WfM)
      - for ideas on how to make the data useful (flight recorder
      data)
      - WfM is tool that makes it useful
	- sits in BIOS of PC, so net manager can monitor PC
	- not on scale you're talking about, but is corporation-wide
	type of thing
	- to give you ideas on how and what to do with info you're
	collecting
	- generate red flags for apps; you're managing a network
  - OceanStore
    - I agree with Bill on OceanStore; should be more like LakeStore
      - address at corporation level first
    - Can you convince people to put their confidential data there?
      - corporate intellectual property?
      - someone else handling the hardware spooks Fortune-500
    - Ric Wheeler: mix of AOL servers and laptops is interesting model
    too [Great Lakes Store?]

Bill Tetzlaff (IBM)
  - IRAM hardware
    - things progressed slowly since last summer
    - never mentioned IBM and no IBM people here
      - should be more IBM people involved
  - ISTORE hardware
    - wish I heard more from IBM on that (disks coming from IBM)
    - Diagnostic Processor a good idea
      - we've done it with mainframe
      - dual-processor with hot standby and journaling state to other
      node
      - I'll get open literature references to this stuff; send me
      email
   - Aaron's stuff interesting; like idea of finding out how often
   these things really happen
      - Run a workshop, invite people to come talk about their problems
   - Objectives and principles
     - less dismayed than Bill
     - looks like a rethink from broad/global
     - not far enough along to know what doing next, but good to be
     going through the exercise
  - OceanStore
    - I like the enormous vision
    - biggest challenge is to cope with the scale
    - disappointed with IBM participation; probably my fault

Ric Wheeler (EMC)
- great few days, learned a lot
- great to see something vertical HW -> OS -> new apps
  - like what real systems companies do
  - e.g. EMC
  - good experience for grad students
- enjoyed the AME Benchmarking talk; great way to look at systems
- we have lots of failure data, can't tell you, but some of our
customers might be willing to share it with you
- I'd like to give a talk/write a paper on how enterprise storage
differs from the kind of research academics do; you come out here too
- can get data from big customers of system vendors (e.g. AOL)
- our experience has been you have to use every kind of redundant
hardware you can, because when you need it it won't work; so use
everything all the time, make sure you can get by with less of it
	   - everyone doing work, but you can get by with half
- modeling things like 2x or 3x mirroring good; also steep cost factor
- reiterate importance of configure, administer big boxes
  - Brewster's NFS mount points hellish to use
  - need lots of tools
  - Beowulf cluster people have such tools
  - VA-Linux 24 quad Xeons; important headless BIOS boot but working
  on stuff but not there yet; think about how install, update, rev
  software

Jim Kohn (SGI)
- IRAM
  - interested in s/w, perf issues
  - IRAM hw
    - pleased you're still planning to tape out this year
    - dropping things is fine; seems you've made reasonable tradeoffs
    - dismayed that scalar processor has changed at this late date
    - change towards more simple, classic processor
    - scalar unit raises yellow flags if target is high-perf vector processor
  - mem consistency model
    - could dramatically impact vector perf, particularly high-evel
   code with conventions
    - but you have direct control of your app space, so should be able
    to impact choices of mem consistency model, etc.
  - IRAM SW
    - good you have a working processor
    - wouldn't rush into tuning right away; might be opportunity to
    provide flexibility in the what-if area
    - at Cray we're trying to provide a little flexibility in s/w
    conventions in case
    - recommend collaboration with us (we're at finalization stage)
    - focus on memory consistency software model (already well defined
    in hw)
       - need sw model that delivers high perf
       - capture key kernels as libraries to use in many apps
  - expectations
    - as you tune perf, do some eval. of key IRAM applications to see
    how perf breaks out -- what was root cause
  - collaborate on SW conventions, start up a library