The Daily Grind:
“Okay, now why is this happening?”
You deal with pop-up problems all day; user complaints, crashed nodes, app failures. Are the nodes and the OS properly configured, or is PerfMiner reporting them in a separate configuration cluster? Did it change recently? Same question for software versions and code libraries or modules.
- What was running on that node before it crashed? What was it doing?
- Which resources are under contention?
- Which of your users are running the least efficient code? Is that normal?
- Which developers are producing the least efficient code? Is that necessary?
- Are your batch systems and CPUs oversubscribed?