I've put together a cheat sheet to show how you might want to initially configure your Nagios load checks. The thinking behind these initial values is set out in Tuning Nagios Load Checks.
Use | OS | Cores | Warning | Critical | Notes |
---|---|---|---|---|---|
CMS (Teamsite) | Solaris | 1 | 10,7,5 | 20,15,10 | Testing shows this app to be responsive up until these loads. |
Web Server | Linux | 2 x 4 | 16,10,4 | 32,24,20 | Web servers are paired, so want to know if reaching 50% capacity regularly. Testing shows performance degradation from a load of 20. |
DB Server | Linux | 2 x 4 | 16,10,4 | 32,24,20 | Same hardware, different use. Nevertheless, using same thresholds. |
Nagios | Linux | 1 x 2 | 6,4,2 | 12,10,7 | Small box, paired with backup. |
General notes:
- The UNIX servers (particularly the Sun SPARC ones) seem to be able to stay up and responsive even under heavy load. And they don't count processes waiting for I/O in their load counts the way Linux does. I have no explanation for this. :-)
- We track these loads over time to predict demand growth for capacity planning -- the thresholds are not a long term goal but rather a short term alert threshold.
- Transaction or revenue-earning web servers might have lower thresholds because of the different commercial implications of performance degradation. YMMV.
For more information on the Nagios check_load command, see Tuning Nagios Load Checks.
No comments:
Post a Comment