APM provides many ready to use templates out of the box. However in many cases, SolarWinds did not have a chance to determine recommended threshold values. There are several reasons for this:
- The APM monitor is unique and there is no published documentation describing some particular counters or operations.
- The APM monitor does some specific things and the result (statistic data) could be varied in different environments.
- The APM monitor executes customer-defined scripts, so again, the result is unpredictable.
In these situations, only an experimental way to define threshold values can be used, and APM has built in facilities for this approach. The main idea is to use historical statistic data charts to try and find out suitable threshold values for your specific environment in an experimental way.
1. You must choose a desired time interval for which statistics data will be considered as a baseline. This time interval should be long enough to get correct min/max values for the monitor being investigated. We recommend a one week time interval (“Last 7 Days”), which should be enough for most cases.
2. The APM application/component (target system) under test must be considered as stable for the desired time period. So you must be sure that the target system operates normally and provides valid statistic results for the entire time interval selected.
Method for Calculating Thresholds:
1. For counters that should be as low as possible:
- Warning: should be the 95th percentile and above (which you can read from the MIN/MAX Average Statistic Data historical charts, as described below)
- Critical: should be the highest sampled piece of data
2. For counters that should be as high as possible:
- Warning: should be the 5th percentile (you should calculate this setting manually as described in the example below) and lower
- Critical: should be the lowest sampled piece of data
Use Case Example:
Consider the following Performance Counter monitor example. Every Windows Performance Counter monitor (for example, from the Active Directory and IIS templates) has its own “MIN/MAX Average Statistic Data” chart. You can find this chart on the Component Details view. This chart is configurable and you can change the time interval by clicking EDIT in the chart and setting the Time Period for the chart to Last 7 Days, as shown in the following screen:
You can also specify a sample interval to specify how often samples are collected. When you are finished editing the chart, click SUBMIT.
After the chart is configured, and you have waited some period of time for the data to be collected, the graph may look similar to the following. In this example, statistics from the counter “LDAP Client Sessions” that have been monitored for some time are displayed. In this case the monitoring is for TODAY, however it should really be monitored for a week to get more accurate results.
Note that the red arrows mark the information about the 95th percentile for the chart. The top arrow marks a line on the chart that has the name “95th”. This line shows the statistic value that represents 95% of the maximum on a graph. If you follow the line across the graph to the left axis, you can see that the value for the 95th percentile on the graph is 39.00. Underneath the graph, the bottom arrow marks the caption: “95th Percentile: 39”, which conveniently lists the 95th Percentile information, so that you do not need to read it from the chart. From the chart you can see the variation in the samples collected over each interval. The blue line on the chart shows the average of the values read.
Using the 95th Percentile information to calculate your thresholds
For counters that should remain as low as possible:
- Find the 95th percentile infomation. In our example above, this is 39.
- Since counters should remain as low as possible, you set the thresholds as described above:
- Warning: should be the 95th percentile and above (which you can read from the chart and is indicated by the 95th percentile line as 39).
- Critical: should be the highest sampled piece of data (which you can read from the chart as 41 and is indicated by the two highest bars).
For counters that should remain as high as possible, you will need to perform some additional manual calculations to get the 5th percentile line value, because APM charts don’t calculate and display this line. To do this, perform the following:
- Open the extended chart view by clicking the desired chart in the application/component details view:
- Click Raw Data to view the actual chart data in Excel format:
- Next, data should be sorted in ascending order, since you are going to calculate the 5th percentile line. Click the sort ascending (A->Z) button that is highlighted near the top of the screenshot above to sort the data in ascending order:
- See the following article "Understanding 95th Percentile Calculations" for supporting information about calculating the percentile values:
- Following this technique, we calculate 5% of the values to drop: 29 values * 5% = 1.45, which we round to 2 values to drop. That leaves the third value 3799 as the value for the 5th percentile. So, the warning threshold is the 5th percentile line value (3799).
- The critical threshold is the minimum available value, which you can find in row 2 near the top of the spreadsheet (2979).
These threshold settings will warn the user when the counter goes below the 5th percentile, and will go critical when it goes below the MIN value ever previously seen in the selected dataset. You can update the critical and warning thresholds based on the new data, assuming that the system was still performing normally at the time the new values were read.
The procedure described here cannot be considered as an absolutely foolproof method for determining thresholds in all possible situations. However it can be useful when you do not have any other ways to determine thresholds for some particular monitors. Also it should work in cases when some monitor in your system behaves differently than described in the MicroSoft documentation, but which you may consider as correct behavior.
No visitor comments posted.