<<Project Name>>

Read the Guidance (Arial blue font in brackets) to understand the information that should be placed in each section of this template. Then delete the Guidance and replace the placeholder within <<Begin text here>> with your response. There may be additional Guidance in the Appendix of some documents, which should also be deleted once it has been used.

Some templates have four levels of headings. They are not indented, but can be differentiated by font type and size:

Heading 1 – Arial Bold 16 font
Heading 2 – Arial Bold Italic 14 font
Heading 3 – Arial Bold 13 font
Heading 3 – Arial Bold Italic 12 font

You may elect to indent sections for readability.

Author
Author Position
Date

Version: 1.0

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.

Microsoft and Visual Basic are either registered trademarks or trademarks of Microsoft in theUnited States and/or other countries.

Revision & Sign-off Sheet

Change Record

Date	Author	Version	Change Reference

Reviewers

Name	Version Approved	Position	Date

Distribution

Name	Position

Document Properties

Item	Details
Document Title	Monitoring Plan
Author
Creation Date
Last Updated

Description: The Monitoring Plan defines the process by which the operational environment will monitor the solution. It describes what will be monitored, what monitoring is looking for, how monitoring will be done, and how the results of monitoring will be reported and used. Customers use automated procedures to monitor many aspects of their solutions. Automated monitoring is a key best practice that enables identification of failure conditions and potential problems. Monitoring helps to reduce the time needed to recover from failures.

Justification: The plan will provide the details of the monitoring process, which will be incorporated into the functional specification. Once incorporated into the functional specification, the monitoring process (manual and automated) will be included in the solution design. Monitoring ensures that operators are made aware that a failure has occurred so they can initiate procedures to restore service. Additionally, some organizations monitor their servers’ performance characteristics to spot usage trends. This proactive best practice allows organizations to identify the conditions that contribute to system failure and take action to prevent those conditions from occurring.

{Team Role Primary: Program Management is responsible for ensuring that the plan is completed and has acceptable quality, as well as incorporating it into the Master Project Plan and Operations Plan. Release Management will contribute heavily to the content of the plan in its responsibility for designing an effective solution monitoring process.

Team Role Secondary: Development will review the plan to ensure that the functional specification and project deliverables are in synch with the monitoring plan. Product Management will review the plan to ensure that external customer needs are met by the monitoring plan. Test and User Experience will review the plan to ensure that what is monitored supports their functional areas of interest.}]

Summary

Justification: Some project participants may need to know only the highlights of the plan, and summarizing creates that user view. It also enables the full reader to know the essence of the document before they examine the details.]

Objectives

[Description: The Objectives section describes the business and technical drivers of the monitoring process and what key objectives are targeted for the monitoring process.

Justification: Identifying the drivers and monitoring objectives signals to the customer that Microsoft has carefully considered the situation and solution and created an appropriate monitoring approach.]

Anticipating Failures

Component

Single Point of Failure

(yes or no)

Mean Time between Failures

Conditions and Circumstances leading to Failure

Probability of Component Failure

Impacts of Failure

Justification: Anticipating failures will enable operations either to avoid them or be prepared to deal with them when they occur.]

Resource Threshold Monitoring

[Description: The Resource Threshold Monitoring section identifies the solution resources that will be monitored, it defines the conditions and circumstances to be monitored for each type of resource, and it defines the thresholds to be used to judge that resources are working properly and are/are not sufficient to support the solution. Resources include hard drives, CPU, memory, and threads.]

Performance Monitoring

[Description: The Performance Monitoring section defines the monitoring process that gathers and records information about the performance of the total solution and the individual components in the solution. For each type of solution event it includes

Trend Analysis

[Description: The Trend Analysis section defines the analysis that will take place on the data collected during performance monitoring. Trend analysis uses the information gathered and recorded by performance monitoring to predict solution and component performance and health under different conditions and circumstances, such as a larger user set and a changing solution environment.]

Application Health and Performance Monitoring

[Description: The Application Health and Performance Monitoring section should list and describe each software application in the solution and describe the plan for monitoring each application:

Detecting Failures (Incidents)

[Description: The Detecting Failures section should describe how the development team, operations, and maintenance will utilize the functional specifications and user acceptance criteria to detect failure incidents. The functional specifications clearly define the success criteria for a solution and for each of its components. User Acceptance Criteria, based on the functional specifications, precisely define user expectations for the correct and effective operation of the solution.]

Error Detection

[Description: The Error Detection section describes the processes, methods, and tools teams will use to detect and diagnose solution errors. The goal of an error detection strategy should be that the error is detected, resolved and recovered without the knowledge of the user community.

Justification: Error detection in a Windows environment will enhance a solution’s reliability and availability. Early detection and handling of application and system errors can help avoid a shutdown, or at least allow for an orderly shutdown. It can also increase availability by allowing the solution to continue operating in a degraded state.]

SNMP

[Description: The SNMP protocol captures or traps configuration and status information from a Windows NT server.]

Event Logs

[Description: The Event Logs section describes the logs that will provide a system for capturing and reviewing significant application and system events. Describe the logs operations will maintain and the procedures they will use to record events and time in the logs.]

Monitoring for Failure

[Description: The Monitoring for Failure section should describe the processes, methods, and tools teams will use to detect and report solution failures.]

Monitoring for Success

[Description: The Monitoring for Success section describes the processes, methods, and tools teams will use to determine the solution is working correctly and is meeting user expectations. Monitoring for success includes the use of monitoring tools and interaction with solution users to gather information about solution successes.]

Monitoring for Alarms

[Description: The Monitoring for Alarms section describes how solution alarms will signal that a problem is about to occur or has occurred in a solution. It should identify all solution alarms, indicate how they will signal users and operations, and define what each alarm means.]

Exception Trapping

[Description: The Exception Trapping section describes a type of monitoring built into a solution that recognizes incidents, indicating a solution has produced a result that is an exception to acceptable results (i.e., the result lies outside the range of acceptability). This section should identify where the development team will build exception traps into the solution that continually monitor solutions or that operations will turn on when they suspect problems within a solution. Exception trapping capabilities allow for reliable programmer and program control over responses to exceptions that occur during the execution of a solution.]

Notifications

[Description: The Notifications section describes how people will be notified when monitoring and exception trapping has detected solution failures. This should include notification for errors and cases in which user performance expectations have not been met.]

Diagnosing Failures (Problems)

[Description: The Diagnosing Failures section describes the processes, methods, and tools teams will employ to diagnose the problems detected in solutions by monitoring and exception trapping.]

Resolving Failures (Known Errors)

[Description: The Resolving Failures section describes the procedures teams will use to correct the errors detected and diagnosed in solutions and to improve solutions that do not meet user expectations.]

Recovering from Failures

[Description: The Recovering from Failures section defines how the solution will be recovered from failure or referenced the Backup and Recovery Plan.]

Tools

[Description: The Tools section lists and describes the tools teams can employ to detect, diagnose, and correct errors and to improve a solution’s performance. The table below is an example of this.]

Tool	Description
Microsoft Systems Management Server	Integrated inventory, distribution, installation, and remote troubleshooting tools for centralized management of hardware and software. Microsoft Systems Management Server can be used in medium to large multi-site Windows–based environments to reduce the cost of change and configuration management of Windows based desktop and server computers. Details available at http://www.microsoft.com/backoffice
Microsoft Performance Monitor (Perfmon)	Windows NT administrative tool that enables viewing behavior of processors, memory, cache, threads, and process objects. Each object has an associated set of counters that provide information about device usage, queue length, delays, and other data that measures throughput and internal congestion. Details available at http://www.microsoft.com/ntserver
Microsoft Windows NT Resource Kit, version 3.51 Microsoft Windows NT Server 4.0 Resource Kit Microsoft Windows NT Workstation 4.0 Resource Kit	Microsoft Press® kits contain both technical documentation and a CD-ROM with useful utilities and accessory programs to help install, configure, and troubleshoot Microsoft Windows NT. See Details available at http://www.mspress.microsoft.com
Tivoli Management Software	Family of products with a single management framework integrating disparate IBM systems management applications. Details available at http://www.tivoli.com
Microsoft HTTPMon	Multithreaded Windows NT service that monitors web server performance by measuring how quickly the web server responds to requests from client browsers. Details available at http://www.microsoft.com/ntserver
HP OpenView	Hewlett Packard family of products designed to manage distributed computer systems and networks from computers running Windows or UNIX operating systems. Details available at http://www.hp.com
NetManage	Single-source PC-to-host connectivity solutions from NetManage. The company develops integrated applications, servers, and development tools for Microsoft Windows, Windows® 95 and Windows NT operating systems. Details available at http://www.netmanage.com
PerlEx	Utility for Web servers running under Windows NT that improves the performance of Perl scripts. Details available at http://www.activestate.com
SeNTry	An SNMP-based monitoring tool. Details available at http:// www.missioncritical.com

<<Project Name>>

Monitoring Plan

Customer Name

Revision & Sign-off Sheet

Table of Contents