Monday, November 1, 2010

Ways to monitor a TXSeries (CICS) system

The CICS TS for the z environment has a rich set of monitoring tools available from both IBM and the third-party vendors. Though TXSeries (CICS) is not blessed with similar offerings, it has a rich set of facilities available (that confirms to the CICS standard) for monitoring. These facilities mainly aid to get granular information on the activities occurring within the CICS region.

One of the first things that comes to mind thinking of monitoring a system, is on where and how to gather information that would indicate various events occurring within the TXSeries system. Following are the primary facilities available in a TXSeries system where one can gather information:

  • CICS STATISTICS API
    • This facility provides the ability to query on the usage of various CICS region resources through the COLLECT STATISTICS API. These would give you the aggregate information of various CICS resources consumed within the CICS region.  For example, you can inquire on the peak transactions, storage usage, terminals installed, etc,.
  • CICS Monitoring Facility (CMF)
    • This facility allows you to gather information pertaining to a task running within the CICS region. The CICS system has various pre-defined points called as EMP (Event Monitoring Points) that collects information from the runtime system. For example, you can gather information such as the start/end time of the task, time taken waiting for I/O or network operation, etc,. You can enable/disable specific EMP fields that is of your interest by configuring MD (Monitoring Definition) attributes.
  • INQUIRE API
    • This facility provides the ability to query information about CICS resources programatically.

Now, the other part of the problem is how to use these facilities to monitor the system. There are different ways which we can adopt, some of them are listed below:

  • User applications or scripts
    • This is probably the most commonly seen approach, for monitoring the TXSeries CICS regions. For instance, you can choose to send a SMS to alert Administrators of any critical failures by watching TXSeries logs frequently, takes pre-defined actions such as re-starting the CICS region for certain abends/failures, archiving files, etc,. This approach however will not be useful to gather information from within the CICS runtime system, as the scripts will not have access to such information.
  • User defined transactions 
    • This is probably the best approach available to collect CICS region resource based information. This is done by having a long running transaction (often called a daemon)  collecting information at regular intervals (using the above mentioned facilities). The collected information then can be presented to the Administrator or users through WEB, or scripts, or any other applications via primitive mechanisms such as sockets, or files. Apart from collecting the information, the long running transaction can also intelligently find out if there are any transactions that are stuck, taking too much CPU or memory resources, and alarm accordingly.
  • Tivoli Agents
    • Enterprise systems often monitor more than one system at a time, and in such cases it definitely makes sense to monitor systems through Tivoli - thus providing a single window for Administrator for monitoring their system. The implementation is much similar to the approaches mentioned above, except that these are driven by the agents written in Tivoli.
  • Third-party tools
    • I came across a nice and simple tool called EcStat2 which is a GUI to monitor TXSeries CICS system. It is a Windows based application, that collects information through CICS STATISTICS and INQUIRE APIs. This GUI tool connects to the CICS regions using CTG/CUC (ECI).
  • CICS supplied transactions
    • For quick monitoring of the system, you can make use of the CSTD and CEMT supplied transaction for monitoring the system. These operations however differ from the above, as it needs to be performed manually.
  • Monitoring through TXSeries Web Administration Console (>= TXSeries for Multiplatforms V6.2)
    • Users can monitor resources such as System, Transaction and Program in a web environment. They have the ability to specify the fields to be monitored, do off-line analysis, and to archive the monitoring data.
    • The advantage of this approach is that you wouldn't require to connect to the CICS region directly, and they can monitor the system from any system via a browser.
I have just summarized here on the various approaches that can be taken to monitor the TXSeries system. If you need information on the monitoring facilities available, and how to use them, you can refer to this white paper: http://www-01.ibm.com/support/docview.wss?uid=pub1gc34711500

In summary, TXSeries does provide rich interface and wealth of information that aids monitoring, however it highly relies on users/administrators to exploit the provided interfaces, and not many third-party tools available to buy! A reference implementation is available for Tivoli agents, which can further be enhanced (as the source is available for download) to suit your requirements. A good start would be to use the TXSeries Web based administration console for monitoring.

So happy monitoring, and do drop in your comments and thoughts here!