Health Check Framework

Proactively identifying performance issues with the HCF plugin

Posted on March 23, 2017 Updated on April 3, 2017

In the last post, we talked about the Health Check Framework (HCF) and its benefits. Since I’ve been using the plugin for over a month I was able to collect useful performance information and identify some potential performance issues even before they occur. In this post, you will learn how to proactively monitor your system performance and prevent potential performance issues from happening.

First, you will need to install the Health Check Framework plugin. The installation process is quite straightforward: all you need is to go to your IBM app store on your QRadar environment, search for “Health Check Framework” and install it following the steps on the screen. With the plugin installed, you can start by browsing the plugin interface and extracting a report about your system performance. In this report, you will see a lot of details about your system, such as CPU Usage, Disk Usage, EPS, FPM, heavy reports, current users, etc.

Report generated by the Health Check Framework (HFC) – A holistic view of your environment, note the number of tabs on the report

In my case, we are planning to expand the scope of servers monitored by QRadar, so I wanted to understand if we would need any hardware upgrades. For testing purposes, I disabled the log collection of 30 Windows servers that were currently being monitored and I noticed that the RAM memory usage reduced by around 5% (see images below). Obviously, this number will vary according to the server usage, but this test gave me a rough estimation that for each 30 servers my RAM memory usage will increase around 5%. So with the help of the HCF plugin, I was able to identify hardware upgrades to accommodate the monitoring of new servers, avoiding system outages due to lack of resources during the scope expansion.

Before disabling Windows test servers – Memory used ~40GB

After disabling the Windows servers – Memory used: ~38GB

Even if you’re not planning to expand your scope, you can use historical performance data to proactively identify issues. For example, let’s say, your QRadar monitors a new e-commerce website. The number of logs you get depends on the traffic your website has. With the historical data, you will be able to identify a performance trend: as the e-commerce website becomes more popular, the EPS increases and the CPU/Memory usage also increases. With this data, you will be able to estimate at which point in time you will need a hardware upgrade, avoiding any unexpected system outages due to lack of resources.

Another very interesting data that I found in this report was the “Event Average Payload Size”, which as the name says, tells you the average size of logs received. This can be very useful to identify hard drive requirements when expanding your EPS.

Using the same plugin I was also able to identify heavy reports and rules that were severely impacting the performance of my QRadar environment. After reviewing and fixing the queries of the reports and rules, it was noticed a considerable reduction in the CPU usage.

Chart extracted from the HCF report – Identifying heavy rules

Monitoring the performance of your system puts you in a proactive posture in relation to your environment. Being proactive means that you will not be firefighting issues as before, but instead, monitoring and planning upgrades ahead to avoid issues even before they happen.

This entry was posted in Administration, Apps, Sizing, Troubleshooting, Tunning and tagged Apps, Design, HCF, Health Check Framework, Sizing, Troubleshooting.