Troubleshooting

Proactively identifying performance issues with the HCF plugin

Posted on Updated on

In the last post, we talked about the Health Check Framework (HCF) and its benefits. Since I’ve been using the plugin for over a month I was able to collect useful performance information and identify some potential performance issues even before they occur. In this post, you will learn how to proactively monitor your system performance and prevent potential performance issues from happening.

First, you will need to install the Health Check Framework plugin. The installation process is quite straightforward: all you need is to go to your IBM app store on your QRadar environment, search for “Health Check Framework” and install it following the steps on the screen. With the plugin installed, you can start by browsing the plugin interface and extracting a report about your system performance. In this report, you will see a lot of details about your system, such as CPU Usage, Disk Usage, EPS, FPM, heavy reports, current users, etc.

Report generated by the Health Check Framework (HFC) – A holistic view of your environment, note the number of tabs on the report

 

In my case, we are planning to expand the scope of servers monitored by QRadar, so I wanted to understand if we would need any hardware upgrades. For testing purposes, I disabled the log collection of 30 Windows servers that were currently being monitored and I noticed that the RAM memory usage reduced by around 5% (see images below). Obviously, this number will vary according to the server usage, but this test gave me a rough estimation that for each 30 servers my RAM memory usage will increase around 5%. So with the help of the HCF plugin, I was able to identify hardware upgrades to accommodate the monitoring of new servers, avoiding system outages due to lack of resources during the scope expansion.

Before disabling Windows test servers – Memory used ~40GB

 

After disabling the Windows servers – Memory used: ~38GB

 

Even if you’re not planning to expand your scope, you can use historical performance data to proactively identify issues. For example, let’s say, your QRadar monitors a new e-commerce website. The number of logs you get depends on the traffic your website has. With the historical data, you will be able to identify a performance trend: as the e-commerce website becomes more popular, the EPS increases and the CPU/Memory usage also increases. With this data, you will be able to estimate at which point in time you will need a hardware upgrade, avoiding any unexpected system outages due to lack of resources.

Another very interesting data that I found in this report was the “Event Average Payload Size”, which as the name says, tells you the average size of logs received. This can be very useful to identify hard drive requirements when expanding your EPS.

Using the same plugin I was also able to identify heavy reports and rules that were severely impacting the performance of my QRadar environment. After reviewing and fixing the queries of the reports and rules, it was noticed a considerable reduction in the CPU usage.

Chart extracted from the HCF report – Identifying heavy rules

 

Monitoring the performance of your system puts you in a proactive posture in relation to your environment. Being proactive means that you will not be firefighting issues as before, but instead, monitoring and planning upgrades ahead to avoid issues even before they happen.

Advertisement

Checking if GUI is working on the IMM

Posted on

Hi guys, sometimes you’re not able to log in to the IMM web interface and check many information regarding event processor/ collector. If the GUI is down you can go through the following easy steps:

  •  ssh to the IMM,
  • check if the server is running:     system>     ssl
    • if it’s running try to reset the IMM.
    • if the server is not running just turn it on:          system>   ssl se on

IMM Troubleshooting

Running commands across the environment

Posted on

The daily maintenance across a small environments can be an easy job, but when our environment grows to a point where we have several appliances it can be a though job. For example, in case we need to monitor the Disk Space in a environment of just one appliance, we can simple connect through SSH to the QRadar and run a Linux command such as ‘df -h‘, but in a large environment with several appliances this practice would take a lot of time.

In the QRadar distributed environments, the console acts like a central management console to all the another appliances. In our example of monitoring disk, wouldn’t be easier if we could run a command in the main console to get information about all the environment? It’s exactly what the script ‘all_servers.sh‘ does. The script is located at:
/opt/qradar/support/all_servers.sh

To run the command, you can use the following syntax:
[root@MY_RADAR]# ./opt/qradar/support/all_servers.sh ‘COMMAND’
(Where COMMAND is what you want to run in the appliances)

In our example of monitoring the disk size, we could use:
[root@MY_RADAR]# ./opt/qradar/support/all_servers.sh ‘df -h’ > /root/drive_space.txt
And it would write the result of the script on all the servers in the following file: /root/drive_space.txt

The script can be used for several different purposes: Monitoring disk space, Monitoring CPU, Viewing network configurations, checking logs, etc. Can you imagine how it could help in your environment?! Had good ideas of how to integrate it with your monitoring systems?! Let us know in the comments!

 

— This post was suggested and written by our new collaborator, Tomasz Stankiewic​z.

Quick Log Collection Troubleshooting

Posted on Updated on

We already discussed about how configure log sources, and how configure QRadar to receive the logs. Let’s say that everything is ready, you are in front of the customer, and the logs doesn’t show up, do you know how to troubleshoot it? Here is some quick troubleshooting tips, that can help you in those situations:

  • Verify the connectivity between the log source and the QRadar collector:
    • You can simply ping from the log source to the collector;
    • By default, the IP-Tables from QRadar drop pings, so you will need to stop the iptables process in the QRadar collector. You can do it opening the terminal (or ssh) in the QRadar and using the following command:
      services iptables stop ;
    • If you cannot even ping the QRadar server from your log source, the issue is the network;
    • Don’t forget to restart the IPtables after testing, just use the following command:
      services iptables start ;
  • Verify the firewalls between the log source and the QRadar:
    • The firewalls should allow the ports used to collect. For example, for collecting syslog, the firewalls should allow the port 514/UDP;
    • If you have no access to the firewall, a simple way to test the firewall is using the telnet command from the logsource to the QRadar:  telnet [IP] [PORT]
      Example: telnet 10.1.1.1 514
    • If the telnet doesn’t work, some firewall is dropping the packets on the specified port, you should ask for a firewall rule allowing the traffic;
  • Verify the flows coming in the QRadar collector:
    • You can use the command tcpdump in the QRadar to verify if the packets are being received in the QRadar;
    • Syntax: tcpdump -i [INTERFACE] src host [IP-LOGSOURCE] port [PORT]
    • Example: tcpdump -i eth0 src host 10.2.2.2 port 514
    • If nothing shows up, there is some network issue dropping the packets or the log source is not properly configured;
  • Verify the QRadar Logs:
    • The QRadar logs are stored in the following folder: /var/log/
    • The main log is named qradar.log
    • You can simple access and monitor the log using the following command: tail –f  /var/log/qradar.log
    • You can verify the current EPS using the following command:
      tail –f  /var/log/qradar.log | grep ‘Events per Second’

I hope this post help you guys to troubleshoot collecting problems on QRadar. If you have any question or suggestion, please leave us a comment!