In the last post we discussed how to calculate the EPS of our environment. Now lets discuss how to calculate the required size of the storage, since with the EPS in hands it turns way easier to calculate the size of our database. In this scenario we will consider only the log storage, not considering the network flows storage.
First of all, we need to understand how the data is stored on QRadar. Basically, you have 3 types of data:
- Online live data: All the events can be accessed with no latency. In this case the data is not compacted;
- Online compacted data: All the events can be accessed but with a small latency because the data is compacted. The avarage compression rate is 10:1;
- Offline data: All the events cannot be accessed instantly because all the data is in a external backup server. To access this data the user should import the backup into the QRadar (or into a QRadar Virtual Machine) for analysis;
After understanding which each type of data represents, we can start to calculate the storage based on the requirements of the project. In the sizing, we only use the Online data, the offline backup is not considered (since it is a external independent server).
To make an easy explanation, lets use the following requirements:
[Online Live Data: 7 days; Online Compacted: 180 days; EPS: 2500]
Steps to calculate:
- Calculate how much data is generated each second: Multiply the EPS by 300 bytes (the average size of an log):
In the example: 2500 x 300 = 750000 bytes = 732.5 kb/s
- With the Data Per Second, we can calculate how much data we have in one day (1 day = 86400 seconds):
In the example: 732.5 * 86400 = 63288000 kb/day = 61804.7 Mb/day = 60.4 Gb/day
- Now that we know how much data is generated in one day, lets calculate the Online Live Data size (non-compacted):
In the example: 60.4Gb/day * 7 = 422.8Gb
- Now, lets calculate the Online Compacted Data. Note that the average compression rate is 10:1 :
In the example: 180 days – 7 days (online live data) = 173 days
173 days * 60.4Gb = 10449.2 Gb
10449.2Gb * 0,1 (compression rate) = 1044.92Gb
- We have the size of the online live and the online compacted data. Now we just need to sum both and we have the final size:
In the example: 422.8Gb + 1044.92Gb = 1467.72Gb = 1.43Tb
Following this basic steps we can have a accurate approximation of the necessary storage size. A good practice is using a storage 20% bigger than the estimated.
Do you have any another experience with storage sizing? Let us know in the comments!
UPDATE: According to one of our readers (see comments), starting from the version 7.2.7, the stored data will always be compressed. So, if you are sizing your environment for the latest QRadar version, you should use only the “compressed data” calculations.
4 thoughts on “Storage Sizing”
November 5, 2014 at 10:20 pm
Hi, what do u mean by online live data and online compacted data?
What is the directory to store compacted data? Is it in /store/backup?
June 16, 2016 at 10:14 am
Just FYI, the difference between online live data and online compressed data is gone with the latest QRadar release (7.2.7). Now, data is always compressed on disk and all decompression occurs in memory with no rewrite to disk (i.e. always-on disk compression)
December 4, 2016 at 11:27 pm
Thanks for the article it’s very useful
If I also collect flows, how does it affects my calculations?
Does the number of saved dashboard or graph history affect this number?
December 5, 2016 at 2:19 pm
The flow is a little bit more tricky to calculate, because depends on the collection method (VFlow, NetFlow, QFlow, Raw Packets, etc), I will write a full post explaining how to calculate it.
The number of saved dashboards and graph history does affect the storage number but not significantly. In the current client I’m working the dashboards and saved searches only get less than 5% of the disk total.