The adaptive readahead patch benchmark

One of the more interesting patches for the linux kernel lately has been Wu Fengguang's adaptive readahead patchset, currently at version 12. Talking about its performance benefits Wu says: "besides file servers and desktops, it is recently found to benefit postgresql databases a lot.".

So I decided to do a simple benchmark to see what difference would adaptive readahead make in my case. The idea was to test a very simple database query (random select) to the PostgreSQL database and see how it performs through time (while the memory is being primed with data from disk).

Test methodology

As a test kernel I used 2.6.17-rc5 which was patched with the latest Wu's patchset. Because I have used the unsupported release candidate kernel I did have some conflicts while applying the patch, but they were quite easy to resolve. Before every run I rebooted the computer to be sure that file cache is cleaned up so we have a fresh start every time. Immediately after reboot I fired up the attached simple perl script. Its only task was to query the database in a random fashion. The test database was a table with data of about million and a half phone subscribers queried by number. Every ten seconds the script would print out the average number of queries per second achieved in the last ten seconds period. The idea was to monitor it through time to see how fast the kernel is able to pull database data into the main memory, which is a test that even simple readahead algorithm should do well. There was enough physical memory to cache the whole table and its indexes so at the end of the benchmark, when we achieved the full speed, there was no I/O to disk, all data was cached in memory and we were in fact measuring the CPU speed (Pentium M 1.5GHz scaled down to 600MHz, if you need to know).

I must admit that I didn't pick this test by chance. I noticed before that PostgreSQL database was very slow in this kind of tests, at least compared to the other databases. It would always spend much more time to pull data into memory thanks to the now known fact that PostgreSQL doesn't have any readahead algorithms implemented by itself, but instead relies on the kernel to do the magic.


In the picture below you can see the difference between the run on the default (unpatched) kernel (red line) and the run with the adaptive readahead patchset applied (green line).


I think the graph speaks for itself. It took around 6 minutes to prime the memory when run on the standard kernel, while on the other hand, when adaptive readahead was compiled in, the database was fully cached after only 2 minutes. That amounts to a 3x speedup.


This test was so simple that we shouldn't draw any far reaching conclusions. Yes, Wu has made a good job and in cases like this it will surely help to get the data from disk cached in memory faster. The hard part is to test the myriad of other setups and especially the behavior of the patch when the memory resources are scarce. No doubt Wu is busy testing at the moment and I hope other people will join in and report what they have found.

Plain text icon bytes


With kernel 2.6.16 and newer you cand do "sync; echo 3 > /proc/sys/vm/drop_caches" to clear all caches.
OTOH, nice benchmark.

Wow, that's one nice debugging tunable, didn't know about that, probably because it's so new. Thanks for the tip!

Of course, if you're in production, there's not much use for it, but if you're debugging or benchmarking stuff it's God given.

thanks, nice test!

You mention that the system has enough memory to cache the entire database, so i'm wondering if the main benefit you see in this case is due to Wu's patch increasing the max amount of readahead the system does by default.

Can you re-run the stock kernel test, except 'echo 2048 > /sys/block/[dev]/queue/read_ahead_kb' first? I think this will increase the performance noticebly, showing that the adaptive readahead's benefit lies in other areas, particularly systems with memory pressure and multiple i/o bound processes. In other words, it's value is that it can improve this workload's performance (by increasing readahead) without damaging other workloads where a high readahead value would be suboptimal on the stock kernel.

also, it is sufficient to unmount the device you are testing in between each test, but the drop_caches trick is not necessarily enough for all benchmarks. From the comment in the code:

* invalidate_mapping_pages() will not block on IO activity. It will not
* invalidate pages which are dirty, locked, under writeback or mapped into
* pagetables.


I would love to repeat the test with the change you suggested, but for some reasons my system doesn't accept any value bigger than the default 128KB for read_ahead_kb, I don't know why.

Now, I'm pretty sure that bigger default readahead would also help in this test, it's just the question would it be good to have it set system-wide. While I don't know for sure, I pretty much expect the adaptive readahead to make much better decisions when it comes to how much data in what circumstance to read ahead from disk.

But all in all, I don't disagree with you, this test is too simple and I knew it would show the adaptive readahead in a positive way. Maybe, if I find some time, I'll try to come up with a much more elaborate test that would even prove something. :)

About drop_caches, yes, I know it won't solve everything but it's still damn handy, and the documentation clearly says that it's important to run sync before it to flush out dirty pages to disk so they can be freed too.

Then again, if I want to be sure and I care about what I'm testing I always reboot. Otherwise it's quite hard to know what you're testing (disk or memory speed). Unmounting is a solution that is somewhere in between, but in my case I would have to unmount /var partition, so it was less pain for me to reboot the damn thing and start over. :)

Thanks for your comment!

Nice work.

Here are two more tips:
- run 'blockdev --setra 2048 /dev/sda' before each test, to ensure fairness and avoid the (rather rigid) size limit imposed by the sysfs interface;
- for this test case(randomly populating the cache, until some whole files are pulled into the large enough memory), increasing /proc/sys/vm/readahead_hit_rate may help performance. 1 or 8 will be good starting value to try, and larger values might not help as much.

Wu Fengguang

Hey Wu! Thanks for your comment.

I have rerun the benchmark following your guidelines and results are as follows (no graph this time):

default RA algo (ra=2048): 350 seconds
adaptive RA algo (ra=2048, r_h_r=1): 270 seconds
adaptive RA algo (ra=2048, r_h_r=8): 85 seconds

All tests were run with read_ahead_kb of 1024 (set with the above command). The first one on -rc5, last two on rc5-mm3 kernel which includes the adaptive readahead patch. As can be seen something changed in the setup (maybe just default settings?) because the second run with readahead_hit_rate at default value of 1 is now only marginally faster than the stock RA algorithm. But when that parameter was raised to 8, the test script slurped database data in memory very fast (more than 4 times faster than the default kernel algorithm, very impressive).

I have built an Adaptive Readahead kernel package for Debian users, strongly based on and feature compatible with the kernel which ships with Debian Etch (other than it has ARA turned on!). Looking for feedback from anyone who has tried this in production.