Sunday, August 12, 2012

Speedup file reading on linux

(Actually, this is pretty old post from my previos address.)

It's about how to speedup reading a pile of files from disk drives. Evident, such operation requires not so rarely - parsing couple of files, copying them over fast ethernet connection as like as moving them from one partition to another.

There are several things to speedup and I'm sure you know about IO-buffer size dependence, but one of main advantage achieves by "prereading" of data and doing this in right order, the things you usually can't control from userspace. Since the main obstacle while reading files is non-linear moving of disk drive heads, we should achieve as native physical order as possible.

This native order may be retrieved by using ioctl(FIBMAP) on opened file descriptor, but there are some limits: third argument of `ioctl' call presents pointer to integer - logical block being translated to physical on output, so obviously number of physical block able to be mapped is not very large. It may hit the limit on XFS and other huge FSes. There is also a big disadvantage of FIBMAP - it requires a superuser privilegies (don't know why). Instead of using old ioctl, new linux kernel provides an another one: FS_IOC_FIEMAP. This variant is much more flexible, universal and limit-safe. It also requires no superuser privilegies. This call provides you viewing of file as physical extents (even for filesystems allocating data by bitmaps, see flags). You can find much information in kernel documentation.

Here is the sample of how to retrieve first physical block by methods mentioned above:

#include <linux/fs.h>
#include <linux/fiemap.h>

uint64_t
get_physblock(const char *f)
{
    int fd = open(f, O_RDONLY);
    uint64_t block = ~0ULL;
    if (fd >= 0) {
#ifdef FS_IOC_FIEMAP
         union {
           struct fiemap fm;
           char buf[sizeof(struct fiemap) + sizeof(struct fiemap_extent) * 1];
         };
         memset(&fm, 0, sizeof fm);
         fm.fm_length = 1;       /* one byte mapping from logical offset=0 */
         fm.fm_extent_count = 1; /* buffer for one extent provided */
         if (ioctl(fd, FS_IOC_FIEMAP, &fm) != -1 && fm.fm_mapped_extents == 1)
           block = blk;
#else
         int blk = 0; /* first logical block */
         if (-1 != ioctl(fd, FIBMAP, &blk))
           block = blk;
#endif
        close(fd);
    }

    return block;
}

Test

Clearly right what relying on physical block ID, you may reorder files to read. In addition, there is a readahead(2) syscall which can be used to "preread" file data in VFS cache. It differs from the reading by read(2) since it has no "copy_to_userspace" overhead. Indeed, there is no much to talk about but give a test results. Testing principle is quite simple: read linux sources file by file. At first read them `as is', then apply readahead, and finally - preordering. FS cache between that cases may be purged by

$ echo 2 >/proc/sys/vm/drop_caches
Test results are following:
Method usedTime elapsed (sec)
as-is33
readahead26
reorder + readahead14

It's not difficult to see what applying readahead, especially with preordering, gives much benefit. So, this method may be used for caching - there are several implementation engaged in popular Linux distributions, for example readahead package, used by default in Fedora and Ubuntu.

You can read details about fiemap on LWN page.

I hope this short note will convince you using this tricky calls when FS reading speed is valuable. Surely, this method worthy only for reading much of files, but not for couple of huge files since it's already self-ordered. For myself, I used this approach when developed library which creates dictionaries for classifying phrases - there was over 60000 of input files, and until then read this files consuming the most time.

No comments:

Post a Comment