简体   繁体   中英

Block ram disk fails to read/write with offset

I'm creating a very very simple block RAM disk based on sbull.

So far it works fine if I read/write blocks of data using dd, but whenever I try mounting a filesystem on it (and sometimes creating a file system) my driver crashes.

After long weeks of debugging, I finally found out what is wrong, even though I can't really find a way to solve the problem. Hence my question here :)

Whenever a user space application creates a request to the device WITH AN OFFSET, the driver won't work! Let me show you the source code in order to clarify:

First of all, I'm handling requests using mk_request (not using a request_queue):

static void escsi_mk_request(struct request_queue *q, struct bio *bio)
{
        struct block_device *bdev = bio->bi_bdev;
        struct escsi_dev *esd = bdev->bd_disk->private_data;
        int rw;
        struct bio_vec *bvec;
        sector_t sector;
        int i;
        int err = -EIO;

        printk("request received nr. sectors = %lu\n",bio_sectors(bio));

        sector = bio->bi_sector;
        if (bio_end_sector(bio) > get_capacity(bdev->bd_disk))
                goto out;

        if (unlikely(bio->bi_rw & REQ_DISCARD)) {
                err = 0;
                goto out;
        }

        rw = bio_rw(bio);
        if (rw == READA)
            rw = READ;

        bio_for_each_segment(bvec, bio, i) {
                unsigned int len = bvec->bv_len;
                err = esd_do_bvec(esd, bvec->bv_page, len, bvec->bv_offset, rw, sector);
                if (err) {
                        printk("err!\n");
                        break;
                }
                sector += len >> SECTOR_SHIFT;
        }

out:
        bio_endio(bio, err);
}

The esd_do_bvec function:

static int esd_do_bvec(struct escsi_dev *esd, struct page *page,
                         unsigned int len, unsigned int off, int rw,
                         sector_t sector)
 {
            void *mem;
            int err = 0;
            unsigned int offset;
            int i;

        offset = off + sector * 512;

        printk("ESD RW=%d, len=%d, off=%d, offset=%d, sector=%lu\n",rw,len,off,offset,sector);

        mem = kmap_atomic(page);
        if (rw == READ) {
                memcpy(mem,esd->data+offset,len);
        } else {
                memcpy(esd->data+offset,mem,len);
        }
        kunmap_atomic(mem);

out:
        return err;
}

OK, so basically when I read or write data using dd, the variable "off" in esd_do_bvec() is always 0, regardless of where and how many bytes I want to write. The file system obviously always performs I/O in 4KB chunks and will write a full block even when only one byte needs to be replaced.

I am sure that reads and writes are working correctly when there's no offset because I created a file that is the same size as my block RAM disk and dumped the entire file into my device using dd, then got the output of the device (also using dd), and the input and output files are exactly the same. I also wrote the same file into a brd (Linux kernel original block RAM disk driver) and the outputs are the same comparing my device and the brd device.

BUT -- in some specific situations I try to mount or create a new file system on my device and somehow it gets I/O requests with an offset, and at that point my driver fails. I assume that I'm not handling the offset properly. For example, when I try "mount -t ext2 /dev/esda":

linux-xjwl:/home/phil/escsi # mount /dev/esda -t ext2 /mnt/esda1/
mount: wrong fs type, bad option, bad superblock on /dev/esda,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so
linux-xjwl:/home/phil/escsi # dmesg|tail -n 10
[ 2239.275901] ESD RW=0, len=4096, off=0, offset=16384, sector=32
[ 2239.275947] request received nr. sectors = 8
[ 2239.275959] ESD RW=0, len=4096, off=0, offset=4096, sector=8
[ 2239.276516] request received nr. sectors = 8
[ 2239.276537] ESD RW=0, len=4096, off=0, offset=2097152, sector=4096
[ 2239.276606] request received nr. sectors = 8
[ 2239.276626] ESD RW=0, len=4096, off=0, offset=28672, sector=56
[ 2239.277535] request received nr. sectors = 2
[ 2239.277535] ESD RW=0, len=1024, off=1024, offset=2048, sector=2
[ 2239.277535] EXT4-fs (esda): VFS: Can't find ext4 filesystem

(ps: the output shows "EXT4" but I am running with "-t ext2")

I have checked the contents of sector n. 2 in my device and it does contain the ext2 metadata (since I ran mkfs.ext2 prior to trying to mount, of course). So I believe there's a problem with the offset. So far I can't really debug my driver because I wasn't able to come up with a request which would cause an I/O request with an offset (eg, if I try writing a single byte into my device, Linux will read the whole block and rewrite it with only one different byte).

Hope it's not a too simple question for you.

Thanks in advance, Phil




Please see the answer provided by Peter below.

If you're wondering what the esd_do_bvec() function looks like now, here it comes:

static int esd_do_bvec(struct escsi_dev *esd, char *buf,
                        unsigned int len, int rw, sector_t sector)
{
        int err = 0;
        unsigned int offset;

        // Please notice that we STILL have an offset to deal with, but
        // this offset comes in sectors and needs to be converted to a
        // a byte offset.
        offset = sector << SECTOR_SHIFT; // or multiply by 512

        //printk("ESD RW=%d, len=%d, off=%d, offset=%d, sector=%lu\n",rw,len,off,offset,sector);

        if (rw == READ) {
                memcpy(buf,esd->data+offset,len);
        } else {
                memcpy(esd->data+offset,buf,len);
        }
        return err;
}

The offset per segment does not refer to an offset from the block device location, but rather an offset into the page. To cause this to be nonzero, you'll probably need to write your own C program that runs read() and write() . Allocate a page-aligned buffer, then read/write to/from different locations in that buffer, and those should show up as offsets in the bvec.

That said, LWN warns of managing this page offset manually, and recommends instead the macro bio_kmap_irq() , which is called on the bio_for_each_segment() variable bio , and takes care of the atomic kmap AND manages the offset entry as well. Source: http://lwn.net/Articles/26404/

Your code will look something like:

    bio_for_each_segment(bvec, bio, i) {
            unsigned int len = bvec->bv_len;
            unsigned long flags;

            char *buf = bio_kmap_irq(bio, &flags);
            err = esd_do_bvec(esd, buf, len, rw, sector);
            bio_kunmap_irq(buf, &flags);

            if (err) {
                    printk("err!\n");
                    break;
            }
            sector += len >> SECTOR_SHIFT;
    }

Of course this changes the signature of esd_do_bvec to accept the memory buffer directly rather than page/offset.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM