[patch] use cheaper elv_queue_empty when unplug a device

Discussion:

(too old to reply)

Chen, Kenneth W

2005-03-29 03:01:02 UTC

This patch was posted last year and if I remember correctly, Jens said
he is OK with the patch. In function __generic_unplug_deivce(), kernel
can use a cheaper function elv_queue_empty() instead of more expensive
elv_next_request to find whether the queue is empty or not. blk_run_queue
can also made conditional on whether queue's emptiness before calling
request_fn().

Signed-off-by: Ken Chen <***@intel.com>

--- linux-2.6.11/drivers/block/ll_rw_blk.c.orig 2005-03-28 18:22:26.000000000 -0800
+++ linux-2.6.11/drivers/block/ll_rw_blk.c 2005-03-28 18:44:46.000000000 -0800
@@ -1459,7 +1459,7 @@ void __generic_unplug_device(request_que
/*
* was plugged, fire request_fn if queue has stuff to do
*/
- if (elv_next_request(q))
+ if (!elv_queue_empty(q))
q->request_fn(q);
}
EXPORT_SYMBOL(__generic_unplug_device);
@@ -1589,7 +1589,8 @@ void blk_run_queue(struct request_queue

spin_lock_irqsave(q->queue_lock, flags);
blk_remove_plug(q);
- q->request_fn(q);
+ if (!elv_queue_empty(q))
+ q->request_fn(q);
spin_unlock_irqrestore(q->queue_lock, flags);
}
EXPORT_SYMBOL(blk_run_queue);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------

Jens Axboe

2005-03-29 08:48:43 UTC

Permalink

Post by Chen, Kenneth W
This patch was posted last year and if I remember correctly, Jens said
he is OK with the patch. In function __generic_unplug_deivce(), kernel
can use a cheaper function elv_queue_empty() instead of more expensive
elv_next_request to find whether the queue is empty or not. blk_run_queue
can also made conditional on whether queue's emptiness before calling
request_fn().

Looks good, thanks.

Signed-off-by: Jens Axboe <***@suse.de>
--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------

Jens Axboe

2005-03-29 09:38:07 UTC

Permalink

Post by Jens Axboe

Looks good, thanks.

Speaking of which, I've had a few ideas lying around for possible
performance improvement in the block code.
I haven't used a big disk array (or tried any simulation), but I'll
attach the patch if you're looking into that area.
- don't generic_unplug_device unconditionally in get_request_wait,
- removes the relock/retry merge mechanism in __make_request if we
aren't able to get the GFP_ATOMIC allocation. Just fall through
and assume the chances of getting a merge will be small (is this
a valid assumption? Should measure it I guess).
- removes the GFP_ATOMIC allocation. That's always a good thing.

Looks good, I've been toying with something very similar for a long time
myself.

The unplug change is a no-brainer. The retry stuff i __make_request()
will make no real difference on 'typical' hardware, when it was
introduced in 2.4.x it was very useful on slower devices like dvd-rams.
The batch wakeups should take care of this.

The atomic-vs-blocking allocation should be tested. I'd really like it
to be a "don't dive into the vm very much, just wait on the mempool"
type allocation, so we are not at the mercy of long vm reclaim times
hurting the latencies here.
--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------

Nick Piggin

2005-03-29 09:53:01 UTC

Permalink

Post by Jens Axboe
Looks good, I've been toying with something very similar for a long time
myself.
The unplug change is a no-brainer.

Yep - I may have even stolen it from you (or someone) from a patch
which had been forgotten. I can't remember for sure, but it is trivial
enough that anyone could come up with it if they noticed, so I won't
worry about attribution ;)

Post by Jens Axboe
The retry stuff i __make_request()
will make no real difference on 'typical' hardware, when it was
introduced in 2.4.x it was very useful on slower devices like dvd-rams.
The batch wakeups should take care of this.

OK cool, that was the main thing I was unsure of.

Post by Jens Axboe
The atomic-vs-blocking allocation should be tested. I'd really like it
to be a "don't dive into the vm very much, just wait on the mempool"
type allocation, so we are not at the mercy of long vm reclaim times
hurting the latencies here.

Ahh yes I forgot it was backing it with a mempool. The problem I see
with that is that GFP_ATOMIC allocations eat into the mm's "atomic
reserve" pool (main use: networking), which would be nice not to.

So long as we are sure that we'll *eventually* fall back to the mempool,
it should be OK (but I still agree should have testing) - that isn't
entirely clear though, because the page allocator infinitely loops on
small allocations unless __GFP_NORETRY is set.

Andrew - tell me I'm missing something?

Nick Piggin

2005-03-29 10:10:12 UTC

Permalink

Post by Jens Axboe
Looks good, I've been toying with something very similar for a long time
myself.

Here is another thing I just noticed that should further reduce the
locking by at least 1, sometimes 2 lock/unlock pairs per request.
At the cost of uglifying the code somewhat. Although it is pretty
nicely contained, so Jens you might consider it acceptable as is,
or we could investigate how to make it nicer if Kenneth reports some
improvement.

Note, this isn't runtime tested - it could easily have a bug.

Nick Piggin

2005-03-29 09:27:00 UTC

Permalink

I haven't used a big disk array (or tried any simulation), but I'll
attach the patch if you're looking into that area.

Oh, and this one removes a memory barrier. I think we (Jens and I)
agreed this is valid. Whether or not you'll notice a difference is
another story ;)

Nick Piggin

2005-03-29 09:27:01 UTC

Permalink

Post by Jens Axboe

Looks good, thanks.

Speaking of which, I've had a few ideas lying around for possible
performance improvement in the block code.

I haven't used a big disk array (or tried any simulation), but I'll
attach the patch if you're looking into that area.

It puts in a few unlikely()s, but the main changes are:
- don't generic_unplug_device unconditionally in get_request_wait,
- removes the relock/retry merge mechanism in __make_request if we
aren't able to get the GFP_ATOMIC allocation. Just fall through
and assume the chances of getting a merge will be small (is this
a valid assumption? Should measure it I guess).
- removes the GFP_ATOMIC allocation. That's always a good thing.

Jens Axboe

2005-03-29 10:26:28 UTC

Permalink

- removes the relock/retry merge mechanism in __make_request if we
aren't able to get the GFP_ATOMIC allocation. Just fall through
and assume the chances of getting a merge will be small (is this
a valid assumption? Should measure it I guess).

Precisely, see my response as well. If the noretry flag works, that
should be fine.
--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------

Arjan van de Ven

2005-03-29 10:15:14 UTC

Permalink

this may have a potential problem; if the vm gets in trouble, you
suddenly start to generate worse IO patterns, which means IO performance
goes down right when it's most needed.....

of course we need to figure if this actually ever hits in practice,
because if it doesn't I'm all for simpler code.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------

Nick Piggin

2005-03-29 10:30:42 UTC

Permalink

this may have a potential problem; if the vm gets in trouble, you
suddenly start to generate worse IO patterns, which means IO performance
goes down right when it's most needed.....

Sorry my wording was incorrect. It currently *always* retries the
merge if it had at first failed, and after the patch, we never retry.
So it should not result in behavioural shifts when there is a VM load
is high.

It seems to be a clear source of problems for Kenneth though, because
his workload appears to have almost zero merges, so he'll always be
invoking the merge logic twice.

I agree there is potential for subtle interactions. But generally the
block layer is surprisingly well behaved in my experience.

As Jens said, the complete removal of the GFP_ATOMIC allocation probably
has the most potential for problems in this regard, although bios are not
using GFP_ATOMIC allocations, so I would be a little surprised if it made
a really noticable difference.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------

Jens Axboe

2005-03-29 13:17:37 UTC

Permalink

@@ -2577,19 +2577,18 @@ static int __make_request(request_queue_
spin_lock_prefetch(q->queue_lock);
barrier = bio_barrier(bio);
- if (barrier && (q->ordered == QUEUE_ORDERED_NONE)) {
+ if (unlikely(barrier) && (q->ordered == QUEUE_ORDERED_NONE)) {
err = -EOPNOTSUPP;
goto end_io;
}
spin_lock_irq(q->queue_lock);
if (elv_queue_empty(q)) {
blk_plug_device(q);
goto get_rq;
}

This should just goto get_rq, the plug should happen only at the end

req->rq_disk = bio->bi_bdev->bd_disk;
req->start_time = jiffies;
+ spin_lock_irq(q->queue_lock);
+ if (elv_queue_empty(q))
+ blk_plug_device(q);
add_request(q, req);
- if (freereq)
- __blk_put_request(q, freereq);
if (bio_sync(bio))
__generic_unplug_device(q);

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------

Chen, Kenneth W

2005-03-29 19:04:51 UTC

Permalink

Nick Piggin wrote on Tuesday, March 29, 2005 1:20 AM

Speaking of which, I've had a few ideas lying around for possible
performance improvement in the block code.
I haven't used a big disk array (or tried any simulation), but I'll
attach the patch if you're looking into that area.
linux-2.6-npiggin/drivers/block/ll_rw_blk.c | 63 ++++++++++------------------
1 files changed, 23 insertions(+), 40 deletions(-)
diff -puN drivers/block/ll_rw_blk.c~blk-efficient drivers/block/ll_rw_blk.c
--- linux-2.6/drivers/block/ll_rw_blk.c~blk-efficient 2005-03-29 19:00:07.000000000 +1000
+++ linux-2.6-npiggin/drivers/block/ll_rw_blk.c 2005-03-29 19:10:45.000000000 +1000
@@ -2557,7 +2557,7 @@ EXPORT_SYMBOL(__blk_attempt_remerge);
static int __make_request(request_queue_t *q, struct bio *bio)
{
- struct request *req, *freereq = NULL;
+ struct request *req;
int el_ret, rw, nr_sectors, cur_nr_sectors, barrier, err;
sector_t sector;
@@ -2577,19 +2577,18 @@ static int __make_request(request_queue_
spin_lock_irq(q->queue_lock);
if (elv_queue_empty(q)) {

....

/*
- * Grab a free request from the freelist - if that is empty, check
- * if we are doing read ahead and abort instead of blocking for
- * a free slot.
+ * Grab a free request. This is might sleep but can not fail.
+ */
+ spin_unlock_irq(q->queue_lock);
+ req = get_request_wait(q, rw);
+ /*
+ * After dropping the lock and possibly sleeping here, our request
+ * may now be mergeable after it had proven unmergeable (above).
+ * We don't worry about that case for efficiency. It won't happen
+ * often, and the elevators are able to handle it.
*/
- if (freereq) {
- req = freereq;
- freereq = NULL;
- } else {
- spin_unlock_irq(q->queue_lock);
- if ((freereq = get_request(q, rw, GFP_ATOMIC)) == NULL) {
- /*
- * READA bit set
- */
- err = -EWOULDBLOCK;
- if (bio_rw_ahead(bio))
- goto end_io;
-
- freereq = get_request_wait(q, rw);
- }
- goto again;
- }
req->flags |= REQ_CMD;

....

This part of change in __make_request() is very welcome for enterprise workload.
I have an unpublished version of patch does something similar. I will look through
all other changes in your patch.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------