+#ifdef CONFIG_PREEMPT_RCU
+
+void rcu_read_lock(void)
+{
+ if (current->rcu_read_lock_nesting++ == 0) {
+ current->rcu_data = &get_cpu_var(rcu_data);
+ atomic_inc(¤t->rcu_data->active_readers);
+ put_cpu_var(rcu_data);
Need an smp_mb() here for non-x86 CPUs. Otherwise, the CPU can
re-order parts of the critical section to precede the rcu_read_lock().
Could precede the put_cpu_var(), but why increase latency?
ok. It's enough to put a barrier into the else branch here, because the
atomic op in the main brain is a barrier by itself.
For x86, the atomic op is a barrier, but not for other architectures.
You don't need a barrier in the else branch, because in that case
you are already in an enclosing RCU read-side critical section, so
any bleeding of code will be into this enclosing section, thus still
safe.
+void rcu_read_unlock(void)
+{
[...]
And need an smp_mb() here, again for non-x86 CPUs.
ok.
Assuming that the memory barriers are added, I can see a bunch of ways
for races to extend grace periods, but none so far that result in the
fatal too-short grace period. Since rcu_qsctr_inc() refuses to
increment the quiescent-state counter on any CPU that started an RCU
read-side critical section that has not yet completed, any long
critical section will have a corresponding CPU that will refuse to go
through a quiescent state. And that will prevent the grace period
from completing.
i'm worried about the following scenario: what happens when a task is
migrated from CPU#1 to CPU#2, while in an RCU read section that it
acquired on CPU#1, and queues a callback. E.g. d_free() does a
call_rcu(), to queue the freeing of the dentry.
That callback will be queued on CPU#2 - while the task still keeps
current->rcu_data of CPU#1. It also means that CPU#2's read counter did
_not_ get increased - and a too short grace period may occur.
Let me make sure I understand the sequence of events:
CPU#1 CPU#2
task1: rcu_read_lock()
task1: migrate to CPU#2
task1: call_rcu()
task1: rcu_read_unlock()
This sequence will be safe because CPU#1's active_readers field will
be non-zero throughout, so that CPU#1 will refuse to record any
quiescent state from the time that task1 does the rcu_read_lock()
on CPU#1 until the time that it does the rcu_read_unlock() on CPU#2.
Now, it is true that CPU#2 might record a quiescent state during this
time, but this will have no effect because -all- CPUs must pass through
a quiescent state before any callbacks will be invoked. Since CPU#1
is refusing to record a quiescent state, grace periods will be blocked
for the full extent of task 1's RCU read-side critical section.
Or am I misunderstanding your scenario? Or, for that matter, your code?
it seems to me that that only safe method is to pick an 'RCU CPU' when
first entering the read section, and then sticking to it, no matter
where the task gets migrated to. Or to 'migrate' the +1 read count from
one CPU to the other, within the scheduler.
This would work too, but I don't believe it is necessary given what
you are already doing.
the 'migrate read count' solution seems more promising, as it would keep
other parts of the RCU code unchanged. [ But it seems to break the nice
'flip pointers' method you found to force a grace period. If a 'read
section' can migrate from one CPU to another then it can migrate back as
well, at which point it cannot have the 'old' pointer. Maybe it would
still work better than no flip pointers. ]
Well, I do believe that suppressing migration of tasks during RCU read-side
critical sections would simplify the flip-pointers/counters code. But
that is easy to say in advance of actually producing this code. ;-)
Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------