Discussion:
[OOPS] 2.6.11 - NMI lockup with CFQ scheduler
(too old to reply)
Chris Rankin
2005-03-27 19:24:17 UTC
Permalink
[gcc-3.4.3, Linux-2.6.11-SMP, Dual P4 Xeon with HT enabled]

Hi,

My Linux 2.6.11 box oopsed when I tried to logout. I have switched to using the anticipatory
scheduler instead.

Cheers,
Chris

NMI Watchdog detected LOCKUP on CPU1, eip c0275cc7, registers:
Modules linked in: snd_pcm_oss snd_mixer_oss snd_usb_audio snd_usb_lib snd_intel8x0 snd_seq_oss
snd_seq_midi snd_emu10k1_synth snd_emu10k1 snd_ac97_codec snd_pcm snd_page_alloc snd_emux_synth
snd_seq_virmidi snd_rawmidi snd_seq_midi_event snd_seq_midi_emul snd_hwdep snd_util_mem snd_seq
snd_seq_device snd_rtctimer snd_timer snd nls_iso8859_1 nls_cp437 vfat fat usb_storage radeon drm
i2c_algo_bit emu10k1_gp gameport deflate zlib_deflate zlib_inflate twofish serpent aes_i586
blowfish des sha256 crypto_null af_key binfmt_misc eeprom i2c_sensor button processor psmouse
pcspkr p4_clockmod speedstep_lib usbserial lp nfsd exportfs md5 ipv6 sd_mod scsi_mod autofs nfs
lockd sunrpc af_packet ohci_hcd parport_pc parport e1000 video1394 raw1394 i2c_i801 i2c_core
ohci1394 ieee1394 ehci_hcd soundcore pwc videodev uhci_hcd usbcore intel_agp agpgart ide_cd cdrom
ext3 jbd
CPU: 1
EIP: 0060:[<c0275cc7>] Not tainted VLI
EFLAGS: 00200086 (2.6.11)
EIP is at _spin_lock+0x7/0xf
eax: f7b8b01c ebx: f7c82b88 ecx: f7c82b94 edx: f6c33714
esi: eb68ad88 edi: f6c33708 ebp: f6c33714 esp: f5b32f70
ds: 007b es: 007b ss: 0068
Process nautilus (pid: 5757, threadinfo=f5b32000 task=f7518020)
Stack: c01f7f79 00200282 f76bda24 f6c323e4 f7518020 00000000 00000000 c01f1d0c
f5b32000 c011d7b3 00000001 00000000 b65ffa40 00000000 f5b32fac 00000000
00000000 00000000 f5b32000 c011d8d6 c0102e7f 00000000 b65ffbf0 b6640bf0
Call Trace:
[<c01f7f79>] cfq_exit_io_context+0x54/0xb3
[<c01f1d0c>] exit_io_context+0x45/0x51
[<c011d7b3>] do_exit+0x205/0x308
[<c011d8d6>] next_thread+0x0/0xc
[<c0102e7f>] syscall_call+0x7/0xb
Code: 05 e8 3a e6 ff ff c3 ba 00 f0 ff ff 21 e2 81 42 14 00 01 00 00 f0 81 28 00 00 00 01 74 05 e8
1d e6 ff ff c3 f0 fe 08 79 09 f3 90 <80> 38 00 7e f9 eb f2 c3 f0 81 28 00 00 00 01 74 05 e8 ff e5
ff
console shuts up ...

Send instant messages to your online friends http://uk.messenger.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Jens Axboe
2005-03-29 11:35:35 UTC
Permalink
Post by Chris Rankin
[gcc-3.4.3, Linux-2.6.11-SMP, Dual P4 Xeon with HT enabled]
Hi,
My Linux 2.6.11 box oopsed when I tried to logout. I have switched to using the anticipatory
scheduler instead.
Cheers,
Chris
Modules linked in: snd_pcm_oss snd_mixer_oss snd_usb_audio snd_usb_lib snd_intel8x0 snd_seq_oss
snd_seq_midi snd_emu10k1_synth snd_emu10k1 snd_ac97_codec snd_pcm snd_page_alloc snd_emux_synth
snd_seq_virmidi snd_rawmidi snd_seq_midi_event snd_seq_midi_emul snd_hwdep snd_util_mem snd_seq
snd_seq_device snd_rtctimer snd_timer snd nls_iso8859_1 nls_cp437 vfat fat usb_storage radeon drm
i2c_algo_bit emu10k1_gp gameport deflate zlib_deflate zlib_inflate twofish serpent aes_i586
blowfish des sha256 crypto_null af_key binfmt_misc eeprom i2c_sensor button processor psmouse
pcspkr p4_clockmod speedstep_lib usbserial lp nfsd exportfs md5 ipv6 sd_mod scsi_mod autofs nfs
lockd sunrpc af_packet ohci_hcd parport_pc parport e1000 video1394 raw1394 i2c_i801 i2c_core
ohci1394 ieee1394 ehci_hcd soundcore pwc videodev uhci_hcd usbcore intel_agp agpgart ide_cd cdrom
ext3 jbd
CPU: 1
EIP: 0060:[<c0275cc7>] Not tainted VLI
EFLAGS: 00200086 (2.6.11)
EIP is at _spin_lock+0x7/0xf
eax: f7b8b01c ebx: f7c82b88 ecx: f7c82b94 edx: f6c33714
esi: eb68ad88 edi: f6c33708 ebp: f6c33714 esp: f5b32f70
ds: 007b es: 007b ss: 0068
Process nautilus (pid: 5757, threadinfo=f5b32000 task=f7518020)
Stack: c01f7f79 00200282 f76bda24 f6c323e4 f7518020 00000000 00000000 c01f1d0c
f5b32000 c011d7b3 00000001 00000000 b65ffa40 00000000 f5b32fac 00000000
00000000 00000000 f5b32000 c011d8d6 c0102e7f 00000000 b65ffbf0 b6640bf0
[<c01f7f79>] cfq_exit_io_context+0x54/0xb3
[<c01f1d0c>] exit_io_context+0x45/0x51
[<c011d7b3>] do_exit+0x205/0x308
[<c011d8d6>] next_thread+0x0/0xc
[<c0102e7f>] syscall_call+0x7/0xb
Code: 05 e8 3a e6 ff ff c3 ba 00 f0 ff ff 21 e2 81 42 14 00 01 00 00 f0 81 28 00 00 00 01 74 05 e8
1d e6 ff ff c3 f0 fe 08 79 09 f3 90 <80> 38 00 7e f9 eb f2 c3 f0 81 28 00 00 00 01 74 05 e8 ff e5
ff
console shuts up ...
The queue was gone by the time the process exited. What type of storage
do you have attached to the box? At least with SCSI, it has some
problems in this area - it will glady free the scsi device structure
(where the queue lock is located) while the queue reference count still
hasn't dropped to zero.
--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Chris Rankin
2005-03-29 12:06:03 UTC
Permalink
I have one IDE hard disc, but I was using a USB memory stick at one point. (Notice the usb-storage
and vfat modules in my list.) Could that be the troublesome SCSI device?
Post by Chris Rankin
Post by Chris Rankin
[gcc-3.4.3, Linux-2.6.11-SMP, Dual P4 Xeon with HT enabled]
Hi,
My Linux 2.6.11 box oopsed when I tried to logout. I have switched to using the anticipatory
scheduler instead.
Cheers,
Chris
Modules linked in: snd_pcm_oss snd_mixer_oss snd_usb_audio snd_usb_lib snd_intel8x0
snd_seq_oss
Post by Chris Rankin
snd_seq_midi snd_emu10k1_synth snd_emu10k1 snd_ac97_codec snd_pcm snd_page_alloc
snd_emux_synth
Post by Chris Rankin
snd_seq_virmidi snd_rawmidi snd_seq_midi_event snd_seq_midi_emul snd_hwdep snd_util_mem
snd_seq
Post by Chris Rankin
snd_seq_device snd_rtctimer snd_timer snd nls_iso8859_1 nls_cp437 vfat fat usb_storage radeon
drm
Post by Chris Rankin
i2c_algo_bit emu10k1_gp gameport deflate zlib_deflate zlib_inflate twofish serpent aes_i586
blowfish des sha256 crypto_null af_key binfmt_misc eeprom i2c_sensor button processor psmouse
pcspkr p4_clockmod speedstep_lib usbserial lp nfsd exportfs md5 ipv6 sd_mod scsi_mod autofs
nfs
Post by Chris Rankin
lockd sunrpc af_packet ohci_hcd parport_pc parport e1000 video1394 raw1394 i2c_i801 i2c_core
ohci1394 ieee1394 ehci_hcd soundcore pwc videodev uhci_hcd usbcore intel_agp agpgart ide_cd
cdrom
Post by Chris Rankin
ext3 jbd
CPU: 1
EIP: 0060:[<c0275cc7>] Not tainted VLI
EFLAGS: 00200086 (2.6.11)
EIP is at _spin_lock+0x7/0xf
eax: f7b8b01c ebx: f7c82b88 ecx: f7c82b94 edx: f6c33714
esi: eb68ad88 edi: f6c33708 ebp: f6c33714 esp: f5b32f70
ds: 007b es: 007b ss: 0068
Process nautilus (pid: 5757, threadinfo=f5b32000 task=f7518020)
Stack: c01f7f79 00200282 f76bda24 f6c323e4 f7518020 00000000 00000000 c01f1d0c
f5b32000 c011d7b3 00000001 00000000 b65ffa40 00000000 f5b32fac 00000000
00000000 00000000 f5b32000 c011d8d6 c0102e7f 00000000 b65ffbf0 b6640bf0
[<c01f7f79>] cfq_exit_io_context+0x54/0xb3
[<c01f1d0c>] exit_io_context+0x45/0x51
[<c011d7b3>] do_exit+0x205/0x308
[<c011d8d6>] next_thread+0x0/0xc
[<c0102e7f>] syscall_call+0x7/0xb
Code: 05 e8 3a e6 ff ff c3 ba 00 f0 ff ff 21 e2 81 42 14 00 01 00 00 f0 81 28 00 00 00 01 74
05 e8
Post by Chris Rankin
1d e6 ff ff c3 f0 fe 08 79 09 f3 90 <80> 38 00 7e f9 eb f2 c3 f0 81 28 00 00 00 01 74 05 e8 ff
e5
Post by Chris Rankin
ff
console shuts up ...
The queue was gone by the time the process exited. What type of storage
do you have attached to the box? At least with SCSI, it has some
problems in this area - it will glady free the scsi device structure
(where the queue lock is located) while the queue reference count still
hasn't dropped to zero.
--
Jens Axboe
Send instant messages to your online friends http://uk.messenger.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Jens Axboe
2005-03-29 12:15:56 UTC
Permalink
On Tue, Mar 29 2005, Chris Rankin wrote:

(please don't top post)
Post by Chris Rankin
Post by Chris Rankin
Post by Chris Rankin
[gcc-3.4.3, Linux-2.6.11-SMP, Dual P4 Xeon with HT enabled]
Hi,
My Linux 2.6.11 box oopsed when I tried to logout. I have switched to using the anticipatory
scheduler instead.
Cheers,
Chris
Modules linked in: snd_pcm_oss snd_mixer_oss snd_usb_audio snd_usb_lib snd_intel8x0
snd_seq_oss
Post by Chris Rankin
snd_seq_midi snd_emu10k1_synth snd_emu10k1 snd_ac97_codec snd_pcm snd_page_alloc
snd_emux_synth
Post by Chris Rankin
snd_seq_virmidi snd_rawmidi snd_seq_midi_event snd_seq_midi_emul snd_hwdep snd_util_mem
snd_seq
Post by Chris Rankin
snd_seq_device snd_rtctimer snd_timer snd nls_iso8859_1 nls_cp437 vfat fat usb_storage radeon
drm
Post by Chris Rankin
i2c_algo_bit emu10k1_gp gameport deflate zlib_deflate zlib_inflate twofish serpent aes_i586
blowfish des sha256 crypto_null af_key binfmt_misc eeprom i2c_sensor button processor psmouse
pcspkr p4_clockmod speedstep_lib usbserial lp nfsd exportfs md5 ipv6 sd_mod scsi_mod autofs
nfs
Post by Chris Rankin
lockd sunrpc af_packet ohci_hcd parport_pc parport e1000 video1394 raw1394 i2c_i801 i2c_core
ohci1394 ieee1394 ehci_hcd soundcore pwc videodev uhci_hcd usbcore intel_agp agpgart ide_cd
cdrom
Post by Chris Rankin
ext3 jbd
CPU: 1
EIP: 0060:[<c0275cc7>] Not tainted VLI
EFLAGS: 00200086 (2.6.11)
EIP is at _spin_lock+0x7/0xf
eax: f7b8b01c ebx: f7c82b88 ecx: f7c82b94 edx: f6c33714
esi: eb68ad88 edi: f6c33708 ebp: f6c33714 esp: f5b32f70
ds: 007b es: 007b ss: 0068
Process nautilus (pid: 5757, threadinfo=f5b32000 task=f7518020)
Stack: c01f7f79 00200282 f76bda24 f6c323e4 f7518020 00000000 00000000 c01f1d0c
f5b32000 c011d7b3 00000001 00000000 b65ffa40 00000000 f5b32fac 00000000
00000000 00000000 f5b32000 c011d8d6 c0102e7f 00000000 b65ffbf0 b6640bf0
[<c01f7f79>] cfq_exit_io_context+0x54/0xb3
[<c01f1d0c>] exit_io_context+0x45/0x51
[<c011d7b3>] do_exit+0x205/0x308
[<c011d8d6>] next_thread+0x0/0xc
[<c0102e7f>] syscall_call+0x7/0xb
Code: 05 e8 3a e6 ff ff c3 ba 00 f0 ff ff 21 e2 81 42 14 00 01 00 00 f0 81 28 00 00 00 01 74
05 e8
Post by Chris Rankin
1d e6 ff ff c3 f0 fe 08 79 09 f3 90 <80> 38 00 7e f9 eb f2 c3 f0 81 28 00 00 00 01 74 05 e8 ff
e5
Post by Chris Rankin
ff
console shuts up ...
The queue was gone by the time the process exited. What type of storage
do you have attached to the box? At least with SCSI, it has some
problems in this area - it will glady free the scsi device structure
(where the queue lock is located) while the queue reference count still
hasn't dropped to zero.
I have one IDE hard disc, but I was using a USB memory stick at one
point. (Notice the usb-storage and vfat modules in my list.) Could
that be the troublesome SCSI device?
Yes, it probably is. What happens is that you insert the stick and do io
against it, which sets up a process io context for that device. That
context persists until the process exits (or later, if someone still
holds a reference to it), but the queue_lock will be dead when you yank
the usb device.

It is quite a serious problem, not just for CFQ. SCSI referencing is
badly broken there.
--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Jens Axboe
2005-03-29 12:33:35 UTC
Permalink
Post by Jens Axboe
Post by Chris Rankin
Post by Chris Rankin
I have one IDE hard disc, but I was using a USB memory stick at one
point. (Notice the usb-storage and vfat modules in my list.) Could
that be the troublesome SCSI device?
Yes, it probably is. What happens is that you insert the stick and do io
against it, which sets up a process io context for that device. That
context persists until the process exits (or later, if someone still
holds a reference to it), but the queue_lock will be dead when you yank
the usb device.
It is quite a serious problem, not just for CFQ. SCSI referencing is
badly broken there.
That would explain why it was nautilus which caused the oops then.
Does this mean that the major distros aren't using the CFQ then?
Because how else can they be avoiding this oops with USB storage
devices?
CFQ with io contexts is relatively new, only there since 2.6.10 or so.
On UP, we don't grab the queue lock effetively so the problem isn't seen
there.

You can work around this issue by using a different default io scheduler
at boot time, and then select cfq for your ide hard drive when the
system has booted with:

# echo cfq > /sys/block/hda/queue/scheduler

(substitute hda for any other solid storage device).
--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Loading...