Discussion:
Linux 2.4.30-rc3
(too old to reply)
Marcelo Tosatti
2005-03-26 21:14:49 UTC
Permalink
Hi,

Here goes -rc3.

A nasty typo happened while merging v2.6 load_elf_library() DoS fix,
which could leap to oopses.

Summary of changes from v2.4.30-rc2 to v2.4.30-rc3
============================================

Marcelo Tosatti:
o Andreas Arens: Fix deadly mismerge of binfmt_elf DoS fix
o Change VERSION to 2.4.30-rc3

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Ville Herva
2005-03-28 07:36:45 UTC
Permalink
Post by Marcelo Tosatti
Hi,
Here goes -rc3.
A nasty typo happened while merging v2.6 load_elf_library() DoS fix,
which could leap to oopses.
Summary of changes from v2.4.30-rc2 to v2.4.30-rc3
============================================
o Andreas Arens: Fix deadly mismerge of binfmt_elf DoS fix
o Change VERSION to 2.4.30-rc3
I just upgraded from linux-2.4.21 + vserser 0.17 to 2.4.30rc3 + vserver
1.2.10. The box has been running stable with 2.4.21 + vserver 0.17/0.16 for
a few years (uptime before reboot was nearly 400 days.)

The boot went fine, but after few hours I got
Message from ***@box at Sun Mar 27 22:07:00 2005 ...
turing kernel: journal commit I/O error

and dmesg is filled with
--8<-----------------------------------------------------------------------
EXT3-fs error (device md(9,3)) in start_transaction: Journal has aborted
EXT3-fs error (device md(9,3)) in start_transaction: Journal has aborted
EXT3-fs error (device md(9,3)) in start_transaction: Journal has aborted
EXT3-fs error (device md(9,3)) in start_transaction: Journal has aborted
--8<-----------------------------------------------------------------------

This is roofs, on top software raid1 and two ide disks. mdstat claims it's
healthy:

--8<-----------------------------------------------------------------------
md3 : active raid1 hdc3[1] hda3[0]
37955648 blocks [2/2] [UU]
--8<-----------------------------------------------------------------------

While dmesg has filled up and /var/log/messages is read-only - I can't see
all the kernel messages - there appears to be no IO errors from the
underlying devices (md, ide). smartctl -a does not report errors for hda nor
hdc.

During reboot, fsck was run for md3, and it was clean. Now I get

--8<-----------------------------------------------------------------------
Block bitmap differences: -(7800660--7801060) -(7801934--7802030) -(7802370--7802602) -(7802604--7802613) -(7802681--7802700) -(7802715--7802716) -(7802726--7802732) -(7802744--7802750)-(7802914--7802927) -(7802934--7802937) -(7802946--7802964) -(7803392--7803417) -(7805060--7808825) -(7808976--7809608)
Fix? no

Inode bitmap differences: -3899400
Fix? no
--8<-----------------------------------------------------------------------

No errors from the badblocks part of the fsck, though.

Running fsck triggers the "journal commit I/O error" messages again, and
still no IO errors from either md or ide.

This _could_ have something to do with the vserver patch but it doesn't
appear so. Also, it doesn't immediately look like hardware problem.

Any ideas?

-- v --

***@iki.fi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Ville Herva
2005-03-28 16:58:04 UTC
Permalink
Post by Ville Herva
I just upgraded from linux-2.4.21 + vserser 0.17 to 2.4.30rc3 + vserver
1.2.10. The box has been running stable with 2.4.21 + vserver 0.17/0.16 for
a few years (uptime before reboot was nearly 400 days.)
The boot went fine, but after few hours I got
kernel: journal commit I/O error
and dmesg is filled with
--8<-----------------------------------------------------------------------
EXT3-fs error (device md(9,3)) in start_transaction: Journal has aborted
EXT3-fs error (device md(9,3)) in start_transaction: Journal has aborted
EXT3-fs error (device md(9,3)) in start_transaction: Journal has aborted
EXT3-fs error (device md(9,3)) in start_transaction: Journal has aborted
--8<-----------------------------------------------------------------------
This is roofs, on top software raid1 and two ide disks. mdstat claims it's
--8<-----------------------------------------------------------------------
md3 : active raid1 hdc3[1] hda3[0]
37955648 blocks [2/2] [UU]
--8<-----------------------------------------------------------------------
While dmesg has filled up and /var/log/messages is read-only - I can't see
all the kernel messages - there appears to be no IO errors from the
underlying devices (md, ide). smartctl -a does not report errors for hda nor
hdc.
During reboot, fsck was run for md3, and it was clean. Now I get
--8<-----------------------------------------------------------------------
Block bitmap differences: -(7800660--7801060) -(7801934--7802030) -(7802370--7802602) -(7802604--7802613) -(7802681--7802700) -(7802715--7802716) -(7802726--7802732) -(7802744--7802750)-(7802914--7802927) -(7802934--7802937) -(7802946--7802964) -(7803392--7803417) -(7805060--7808825) -(7808976--7809608)
Fix? no
Inode bitmap differences: -3899400
Fix? no
--8<-----------------------------------------------------------------------
No errors from the badblocks part of the fsck, though.
Running fsck triggers the "journal commit I/O error" messages again, and
still no IO errors from either md or ide.
This _could_ have something to do with the vserver patch but it doesn't
appear so. Also, it doesn't immediately look like hardware problem.
I rebooted (fsck took the fs errors away, no big offenders), and after a few
minutes, I got the same error ("journal commit I/O error"). So it doesn't
appear all that random memory corruption. The error happened right when I
logged out, but that might have been a coincidence. No ide nor md errors
this time either.

I don't know what to suspect. What I gather from changelogs, there haven't
been any critical looking ext3 changes in 2.4 lately, but then again,
vserver doesn't mess with block layer / ext3 journalling either.

Any ideas?

-- v --

***@iki.fi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Willy Tarreau
2005-03-28 17:30:00 UTC
Permalink
Hi Ville,

On Mon, Mar 28, 2005 at 07:55:01PM +0300, Ville Herva wrote:
(...)
Post by Ville Herva
I rebooted (fsck took the fs errors away, no big offenders), and after a few
minutes, I got the same error ("journal commit I/O error"). So it doesn't
appear all that random memory corruption. The error happened right when I
logged out, but that might have been a coincidence. No ide nor md errors
this time either.
I don't know what to suspect. What I gather from changelogs, there haven't
been any critical looking ext3 changes in 2.4 lately, but then again,
vserver doesn't mess with block layer / ext3 journalling either.
Any ideas?
Since you don't seem to be willing to remove vserver, I guess you really
need it on this machine, and to be honnest, I too don't see what trouble
it could cause in this area. However, could you try removing the journal,
or simply mount the FS as ext2 ? It would help to narrow the problem down.

To resume, you have your root on ext3 on top of soft raid1 consisting in
two IDE disks, which works in 2.4.21 but not on 2.4.30-rc3, that's
correct ? There was a fix last week by Neil Brown about RAID1 rebuild
process (degraded array of 3 disks, etc...), unless it obviously does
not come from there, you might want to try reverting it first ? The
next one is from Doug Ledford on 2004/09/18 and should only affect SMP.

My different raid machines run either reiserfs or xfs on soft raid5 on
top of scsi and with kernel 2.4.27, so there's not much to compare...
Perhaps someone on the list has a setup similar to yours and could test
the kernel ?

Cheers,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Ville Herva
2005-03-28 19:31:15 UTC
Permalink
Post by Willy Tarreau
Since you don't seem to be willing to remove vserver, I guess you really
need it on this machine, and to be honnest,
Yes, the machine is in production, and for that it, it needs vserver. The
fact it is in production also makes it a tad hankala awkward to test
different options, at least the most experimental ones.
Post by Willy Tarreau
I too don't see what trouble it could cause in this area.
Neither do I. Of course it could be memory corruption caused by vserver in
different part of kernel, but the fact that the symptoms are so consistent,
make me think that is unlikely. But not impossible.
Post by Willy Tarreau
However, could you try removing the journal, or simply mount the FS as
ext2 ? It would help to narrow the problem down.
That is a good idea, if only I could boot the box at will and reliably
reproduce the problem. While it took less than ten minutes to trigger it the
first time, it took more than five hours the first time.

I will try to reproduce the problem on different setup first, and if I can
do that, I'll try your suggestion.
Post by Willy Tarreau
To resume, you have your root on ext3 on top of soft raid1 consisting in
two IDE disks, which works in 2.4.21 but not on 2.4.30-rc3, that's
correct ?
Correct. This is PII 266MHz, no SMP.
Post by Willy Tarreau
There was a fix last week by Neil Brown about RAID1 rebuild process
(degraded array of 3 disks, etc...), unless it obviously does not come
from there, you might want to try reverting it first ?
Sounds sane, although the raid array was not in degraded state at any stage
and no raid rebuild aver triggered.
Post by Willy Tarreau
The next one is from Doug Ledford on 2004/09/18 and should only affect
SMP.
Ok, as said, this is UP.
Post by Willy Tarreau
My different raid machines run either reiserfs or xfs on soft raid5 on
top of scsi and with kernel 2.4.27, so there's not much to compare...
Perhaps someone on the list has a setup similar to yours and could test
the kernel ?
I will try to contruct a similar setup on another machine.

thanks for your insights,

-- v --

***@iki.fi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Neil Brown
2005-03-29 00:12:11 UTC
Permalink
Post by Ville Herva
I just upgraded from linux-2.4.21 + vserser 0.17 to 2.4.30rc3 + vserver
1.2.10. The box has been running stable with 2.4.21 + vserver 0.17/0.16 for
a few years (uptime before reboot was nearly 400 days.)
The boot went fine, but after few hours I got
kernel: journal commit I/O error
I got that error on 2.4.30-rc1 a couple of times, and now cannot
reproduce it :-(
But if you got it too, then it wasn't just bad luck.

The ext3 code in 2.4.30-rc does have a few more checks for IO errors
which will cause the journal to be aborted and produce this error, so
I suspect that change which caused the problem is a change in ext3.
However that doesn't mean the bug is there.

The extra code in ext3 seems to just check if buffer_uptodate is false
after it has waited on a locked buffer, and triggers a journal abort
if it isn't. This should be perfectly safe, and I cannot find any
logic error near by. But nor can I find any errors that would cause a
buffer returned from raid1 to not be uptodate (unless there really was
an IO error).

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Loading...