Discussion:
swsusp 'disk' fails in bk-current - intel_agp at fault?
(too old to reply)
Andy Isaacson
2005-03-23 18:52:32 UTC
Permalink
I was previously running 2.6.11-rc3 and swsusp was working quite nicely:
echo shutdown > /sys/power/disk
echo disk > /sys/power/state

Now I've upgraded to 2.6.12-rc1, 423b66b6oJOGN68OhmSrBFxxLOtIEA, and it
no longer works reliably. Almost every time I do the above it blocks in
device_resume() (I haven't had time to track it deeper than that).
Here's the output (hand copied):

[ 51.782593] [nosave pfn 0x356]<7>[nosave pfn 0x357]swsusp:critical section/: done (122772 pages copied)
[ 54.305996] PM: writing image.
[ 54.306032] /usr/src/linux-2.6-cvs/kernel/power/swsusp.c:863
[ 54.316885] e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
_

(Obviously, I added some printks to track where it's blocking.)

Dmesg is attached; hardware is a Vaio r505te.

Unfortunately, the deadlock (?) is nondeterministic; it *sometimes*
suspends successfully, maybe one time out of 10. And thinking back, I
*sometimes* saw failures to suspend with 2.6.11-rc3, maybe one failure
out of 20 suspends.

Another interesting tidbit - I had more success when I tried it without
the intel_agp module loaded; I haven't seen a lockup yet. (But why
can't I rmmod intel_agp?)

-andy
Stefan Seyfried
2005-03-24 17:05:36 UTC
Permalink
Post by Andy Isaacson
Dmesg is attached; hardware is a Vaio r505te.
Unfortunately, the deadlock (?) is nondeterministic; it *sometimes*
suspends successfully, maybe one time out of 10. And thinking back, I
*sometimes* saw failures to suspend with 2.6.11-rc3, maybe one failure
out of 20 suspends.
Does it hang hard or is sysrq still working?
If sysrq is still working, please try with "i8042.noaux" (this will kill
your touchpad, which is what i intend :-)

Best regards,

Stefan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Andy Isaacson
2005-03-24 18:13:35 UTC
Permalink
Post by Stefan Seyfried
Post by Andy Isaacson
Dmesg is attached; hardware is a Vaio r505te.
Unfortunately, the deadlock (?) is nondeterministic; it *sometimes*
suspends successfully, maybe one time out of 10. And thinking back, I
*sometimes* saw failures to suspend with 2.6.11-rc3, maybe one failure
out of 20 suspends.
Does it hang hard or is sysrq still working?
Sysrq still prints stuff, so IRQs aren't locked. But most of the sysrq
commands don't work... S and U don't seem to do anything (not too
suprising I suppose) but B does reboot.
Post by Stefan Seyfried
If sysrq is still working, please try with "i8042.noaux" (this will kill
your touchpad, which is what i intend :-)
So I added i8042.noaux to my kernel command line, rebooted, insmodded
intel_agp, started X, and verified no touchpad action. Then I
suspended, and it worked fine. After restart, I suspended again - also
fine.

So I think that fixed it. But no touchpad is a bit annoying. :)

-andy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Dmitry Torokhov
2005-03-24 19:21:27 UTC
Permalink
Post by Andy Isaacson
So I added i8042.noaux to my kernel command line, rebooted, insmodded
intel_agp, started X, and verified no touchpad action. Then I
suspended, and it worked fine. After restart, I suspended again - also
fine.
So I think that fixed it. But no touchpad is a bit annoying. :)
Try adding i8042.nomux instead of i8042.noaux, it should keep your
touchpad in working condition. Please let me know if it still wiorks.
--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Andy Isaacson
2005-03-24 20:27:13 UTC
Permalink
Post by Dmitry Torokhov
Post by Andy Isaacson
So I added i8042.noaux to my kernel command line, rebooted, insmodded
intel_agp, started X, and verified no touchpad action. Then I
suspended, and it worked fine. After restart, I suspended again - also
fine.
So I think that fixed it. But no touchpad is a bit annoying. :)
Try adding i8042.nomux instead of i8042.noaux, it should keep your
touchpad in working condition. Please let me know if it still wiorks.
With nomux the touchpad works again, but suspend blocks in the same
place as without nomux.

(How can I verify that "nomux" was accepted? It shows up on the "Kernel
command line" but there's no other mention of it in dmesg.)

-andy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Dmitry Torokhov
2005-03-24 21:13:15 UTC
Permalink
Post by Andy Isaacson
Post by Dmitry Torokhov
Post by Andy Isaacson
So I added i8042.noaux to my kernel command line, rebooted, insmodded
intel_agp, started X, and verified no touchpad action. Then I
suspended, and it worked fine. After restart, I suspended again - also
fine.
So I think that fixed it. But no touchpad is a bit annoying. :)
Try adding i8042.nomux instead of i8042.noaux, it should keep your
touchpad in working condition. Please let me know if it still wiorks.
With nomux the touchpad works again, but suspend blocks in the same
place as without nomux.
(How can I verify that "nomux" was accepted? It shows up on the "Kernel
command line" but there's no other mention of it in dmesg.)
-andy
If you do "ls /sys/bus/serio/devices" and see more than 3 ports you
have MUX mode active.
--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Dmitry Torokhov
2005-03-24 21:17:50 UTC
Permalink
Post by Andy Isaacson
Post by Dmitry Torokhov
Post by Andy Isaacson
So I added i8042.noaux to my kernel command line, rebooted, insmodded
intel_agp, started X, and verified no touchpad action. Then I
suspended, and it worked fine. After restart, I suspended again - also
fine.
So I think that fixed it. But no touchpad is a bit annoying. :)
Try adding i8042.nomux instead of i8042.noaux, it should keep your
touchpad in working condition. Please let me know if it still wiorks.
With nomux the touchpad works again, but suspend blocks in the same
place as without nomux.
(How can I verify that "nomux" was accepted? It shows up on the "Kernel
command line" but there's no other mention of it in dmesg.)
Ignore my babbling, I just noticed in your dmesg that your KBC does
not support MUX mode to begin with.
--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Stefan Seyfried
2005-03-24 20:45:04 UTC
Permalink
Post by Andy Isaacson
Sysrq still prints stuff, so IRQs aren't locked. But most of the sysrq
commands don't work... S and U don't seem to do anything (not too
suprising I suppose) but B does reboot.
sysrq-t will probably show a stuck kseriod. Unfortunately it only
happens on one machine for me (toshiba P10-550 IIRC, P4HT but with
non-smp kernel) which has no serial port for console.
Post by Andy Isaacson
Post by Stefan Seyfried
If sysrq is still working, please try with "i8042.noaux" (this will kill
your touchpad, which is what i intend :-)
So I added i8042.noaux to my kernel command line, rebooted, insmodded
intel_agp, started X, and verified no touchpad action. Then I
suspended, and it worked fine. After restart, I suspended again - also
fine.
So I think that fixed it. But no touchpad is a bit annoying. :)
Yes, it was not thought as a fix but just for verification, since i have
seen something similar.
We have a SUSE bug for this, i believe Vojtech and Pavel will take care
of this one. Thanks for confirming, i almost started to believe i was
seeing ghosts :-)
--
seife
Never trust a computer you can't lift.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Andy Isaacson
2005-03-24 23:56:51 UTC
Permalink
Post by Dmitry Torokhov
If you do "ls /sys/bus/serio/devices" and see more than 3 ports you
have MUX mode active.
Just serio0 and serio1.
Post by Dmitry Torokhov
Post by Andy Isaacson
(How can I verify that "nomux" was accepted? It shows up on the "Kernel
command line" but there's no other mention of it in dmesg.)
Ignore my babbling, I just noticed in your dmesg that your KBC does
not support MUX mode to begin with.
OK, anything else I should try?

Why does it only fail when I have *both* intel_agp and i8042 aux?

In the SysRq-T trace I see one interesting process: most things are
in D state in refrigerator(), but sh shows the following traceback:

wait_for_completion
call_usermodehelper
kobject_hotplug
kobject_del
class_device_del
class_device_unregister
mousedev_disconnect
input_unregister_device
alps_disconnect
psmouse_disconnect
serio_driver_remove
device_release_driver
serio_release_driver
serio_resume
resume_device
dpm_resume
device_resume
swsusp_write
pm_suspend_disk
enter_state
state_store
subsys_attr_store
flush_write_buffer
sysfs_write_file
...

That seems odd to me...

Also, khelper has the following trace:
io_schedule
sync_buffer
__wait_on_bit
out_of_line_wait_on_bit
ext3_find_entry
ext3_lookup
real_lookup
do_lookup
__link_path_walk
link_path_walk
path_lookup
open_exec
do_execve
...

-andy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Stefan Seyfried
2005-03-25 09:24:21 UTC
Permalink
Post by Andy Isaacson
OK, anything else I should try?
not really, i just wait for Vojtech and Pavel :-)
Post by Andy Isaacson
Why does it only fail when I have *both* intel_agp and i8042 aux?
later...
Post by Andy Isaacson
In the SysRq-T trace I see one interesting process: most things are
wait_for_completion
call_usermodehelper
kobject_hotplug
kobject_del
class_device_del
class_device_unregister
mousedev_disconnect
input_unregister_device
alps_disconnect
psmouse_disconnect
serio_driver_remove
device_release_driver
serio_release_driver
i think the following happens (but i am in no case an expert for this):
- alps driver suspends
- alps driver unregisters the device
- udev is called via call_usermodehelper (which fails since userspace
is stopped)
- now somebody wants to wait for udev which does not work right.

Why only with the ALPS driver and intel_agp?
I think this is an accident. For me, it happens only with init=/bin/bash
and _no_ other drivers loaded (only IDE drivers and psmouse built-in).
As soon as i load any other drivers (i have only tried ehci_hcd and
8139too, to be honest) it works fine again. This leads me to believe it
is a race condition since the extra driver that has to be suspended may
give the ALPS driver the extra time needed to finish the race. For you,
it may be the other way round.

This is mostly guesswork, i am no kernel expert at all.
--
seife
Never trust a computer you can't lift.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Pavel Machek
2005-03-25 10:17:51 UTC
Permalink
Hi!
Post by Stefan Seyfried
Post by Andy Isaacson
OK, anything else I should try?
not really, i just wait for Vojtech and Pavel :-)
Try commenting out "call_usermodehelper". If that helps, Stefan's
theory is confirmed, and this waits for Vojtech to fix it.
Post by Stefan Seyfried
Post by Andy Isaacson
In the SysRq-T trace I see one interesting process: most things are
wait_for_completion
call_usermodehelper
kobject_hotplug
kobject_del
class_device_del
class_device_unregister
mousedev_disconnect
input_unregister_device
alps_disconnect
psmouse_disconnect
serio_driver_remove
device_release_driver
serio_release_driver
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Dmitry Torokhov
2005-03-25 14:54:38 UTC
Permalink
Hi!
Post by Pavel Machek
Post by Stefan Seyfried
Post by Andy Isaacson
OK, anything else I should try?
not really, i just wait for Vojtech and Pavel :-)
Try commenting out "call_usermodehelper". If that helps, Stefan's
theory is confirmed, and this waits for Vojtech to fix it.
This is more of a general swsusp problem I believe - the second phase
when it blindly resumes entire system. Resume of a device can fail
(any reason whatsoever) and it will attempt to clean up after itself,
but userspace is dead and hotplug never completes. While I am
interested to know why ALPS does not want to resume on ANdy's laptop
the issue will never be completely resolved from within the input
system.
When device fails to resume, what should I do? I think I could
if (error)
panic("Device resume failed\n");
, but... that does not look like what you want.
Oh, always panic-happy Pavel ;). It really depends on what kind of
device has faled to resume. If the device is really needed for writing
image then panic is the only recourse, but if it some other device you
resuming just ignore it, who cares...

Btw, I dont think that doing selective resume (as opposed to selective
suspend and Nigel's partial device trees) would be so much
complicated. You'd always resume sysdevs and then, when iterating over
"normal" devices, just skip ones not in resume path. It can all be
contained in driver core I believe (sorry but no patch, for now at
least).
Pavel, is it possible for swsusp to disable hotplug (probably just do
hotplug_path[0] = 0) before resuming in suspend phase?
It feels like a hack, but yes, I probably could do that. (Do you have
patch to try?)
Not really, I won't be able to write any code anything till next week I think.
--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Pavel Machek
2005-03-25 15:44:43 UTC
Permalink
Hi!
Post by Dmitry Torokhov
This is more of a general swsusp problem I believe - the second phase
when it blindly resumes entire system. Resume of a device can fail
(any reason whatsoever) and it will attempt to clean up after itself,
but userspace is dead and hotplug never completes. While I am
interested to know why ALPS does not want to resume on ANdy's laptop
the issue will never be completely resolved from within the input
system.
When device fails to resume, what should I do? I think I could
if (error)
panic("Device resume failed\n");
, but... that does not look like what you want.
Oh, always panic-happy Pavel ;). It really depends on what kind of
device has faled to resume. If the device is really needed for writing
image then panic is the only recourse, but if it some other device you
resuming just ignore it, who cares...
You are right, for resume-during-suspend, we may as well risk it. We
have consistent state, and if we happen to write it on disk,
everything is okay.

For resume-during-resume, I don't really know how we can handle
that. Running with some devices non-working seems dangerous to me.
Post by Dmitry Torokhov
Btw, I dont think that doing selective resume (as opposed to selective
suspend and Nigel's partial device trees) would be so much
complicated. You'd always resume sysdevs and then, when iterating over
"normal" devices, just skip ones not in resume path. It can all be
contained in driver core I believe (sorry but no patch, for now at
least).
:-) I think we can simply make device freeze/unfreeze fast enough.
[We do not need to do full suspend/resume; freeze is enough].

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Dmitry Torokhov
2005-03-25 16:06:33 UTC
Permalink
Post by Pavel Machek
Hi!
Post by Dmitry Torokhov
This is more of a general swsusp problem I believe - the second phase
when it blindly resumes entire system. Resume of a device can fail
(any reason whatsoever) and it will attempt to clean up after itself,
but userspace is dead and hotplug never completes. While I am
interested to know why ALPS does not want to resume on ANdy's laptop
the issue will never be completely resolved from within the input
system.
When device fails to resume, what should I do? I think I could
if (error)
panic("Device resume failed\n");
, but... that does not look like what you want.
Oh, always panic-happy Pavel ;). It really depends on what kind of
device has faled to resume. If the device is really needed for writing
image then panic is the only recourse, but if it some other device you
resuming just ignore it, who cares...
You are right, for resume-during-suspend, we may as well risk it. We
have consistent state, and if we happen to write it on disk,
everything is okay.
For resume-during-resume, I don't really know how we can handle
that. Running with some devices non-working seems dangerous to me.
I think it again varies, and the driver would have to decide what to
do if it can not resume hardware. Take for example USB - i believe USB
guys are shooting at being able to disconnect device while the box is
suspended and have it removed from the system when resuming. In
Probably every driver that has even a slighest notion of
hot-pluggability should just properly clean up after itself and not
signal error to the core.
Post by Pavel Machek
Post by Dmitry Torokhov
Btw, I dont think that doing selective resume (as opposed to selective
suspend and Nigel's partial device trees) would be so much
complicated. You'd always resume sysdevs and then, when iterating over
"normal" devices, just skip ones not in resume path. It can all be
contained in driver core I believe (sorry but no patch, for now at
least).
:-) I think we can simply make device freeze/unfreeze fast enough.
[We do not need to do full suspend/resume; freeze is enough].
It is not suspend/freeze here that gets us but resume and with resume
the driver (at least for now) does not have any idea if it is
"unfreeze" or "full-resume". I mean I could have serio just ignore
"unfreeze" requests (as I doubt anyone would ever try to suspend over
PS/2 port ;) ) but I think it should be really handled by the core.
--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Pavel Machek
2005-03-28 23:03:15 UTC
Permalink
Hi!
Post by Dmitry Torokhov
Post by Pavel Machek
Post by Dmitry Torokhov
Btw, I dont think that doing selective resume (as opposed to selective
suspend and Nigel's partial device trees) would be so much
complicated. You'd always resume sysdevs and then, when iterating over
"normal" devices, just skip ones not in resume path. It can all be
contained in driver core I believe (sorry but no patch, for now at
least).
:-) I think we can simply make device freeze/unfreeze fast enough.
[We do not need to do full suspend/resume; freeze is enough].
It is not suspend/freeze here that gets us but resume and with resume
the driver (at least for now) does not have any idea if it is
"unfreeze" or "full-resume". I mean I could have serio just ignore
"unfreeze" requests (as I doubt anyone would ever try to suspend over
PS/2 port ;) ) but I think it should be really handled by the core.
Please just always do full-resume... for now. Patches that enable you
to detect "unfreeze" are not in, yet. If something fails, just printk
with big enough severity and continue, as you don't have method of
signaling error, anyway.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Pavel Machek
2005-03-25 14:26:34 UTC
Permalink
Hi!
Post by Pavel Machek
Post by Stefan Seyfried
Post by Andy Isaacson
OK, anything else I should try?
not really, i just wait for Vojtech and Pavel :-)
Try commenting out "call_usermodehelper". If that helps, Stefan's
theory is confirmed, and this waits for Vojtech to fix it.
This is more of a general swsusp problem I believe - the second phase
when it blindly resumes entire system. Resume of a device can fail
(any reason whatsoever) and it will attempt to clean up after itself,
but userspace is dead and hotplug never completes. While I am
interested to know why ALPS does not want to resume on ANdy's laptop
the issue will never be completely resolved from within the input
system.
When device fails to resume, what should I do? I think I could

if (error)
panic("Device resume failed\n");

, but... that does not look like what you want.
Pavel, is it possible for swsusp to disable hotplug (probably just do
hotplug_path[0] = 0) before resuming in suspend phase?
It feels like a hack, but yes, I probably could do that. (Do you have
patch to try?)
A bit on tangent - you need to resume system so you can write the
image, right? I wonder if we could add a flag to struct device that
would mark device as "on_resume_path". The flag would be set when you
select resume partition and propagated to the root of the system. Then
when resume after making the image you could skip all devices that are
not on resume path.
I'm not going to do that, see FAQ in swsusp.txt.

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Dmitry Torokhov
2005-03-25 14:22:26 UTC
Permalink
Hi,
Post by Pavel Machek
Hi!
Post by Stefan Seyfried
Post by Andy Isaacson
OK, anything else I should try?
not really, i just wait for Vojtech and Pavel :-)
Try commenting out "call_usermodehelper". If that helps, Stefan's
theory is confirmed, and this waits for Vojtech to fix it.
This is more of a general swsusp problem I believe - the second phase
when it blindly resumes entire system. Resume of a device can fail
(any reason whatsoever) and it will attempt to clean up after itself,
but userspace is dead and hotplug never completes. While I am
interested to know why ALPS does not want to resume on ANdy's laptop
the issue will never be completely resolved from within the input
system.

Pavel, is it possible for swsusp to disable hotplug (probably just do
hotplug_path[0] = 0) before resuming in suspend phase?

A bit on tangent - you need to resume system so you can write the
image, right? I wonder if we could add a flag to struct device that
would mark device as "on_resume_path". The flag would be set when you
select resume partition and propagated to the root of the system. Then
when resume after making the image you could skip all devices that are
not on resume path.
--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Andy Isaacson
2005-03-25 18:41:07 UTC
Permalink
Post by Pavel Machek
Hi!
Post by Stefan Seyfried
Post by Andy Isaacson
OK, anything else I should try?
not really, i just wait for Vojtech and Pavel :-)
Try commenting out "call_usermodehelper". If that helps, Stefan's
theory is confirmed, and this waits for Vojtech to fix it.
Post by Stefan Seyfried
Post by Andy Isaacson
wait_for_completion
call_usermodehelper
kobject_hotplug
kobject_del
Without the call_usermodehelper in kobject_hotplug, the first suspend
seems to work OK (which I think confirms the theory). But after resume,
the second suspend hangs in the same place. It's calling
call_usermodehelper from input_call_hotplug... time to comment out
another one and recompile.

I also tried -mm1 and it hangs in the same place.

-andy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Dmitry Torokhov
2005-03-29 16:21:05 UTC
Permalink
Post by Stefan Seyfried
Post by Andy Isaacson
In the SysRq-T trace I see one interesting process: most things are
wait_for_completion
call_usermodehelper
kobject_hotplug
kobject_del
class_device_del
class_device_unregister
mousedev_disconnect
input_unregister_device
alps_disconnect
psmouse_disconnect
serio_driver_remove
device_release_driver
serio_release_driver
- alps driver suspends
- alps driver unregisters the device
- udev is called via call_usermodehelper (which fails since userspace
is stopped)
- now somebody wants to wait for udev which does not work right.
The thing is that kobject_uevent calls call_usermodehelper with
wait=0. That means that it conly waits for execve("/sbin/hotplug")
call to complete, it does not wait for the entire process ti complete.

If you look at Andy's second trace you will see that we are waiting
for the disk I/O to get /sbin/hotplug from the disk. Pavel, do you
know why IO does not complete? khelper is a kernel thread so it is
marked with
PF_NOFREEZE. Could it be that we managed to freeze kblockd?
--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Pavel Machek
2005-03-29 18:22:32 UTC
Permalink
Hi!
Post by Dmitry Torokhov
Post by Stefan Seyfried
Post by Andy Isaacson
In the SysRq-T trace I see one interesting process: most things are
wait_for_completion
call_usermodehelper
kobject_hotplug
kobject_del
class_device_del
class_device_unregister
mousedev_disconnect
input_unregister_device
alps_disconnect
psmouse_disconnect
serio_driver_remove
device_release_driver
serio_release_driver
- alps driver suspends
- alps driver unregisters the device
- udev is called via call_usermodehelper (which fails since userspace
is stopped)
- now somebody wants to wait for udev which does not work right.
The thing is that kobject_uevent calls call_usermodehelper with
wait=0. That means that it conly waits for execve("/sbin/hotplug")
call to complete, it does not wait for the entire process ti complete.
If you look at Andy's second trace you will see that we are waiting
for the disk I/O to get /sbin/hotplug from the disk. Pavel, do you
know why IO does not complete? khelper is a kernel thread so it is
marked with
PF_NOFREEZE. Could it be that we managed to freeze kblockd?
Uf, no idea about kblockd freezing -- we certainly should not.

*But*, if we are doing execve while system is frozen, something is
very wrong. We should not be doing execve in the first place.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Pavel Machek
2005-03-29 19:37:37 UTC
Permalink
Hi!
Post by Pavel Machek
Post by Dmitry Torokhov
If you look at Andy's second trace you will see that we are waiting
for the disk I/O to get /sbin/hotplug from the disk. Pavel, do you
know why IO does not complete? khelper is a kernel thread so it is
marked with
PF_NOFREEZE. Could it be that we managed to freeze kblockd?
Uf, no idea about kblockd freezing -- we certainly should not.
*But*, if we are doing execve while system is frozen, something is
very wrong. We should not be doing execve in the first place.
Well, there lies a problem - some devices have to do execve because
they need firmware to operate. Also, again, some buses with
hot-pluggable devices will attempt to clean up unsuccessful resume and
this will cause hotplug events. The point is you either resume system
or you don't. We probably need a separate "unfreeze" callback,
although this is kind of messy.
There's a better solution for firmware: You should load your firmware
prior to suspend and store it in RAM. Anything else just plain does
not work. (Because your wireless firmware might be on NFS mounted over
that wireless card).

Hotplug... I guess udev just needs to hold that callbacks before
system is fully up... it has to do something similar on regular boot,
no?
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Dmitry Torokhov
2005-03-29 19:13:40 UTC
Permalink
Post by Pavel Machek
Hi!
Post by Dmitry Torokhov
If you look at Andy's second trace you will see that we are waiting
for the disk I/O to get /sbin/hotplug from the disk. Pavel, do you
know why IO does not complete? khelper is a kernel thread so it is
marked with
PF_NOFREEZE. Could it be that we managed to freeze kblockd?
Uf, no idea about kblockd freezing -- we certainly should not.
*But*, if we are doing execve while system is frozen, something is
very wrong. We should not be doing execve in the first place.
Well, there lies a problem - some devices have to do execve because
they need firmware to operate. Also, again, some buses with
hot-pluggable devices will attempt to clean up unsuccessful resume and
this will cause hotplug events. The point is you either resume system
or you don't. We probably need a separate "unfreeze" callback,
although this is kind of messy.
--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Dmitry Torokhov
2005-03-25 15:00:37 UTC
Permalink
Post by Andy Isaacson
Post by Dmitry Torokhov
If you do "ls /sys/bus/serio/devices" and see more than 3 ports you
have MUX mode active.
Just serio0 and serio1.
Post by Dmitry Torokhov
Post by Andy Isaacson
(How can I verify that "nomux" was accepted? It shows up on the "Kernel
command line" but there's no other mention of it in dmesg.)
Ignore my babbling, I just noticed in your dmesg that your KBC does
not support MUX mode to begin with.
OK, anything else I should try?
Why does it only fail when I have *both* intel_agp and i8042 aux?
In the SysRq-T trace I see one interesting process: most things are
wait_for_completion
call_usermodehelper
kobject_hotplug
kobject_del
class_device_del
class_device_unregister
mousedev_disconnect
input_unregister_device
alps_disconnect
psmouse_disconnect
serio_driver_remove
device_release_driver
serio_release_driver
serio_resume
I wonder why ALPS reconnect failed. You don't have a serial console
set up, do you? If not then maybe you could make a huge framebuffer to
capture as much info as you can... I hope you have a digital camera ;)

Then do "echo 1 > /sys/modules/i8042/parameters/debug" and try to
suspend. I am interested of data coming in and out of i8042.
--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Dmitry Torokhov
2005-03-29 19:41:30 UTC
Permalink
Post by Stefan Seyfried
Post by Andy Isaacson
So I added i8042.noaux to my kernel command line, rebooted, insmodded
intel_agp, started X, and verified no touchpad action. Then I
suspended, and it worked fine. After restart, I suspended again - also
fine.
So I think that fixed it. But no touchpad is a bit annoying. :)
Yes, it was not thought as a fix but just for verification, since i have
seen something similar.
We have a SUSE bug for this, i believe Vojtech and Pavel will take care
of this one. Thanks for confirming, i almost started to believe i was
seeing ghosts :-)
Could you please try the patch below - it should fix the issues you are
seeing although there may be other devices (really any hot-pluggable
device) that will show the same behaviour. In the long run swsusp should
not attempt resuming devices when the system can not handle the process
properly.
--
Dmitry

===================================================================

Input: serio - do not attempt to immediately disconnect port if
resume failed, let kseriod take care of it. Otherwise we
may attempt to unregister associated input devices which
will generate hotplug events which are not handled well
during swsusp.

Signed-off-by: Dmitry Torokhov <***@mail.ru>

serio.c | 1 -
1 files changed, 1 deletion(-)

Index: dtor/drivers/input/serio/serio.c
===================================================================
--- dtor.orig/drivers/input/serio/serio.c
+++ dtor/drivers/input/serio/serio.c
@@ -779,7 +779,6 @@ static int serio_resume(struct device *d
struct serio *serio = to_serio_port(dev);

if (!serio->drv || !serio->drv->reconnect || serio->drv->reconnect(serio)) {
- serio_disconnect_port(serio);
/*
* Driver re-probing can take a while, so better let kseriod
* deal with it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



-------------------------------------------------------------------------------
Achtung: diese Newsgruppe ist eine unidirektional gegatete Mailingliste.
Antworten nur per Mail an die im Reply-To-Header angegebene Adresse.
Fragen zum Gateway -> ***@inka.de.
-------------------------------------------------------------------------------
Loading...