Thứ Ba, 24 tháng 5, 2016

Ubuntu cloud instance hang at boot screen

When you reboot ubuntu cloud instance, sometimes it hang in boot screen with some lines such as:
random: nonblocking pool is initialized
or
random: init urandom read with 94 bits of entropy available

Reboot it, hard reboot but it was still stuck at that.

Solution: migrate instance to another compute host

Thứ Tư, 18 tháng 5, 2016

Kernel hanging tty in CentOS 6.6

In CentOS 6.6, version kernel 2.6.32-504.el6.x86_64, sometimes I could not connect to the server.
I can run ssh but can not get tty console.
After googling, I saw a question in stackoverflow
http://stackoverflow.com/questions/26628274/kernel-hanging-the-tty-subsystem
/var/log/messages
 kernel: INFO: task bash:44739 blocked for more than 120 seconds.
 kernel:      Not tainted 2.6.32-504.el6.x86_64 #1
 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 kernel: bash          D 0000000000000006     0 44739      1 0x00000080
 kernel: ffff88023acaf898 0000000000000046 0000000000000000 ffff8803b5580d00
 kernel: ffff88023acaf868 ffffffff81271aa4 00004e9cd768ce30 ffff88043c279800
 kernel: ffff88043c2799e0 0000000105231542 ffff88023bbe5098 ffff88023acaffd8
 kernel: Call Trace:
 kernel: [] ? blk_queue_bio+0x494/0x610
 kernel: [] schedule_timeout+0x215/0x2e0
 kernel: [] wait_for_common+0x123/0x180
 kernel: [] ? default_wake_function+0x0/0x20
 kernel: [] wait_for_completion+0x1d/0x20
 kernel: [] flush_cpu_workqueue+0x61/0x90
 kernel: [] ? wq_barrier_func+0x0/0x20
 kernel: [] flush_workqueue+0x54/0x80
 kernel: [] flush_scheduled_work+0x15/0x20
 kernel: [] tty_ldisc_release+0x3c/0x90
 kernel: [] tty_release_dev+0x40b/0x5e0
 kernel: [] ? __dec_zone_page_state+0x2e/0x30
 kernel: [] tty_release+0x1e/0x30
 kernel: [] __fput+0xf5/0x210
 kernel: [] fput+0x25/0x30
 kernel: [] filp_close+0x5d/0x90
 kernel: [] put_files_struct+0x7f/0xf0
 kernel: [] exit_files+0x53/0x70
 kernel: [] do_exit+0x18d/0x870
 kernel: [] ? __sigqueue_free+0x3d/0x50
 kernel: [] ? __dequeue_signal+0x102/0x200
 kernel: [] do_group_exit+0x58/0xd0
 kernel: [] get_signal_to_deliver+0x1f6/0x460
 kernel: [] do_signal+0x75/0x800
 kernel: [] ? __audit_syscall_exit+0x25e/0x290
 kernel: [] do_notify_resume+0x90/0xc0
 kernel: [] int_signal+0x12/0x17

or
 kernel: INFO: task sshd:25489 blocked for more than 120 seconds.
 kernel:      Not tainted 2.6.32-504.el6.x86_64 #1
 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 kernel: sshd          D 0000000000000000     0 25489  26121 0x00000080
 kernel: ffff88023a8cf728 0000000000000082 0000000000000000 0000000000000000
 kernel: ffff88023a8cf7f8 ffffffff8105ca34 00006b83e91acd8b ffff88023a8cf708
 kernel: ffff88023a8cf880 00000001070847e8 ffff880239e48638 ffff88023a8cffd8
 kernel: Call Trace:
 kernel: [] ? find_busiest_group+0x244/0x9e0
 kernel: [] schedule_timeout+0x215/0x2e0
 kernel: [] wait_for_common+0x123/0x180
 kernel: [] ? default_wake_function+0x0/0x20
 kernel: [] wait_for_completion+0x1d/0x20
 kernel: [] flush_work+0x77/0xc0
 kernel: [] ? wq_barrier_func+0x0/0x20
 kernel: [] flush_delayed_work+0x54/0x70
 kernel: [] tty_flush_to_ldisc+0x15/0x20
 kernel: [] n_tty_poll+0x67/0x1d0
 kernel: [] tty_poll+0x8a/0xa0
 kernel: [] do_select+0x3c5/0x7c0
 kernel: [] ? ip_finish_output+0x148/0x310
 kernel: [] ? __pollwait+0x0/0xf0
 kernel: [] ? pollwake+0x0/0x60
 kernel: [] ? pollwake+0x0/0x60
 kernel: [] ? pollwake+0x0/0x60
 kernel: [] ? pollwake+0x0/0x60
 kernel: [] ? _spin_unlock_bh+0x1b/0x20
 kernel: [] ? release_sock+0xe5/0x110
 kernel: [] ? tcp_sendmsg+0x73c/0xa20
 kernel: [] ? sock_aio_write+0x19b/0x1c0
 kernel: [] ? tty_wakeup+0x3d/0x80
 kernel: [] core_sys_select+0x18a/0x2c0
 kernel: [] ? n_tty_read+0x3ad/0x950
 kernel: [] ? autoremove_wake_function+0x0/0x40
 kernel: [] sys_select+0x47/0x110

Solution: yum upgrade to install new kernel, and reboot system to using new kernel.