11 comments

  • bhaney 2 hours ago

    My dmesg is already constantly full of

      INFO: task btrfs:103945 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    
    Until eventually

      Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
    
    So I'm looking forward to getting an actual count of how often this happens without needing to babysit the warning suppressions and count the incidents myself.
    • jeffbee an hour ago

      You could leave this problem behind by switching to a filesystem that isn't full of deadlock bugs.

      • yjftsjthsd-h 26 minutes ago

        I am curious - is this message indicative of a problem in the fs? I would have assumed anything marked "INFO" is, tautologically, not an error, but surely a filesystem shouldn't be locking up? Or is it just suggestive of high system load or poor hardware performance?

        • blueflow 11 minutes ago

          That the in-kernel code for btrfs locks up should never happen at all. There is a rumor going around that btrfs never reached maturity and suffers from design issues.

  • gcr 4 hours ago

    What counts as a hung task? Blocking on unsatisfiable I/O for more than X seconds? Scheduler hasn’t gotten to it in X seconds?

    If a server process is blocking on accept(), wouldn’t it count as hung until a remote client connects? or do only certain operations count?

    • westurner 3 hours ago

      torvalds/linux//kernel/hung_task.c :

      static void check_hung_task(struct task_struct *t, unsigned long timeout) https://github.com/torvalds/linux/blob/9f16d5e6f220661f73b36...

      static void check_hung_uninterruptible_tasks(unsigned long timeout) https://github.com/torvalds/linux/blob/9f16d5e6f220661f73b36...

      • striking 3 hours ago

        Just to double check my understanding (because being wrong on the internet is perhaps the fastest way to get people to check your work):

        Is this saying that regular tasks that haven't been scheduled for two minutes and tasks that are uninterruptible (truly so, not idle or also killable despite being marked as uninterruptible) that haven't been woken up for two minutes are counted?

        • westurner 2 hours ago

          Your and the Llama's explanations would make good comments for the source and/or the docs if true.

  • ape4 3 hours ago
    • Polizeiposaune 3 hours ago

      Not the same thing by any means - they don't indicate something is wrong with kernel or hardware.

      The zombie process state is a normal transient state for all exiting processes where the only remaining function of the process is as a container for the exiting process's id and exit status; they go away once the parent process calls some flavor of the "wait" system call to collect the exit status. A pileup of zombies indicates a userspace bug: a negligent parent process that isn't collecting the exit status in a timely manner.

      • thwarted 2 hours ago

        Additionally, there are a few more process accounting things, rusage, that zombie processes hold until reaped. See wait3(2), wait4(2) and getrusage(2).