サーバ不調

少し前からサーバが不調で時々ダウンする。CPU温度のログ取ってたけど変化は現れない。偶々ターミナルでsyslog出力が引っかかったので保存しておきました。

miyajima@hana:~$
Message from syslogd@hana at Aug 22 21:20:36 …
kernel:[90419.426703] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 4: ba00000081000402

Message from syslogd@hana at Aug 22 21:20:36 …
kernel:[90419.427512] mce: [Hardware Error]: RIP !INEXACT! 10:

Message from syslogd@hana at Aug 22 21:20:36 …
kernel:[90419.428372] mce: [Hardware Error]: TSC 12fa28570028d

Message from syslogd@hana at Aug 22 21:20:36 …
kernel:[90419.428835] mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1661170836 SOCKET 0 APIC 0 microcode 1c

Message from syslogd@hana at Aug 22 21:20:36 …
kernel:[90419.429518] mce: [Hardware Error]: Run the above through ‘mcelog –ascii’

Message from syslogd@hana at Aug 22 21:20:36 …
kernel:[90419.430070] mce: [Hardware Error]: Machine check: Processor context corrupt

Message from syslogd@hana at Aug 22 21:20:36 …
kernel:[90419.430622] Kernel panic – not syncing: Fatal machine check

Message from syslogd@hana at Aug 22 21:20:36 …
kernel:[90419.431135] Kernel Offset: 0x1000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

CPU0が怪しいっぽいけど、結局は熱暴走なのかな?