One of our core servers had all disks full which resulted into frozen VMs serving some core APIs and cryptocurrency nodes. After a cleanup and resuming VMs, some cryptocurrency nodes became unsynced with their networks due to system time desync while VMs were suspended. We have reset system time on all VMs as well as hypervisor's system time which resolved the problem.
All pending orders (500+) that were stuck during that time were processed.
Our backend engine node server became unresponsive due to hardware failure related to the network adapter's kernel bug (related and absolutely same behaviour reported here)
10 minutes after server unresponsiveness we were eventually able to login via SSH by some attempts to see that the problem is the failing network adapter caused by IOMMU-related kernel bug triggered inside our Tor router VM.
We have assumed safe to soft-reboot the server since there were no signs of any physical server intervention. We have also checked the bootloader integrity and changed disk encryption keys after a reboot.
No disk or RAM modules were ever pulled from the server. No USB ports were triggered or unknown devices attached to the server.
Conclusion: Not a coldboot attack attempt - safe to operate. Migration to other hardware will still be performed later.
{ 'code' => -38, 'message' => 'no connection to daemon' }
and repeat them after some minutes. (Node and RPC version 0.18.3.3)
libunwind
(unset LIBUNWIND_FOUND
). This solution was never proposed or mentioned by the core team in relevant Github issues.