In a rare scenario under heavy load / soft lockups on SecureXL cores SGM400 may become irresponsive Technical Level
  • On a setup with SGM400 members, some members are not responsive over the Data plane (BPEthX / Sync interfaces).

Only relevant to SGM400 (due to i40e driver usage in BPEthX interfaces):

In a rare scenario, when SecureXL cores are under a heavy load and result in soft lockups, i40e may perform an internal reset to itself. This leads to an outage on the BPEthX interfaces and relevant pseudo interfaces (e.g., Sync) due to the removal of a certain configuration.

The indication of this issue is the following message in /var/log/messages and dmesg:
kernel: i40e 0000:02:00.0: BPEth0: tx_timeout recovery level 1, hung_queue 0

