Monday, February 27, 2006

Niagara Emlxs(7D) fibre channel Panic for stopping CPU context switch

It seems the fibre channel nexus driver with SCSA interface for the emlxs(7D) fibre channel adapter is doing auto request sense and tagged queueing which raises packet timeout. As the first thread to initiat a system panic records and renders the system quiescent by stopping other processors.

In turns cause Sun4v specific xt_sync to wait for x-trap to finish. In addition, due to the panic uts forces other processors o trap into panic idle so they will no not receive cross-calls.

If we should do detach and attach again to see how system react to the HW interrupts.


A email from mailing list

>
> Did anyone else see something similar to this on Solaris 10 3/05 HW2 s10s_hw2wos_05 SPARC on T2000?
> Any suggestions?
>
> Regards,
> Jignesh
>
> Feb 25 01:31:59 bcu3510-1 emlxs: [ID 349649 kern.info] [1.0126]emlxs0: NOTICE: 910: Packet timeout. (chip a
> bort: sbp=60015d1e858 iotag=1e42 tmo=60)
> Feb 25 01:31:59 bcu3510-1 scsi: [ID 107833 kern.warning] WARNING: /pci@7c0/pci@0/pci@1/pci@0,2/SUNW,emlxs@1
> /fp@0,0/ssd@w226000c0ffa98fc0,1 (ssd10):
> Feb 25 01:31:59 bcu3510-1 SCSI transport failed: reason 'timeout': retrying command
> Feb 25 01:56:14 bcu3510-1 emlxs: [ID 349649 kern.info] [1.0126]emlxs0: NOTICE: 910: Packet timeout. (chip a
> bort: sbp=600154995e8 iotag=34a7 tmo=60)
> Feb 25 02:00:24 bcu3510-1 emlxs: [ID 349649 kern.info] [1.0126]emlxs0: NOTICE: 910: Packet timeout. (chip a
> bort: sbp=3003f4cd168 iotag=3328 tmo=60)
> Feb 25 06:31:09 bcu3510-1 unix: [ID 547063 kern.notice] Cross trap sync timeout at cpu_sync.xword[0]: 0x100
> 000000000000
> Feb 25 06:31:09 bcu3510-1 unix: [ID 350512 kern.notice] panic: failed to stop cpu0
> Feb 25 06:31:09 bcu3510-1 unix: [ID 836849 kern.notice]
> Feb 25 06:31:09 bcu3510-1 ^Mpanic[cpu23]/thread=30001f4a6c0:
> Feb 25 06:31:09 bcu3510-1 unix: [ID 990398 kern.notice] xt_sync: timeout
> Feb 25 06:31:09 bcu3510-1 unix: [ID 100000 kern.notice]
> Feb 25 06:31:09 bcu3510-1 genunix: [ID 723222 kern.notice] 000002a101c461d0 unix:xt_sync+17c (d8e29fb05044,
> 2a101c46280, 0, 0, d8e29dd37a18, d8e29dd37a20)
> Feb 25 06:31:09 bcu3510-1 genunix: [ID 179002 kern.notice] %l0-3: 0000000000000001 8000000000000000 00000
> 00000000000 000002a101c46280
> Feb 25 06:31:09 bcu3510-1 %l4-7: 000000000184d800 0000000001038800 0100000000000000 0000000001dcd650
> Feb 25 06:31:09 bcu3510-1 genunix: [ID 723222 kern.notice] 000002a101c462c0 unix:hat_unload_callback+808 (7
> 0000000000, 2a101c465f0, 0, 0, 0, 300005b9e08)
>
>

No comments: