Friday, February 24, 2006

Niagara PICe bus initalization error

As S10 FMA trace reports, Fire Fabric ereport as a leaf PCIe device sends
an error message to root complex, the nexus driver publishes this ereport.

(1) A faulty PCI device off of a pci-pci bridge could see ereport.io.pci.mdpe
and ereport.io.pci.target-mdpe

(2) Faulty PCI device could see ereport.io.pci.sec-rserr

(3) A defective PCI device driver may cause ereport.io.pci.sec-dpe

In general, it seems the above HW issue raises interrupt for handler
of PCIE fabric block and dump the "Fatal PCIe Fabric Error has occurred"


Thanks

Lei
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> Configuring devices.
>>> >>> SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
>>> EVENT-TIME: 0x43ebc150.0x1dcd5ca8 (0x333e1920bc)
>>> PLATFORM: SUNW,Sun-Fire-T200, CSN: -, HOSTNAME:
>>> SOURCE: SunOS, REV: 5.10 Generic_118822-25
>>> DESC: Errors have been detected that require a reboot to ensure system
>>> integrity. See http://www.sun.com/msg/SUNOS-8000-0G for more information.
>>> AUTO-RESPONSE: Solaris will attempt to save and diagnose the error telemetry
>>> IMPACT: The system will sync files, save a crash dump if needed, and reboot
>>> REC-ACTION: Save the error summary below in case telemetry cannot be saved
>>>
>>> ereport.io.fire.fabric ena=333e08995805c01 detector=[ version=0 scheme="dev"
>>> device-path="/pci@7c0" ] msg_code=31 req_id=402 cap_off=44 aer_off=100
>>> sts_reg=4110 sts_sreg=0 dev_sts_reg=6 aer_ce=0 aer_ue=0 aer_sev=60010 aer_h1=
>>> 4000001 aer_h2=3 aer_h3=4010000 aer_h4=40100 saer_ue=1080 saer_sev=1340
>>> saer_h1=1f061030 saer_h2=f0 saer_h3=ff114040 saer_h4=0 severity=9
>>>
>>> ereport.io.pci.mdpe ena=333e08995805c01 detector=[ version=0 scheme="dev"
>>> device-path="/pci@7c0/pci@0" ] pci-status=110 pci-command=547
>>>
>>> ereport.io.pci.target-mdpe ena=333e08995805c01 detector=[ version=0 scheme=
>>> "dev" device-path="/pci@7c0" ]
>>>
>>> ereport.io.pci.sec-dpe ena=333e08995805c01 detector=[ version=0 scheme="dev"
>>> device-path="/pci@7c0/pci@0" ] pci-sec-status=c000 pci-bdg-ctrl=3
>>>
>>> ereport.io.pci.sec-rserr ena=333e08995805c01 detector=[ version=0 scheme="dev"
>>> device-path="/pci@7c0/pci@0" ] pci-sec-status=c000 pci-bdg-ctrl=3
>>>
>>> ereport.io.pci.mdpe ena=333e08995805c01 detector=[ version=0 scheme="dev"
>>> device-path="/pci@7c0/pci@0/pci@1" ] pci-status=110 pci-command=547
>>>
>>> ereport.io.pci.target-mdpe ena=333e08995805c01 detector=[ version=0 scheme=
>>> "dev" device-path="/pci@7c0/pci@0" ]
>>>
>>> ereport.io.pci.sec-dpe ena=333e08995805c01 detector=[ version=0 scheme="dev"
>>> device-path="/pci@7c0/pci@0/pci@1" ] pci-sec-status=c000 pci-bdg-ctrl=3
>>>
>>> ereport.io.pci.sec-rserr ena=333e08995805c01 detector=[ version=0 scheme="dev"
>>> device-path="/pci@7c0/pci@0/pci@1" ] pci-sec-status=c000 pci-bdg-ctrl=3
>>>
>>> ereport.io.pci.sec-dpe ena=333e08995805c01 detector=[ version=0 scheme="dev"
>>> device-path="/pci@7c0/pci@0/pci@1/pci@0,2" ] pci-sec-status=c2a0 pci-bdg-ctrl=
>>> 23
>>>
>>> ereport.io.pci.sec-rserr ena=333e08995805c01 detector=[ version=0 scheme="dev"
>>> device-path="/pci@7c0/pci@0/pci@1/pci@0,2" ] pci-sec-status=c2a0 pci-bdg-ctrl=
>>> 23
>>> ereport.io.fire.fabric ena=333e2133d405c01 detector=[ version=0 scheme="dev"
>>> device-path="/pci@7c0" ] msg_code=33 req_id=402 cap_off=44 aer_off=100
>>> sts_reg=10 sts_sreg=0 dev_sts_reg=0 aer_ce=0 aer_ue=0 aer_sev=60010 aer_h1=
>>> 4000001 aer_h2=3 aer_h3=4010000 aer_h4=40100 saer_ue=1000 saer_sev=1340
>>> saer_h1=1f061030 saer_h2=f0 saer_h3=ff114040 saer_h4=0 severity=9
>>>
>>>
>>> panic[cpu23]/thread=2a100f1dcc0: Fatal PCIe Fabric Error has occurred
>>>
>>>
>>> 000002a100f85d70 px:px_err_fabric_intr+c0 (300005afe00, 31, 300008c42e0, 402, 300008d8f20, 402000000000000)
>>> %l0-3: 00000300008c1bd8 00000000ffffffff fffffffffffffffe 0000000000000000
>>> %l4-7: 000000000183e800 0000000001271800 0000000000000000 00000300008c42f0
>>> 000002a100f85e50 px:px_msiq_intr+1a4 (300008e9da8, 0, 1269f54, 0, 300005afe00, 300008d8f20)
>>> %l0-3: 00000300008c1bd8 00000300005bd7a0 0000000000000000 000002a100f85f10
>>> %l4-7: 000002a100f85f40 00000300008d8f20 0000000000000000 0000000000000031
>>> 000002a100f85f50 unix:current_thread+140 (16, 800000, 7fffe7, 7fffe7, 0, 12)
>>> %l0-3: 000000000100994c 000002a100f1d021 000000000000000e 00000000000007f9
>>> %l4-7: 0000000000000000 0000000000000000 0000000000000000 000002a100f1d8d0
>>> 000002a100f1d970 unix:cpu_halt+c0 (0, 17, 30001a68000, 16, 30001a68000, 1)
>>> %l0-3: 00000000018450f8 0000000000000001 0000000000000002 0000000000000000
>>> %l4-7: 0000000000000000 0000000000000000 0000000000000000 000000000103735c
>>> 000002a100f1da20 unix:idle+128 (1814800, 0, 30001a68000, ffffffffffffffff, 17, 1813400)
>>> %l0-3: 0000060001d4f600 000000000000001b 0000000000000000 ffffffffffffffff
>>> %l4-7: 0000000000000000 0000000000000000 0000000000000000 000000000103735c
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> Subject:
>>> [Fwd: RE: Sun Niagara System problem]
>>> From:
>>> Prameet Chhabra
>>> Date:
>>> Tue, 21 Feb 2006 13:35:50 -0800
>>> To:
>>> Steve Katzman
>>>
>>> To:
>>> Steve Katzman
>>>
>>>
>>> Steve,
>>>
>>> looks like it is the newest version that they are using, can you get me help on this one.
>>>
>>> thanks
>>> Prameet
>>>
>>> -------- Original Message --------
>>> Subject: RE: Sun Niagara System problem
>>> Date: Tue, 21 Feb 2006 16:24:00 -0500
>>> From: Eddie Ng
>>> To: Prameet.Chhabra@Sun.COM , pete salerno
>>>
>>> Yes, we did use the latest version which is 01/06.
>>> I found out the requirement o/s version on Sun site.
>>> Yes, we know that there was a preinstalled version of Solaris 10 on one of the disk, we didn't use that disk, because we normally have a special configuration of file systems and tools for our test system.
>>> I had initial success of installing the o/s at first, but after a reboot, the error message started to appear, I also tried to use the preinstalled version, but same problem occurs, I couldn't get far enough for the installation wizard to come up.
>>>
>>> Thank you,
>>>
>>> Edward Ng
>>> Ulticom, Inc.
>>> System Administrator
>>> 1020 Briggs Rd
>>> Mount Laurel, NJ 08054
>>> 856-638-2608
>>> eddie.ng@ulticom.com
>>>
>>> -----Original Message-----
>>> From: Prameet Chhabra [mailto:Prameet.Chhabra@Sun.COM]
>>> Sent: Tuesday, February 21, 2006 4:12 PM
>>> To: Eddie Ng; pete salerno
>>> Subject: Re: Sun Niagara System problem
>>>
>>>
>>> Eddie,
>>>
>>> Did the box come up with any Solaris preloaded it should and why did you need to
>>> install Solaris?....Just out of curiosity what version of Solaris are you
>>> using....? the reason I ask you is because some older version of Solaris 10
>>> (i.e. not the hardware-specific release HW2) doesn't have the sun4v components
>>> and won't work for the T2000. so that could be the reason.
>>>
>>> thanks
>>> Prameet
>>>
>>> Eddie Ng wrote:
>>>
>>>> Sun Niagara System Hostid: 83d936ca
>>>> Good day Prameet, thank you for your information.
>>>> we were trying to install the latest version of Solaris 10 on this system, we had success at first, but when we rebooted the system, the os won't come up, we tried to reinstalled the os but was unsuccessful due to error message from the console. I've unplug the system and then tried again, no success.
>>>> I've attached the error message from the console, please investigate and let us know our course of action.
>>>>
>>>> Thank you,
>>>>
>>>> Edward Ng
>>>> Ulticom, Inc.
>>>> System Administrator
>>>> 1020 Briggs Rd
>>>> Mount Laurel, NJ 08054
>>>> 856-638-2608
>>>> eddie.ng@ulticom.com
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> Configuring devices.
>>>> >>>> SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
>>>> EVENT-TIME: 0x43ebc150.0x1dcd5ca8 (0x333e1920bc)
>>>> PLATFORM: SUNW,Sun-Fire-T200, CSN: -, HOSTNAME:
>>>> SOURCE: SunOS, REV: 5.10 Generic_118822-25
>>>> DESC: Errors have been detected that require a reboot to ensure system
>>>> integrity. See http://www.sun.com/msg/SUNOS-8000-0G for more information.
>>>> AUTO-RESPONSE: Solaris will attempt to save and diagnose the error telemetry
>>>> IMPACT: The system will sync files, save a crash dump if needed, and reboot
>>>> REC-ACTION: Save the error summary below in case telemetry cannot be saved
>>>>
>>>> ereport.io.fire.fabric ena=333e08995805c01 detector=[ version=0 scheme="dev"
>>>> device-path="/pci@7c0" ] msg_code=31 req_id=402 cap_off=44 aer_off=100
>>>> sts_reg=4110 sts_sreg=0 dev_sts_reg=6 aer_ce=0 aer_ue=0 aer_sev=60010 aer_h1=
>>>> 4000001 aer_h2=3 aer_h3=4010000 aer_h4=40100 saer_ue=1080 saer_sev=1340
>>>> saer_h1=1f061030 saer_h2=f0 saer_h3=ff114040 saer_h4=0 severity=9
>>>>
>>>> ereport.io.pci.mdpe ena=333e08995805c01 detector=[ version=0 scheme="dev"
>>>> device-path="/pci@7c0/pci@0" ] pci-status=110 pci-command=547
>>>>
>>>> ereport.io.pci.target-mdpe ena=333e08995805c01 detector=[ version=0 scheme=
>>>> "dev" device-path="/pci@7c0" ]
>>>>
>>>> ereport.io.pci.sec-dpe ena=333e08995805c01 detector=[ version=0 scheme="dev"
>>>> device-path="/pci@7c0/pci@0" ] pci-sec-status=c000 pci-bdg-ctrl=3
>>>>
>>>> ereport.io.pci.sec-rserr ena=333e08995805c01 detector=[ version=0 scheme="dev"
>>>> device-path="/pci@7c0/pci@0" ] pci-sec-status=c000 pci-bdg-ctrl=3
>>>>
>>>> ereport.io.pci.mdpe ena=333e08995805c01 detector=[ version=0 scheme="dev"
>>>> device-path="/pci@7c0/pci@0/pci@1" ] pci-status=110 pci-command=547
>>>>
>>>> ereport.io.pci.target-mdpe ena=333e08995805c01 detector=[ version=0 scheme=
>>>> "dev" device-path="/pci@7c0/pci@0" ]
>>>>
>>>> ereport.io.pci.sec-dpe ena=333e08995805c01 detector=[ version=0 scheme="dev"
>>>> device-path="/pci@7c0/pci@0/pci@1" ] pci-sec-status=c000 pci-bdg-ctrl=3
>>>>
>>>> ereport.io.pci.sec-rserr ena=333e08995805c01 detector=[ version=0 scheme="dev"
>>>> device-path="/pci@7c0/pci@0/pci@1" ] pci-sec-status=c000 pci-bdg-ctrl=3
>>>>
>>>> ereport.io.pci.sec-dpe ena=333e08995805c01 detector=[ version=0 scheme="dev"
>>>> device-path="/pci@7c0/pci@0/pci@1/pci@0,2" ] pci-sec-status=c2a0 pci-bdg-ctrl=
>>>> 23
>>>>
>>>> ereport.io.pci.sec-rserr ena=333e08995805c01 detector=[ version=0 scheme="dev"
>>>> device-path="/pci@7c0/pci@0/pci@1/pci@0,2" ] pci-sec-status=c2a0 pci-bdg-ctrl=
>>>> 23
>>>> ereport.io.fire.fabric ena=333e2133d405c01 detector=[ version=0 scheme="dev"
>>>> device-path="/pci@7c0" ] msg_code=33 req_id=402 cap_off=44 aer_off=100
>>>> sts_reg=10 sts_sreg=0 dev_sts_reg=0 aer_ce=0 aer_ue=0 aer_sev=60010 aer_h1=
>>>> 4000001 aer_h2=3 aer_h3=4010000 aer_h4=40100 saer_ue=1000 saer_sev=1340
>>>> saer_h1=1f061030 saer_h2=f0 saer_h3=ff114040 saer_h4=0 severity=9
>>>>
>>>>
>>>> panic[cpu23]/thread=2a100f1dcc0: Fatal PCIe Fabric Error has occurred
>>>>
>>>>
>>>> 000002a100f85d70 px:px_err_fabric_intr+c0 (300005afe00, 31, 300008c42e0, 402, 300008d8f20, 402000000000000)
>>>> %l0-3: 00000300008c1bd8 00000000ffffffff fffffffffffffffe 0000000000000000
>>>> %l4-7: 000000000183e800 0000000001271800 0000000000000000 00000300008c42f0
>>>> 000002a100f85e50 px:px_msiq_intr+1a4 (300008e9da8, 0, 1269f54, 0, 300005afe00, 300008d8f20)
>>>> %l0-3: 00000300008c1bd8 00000300005bd7a0 0000000000000000 000002a100f85f10
>>>> %l4-7: 000002a100f85f40 00000300008d8f20 0000000000000000 0000000000000031
>>>> 000002a100f85f50 unix:current_thread+140 (16, 800000, 7fffe7, 7fffe7, 0, 12)
>>>> %l0-3: 000000000100994c 000002a100f1d021 000000000000000e 00000000000007f9
>>>> %l4-7: 0000000000000000 0000000000000000 0000000000000000 000002a100f1d8d0
>>>> 000002a100f1d970 unix:cpu_halt+c0 (0, 17, 30001a68000, 16, 30001a68000, 1)
>>>> %l0-3: 00000000018450f8 0000000000000001 0000000000000002 0000000000000000
>>>> %l4-7: 0000000000000000 0000000000000000 0000000000000000 000000000103735c
>>>> 000002a100f1da20 unix:idle+128 (1814800, 0, 30001a68000, ffffffffffffffff, 17, 1813400)
>>>> %l0-3: 0000060001d4f600 000000000000001b 0000000000000000 ffffffffffffffff
>>>> %l4-7: 0000000000000000 0000000000000000 0000000000000000 000000000103735c
>>>
>>>
>>>
>>>
>>
>

No comments: