Thursday, June 22, 2006

Ubuntu and Solaris 10 x86 on Laptop and Desktop

Cross pollination of OpenSolaris and GNU:
http://www.gnusolaris.org/gswiki

Tuesday, June 20, 2006

Compilation tunning tips

However, general tuning is easier than this, here's what I would suggest:

1. Run at -O to establish baseline performance
2. Run at -fast -xipo=2 -xtarget=generic[64]

If there's no difference (or no significant difference) in performance, then you can stop. [But still profile the application!]

If there is a difference, then I'd evaluate performance gains due to the following flags (some combinations may be missing):

3. -xO5
4. -xO5 -xalias_level=basic (for C) compatible (for C++)
5. -xO5 -xdepend
6. -xO5 -fsimple=2 -fns -xlibmil -lmopt
7. -xO5 -xipo=2

I think that covers the bulk of the things that get enabled at -fast. Hopefully from these runs you'd be able to isolate a set of flags which gives you performance.

You might also want to look into profile feedback for codes which contain lots of branch instructions (or calls).

Obviously profiling the application (eg perhaps with spot http://cooltools.sunsource.net/spot/) will give you insights into what the actual performance issues are, and these insights can guide you to selecting appropriate compiler flags.

compiler -fast and optimization on different systems

If Sun studio taking top down or bottom up approach for backend optimization ? One more thing to share I am considering if -fast macro expansion is NP hard. In addition, should we further approximate the computation procedures without generating new sub-problem running on different underline system. Or we should consider different algorithm to conquer the problem.

A simplification on Compiler process

1. Parsing ("front end"). Breaks down the code to elementary operations, and generates debug information. The C++ front end (but not the C front end) inlines some functions that were explicitly or implicitly declared inline. Both C and C++ can mark functions for the back end as "please generate inline."

2. Code generation ("back end"). Generates code, with variable levels of optimizing. Function inlining can occur at -xO3 by request, and at -xO4 or -xO5 even if not requested.

The -g option affects primarily the front ends. The back ends disable some optimizations when -g is in effect. The -g option disables front-end inlining by the C++ compiler; the -g0 option enables front-end inlining.

Most of the remaining optimization options primarily affect the back end, and are the same for C and C++. A few options are available only in C. You need to check the documentation for the compiler version you use, because new options are added from time to time.

The -fast option is really a macro that expands to a series of options based on the details of the system running the compiler. If you run the resulting program on a different machine, results will be sub-optimal, maybe even slower than if it were not compiled with -fast.

The -xOn options select optimizations that are useful across a range of systems.

Boot NFS from NG-Zone

Booting a zone over NFS (namely where its root file system is on the NAS device) is not supported at the current time. There are um...interesting workarounds like using lofi(7D) available.Boot support over NFS is definitely something we want to support in the future, Just to make it clear though - a Solaris 10 system acting as a NAS device can itself host its own non-global zones. They just need to boot off of local file systems on the system.

A classic use case for Zone booting over NFS could be identified within grid computing environment
in order to achieve resource allocation.It is critical since I consider zone is considered as
resource fabric to access large (TB) over underline network NFS. Otherwise grid service providers need to implement quite Transactional grid enabled network file system management services. Of course, it has to be zone aware.

Friday, June 16, 2006

Middleware enters ESB age and SOA is ready

All Middleware vendors adopt ESB solution.
SOA age is ready. ESB next standard in
the Java EE stack ?

Thursday, June 15, 2006

A real world problem and algorithmic analysis

Problem:
An engineer using Toshiba Tecra M2 laptop, and it is failing on him.The problem seems like either power adaptor or the battery pack, but he needs to determine which before he tried to purchase any replacement. For that, He would like to do simple test with existing ones.

Objective:

fix the problem instead of bringing up to run.

Algorithm Analysis and Design:


0x00000001 --- AC Adapter
0x00000010 --- Laptop Main

Tested Unit:
Parts[0] <---- 0x00000001
Parts[1] <---- 0x00000010

Good Laptop as Instrumentation Tool:
Tool[0] <-----0x00000011
Tool[1] <-----0x00000100


Fix-Laptop(Parts, Tool) return 0-1
DefectParts <---- Test-Part (Parts, Tool)
Go to Dealer for maintainance
done <---- 1
return done


Test-Part (Parts, Tool) return DefectParts
Success = Connect(Parts[1], Tool[0]);
if Success
then DefectParts[0] <---- Parts[0]
else do Success = Connect(Parts[1], Tool[1]);
then DefectParts[0] <---- Parts[1]
else do inspection again ensure no contact issue
else DefectParts <----- Parts
return DefectParts

Solution:

It seems unicast to local vendor would be optimal
algorithm than multicast over the smtp overlay
network in term of complexity, cost and completeness.
Since We need to go to dealer anyway, why do
test there and get fixed part right away ?

CDROM access from NG-Zone

How to access cdrom drive from the zone

(1) Try the followings in GZ

/etc/init.d/volmgt start

Make sure in LZ the following SMF services are online too:

svcadm enable svc:/network/rpc/bind:default
svcadm enable svc:/network/rpc/smserver:default

Check the following in case the previous commands don't succeed
Run prtconf to see if the "sd" driver is attached to the cdrom or not.
If not, rem_drv sd and add_drv sd to see if sd can be attched to the cdrom.
If attach fails, then you have a problem. No matter what you are trying, your cdrom cannot be mounted at all.

The bottom line, the "sd" driver attached to the cdrom hardware.


(2) To add a CD-ROM:

run zonecfg and add the foolowing statements:

add fs
set dir=/cdrom
set special=/cdrom
set type=lofs
set options=[nodevices]
end

report system configuration with mdb

Run as root

echo "::prtconf" | mdb -k


It will report all device configuration

Wednesday, June 14, 2006

T1 and e1000g

Enabling e1000g
==============

The Ontario motherboard has Intel Ophir chip that can be used with ipge or e1000g network drivers.
Typically the factory default driver is ipge. If you want to exercise the e1000g driverinstead of
the ipge, please follow the following steps.

How to switch to e1000g driver from factory default ipge driver:
========================================================
1) Edit /etc/rc2.d/S99bench. Plumb e1000g and comment out the plumbing of ipge.
using the command ifconfig e1000g plumb

2) In a separate window on the same machine, run, prtconf -pv and see what
compatible vendor ids are shown for the network interface
You will see entries such as :
>>
compatible: 'pciex8086,105e.108e.105e.6' + 'pciex8086,105e.108e.105e' +
'pciex8086,105e.6' + 'pciex8086,105e' + 'pciexclass,020000' +
'pciexclass,0200'
<<

Check the driver aliases file for ipge entries. You will see values such as :
ipge "pciex8086,105e"
ipge "pciex8086,105f"
ipge "pci8086,105e"
ipge "pci8086,105f"

Check if any of these vendor ids match with vendor-product ids already listed
for e1000g driver. Backup /etc/driver_aliases file to /etc/driver_aliases.ipge Replace all ipge to e1000g in /etc/driver_aliases

3)
Backup original /etc/path_to_inst file to path_to_inst.ipge.
Now, change all the ipge entries in path_to_inst file to e1000g.
Note: the port numbers change too.

port 1 e1000g --> port 0 ipge
port 3 e1000g --> port 1 ipge
port 0 e1000g --> port 2 ipge
port 2 e1000g --> port 3 ipge

Check out the diff below and edit your path_to_inst accordingly.

testmachine> diff path_to_inst path_to_inst.ipge
10,11c10,11
< "/pci@780/pci@0/pci@1/network@0" 1 "e1000g"
< "/pci@780/pci@0/pci@1/network@0,1" 3 "e1000g"

> "/pci@780/pci@0/pci@1/network@0" 0 "ipge"

> "/pci@780/pci@0/pci@1/network@0,1" 1 "ipge"

18,19c18,19
< "/pci@7c0/pci@0/pci@1/network@0" 0 "e1000g"
< "/pci@7c0/pci@0/pci@1/network@0,1" 2 "e1000g"

> "/pci@7c0/pci@0/pci@1/network@0" 2 "ipge"

> "/pci@7c0/pci@0/pci@1/network@0,1" 3 "ipge"

4) Copy /etc/hostname.ipge2 to /etc/hostname.ipge2.bak
Rename /etc/hostname.ipge2 to hostname.e1000g0

5) Reboot the machine

6) When the machine comes up now, run ifconfig -a. You should be able to see
e1000g0
7) Check the inet and netmask for e1000g0 and correct it if necessary

8) Plumb other ports and set inet and netmasks for them also using:
#ifconfig e1000g0 inet netmask up
9) Setup default gateway (Get default gateway using netstat -nr)
#route add default

10) Check cables and make sure leds are green
You should now be able to ping through e1000g on all the interfaces


How to use a new e1000g driver on factory installed Ontario:
=============================================================

Obtain the latest e1000g driver files: e1000g and e1000g.conf
(You may want to contact the e1000g driver team)


1) Copy driver and conf file.
copy e1000g binary to /kernel/drv/sparcv9/
copy e1000g.conf file to /kernel/drv/

2) Backup /etc/driver_aliases to /etc/driver_aliases.ipge

3) Modify /etc/driver_alias

4) Replace all the ipge to e1000g.

5) Backup /etc/path_to_inst to /etc/path_to_inst.ipge

6) Modify /etc/path_to_inst by replacing ipge with e1000g.
Note, the port numbers will change too.
Port for ipge1 becomes e1000g3 and ipge2 becomes e1000g0.

7) Modify /etc/name_to_major. Add a line at end "e1000g 267".
(Go to the last line and select the number that is consecutively higher).

8) Run: touch /reconfigure

9) cp /etc/hostname.ipge2 /etc/hostname.e1000g0
10) Edit /etc/rc2.d/S99bench to plumb e1000g and comment out ipge.
11) Reboot machine.

12) When the machine comes up now, run ifconfig -a. You should be able to see
e1000g entry

13) Check the inet and netmask for e1000g0 and correct it if necessary

14) Plumb other ports and set inet and netmasks for them also using:
#ifconfig e1000g0 inet netmask up

15) Setup default gateway (Get default gateway using netstat -nr)
#route add default

16) Check cables and make sure leds are green
You should now be able to ping through e1000g on all the interfaces

T1 CPI

(7) The instruction execution resulting in overlapping latency which leads to the memory model of T1 addresses the effectiveness contributed with or without memory stalls across the memory
hierarchy. Empirical data set indicates the problem size and requires further investigation on the hidden factors contributing the CPU efficiency.

(6) Someone may agree. A CPI of >=4 as tested on a Niagara would indicate linear thread scalaing, but the same data found on an USIII would not necessarily lead to the same conclusion. 1 - 2 CPI on an USIII could actually be >=4 CPI on a T1 because of the USIII's superscalarness. The amount of thread level paralellism in an instruction stream is somewhat limited. So the real question is, is it possible for a *realistic* instruction stream to have a number of stalling instructions that would give >4 CPI on a T1 but closer to 1 CPI on an USIII, due to the USIII's superscalarness masking the stalls? I'm not so convinced that this is true. But data would be good.

(5) Someone questioned why a cpi > 4[as seen on a USIII] is required for a workload to scale linearly on T1."most of the kernels don't meet the T1 requirement of a cpi of 4 to get thread scaling".USIII has instruction level parallelism so (theoretically) a CPI greater than 4(as seen on USIII) should not be a necessary condition to linearly scale on T1.


(4) Data on which specint tests contain heavy FP. I don't have the data, but I suspect the twolf test also has decent FP as it's another place and route test like vpr.Eon is a graphics visualization test. Also see Brian's comments about Niagara's CPI for specint, indicating that even if you discount the performance on the FP heavy workloads, specint still won't do as well as a "real world" workload that has average cpi > 4.

(3) the SPECint_rate FP data set

Percent fp...
vpr dataset 1 -> 5.6%
dataset 2 -> 8%
eon dataset 1 -> 15.9%
dataset 2 -> 15.2%
dataset 3 -> 16.3%
Only 0.1% of the instructions in eon are sqrt, so fixing
sqrt will help single core, but not significantly change
the rate result.

CPI
gzip ds1 -> 1.14
ds2 -> 1.10
ds3 -> 0.97
ds4 -> 0.97
ds5 -> 1.17
vpr ds1 -> 1.34
ds2 -> 2.37
gcc ds1 -> 2.19
ds2 -> 1.37
ds3 -> 1.50
ds4 -> 1.64
ds5 -> 1.46
mcf ds1 -> 5.81
crafty ds1 -> 1.00
eon ds1 -> 1.20
ds2 -> 1.25
ds3 -> 1.30
perlbmk ds1 -> 1.27
ds2 -> 1.08
ds3 -> 1.74
ds4 -> 1.03
ds5 -> 1.07
ds6 -> 1.04
ds7 -> 1.06
gap ds1 -> 1.46
vortex ds1 -> 1.38
ds2 -> 1.24
ds3 -> 1.39
bzip2 ds1 -> 1.11
ds2 -> 0.91
ds3 -> 0.97
twolf ds1 -> 1.94

(data collected by Darryl Gove on a US3 1056MHz system)

So, for this benchmark, most of the kernels don't meet the T1
requirement of a cpi of 4 to get thread scaling. That, along
with a single issue processor make it impossible to get good
numbers on this benchmark.

So, the problem really is, int_rate doesn't stall on memory enough
for the T1 processor.

I noticed a reply from you to niagara-interest saying that according to folks at SAE, specint is ~20% floating point. Do you know where I might be able to find a breakdown of FP % for each of the 12 benchmarks.

I have a partner that uses specint results to compare platforms internally and is doing some Niagara testing. They're aware that specint contains floating point instructions and are willing to take suggestions from us on how the different benchmarks should be weighted to emphasize integer performance (I'm hoping, of course, that there are some benchmarks with little to no FP).

D consumer and libtrace api

How to use libdtrace api's to interact with the dtrace
subsystem. I just want to use certain methods from within
a 'c' program.


With the user land system calls, which D consumer would be in
your mind for call back invokation from your probes ?