Thursday, December 07, 2006

Friday, November 10, 2006

Hash Distribution Algorithm

For a hash-based distribution algorithm, the most difficult part is to define what hash function would avoid re-distribution. To reduce entry re-distribution, it is also possible to configure the proxy with the maximum number of distribution "slots" upfront, say 10 slots, then associate multiple slots with each service instance.

S1,S2,S3 --> SVC1
S4,S5,S6 --> SVC2
S7, S8, S7, S10 --> SVC3

When the amount of entries stored on SVC1 exceed some limits, a new service instance is set up (say SVC4), one (or more) slots formerly managed by SVC1 are moved to SVC4 (the content of SVC1 is re-distributed to SVC1 and SVC4) and the proxy configuration is changed accordingly, for instance

S1,S2 --> SVC1
S3 --> SVC4
S4,S5,S6 --> SVC2
S7, S8, S7, S10 --> SVC3

This does not solve the re-distribution problem. However, it is much easier to deal with this issue as the number of entries to be re-distributed is much smaller compared to a configuration where the max number of slots has not been planned upfront.

sunkeyvalue

Which I think is not an option in that case since the sunkeyvalue attribute is meant to be generic place holder for any key value pair and needs to be mulivalued.

The other thing, the value of this attribute being XML, it contains a special character which forces the value to be base64 encoded in LDIF (and thus in the DB representation of the entry). This increases the size of the value by at least 30%, thus the size of the data to write.

Directory Server 6 made serious improvement over this use case and the replication historical information will be lighter than with 5.2...

DN Binding with Empty Password

Bind with a DN and an empty password is a valid LDAP operation per the
LDAP v3 specifications (RFC 2251) and results in the user being
identified but not authenticated and not authorized...
The result is that the bind is successful but the connection is treated
as an anonymous operation.

Note that this behavior is now discouraged in RFC 4513 and Directory
Server 6 has a configuration parameter to accept or reject these requests.

Tuesday, November 07, 2006

DS Instance Life cycle

"Disorderly shutdown" message is logged when DS starts and does not find the guardian file that DS writes when it closes the database properly.

Then the server starts and opens the database in recovery mode. If it doesn't start at all, without starting the recovery, it might be a corruption of the config file (dse.ldif). There should be 2 other copies of the dse.ldif in the config directory: dse.ldif.bak (the previous version), dse.ldif.startok (the last one used to start the server). A working dse.ldif can be rebuilt from these files.

If it does go through the recovery mode but fails to recover the database, then you're in trouble. It means that either the DB files are corrupted or the transaction log file is.

One way to quickly recover the server can be to make a backup of another server with the same configuration (other master) and restore it on this server.

Friday, October 27, 2006

Configuring Multiple NIC interfaces in one Zone

In configuring your zone, just "add net" for each device like below (
could certainly access multiple disks this way but there are still uncleared
issues) :

zonecfg -z myzone
create
set zonepath=/zfspool/fs/myzone
set autoboot=true
############### see below ###########
add net
set address=129.148.20.2
set physical=ipge1
end
add net
set address=129.148.30.2
set physical=ipge2
end
################ see above ##############
....
verify
commit

Tuesday, October 24, 2006

Sun Fire System Auto Reboot

on OBP, use setenv
ok> setenv auto-boot? true

on Solaris use eeprom(1)
% eeprom auto-boot?=true

Saturday, October 14, 2006

ZFS Ignore fsflush

ZFS ignores the fsflush. Here's a snippet of the code in zfs_sync():

/*
* SYNC_ATTR is used by fsflush() to force old filesystems like UFS
* to sync metadata, which they would otherwise cache indefinitely.
* Semantically, the only requirement is that the sync be initiated.
* The DMU syncs out txgs frequently, so there's nothing to do.
*/
if (flag & SYNC_ATTR)
return (0);

However, for a user initiated sync(1m) and sync(2) ZFS does force
all outstanding data/transactions synchronously to disk .
This goes beyond the requirement of sync(2) which says IO is inititiated
but not waited on (ie asynchronous).

Wednesday, October 11, 2006

Mount ISO file on Solaris

- make the ISO image file available as a block device with
lofiadm(1M), e.g.

# lofiadm -a /var/tmp/sol-10-u1-companion-ga.iso
/dev/lofi/1

- mount the block device, e.g.

# mount -r -F hsfs /dev/lofi/1 /mnt

- when you're done, umount the file and delete the device with

# lofiadm -d /dev/lofi/1

JVM tunnables for JVM on x410

Java HotSpot(TM) 32-bit Server VM on Windows, version 1.5.0_06

http://www.spec.org/jbb2005/results/res2006q1/jbb2005-20060117-00061.txt

Java HotSpot(TM) 32-bit Server VM on Solaris, version 1.5.0_08

http://www.spec.org/jbb2005/results/res2006q2/jbb2005-20060512-00112.txt

Monday, October 09, 2006

ZFS on Solaris 11

Find disk and slice

format --> select disk --> partition ---> print


# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
0. c0t0d0
/pci@0,600000/pci@1/pci@8/pci@0/scsi@1/sd@0,0
1. c0t1d0
/pci@0,600000/pci@1/pci@8/pci@0/scsi@1/sd@1,0
Specify disk (enter its number): 1
selecting c0t1d0
[disk formatted]


FORMAT MENU:
disk - select a disk
type - select (define) a disk type
partition - select (define) a partition table
current - describe the current disk
format - format and analyze the disk
repair - repair a defective sector
label - write label to the disk
analyze - surface analysis
defect - defect list management
backup - search for backup labels
verify - read and display labels
save - save new disk/partition definitions
inquiry - show vendor, product and revision
volname - set 8-character volume name
! - execute , then return
quit
format> partition


PARTITION MENU:
0 - change `0' partition
1 - change `1' partition
2 - change `2' partition
3 - change `3' partition
4 - change `4' partition
5 - change `5' partition
6 - change `6' partition
7 - change `7' partition
select - select a predefined table
modify - modify a predefined partition table
name - name the current table
print - display the current table
label - write partition map and label to the disk
! - execute , then return
quit

partition> print
Current partition table (original):
Total disk cylinders available: 14087 + 2 (reserved cylinders)

Part Tag Flag Cylinders Size Blocks
0 root wm 2 - 1135 5.50GB (1134/0/0) 11539584
1 swap wu 1155 - 2309 5.60GB (1155/0/0) 11753280
2 backup wm 0 - 14086 68.35GB (14087/0/0) 143349312
3 unassigned wm 2310 - 3464 5.60GB (1155/0/0) 11753280
4 unassigned wm 3465 - 4619 5.60GB (1155/0/0) 11753280
5 unassigned wm 4620 - 5774 5.60GB (1155/0/0) 11753280
6 unassigned wm 5775 - 12931 34.73GB (7157/0/0) 72829632
7 home wm 12932 - 14086 5.60GB (1155/0/0) 11753280

(2) use c0t1d0s6 for zfs

(3) create v device pool

# zpool create ktspool c0t1d0s6

(4) list the pool
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
ktspool 34,5G 33,5K 34,5G 0% ONLINE -
(5) check pool status
# zpool status
pool: ktspool
state: ONLINE
scrub: none requested
(6) ktspool file system was created. verify file system

# df -kh
Filesystem size used avail capacity Mounted on
/dev/dsk/c0t0d0s0 9,8G 3,6G 6,2G 37% /
/devices 0K 0K 0K 0% /devices
ctfs 0K 0K 0K 0% /system/contract
proc 0K 0K 0K 0% /proc
mnttab 0K 0K 0K 0% /etc/mnttab
swap 14G 1,1M 14G 1% /etc/svc/volatile
objfs 0K 0K 0K 0% /system/object
fd 0K 0K 0K 0% /dev/fd
swap 14G 8K 14G 1% /tmp
swap 14G 48K 14G 1% /var/run
/dev/dsk/c0t0d0s7 50G 56M 49G 1% /export/home
ktspool 34G 9K 34G 1% /ktspool

(7) create a new file system as ktspool/kts

zfs create ktspool/kts

(8) verify the file system creation

# df -kh
Filesystem size used avail capacity Mounted on
/dev/dsk/c0t0d0s0 9,8G 3,6G 6,2G 37% /
/devices 0K 0K 0K 0% /devices
ctfs 0K 0K 0K 0% /system/contract
proc 0K 0K 0K 0% /proc
mnttab 0K 0K 0K 0% /etc/mnttab
swap 14G 1,1M 14G 1% /etc/svc/volatile
objfs 0K 0K 0K 0% /system/object
fd 0K 0K 0K 0% /dev/fd
swap 14G 8K 14G 1% /tmp
swap 14G 48K 14G 1% /var/run
/dev/dsk/c0t0d0s7 50G 56M 49G 1% /export/home
ktspool 34G 9K 34G 1% /ktspool
ktspool/kts 34G 9K 34G 1% /ktspool/kts

(9) change the mount point of the zfs file system to /kabirazfs

zfs set mountpoint=/kabirazfs ktspool/kts

(10) verify the new mounted point

# df -kh
Filesystem size used avail capacity Mounted on
/dev/dsk/c0t0d0s0 9,8G 3,6G 6,2G 37% /
/devices 0K 0K 0K 0% /devices
ctfs 0K 0K 0K 0% /system/contract
proc 0K 0K 0K 0% /proc
mnttab 0K 0K 0K 0% /etc/mnttab
swap 14G 1,1M 14G 1% /etc/svc/volatile
objfs 0K 0K 0K 0% /system/object
fd 0K 0K 0K 0% /dev/fd
swap 14G 8K 14G 1% /tmp
swap 14G 48K 14G 1% /var/run
/dev/dsk/c0t0d0s7 50G 56M 49G 1% /export/home
ktspool 34G 9K 34G 1% /ktspool
ktspool/kts 34G 9K 34G 1% /kabirazfs


Also can see /kabirazfs is created under "/"


(11) zpool iostat -x 5


(12) check zfs properties setting. such as compression is disabled

# zfs get all ktspool/kts
NAME PROPERTY VALUE SOURCE
ktspool/kts type filesystem -
ktspool/kts creation lun oct 9 19:08 2006 -
ktspool/kts used 9,50K -
ktspool/kts available 34,2G -
ktspool/kts referenced 9,50K -
ktspool/kts compressratio 1.00x -
ktspool/kts mounted yes -
ktspool/kts quota none default
ktspool/kts reservation none default
ktspool/kts recordsize 128K default
ktspool/kts mountpoint /kabirazfs local
ktspool/kts sharenfs off default
ktspool/kts checksum on default
ktspool/kts compression off default
ktspool/kts atime on default
ktspool/kts devices on default
ktspool/kts exec on default
ktspool/kts setuid on default
ktspool/kts readonly off default
ktspool/kts zoned off default
ktspool/kts snapdir hidden default
ktspool/kts aclmode groupmask default
ktspool/kts aclinherit secure default

OpComm2006

Friday, October 06, 2006

Classic Relational Algebra Algorithm

The algebra on sets of tuples or relations could be used to express typical queries about those relations (1) Union (2) set difference (3) Cartesian Product (4) selection
(5) projection (6) aggregation (7) renaming


set operation(union, intersection, difference), selection, projection, Cartesian product, natural join, theta-join, renaming, duplicated elimination, aggregation, grouping, sorting, extended projection,outerjoin (naturla, left, right)


Intersection, theta join, natural join are dependent operations.
Union, differences, production, selection, projection, renaming are independent operations

Tuesday, October 03, 2006

R&D Conference Presentation in China

Software R&D is boomed in China. People respect R&D there.

NFS Service in NGZ

One cannot do NFS in NGZ but Solution to get NFS Service in NGZ

One can make the GZ an NFS sever and just use loopback from GZ to NGZ instead, this can simulate NFS. Since this is a loopback it should be faster and more efficient than using NFS. Just something to ponder? It might work. Now if your requirement is to put the NFS server in a NGZ, you cannot do that today. Maybe it will be addressed in an update depending on demand?

Monday, October 02, 2006

Denial of Service on X.509

(1)
Vulnerability Note VU#423396
X.509 certificate verification may be vulnerable to resource exhaustion:
http://www.kb.cert.org/vuls/id/423396
(2)
NISCC Vulnerability Advisory
729618/NISCC/PARASITIC-KEYS
Denial-of-Service Condition Affecting X.509 Certificates Verification:
http://www.niscc.gov.uk/niscc/docs/re-20060928-00661.pdf?lang=en
(3)
After x unsuccessful logins, it is possible in till deactivate the account. B
But is it possible to send an email to some Administrator that the account was deactivated.
(4)
DS is using NSS library (Mozilla) which is listed as
not vulnerable in the 729618/NISCC/PARASITIC-KEYS document

Thursday, September 28, 2006

Database Research

(7) Parallelism for traditional small and mid data set is ok. However, for large data set it may be overkilled in interms of paralleling each sub queries for large volume of co-current accessing.
(8) How to speed up the data accessing of archieving data does parallelization work ? Does archiving have indexing ? if not a sequential scan is required, can this be done over parallelization ?
(9) How to move large amount of data throughout the memory hierarchy of parallel computers ?
(10) Future system needs to deal with search whose part of data does come from archives
(11) Current data storage is used as read/write cache. New algorithm is required for the 3 level system buffering management
(12) Current Tx model is good for short Tx. However, for long run Tx, We need entire new approach to handel data integrity and recovering
(13)Space efficient Algorithm for Versioning and configuration model for DB to handle versions of objects
(14) Extend existing data model to include much more semantic information of data.
(15) Browsing with interrogation the nature of the process that merge data for hetergenerous and distributed database
(16) Current distributed DBMS algorithm for query processing, cocurrency control and support for multiple copies were designed for a few sites. They must be rethink for 1000, 10000 sites
(17) local cache, local replication of remote desktop become important, efficient cache maintenance is an open problem.
(18)

O Page and manual SSO login

Implementing a POC for a customer. For this POC we're trying to automate the complete authentication process within AM. We've written a servlet that is deployed in the same war-file (and context) as Access Manager and that handles authentication (using com.sun.identity.authentication.authcontext and is creating the token (using com.iplanet.sso.SSOTokenManager): so we don't redirect to /UI/Login if the SSOToken is invalid!

After establishing the session we want to redirect the user to a site that is protected by a policy agent (using response.redirect(targetUrl)). However, SSO fails and a user needs to authenticate again. It seems that the normal AM cookies (iPlanetDirectoryPro - created when you login using /UI/Login) are not automatically created.

One final thing: setup is okay - we did sanity checks using policy agents and that works fine.

Questions:
1. Can some give me some hints and tips on how to create a valid session, SSO token and the according cookies using just the API?

The expected usage of this kind of flow is ideally through a policy
agent protecting a resource,
which detects missing SSOToken and authenticates on its own. Looks like
you are trying to do
that automatically without user intervention. In that case you can use
zero page login ( more details
in auth arch document pg 24-26), so you dont have to worry about setting
domain cookies etc.

In your approach you would have to set the cookie yourself on the
response. sample code to do that may
be like:

try {
ServiceSchemaManager scm = new ServiceSchemaManager(
"iPlanetAMPlatformService", token);

ServiceSchema platformSchema = scm.getGlobalSchema();
Set cookieDomains = (Set)platformSchema.getAttributeDefaults().
get("iplanet-am-platform-cookie-domains");
String value = token.getTokenID().toString();
String cookieName = SystemProperties.get(
"com.iplanet.am.cookie.name");

Cookie cookie = CookieUtils.newCookie(cookieName, value,
"/");
response.addCookie(cookie);

Iterator iter = cookieDomains.iterator();
Cookie cookie = null;
while (iter.hasNext()) {
String cookieDom = (String) iter.next();
cookie =
com.iplanet.services.util.CookieUtils.newCookie(cookieName, value,
"/", cookieDom );
response.addCookie(cookie);
loadBalancerCookie = setlbCookie(cookieDom);
if (loadBalancerCookie != null) {
response.addCookie(loadBalancerCookie);
}
}
}
} catch (Exception e) {

}
}

JES MF Reference

Even if technical, a good starting point is the JES-MF engineering site at
http://twiki.france/twiki/bin/view/JESMF20/WebHome

JES UWC Health check via Layer 7 Switch

In a typical JES Communications Suite installation, we install Communications Express (also known as UWC) to provide a web interface for Mail, Calendar and Address Books.

UWC is a Web Application running in a web server. And UWC is relying on the HTTP interface provided by the Messaging Server (not in web server but specific daemon: mshttpd) to display some pages.
Both processes are binding on the same IP address but UWC is using port 80 and mshttpd is using port 81.

The problem I'm facing is how to link these 2 applications in the N2120 configuration. Because if mshttpd is down, users are still redirecting to the running UWC on the same box but as mshttpd is not running, some pages (after login) cannot be displayed and the message displayed in the browser is "Bad Gateway. Processing of this request was delegated to a server not functioning properly".

To Do:

The problem lies in the fact that UWC returns a HTTP status code 200 OK. What you need to do is create a check that checks for a certain string,instead of the http status code.

How to install another instance JDK with strong encry policy

How to install another instance JDK with strong encry policy

* downloaded JDK 1.4.2 from
http://java.sun.com/j2se/1.4.2/SAPsite/download.html (64bit)

* unpack to /opt

* create a softlink from /opt/j2sdk1.4.2 to /opt/java1.4

* installed the policy manually in /opt/java1.4

* mount /opt as lofs

* start sapinst

Sapinst will detect, that the policy is already there and will not try to
install it again.

Wednesday, September 27, 2006

JDK access issue from sparse zone

In the Global zone, there is already a copy of JDK installed (by default
in Solaris 10). All the java links are setup properly in /usr.
However, as this is a sparse zone, /usr is inherited i.e. read-only.
Installing JDK anywhere in the sparse zone, while solves the problem,
will still require the user to change the appropriate links/PATHs/etc to
ensure the right JDK gets called.

Sunday, September 24, 2006

USDT per JScript

Java Script with DTrace


http://blogs.sun.com/brendan/entry/dtrace_meets_javascript

System vendor configuration--- CRITICAL vs OPTIMAL

It does not limited to disk but
apply to any key performance measurement within
a system.

As a system vendor, we need to consider ISV and
even end user vertical work load, system architecture
and deployment consideration from data center
operation point of view. To do so, we can make
a realistic assessment on total cost of ownership
at the end point. It is good for competitive analysis
at the end point and architecture selection at end
user level.

However, I am wondering if it is required for a system
vendor to implement a end-to-end HW configuration or
stay at a critical point but leave the further specific
HW and SW HA deployment as alternatives ?

Specifically, I tend to think we need to provide CRITICAL
instead of OPTIMAL configurations in order to leave
flexible and overheads to end deployment to make a
choice.

Regarding to CRITICAL vs OPTIMAL, we can classify the
default configurations so that systems meet customer
demands.

Saturday, August 26, 2006

Workload characterization

Cluster analysis such as k means and mini spanning tree categorize the natural groups of workload for performance modeling and cap planning

Thursday, August 24, 2006

T2000 interrupt bound

I have a T2000 server which experienced LOW CPU
usage. As you can see with the load generated,
CPU usage always 25% and interrupts are executed
by a fixed processor 24. All 64 threads of the user
land process (a single JVM process, multi-thread) only
thread number 33 is taking system and user land
resources and all other threads are in LOCK mode
and many VCX happen on these processors.

I have all ipge setting on /etc/system. I also have
all IP module setting in place. I am trying to verify
if it is system specific issue. That's why I am looking
for different system for a test.

Thursday, August 10, 2006

Disk Management on Solaris

(1) see how may disk

AVAILABLE DISK SELECTIONS:
0. c3t0d0
/pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2/sd@0,0
1. c3t1d0
/pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2/sd@1,0
Specify disk (enter its number):


There are 2 disks

(2) see how disk is used

# df -kh
Filesystem size used avail capacity Mounted on
/dev/dsk/c3t0d0s0 11G 7.8G 2.7G 75% /
/devices 0K 0K 0K 0% /devices
ctfs 0K 0K 0K 0% /system/contract
proc 0K 0K 0K 0% /proc
mnttab 0K 0K 0K 0% /etc/mnttab
swap 5.6G 1008K 5.6G 1% /etc/svc/volatile
objfs 0K 0K 0K 0% /system/object/platform/sun4v/lib/libc_psr/libc_psr_hwcap1.so.1
11G 7.8G 2.7G 75% /platform/sun4v/lib/libc_psr.so.1/platform/sun4v/lib/sparcv9/libc_psr/libc_psr_hwcap1.so.1
11G 7.8G 2.7G 75% /platform/sun4v/lib/sparcv9/libc_psr.so.1
fd 0K 0K 0K 0% /dev/fd
/dev/dsk/c3t0d0s5 5.8G 3.0G 2.7G 53% /var
swap 6.7G 1.1G 5.6G 16% /tmp
swap 5.6G 48K 5.6G 1% /var/run
/dev/lofi/1 330M 330M 0K 100% /tmp/s10install

you can see only disk0 c3t0d0 is in use


(3) See used disk partition


*# format*
Searching for disks...done


AVAILABLE DISK SELECTIONS:
0. c3t0d0
/pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2/sd@0,0
1. c3t1d0
/pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2/sd@1,0
*Specify disk (enter its number): 0*
selecting c3t0d0
[disk formatted]
Warning: Current Disk has mounted partitions.
/dev/dsk/c3t0d0s0 is currently mounted on /. Please see umount(1M).
/dev/dsk/c3t0d0s1 is currently used by swap. Please see swap(1M).
/dev/dsk/c3t0d0s5 is currently mounted on /var. Please see umount(1M).


FORMAT MENU:
disk - select a disk
type - select (define) a disk type
partition - select (define) a partition table
current - describe the current disk
format - format and analyze the disk
repair - repair a defective sector
label - write label to the disk
analyze - surface analysis
defect - defect list management
backup - search for backup labels
verify - read and display labels
save - save new disk/partition definitions
inquiry - show vendor, product and revision
volname - set 8-character volume name
! - execute , then return
quit
*format> partition

*
PARTITION MENU:
0 - change `0' partition
1 - change `1' partition
2 - change `2' partition
3 - change `3' partition
4 - change `4' partition
5 - change `5' partition
6 - change `6' partition
7 - change `7' partition
select - select a predefined table
modify - modify a predefined partition table
name - name the current table
print - display the current table
label - write partition map and label to the disk
! - execute , then return
quit
*partition> print*
Current partition table (original):
Total disk cylinders available: 14087 + 2 (reserved cylinders)

Part Tag Flag Cylinders Size Blocks
0 root wm 403 - 2616 10.74GB (2214/0/0) 22529664
1 swap wu 0 - 402 1.96GB (403/0/0) 4100928
2 backup wm 0 - 14086 68.35GB (14087/0/0) 143349312
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 var wm 2617 - 3824 5.86GB (1208/0/0) 12292608
6 unassigned wm 0 0 (0/0/0) 0
7 unassigned wm 0 0 (0/0/0) 0


(4) see how is the free disk paritioned

*# format*
Searching for disks...done


AVAILABLE DISK SELECTIONS:
0. c3t0d0
/pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2/sd@0,0
1. c3t1d0
/pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2/sd@1,0
*Specify disk (enter its number): 1*
selecting c3t1d0
[disk formatted]
*format> partition*


PARTITION MENU:
0 - change `0' partition
1 - change `1' partition
2 - change `2' partition
3 - change `3' partition
4 - change `4' partition
5 - change `5' partition
6 - change `6' partition
7 - change `7' partition
select - select a predefined table
modify - modify a predefined partition table
name - name the current table
print - display the current table
label - write partition map and label to the disk
! - execute , then return
quit
*partition> print*
Current partition table (original):
Total disk cylinders available: 14087 + 2 (reserved cylinders)

Part Tag Flag Cylinders Size Blocks
0 root wm 0 - 25 129.19MB (26/0/0) 264576
1 swap wu 26 - 51 129.19MB (26/0/0) 264576
2 backup wu 0 - 14086 68.35GB (14087/0/0) 143349312
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
* 6 usr wm 52 - 14086 68.10GB (14035/0/0) 142820160*
7 unassigned wm 0 0 (0/0/0) 0


(5) create mount point for disk1

create a directory $(dir Name)

mount /dev/dsk/c3t1d0s6 $(dirName)


so disk can be accessable now.

Monday, August 07, 2006

plockstat hit bug for the second call

I am on T2000 with S10 U3 bits of KernelID = Generic_118833-18

# plockstat -x aggsize=500m -x dynvarsize=200m -x bufresize=auto -p 654
^C
Mutex block

Count nsec Lock Caller
--------------------------------------------------------------------- ----------
Segmentation Fault(coredump)


The core file is below:

----------------- lwp# 1 / thread# 1 --------------------
ffffffff7ee39a50 strlen (100003faa, ffffffff7ffff6e8, ffffffff7eea0ed4, ffffffff7fffed99, 0, 100003fa9) + 50
ffffffff7eea4fdc snprintf (ffffffff7ffffa10, 0, 100003fa8, 0, ffffffff7ffff740, 100003000) + 88
0000000100001cd8 ???????? (1003bff20, ff2b9bec, ffffffff7ffffa10, 28, 1, 1d)
0000000100001ff4 ???????? (10040e4d8, 10040e4f8, 2, 10040e4d8, 10040e4c8, 100106990)
ffffffff7f225664 dt_aggregate_walk_sorted (10010b500, 100001de8, 0, ffffffff7f224f08, 0, 1004395c0) + a4
0000000100002e9c main (100001, 100106000, 100000, 100001000, 100107228, 100106) + ad0
000000010000159c _start (0, 0, 0, 0, 0, 0) + 17c
----------------- lwp# 2 / thread# 2 --------------------
ffffffff7eece820 _write (102, ffffffff725fbf48, 8, 0, ffffffff7e300000, 0) + c
ffffffff7f252768 dt_proc_control (0, 1fc000, 8, ffffffff725fbf48, 1, 1) + 1f0
ffffffff7eecd2d8 _lwp_start (0, 0, 0, 0, 0, 0)

Saturday, July 29, 2006

OS internal & many of many

How OS internal addresses platform virtualization and abstraction , real-time, embedded system, dependability,transaction management, availability, DSM,simulation, organism, structure design and trust computing for core system resource abstraction,energy management, firmware enhancement, zfs,CMT and open standard support, streaming support,high end computing etc.

For in-depth analysis, design and implementation,I may consider to land at theoretical computation, synchronous and asynchronous modeling,algorithmic operations on parallel formulation,and complexity proofs etc.

Thursday, July 27, 2006

IPC SystemV and POSIX IPCs

(1) semaphores, shared memory, message queue, System V and POSIX has different kernel implementation
(2) other IPC implementation such as mmap(2), named pipes, solaris doors

Wednesday, July 26, 2006

Power law of data center on SOA Management

Service techniques eventualy do transform into management
adapatation which in turn become an overall system and
os computation.

Is this the power law illustrated in data centers ?

http://searchwebservices.techtarget.com/originalContent/0,289142,sid26_gci1204593,00.html

Monday, July 24, 2006

Web2.0 & System Vendor

With respect to traditional tiered enterprise application and service analysis and design, I would have your point of views. Adherent to Web2.0, other than the dominant asynchronous industrial theme Ajax, blog and RSS present the diversified schemes. However, paralleling with the above presentation, heuristic notation of semantic web,in addition to mobility, Web 2.0 does make sense for conventional service providers and system vendors with the machine knowledge.

Hence, existing industrial crowds to Web 2.0 would enrich the subset of Grid infrastructure, access grid which presents the data grid to human actors Web semantics and description rules will provide learning and interactive interfaces to service entities such as machines.

Web2.0 & System Vendor

With respect to traditional tiered enterprise application
and service analysis and design, I would have your point
of views. Adherent to Web2.0, other than the dominant
asynchronous industrial theme Ajax, blog and RSS present
the diversified schemes. However, paralleling with the
above presentation, heuristic notation of semantic web,
in addition to mobility, Web 2.0 does make sense for
conventional service providers and system vendors
with the machine knowledge.

Hence, existing industrial crowds to Web 2.0 would enrich
the subset of Grid infrastructure, access grid which presents
the data grid to human actors Web semantics and description
rules will provide learning and interactive interfaces to
service entities such as machines.

Thursday, July 13, 2006

DDoS for mobile ad hoc peer-to-peer networks

R&D tasks has been done with MVS, Tandem before. Open systems will not be the destiny since parallelism and pipelining still motivate me. As one kind of virtualization technology, Java is appealing.

The key entity of the engineering activities is to contribute to the greedy security with mini spanning directed graph with asymmetric communication link.

With the associative environmental setting of distributed or even parallel communications,traditional centralized attack and detection goals or utilities have been challenged to yield the proximity in term of false positive ratio within the continuous and stochastic task environment. As for content aware defense, the locality and uniformity of key distribution needs further in depth discovery with the mobility, wireless sensors to extract the abstract vector of attributes in purpose of obtaining the relaxation to the proximity.

Wednesday, July 12, 2006

Layer 3 protocol algorithms with SOA XML packets

Pertaining to OSI reference model designated to conventional fixed infrastructure, complex networking, from small world to scale free established network layer routing, proves optimal routing performance measurement. Industrial practices implement the traditional optimization achieved upon traditional link-state and distance-vector algorithmic operations.

Both proactive and reactive path location strategy represent the industrial and engineering activities. Specifically, kernel land from Solaris, user land module from network layer devices such as router and even lower layer switch, packet based tunneling at critical super nodes over hierarchical topological deployment. In addition, an adaptive layer 3 protocol analysis and design would be considered for extension of the reserved 8 bit header to enable the routing control such that the path scheduling and planning algorithms could be interactive with earlier routes. Greedy security routing could be implemented with mini protocol enhancement. However, holding the setting of the classic end-to-end argument within computer science discipline, application layer processes hosted at user land would argue the complex learning or reasoning based packet routing and policy enforcement with the trade-off of processing overhead. Furthermore, taking dynamic and mobile computing, specially ad-hoc or peer-to-peer networks into account, less or no infrastructure with absence of head node, the routing from node level and network level to overlay level will argue the user land model, algorithm and protocol design and implementation.

Hence, packet switch networks tend to hold co-existence of the modular approach design aligned with computer science design principle.

Thursday, June 22, 2006

Ubuntu and Solaris 10 x86 on Laptop and Desktop

Cross pollination of OpenSolaris and GNU:
http://www.gnusolaris.org/gswiki

Tuesday, June 20, 2006

Compilation tunning tips

However, general tuning is easier than this, here's what I would suggest:

1. Run at -O to establish baseline performance
2. Run at -fast -xipo=2 -xtarget=generic[64]

If there's no difference (or no significant difference) in performance, then you can stop. [But still profile the application!]

If there is a difference, then I'd evaluate performance gains due to the following flags (some combinations may be missing):

3. -xO5
4. -xO5 -xalias_level=basic (for C) compatible (for C++)
5. -xO5 -xdepend
6. -xO5 -fsimple=2 -fns -xlibmil -lmopt
7. -xO5 -xipo=2

I think that covers the bulk of the things that get enabled at -fast. Hopefully from these runs you'd be able to isolate a set of flags which gives you performance.

You might also want to look into profile feedback for codes which contain lots of branch instructions (or calls).

Obviously profiling the application (eg perhaps with spot http://cooltools.sunsource.net/spot/) will give you insights into what the actual performance issues are, and these insights can guide you to selecting appropriate compiler flags.

compiler -fast and optimization on different systems

If Sun studio taking top down or bottom up approach for backend optimization ? One more thing to share I am considering if -fast macro expansion is NP hard. In addition, should we further approximate the computation procedures without generating new sub-problem running on different underline system. Or we should consider different algorithm to conquer the problem.

A simplification on Compiler process

1. Parsing ("front end"). Breaks down the code to elementary operations, and generates debug information. The C++ front end (but not the C front end) inlines some functions that were explicitly or implicitly declared inline. Both C and C++ can mark functions for the back end as "please generate inline."

2. Code generation ("back end"). Generates code, with variable levels of optimizing. Function inlining can occur at -xO3 by request, and at -xO4 or -xO5 even if not requested.

The -g option affects primarily the front ends. The back ends disable some optimizations when -g is in effect. The -g option disables front-end inlining by the C++ compiler; the -g0 option enables front-end inlining.

Most of the remaining optimization options primarily affect the back end, and are the same for C and C++. A few options are available only in C. You need to check the documentation for the compiler version you use, because new options are added from time to time.

The -fast option is really a macro that expands to a series of options based on the details of the system running the compiler. If you run the resulting program on a different machine, results will be sub-optimal, maybe even slower than if it were not compiled with -fast.

The -xOn options select optimizations that are useful across a range of systems.

Boot NFS from NG-Zone

Booting a zone over NFS (namely where its root file system is on the NAS device) is not supported at the current time. There are um...interesting workarounds like using lofi(7D) available.Boot support over NFS is definitely something we want to support in the future, Just to make it clear though - a Solaris 10 system acting as a NAS device can itself host its own non-global zones. They just need to boot off of local file systems on the system.

A classic use case for Zone booting over NFS could be identified within grid computing environment
in order to achieve resource allocation.It is critical since I consider zone is considered as
resource fabric to access large (TB) over underline network NFS. Otherwise grid service providers need to implement quite Transactional grid enabled network file system management services. Of course, it has to be zone aware.

Friday, June 16, 2006

Middleware enters ESB age and SOA is ready

All Middleware vendors adopt ESB solution.
SOA age is ready. ESB next standard in
the Java EE stack ?

Thursday, June 15, 2006

A real world problem and algorithmic analysis

Problem:
An engineer using Toshiba Tecra M2 laptop, and it is failing on him.The problem seems like either power adaptor or the battery pack, but he needs to determine which before he tried to purchase any replacement. For that, He would like to do simple test with existing ones.

Objective:

fix the problem instead of bringing up to run.

Algorithm Analysis and Design:


0x00000001 --- AC Adapter
0x00000010 --- Laptop Main

Tested Unit:
Parts[0] <---- 0x00000001
Parts[1] <---- 0x00000010

Good Laptop as Instrumentation Tool:
Tool[0] <-----0x00000011
Tool[1] <-----0x00000100


Fix-Laptop(Parts, Tool) return 0-1
DefectParts <---- Test-Part (Parts, Tool)
Go to Dealer for maintainance
done <---- 1
return done


Test-Part (Parts, Tool) return DefectParts
Success = Connect(Parts[1], Tool[0]);
if Success
then DefectParts[0] <---- Parts[0]
else do Success = Connect(Parts[1], Tool[1]);
then DefectParts[0] <---- Parts[1]
else do inspection again ensure no contact issue
else DefectParts <----- Parts
return DefectParts

Solution:

It seems unicast to local vendor would be optimal
algorithm than multicast over the smtp overlay
network in term of complexity, cost and completeness.
Since We need to go to dealer anyway, why do
test there and get fixed part right away ?

CDROM access from NG-Zone

How to access cdrom drive from the zone

(1) Try the followings in GZ

/etc/init.d/volmgt start

Make sure in LZ the following SMF services are online too:

svcadm enable svc:/network/rpc/bind:default
svcadm enable svc:/network/rpc/smserver:default

Check the following in case the previous commands don't succeed
Run prtconf to see if the "sd" driver is attached to the cdrom or not.
If not, rem_drv sd and add_drv sd to see if sd can be attched to the cdrom.
If attach fails, then you have a problem. No matter what you are trying, your cdrom cannot be mounted at all.

The bottom line, the "sd" driver attached to the cdrom hardware.


(2) To add a CD-ROM:

run zonecfg and add the foolowing statements:

add fs
set dir=/cdrom
set special=/cdrom
set type=lofs
set options=[nodevices]
end

report system configuration with mdb

Run as root

echo "::prtconf" | mdb -k


It will report all device configuration

Wednesday, June 14, 2006

T1 and e1000g

Enabling e1000g
==============

The Ontario motherboard has Intel Ophir chip that can be used with ipge or e1000g network drivers.
Typically the factory default driver is ipge. If you want to exercise the e1000g driverinstead of
the ipge, please follow the following steps.

How to switch to e1000g driver from factory default ipge driver:
========================================================
1) Edit /etc/rc2.d/S99bench. Plumb e1000g and comment out the plumbing of ipge.
using the command ifconfig e1000g plumb

2) In a separate window on the same machine, run, prtconf -pv and see what
compatible vendor ids are shown for the network interface
You will see entries such as :
>>
compatible: 'pciex8086,105e.108e.105e.6' + 'pciex8086,105e.108e.105e' +
'pciex8086,105e.6' + 'pciex8086,105e' + 'pciexclass,020000' +
'pciexclass,0200'
<<

Check the driver aliases file for ipge entries. You will see values such as :
ipge "pciex8086,105e"
ipge "pciex8086,105f"
ipge "pci8086,105e"
ipge "pci8086,105f"

Check if any of these vendor ids match with vendor-product ids already listed
for e1000g driver. Backup /etc/driver_aliases file to /etc/driver_aliases.ipge Replace all ipge to e1000g in /etc/driver_aliases

3)
Backup original /etc/path_to_inst file to path_to_inst.ipge.
Now, change all the ipge entries in path_to_inst file to e1000g.
Note: the port numbers change too.

port 1 e1000g --> port 0 ipge
port 3 e1000g --> port 1 ipge
port 0 e1000g --> port 2 ipge
port 2 e1000g --> port 3 ipge

Check out the diff below and edit your path_to_inst accordingly.

testmachine> diff path_to_inst path_to_inst.ipge
10,11c10,11
< "/pci@780/pci@0/pci@1/network@0" 1 "e1000g"
< "/pci@780/pci@0/pci@1/network@0,1" 3 "e1000g"

> "/pci@780/pci@0/pci@1/network@0" 0 "ipge"

> "/pci@780/pci@0/pci@1/network@0,1" 1 "ipge"

18,19c18,19
< "/pci@7c0/pci@0/pci@1/network@0" 0 "e1000g"
< "/pci@7c0/pci@0/pci@1/network@0,1" 2 "e1000g"

> "/pci@7c0/pci@0/pci@1/network@0" 2 "ipge"

> "/pci@7c0/pci@0/pci@1/network@0,1" 3 "ipge"

4) Copy /etc/hostname.ipge2 to /etc/hostname.ipge2.bak
Rename /etc/hostname.ipge2 to hostname.e1000g0

5) Reboot the machine

6) When the machine comes up now, run ifconfig -a. You should be able to see
e1000g0
7) Check the inet and netmask for e1000g0 and correct it if necessary

8) Plumb other ports and set inet and netmasks for them also using:
#ifconfig e1000g0 inet netmask up
9) Setup default gateway (Get default gateway using netstat -nr)
#route add default

10) Check cables and make sure leds are green
You should now be able to ping through e1000g on all the interfaces


How to use a new e1000g driver on factory installed Ontario:
=============================================================

Obtain the latest e1000g driver files: e1000g and e1000g.conf
(You may want to contact the e1000g driver team)


1) Copy driver and conf file.
copy e1000g binary to /kernel/drv/sparcv9/
copy e1000g.conf file to /kernel/drv/

2) Backup /etc/driver_aliases to /etc/driver_aliases.ipge

3) Modify /etc/driver_alias

4) Replace all the ipge to e1000g.

5) Backup /etc/path_to_inst to /etc/path_to_inst.ipge

6) Modify /etc/path_to_inst by replacing ipge with e1000g.
Note, the port numbers will change too.
Port for ipge1 becomes e1000g3 and ipge2 becomes e1000g0.

7) Modify /etc/name_to_major. Add a line at end "e1000g 267".
(Go to the last line and select the number that is consecutively higher).

8) Run: touch /reconfigure

9) cp /etc/hostname.ipge2 /etc/hostname.e1000g0
10) Edit /etc/rc2.d/S99bench to plumb e1000g and comment out ipge.
11) Reboot machine.

12) When the machine comes up now, run ifconfig -a. You should be able to see
e1000g entry

13) Check the inet and netmask for e1000g0 and correct it if necessary

14) Plumb other ports and set inet and netmasks for them also using:
#ifconfig e1000g0 inet netmask up

15) Setup default gateway (Get default gateway using netstat -nr)
#route add default

16) Check cables and make sure leds are green
You should now be able to ping through e1000g on all the interfaces

T1 CPI

(7) The instruction execution resulting in overlapping latency which leads to the memory model of T1 addresses the effectiveness contributed with or without memory stalls across the memory
hierarchy. Empirical data set indicates the problem size and requires further investigation on the hidden factors contributing the CPU efficiency.

(6) Someone may agree. A CPI of >=4 as tested on a Niagara would indicate linear thread scalaing, but the same data found on an USIII would not necessarily lead to the same conclusion. 1 - 2 CPI on an USIII could actually be >=4 CPI on a T1 because of the USIII's superscalarness. The amount of thread level paralellism in an instruction stream is somewhat limited. So the real question is, is it possible for a *realistic* instruction stream to have a number of stalling instructions that would give >4 CPI on a T1 but closer to 1 CPI on an USIII, due to the USIII's superscalarness masking the stalls? I'm not so convinced that this is true. But data would be good.

(5) Someone questioned why a cpi > 4[as seen on a USIII] is required for a workload to scale linearly on T1."most of the kernels don't meet the T1 requirement of a cpi of 4 to get thread scaling".USIII has instruction level parallelism so (theoretically) a CPI greater than 4(as seen on USIII) should not be a necessary condition to linearly scale on T1.


(4) Data on which specint tests contain heavy FP. I don't have the data, but I suspect the twolf test also has decent FP as it's another place and route test like vpr.Eon is a graphics visualization test. Also see Brian's comments about Niagara's CPI for specint, indicating that even if you discount the performance on the FP heavy workloads, specint still won't do as well as a "real world" workload that has average cpi > 4.

(3) the SPECint_rate FP data set

Percent fp...
vpr dataset 1 -> 5.6%
dataset 2 -> 8%
eon dataset 1 -> 15.9%
dataset 2 -> 15.2%
dataset 3 -> 16.3%
Only 0.1% of the instructions in eon are sqrt, so fixing
sqrt will help single core, but not significantly change
the rate result.

CPI
gzip ds1 -> 1.14
ds2 -> 1.10
ds3 -> 0.97
ds4 -> 0.97
ds5 -> 1.17
vpr ds1 -> 1.34
ds2 -> 2.37
gcc ds1 -> 2.19
ds2 -> 1.37
ds3 -> 1.50
ds4 -> 1.64
ds5 -> 1.46
mcf ds1 -> 5.81
crafty ds1 -> 1.00
eon ds1 -> 1.20
ds2 -> 1.25
ds3 -> 1.30
perlbmk ds1 -> 1.27
ds2 -> 1.08
ds3 -> 1.74
ds4 -> 1.03
ds5 -> 1.07
ds6 -> 1.04
ds7 -> 1.06
gap ds1 -> 1.46
vortex ds1 -> 1.38
ds2 -> 1.24
ds3 -> 1.39
bzip2 ds1 -> 1.11
ds2 -> 0.91
ds3 -> 0.97
twolf ds1 -> 1.94

(data collected by Darryl Gove on a US3 1056MHz system)

So, for this benchmark, most of the kernels don't meet the T1
requirement of a cpi of 4 to get thread scaling. That, along
with a single issue processor make it impossible to get good
numbers on this benchmark.

So, the problem really is, int_rate doesn't stall on memory enough
for the T1 processor.

I noticed a reply from you to niagara-interest saying that according to folks at SAE, specint is ~20% floating point. Do you know where I might be able to find a breakdown of FP % for each of the 12 benchmarks.

I have a partner that uses specint results to compare platforms internally and is doing some Niagara testing. They're aware that specint contains floating point instructions and are willing to take suggestions from us on how the different benchmarks should be weighted to emphasize integer performance (I'm hoping, of course, that there are some benchmarks with little to no FP).

D consumer and libtrace api

How to use libdtrace api's to interact with the dtrace
subsystem. I just want to use certain methods from within
a 'c' program.


With the user land system calls, which D consumer would be in
your mind for call back invokation from your probes ?

Thursday, May 04, 2006

Probabilistic analysis and Random algorithm

(1) Cost Model and Running Time Model are different
(2) The Technique which is used to analyze Cost Model and Running Time Model is the same.
(3) It is to count the number of operation execution in the routine algorithm
(4) Probabilistic analysis is one of a technique which fits in the use case where there is a input distribution. However, if we can not describe a reasonable input distribution, we can not use probabilistic analysis.
(5) In many case, we only can know a little bit about the distribution of the input but can not model the knowledge of the input distribution.
(6) Random algorithm is controlled not only by input distribution but also random-number generator
so that we can have a level of control which I do not need to guess and make assumption that input
comes in with random order instead we choose input randomly to ensure input is definitely random.
We used to call a pseudorandom number generator --- deterministic algorithm returning random number
(7) Probabilistic analysis impose a distribution rather than assuming a distribution of inputs to
the development of the randomized algorithm
(8)

Friday, April 07, 2006

Fidelity

As the example in the previous section illustrates, the data accessed
by an Odyssey application may be stored in one or more gcncralpurpose
repositories such as file servers, SQL servers, or Web
servers. Alternatively, it may be stored in more specialized rcposltories
such as video libraries, query-by-image-content databases, or
back ends of geographical information systems.
The constraints of mobility complicate data access from such
servers. Ideally, a data item available on a mobile client should be
indistinguishable from that available to the accessing application if
it were to be executed on the server storing that item. But this corrcspondencemay
be difficult to preserve as resources become scarce;

Thursday, March 30, 2006

disable zlogin

Currently have scripts
running during the provisioning of zones to populate
SOE packages and preliminary configuration that need to
be done prior to granting access to anyone, including
any global administrator until after the configuration
is complete.

To avoid zlogin to local zone,
Add the following lines to the
zone's /etc/pam.conf just about
the "other auth" lines

#
# disable zlogin
zlogin auth required pam_deny.so.1

Tuesday, March 28, 2006

Solaris motd issue

(1) Here is how login(1) routine works
/usr/sbin/quota (check quota)
/bin/cat -s /etc/motd (print motd)
/bin/mail -E (check mail)

(2) Here is how mibiisa(1M) works as SNMP agent utility
motd is part of sunsystem group for general
system information reporting. The first line
of /etc/motd. (string[255])

(3) For JES link, I have JESQ4 on my system, it does
not show the link. Have you check which component
create the link ?

(4) A few line D code below may help you to discover the issue


performance analysis for the alogrithm
counts the cost the steps of the random
access machine which is to modeled for
the instrumentation.

Consequently, higher level syscall
instrumentation
symlink(*char* *target , *char* *linkname)
tracing seems friendly for the implementation
of the code. It does not give a plus to performance
Therefore, I would keep the routine as close to
the I/O layer in order to mini the cost of the
delegation of the layered kernel architecture.

I would suggest to add directive as condition
rule to point to the path to the motd in order
to filter out the I/O.

symlink(2)does
the link and rename only. AI and Algoritm
calculation does make sense. The
implementation of my AI and Algorithm
is enhanced as code below.

Please let me know if it works on your system



#! /usr/sbin/dtrace -s
#pragma D option quiet

dtrace:::BEGIN
{
printf("%15s %40s\n", "Executable", "LinkFileName");
}
/* Please note input here is link file name not path */
fbt::fop_symlink:exit
/stringof(args[1]) == $$1/
{
printf("%15s %40s\n", execname,stringof(args[1]));
}

In addition, you can seperate the R/W to further
narrow down the report.

Here is a script that will print the time, name of the executable,
and ptree output when anyone tries to link /etc/motd

#!/usr/sbin/dtrace -wqs

syscall::symlink:entry
/basename(copyinstr(arg1))=="motd"/
{
printf("Caught the culprit\n");
printf("%20s\t %-20Y\n", "Time",walltimestamp);
printf("%20s\t %-10d\n", "Process id",pid);
printf("%20s\t %-20s\n", "Name of Executable" ,execname);
stop();
system("ptree %d",pid);
system("prun %d",pid);
}

Also if they want to use DTrace to automatically avoid the process
from creating the link they can use the script below. This would cause
any link to /etc/motd to become a link to /tmp/motd and then remove the
/tmp/motd file.

#!/usr/sbin/dtrace -wqs

syscall::symlink:entry
/copyinstr(arg1)=="/etc/motd"/
{
printf("Caught the culprit\n");
printf("%20s\t %-20Y\n", "Time",walltimestamp);
printf("%20s\t %-10d\n", "Process id",pid);
printf("%20s\t %-20s\n", "Name of Executable" ,execname);
copyoutstr("/tmp/motd",arg1,9);
stop();
system("ptree %d",pid);
system("prun %d",pid);
system("rm /tmp/motd");
}

Monday, March 27, 2006

Second hit on OS kernel architecture model design

As computer science illustrates:
a goal based model agent does the routine
perception, rule matching and goal mapping
in order to approximate the actions to deal
with the subset of the unobserved conditions.
Consequently, what will be the input of the
change ? kernel, specifically core kernel
modules seems relative stable. What else ?
security patches ? certified third party
drivers ? system library ? user lander
applications ? Is this the time to review the
challenge traditional open system kernel
architecture model design to deal with
complexity ? It is the common control,
plan and game algorithms, models along with
economic models inspires CS scientists
to review the challenges for OS vendors.
Microsoft is not alone. I will be valuable
to investigate the asymptotical notation
for both best cases and worst cases.

On the another hand, what will be business
intelligence to protect OS vendor market
shares along with the user land applications
for OS players ? Does the user land application
continue lock end users ? What will be the
economic and innovative delivery model for
both OS vendors and application providers ?
What OS vendors can really get from open
sources or open services environment ?

Many and Many questions and thoughts ?!


http://www.nytimes.com/2006/03/27/technology/27soft.html?hp&ex=1143522000&en=1c725e1c50ae8d6c&ei=5094&partner=homepage

Tuesday, March 21, 2006

System Event Handling for Non Global Zone via GPEC event queue channel

System event is not allowed in NG-zone. However, system calls
via sysevent 3SYSEVENT is the working solution

sysevent_bind_handle(3SYSEVENT) – bind or unbind subscriber handle
sysevent_free(3SYSEVENT) – free memory for sysevent handle
sysevent_get_attr_list(3SYSEVENT) – get attribute list pointer
sysevent_get_class_name(3SYSEVENT) – get class name, subclass name, ID or buffer size of event
sysevent_get_pid(3SYSEVENT) – get vendor name, publisher name or processor ID of event
sysevent_get_pub_name(3SYSEVENT) – get vendor name, publisher name or processor ID of event
sysevent_get_seq(3SYSEVENT) – get class name, subclass name, ID or buffer size of event
sysevent_get_size(3SYSEVENT) – get class name, subclass name, ID or buffer size of event
sysevent_get_subclass_name(3SYSEVENT) – get class name, subclass name, ID or buffer size of event
sysevent_get_time(3SYSEVENT) – get class name, subclass name, ID or buffer size of event
sysevent_get_vendor_name(3SYSEVENT) – get vendor name, publisher name or processor ID of event
sysevent_post_event(3SYSEVENT) – post system event for applications
sysevent_subscribe_event(3SYSEVENT) – register or unregister interest in event receipt
sysevent_unbind_handle(3SYSEVENT) – bind or unbind subscriber handle
sysevent_unsubscribe_event(3SYSEVENT) – register or unregister interest in event receipt

How to resolve fsflush overhead as large sized memory mapped

To reduce the I/O on Solaris platform, Solaris does
offer virtual file system which is memory-based file systems
that provide access to kernel specific resources.
As the name indicates,virtual file systems do not
use file system disk space. However, tmpfs use the
swap space on a disk. tmpfs is the default file system
type for the /tmp directory in the Solaris.

Since uses local memory for file system reads and writes,
it has much more lower latency than using classic Solaris
UFS. I/O performance can be enhanced by reduced I/O
to a local disk or across the network in order to significantly
speed up their creation, manipulation etc. Therefore, tmpfs
can be utlized for the memory mapping.

Files in TMPFS file systems are votial. The files will be dispeared
as the file system is unmounted and when the system is shut down
or rebooted. Files can be moved into or out of the /tmp directory.
This means KTS needs to ensure completed process image backup
as normal fsflush does.

Please note that tmpfs uses swap space for pageout. Process will
be executed as system does not have enough swap space. Which means
it requires larger swap space for pagout.


Other than Solaris built-in kernel modules, Sun Storage Cache also can
be utlized to reduce the latency of the I/O activities.


To ensure no paging on Solaris platform, shmop(2) with shmget(2)
and Intimate Shared Memory variant of System V shared
memory. ISM* *mappings are created with the SHM_SHARE_MMU flag.
This locks down the memory used. Then just read the file
into shared memory. But this may result in code change.



If this is not an option you can tune the flusher with system parameters

set segspt_minfree
set swapfs_minfree
set lotsfree
set desfree
set minfree

You can also postpone the time between cleaning with

set autoup

Monday, March 20, 2006

Solaris Page-Demand Memory Management

Pertaining Solaris Kernel Proc mgt and Memory Mgt
architecture, Heap segment within Process Virtual
Address Space is allocated for user land data structure
righ above executable data segment of the user land
DB process and grow with the libc.so.1 system
library call such as malloc(3c) which malloc_unlocked does
the dirty work to allocate holding blocks or ordinary
blocks for user land process. If there is no block sbrk(3c) is
called.

It is transparent zero-fill-on-demand memory page
allocation because of the page fault.
Page memory is allocated for the process heap and
space becomes a permanently allocated block.

(1) However, the allocation will not shrink until
process exits.
(2) Page scanner daemon runs to page out memory page
per LRU due to the shortage of memory

This is the core of on demand page memory management
architecture of Solaris Operating System.

As for the free(3c) which free_unlocked is doing the dirty
work to mark the address space as free list
for later use but not to return address space to memory
resource managed pool.

Solaris Basic Library

These default memory allocation routines are safe for use
in multithreaded applications but are not scalable.
Concurrent accesses by multiple threads are single-threaded
through the use of a single lock. Multithreaded applications
that make heavy use of dynamic memory allocation should be
linked with allocation libraries designed for concurrent access,
such as libumem(3LIB) or libmtmalloc(3LIB). Applications that
want to avoid using heap allocations (with brk(2)) can do so
by using either libumem or libmapmalloc(3LIB). The allocation
libraries libmalloc(3LIB) and libbsdmalloc(3LIB) are
available for special needs.

Saturday, March 18, 2006

Faster zone provisioning using zoneadm clone and Dtrace to monitoring zone

There is a wonderful blog on zone, I am transfering one engineer's
test below:

Faster zone provisioning using zoneadm clone

creating zones in parallel to reduce the time it takes to provision multiple zones, it was suggested that the new zoneadm clone subcommand could be of help. The zoneadm clone subcommand (available from build 33 onwards) copies an installed and configured zone. Cloning a zone is faster than installing a zone, but how much faster? To find out an Engineer did some quick experiments creating and cloning both whole root and sparse root zones on a V480:

Creating a whole root zone:

# zonecfg -z zone1
zone1: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:zone1> create -b
zonecfg:zone1> set zonepath=/zones/zone1
zonecfg:zone1> exit
# time zoneadm -z zone1 install
time zoneadm -z zone1 install
Preparing to install zone .
Creating list of files to copy from the global zone.
Copying <123834> files to the zone.
Initializing zone product registry.
Determining zone package initialization order.
Preparing to initialize <986> packages on the zone.
Initialized <986> packages on zone.
Zone is initialized.
Installation of these packages generated errors:
The file contains a log of the zone installation.

real 13m40.647s
user 2m49.840s
sys 4m43.221s

Cloning a whole root zone:

# zonecfg -z zone1 export|sed -e 's/zone1/zone2/'|zonecfg -z zone2
zone2: No such zone configured
Use 'create' to begin configuring a new zone.
# time zoneadm -z zone2 clone zone1
Cloning zonepath /zones/zone1...

real 8m4.615s
user 0m9.780s
sys 2m18.334s

For the whole root zone cloning is almost twice a fast as a regular install.

Creating a sparse root zone:

# zonecfg -z zone2
zone3: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:zone3> create
zonecfg:zone3> set zonepath=/zones/zone3
zonecfg:zone3> exit
# time zoneadm -z zone3 install
Preparing to install zone .
Creating list of files to copy from the global zone.
Copying <2535> files to the zone.
Initializing zone product registry.
Determining zone package initialization order.
Preparing to initialize <986> packages on the zone.
Initialized <986> packages on zone.
Zone is initialized.
Installation of these packages generated errors:
The file contains a log of the zone installation.

real 6m3.227s
user 1m45.902s
sys 2m47.717s

Cloning a sparse root zone:

# zonecfg -z zone3 export|sed -e 's/zone3/zone4/'|zonecfg -z zone4
zone4: No such zone configured
Use 'create' to begin configuring a new zone.
# time zoneadm -z zone4 clone zone3
Cloning zonepath /zones/zone3...

real 0m11.535s
user 0m0.706s
sys 0m6.440s

For the sparse root zone, cloning is more than thirty times faster then installing!

So if you need to provision multiple zones of a certain configuration, zoneadm clone is clearly the way to go.

Note that the current clone operation does not (yet) take advantage of ZFS. To see what ZFS can do for zone cloning, have a look at Mike Gerdts' blog: Zone created in 0.922 seconds. Goodness indeed.

T: OpenSolaris Zones
( Mar 18 2006, 07:12:17 PM CET ) Permalink Comments [1]
20050525 Wednesday May 25, 2005
Monitoring zone boot and shutdown using DTrace

Several people have expressed a desire for a way to monitor zone state transitions such as zone boot or shutdown events. Currently there is no way to get notified when a zone is booted or shutdown. One way would be to run zoneadm list -p at regular intervals and parse the output, but this has some drawbacks that make this solution less ideal:

* it is inefficient because you are polling for events,
* you will probably start at least two processes for each polling cycle (zoneadm(1M) and nawk(1)),
* more importantly, you could miss transitions if your polling interval is too large. Since a zone reboot might take only seconds, you would need to poll often in order not to miss a state change.

A better, much more efficient solution can be built using DTrace, the 'Swiss Army knife of system observability'. As mentioned in this message on the DTrace forum, the zone_boot() function looks like a promising way to get notifications when a zone is booted. Listing all FBT probes with the string 'zone_' in their name (dtrace -l fbt|grep zone_) turns up another interesting function: zone_shutdown(). To verify that these probes are fired when a zone is either booted or shutdown, let's enable both probes:

# dtrace -n 'fbt:genunix:zone_boot:entry, fbt:genunix:zone_shutdown:entry {}'
dtrace: description 'fbt:genunix:zone_boot:entry, fbt:genunix:zone_shutdown:entry ' matched 2 probes

When zoneadm -z zone1 boot is executed we see that the zone_boot:entry probe fires:

CPU ID FUNCTION:NAME
0 6722 zone_boot:entry

The zone_shutdown:entry probe fires when the zone is shutdown (either by zoneadm -z zone1 halt or using init 0 from within the zone):

0 6726 zone_shutdown:entry

This gives us the basic 'plumbing' for the monitoring script. By instrumenting the zone_boot() and zone_shutdown() functions with the FBT provider we can wait for zone boot and shutdown with almost zero overhead. Now what is left is finding out the name of the zone that was booted or shutdown. This requires some knowledge of the implementation and access to the source (anyone interested can take a look at the source after OpenSolaris is launched, so stay tuned).

A quick look at the source shows that we can get the zone name by instrumenting a third function, zone_find_all_by_id() that is called by both zone_boot() and zone_shutdown(). This function returns a pointer to a zone_t structure (defined in /usr/include/sys/zone.h). The DTrace script below uses a common DTrace idiom: in the :entry probe we set a thread-local variable trace that is used as a predicate in the :return probes (the :return probes have the information we're after). The FBT provider :return probe stores the function return value in args[1] so we can access the zone name as args[1]->zone_name in fbt:genunix:zonefind_all_by_id:return and save it for later use in fbt:genunix:zone_boot:return and fbt:genunix:zone_shutdown:return.

#!/usr/sbin/dtrace -qs

self string name;

fbt:genunix:zone_boot:entry
{
self->trace = 1;
}

fbt:genunix:zone_boot:return
/self->trace && args[1] == 0/
{
printf("Zone %s booted\n", self->name);
self->trace = 0;
self->name = 0;
}

fbt:genunix:zone_shutdown:entry
{
self->trace = 1;
}

fbt:genunix:zone_shutdown:return
/self->trace && args[1] == 0/
{
printf("Zone %s shutdown\n", self->name);
self->trace = 0;
self->name = 0;
}

fbt:genunix:zone_find_all_by_id:return
/self->trace/
{
self->name = stringof(args[1]->zone_name);
}


Starting the script and booting and shutting down some Zones gives the following result:

# ./zonemon.d
Zone aap booted
Zone noot booted
Zone noot shutdown
Zone noot booted
Zone aap shutdown

Friday, March 17, 2006

Vritual Machine on x86

OS architecture design and implementation comes from
simple structure to classic layered Unix system approach
which Lunix and Windows follows. Furthermore, to simplify
the kernel manageability, Micro kernel architecture was
proposed. However due to performance and scalability,
Solaris modulization design won the game. One interesting
thing is that Mac OS X takes hybrid structure which bridge
the layered BSD kernel design with Microkernel implementation.
Microkernel manages memory, RPC, IPC and Kthread scheduling.
BSD kernel does the CLIs, file systems and all the POSIX APIs.

Traditional layered Solaris Kernel design concludes the concept
of abstracting the HW resource into several execution environments.
With such virtualization techniques, a process is provided with a
virtual copy of underline OS and HW resources.

Therefore the fundamental resource to run virtual machine is to
share the HW with different execution environments. In such way,
Virtual machine is running in the kernel mode and execute at the
user mode. It has relative virtual user and kernel modes. If there is
a process running in a virtual machine, the control will be transfered
from virtual machine monitor to change the register and process program
counter for simulating the system call. Hence the major difference is
the real I/O will take much more time than virtual I/O does. CPU instruction
time will increase due to the multi-processes running within each virtual
machine. Virtual machine model is the best fit for R&D

However, it seems virtual machine can help resolve system compatibility
issues. The two popular favors of the virtual machine are: vmware and
Java VM. Since virtual machines are running on the top of OS, the traditional
OS design and implementation such Solaris Modules, Microkernel, VM are
still applied.

VMware abstracts x86 platform into isolated virtual machines. VMware runs
as user land application on the top of host OS which enable multiple guest
OSs concurrently within each virtual machines. However, the virtualization
layer as the core the vmware is the most expensive design to abstract the
underline resources into various virtual machines as guest OSs. Each vm
has it's own CPU, Memory, devices etc.

JVM is also abtracting the underline OS and HW. It is through class loader
and Java interpreters to execute the byte codes.

In general, the question of the design and utilization of virtual machine
is depends on the level of virtualization which fits in the requirements.
For platform and system level virtualization across different guest
OSs, vmware is the choice. However, if you only want to virtualize the
user land applications specifclly for Java applications, JVM is the right
technical and political answers to acorss different OSs. An important
note, application level virtualization has been done significantly by Sun
ISVs such as Cassat for Java EE virtualization.

Thursday, March 16, 2006

Wireless TCP

Traditional TCP does not serve the wirless connection efficiently due to the conventional TCP design assumption on the congestion control and "friendly design" of the protocols. This leads to the slow start and fast retransmit/fast recovery. High Error Rate, mobility caused packet dropping and TCP's fundermental issue for the time-out of the missing ack casued by congestion means classic TCP does not work for mobile computing.

UDP leaves the reliable and retranmission to the application layer.


Improved TCP (ITCP) such as Indirect TCP using accessing point or FA for Mobile Node. Segementing TCP connection into 2 connections. Snooping TCP uses FA or access buffers all data packets. Mobile TCP uses SH-MI connections and persistent mode to resolve the issues. Selective retransmission is the good solution for the test. Transaction-oriented TCP combine the packes for connection establishment and connection release with user data packets to reduce the packet for 3 ways handshakes (WAP does the similar things). Header compressionn does the work to for gaming apps.

Wednesday, March 15, 2006

install skype on Ubuntu

mkdir skype
mv skype_1.2.0.18-1_i386.deb skype_1.2.0.18-1_i386.deb.orig
dpkg-deb –extract skype_1.2.0.18-1_i386.deb.orig skype
dpkg-deb –control skype_1.2.0.18-1_i386.deb.orig skype/DEBIAN
vi skype/DEBIAN/control
Change to:
Depends: libc6 (>= 2.3.2.ds1-4), libgcc1 (>= 1:3.4.1-3), libqt3c102-mt (>= 3:3.3.3.2) | libqt3-mt, libstdc++5 (>= 1:3.3.4-1), libx11-6 | xlibs (>> 4.1.0), libxext6 | xlibs (>> 4.1.0)

dpkg –build skype
mv skype.deb skype_1.2.0.18-1_i386.deb
dpkg -i skype_1.2.0.18-1_i386.deb

Tuesday, March 14, 2006

package parameters for zone scope

SUNW_PKG_ALLZONES, SUNW_PKG_HOLLOW,SUNW_PKG_THISZONE

(1) SUNW_PKG_ALLZONES package parameter describes the zone scope of a package. This parameter defines the following:

*

Whether a package is required to be installed on all zones
*

Whether a package is required to be identical in all zones

The SUNW_PKG_ALLZONES package parameter has two permissible values. These values are true and false. The default value is false.

(2)

The SUNW_PKG_HOLLOW package parameter defines whether a package should be visible in any non-global zone if that package is required to be installed and be identical in all zones.

The SUNW_PKG_HOLLOW package parameter has two permissible values, true or false.

*

If SUNW_PKG_HOLLOW is either not set or set to a value other than true or false, the value false is used.
*

If SUNW_PKG_ALLZONES is set to false, the SUNW_PKG_HOLLOW parameter is ignored.
*

If SUNW_PKG_ALLZONES is set to false, then SUNW_PKG_HOLLOW cannot be set to true.

(3)

The SUNW_PKG_THISZONE package parameter defines whether a package must be installed in the current zone, global or non-global, only. The SUNW_PKG_THISZONE package parameter has two permissible values. These value are true and false. The default value is false.

(4)

If a package is installed with pkgadd -G or has the pkginfo setting SUNW_PKG_THISZONE=true, the package can only be patched with patchadd -G.

Zone and Solaris Harden

harden non-global zones using Solaris Security Toolkit not pkgrm.
harden the global zone using Solaris Security Toolkit so that any
subsequent non-global zones created,will automatically be hardened.

pkgrm is the underlying mechanism to remove software packages from Solaris.
If a package is zone-aware, you would use pkgrm to remove it from the zones.
Depending on what the customer's definition of "hardening" may be, it could be possible to satisfy this requirement without using pkgrm.



Basically, hardening the system should not cause issues.
That is as long as you don't remove basic zones functionality, I'm assuming you'll pkgrm some packages etc.
I'd suggest just trying it on a test system first.
The minumum cluster that zones functionality is deliverd in is SUNWCuser.
But it should be possible to start lower, i.e. SUNWCreq and build up, the following e-mail threads have some further discussion on this very topic.

Open Source and Open Service (OSS)

(1) Open Source vs Open Service (OSS)
Why, What, When and How for Sun and Partner
What are the nature of the problems to resolve ?
What are the related OSS has been visionlized
and implemented in the industry ?
(google, salesforce.com. Microsoft, ibm etc.)
(2) What will be engineering engagement platform for OSS ?
(3) What is the engineering engagement protocol for OSS ?
(4) What will be the engineering engagement execution
language for the OSS ?
(5) What will engineering engagement model for OSS ?
(6) What will be the engineering engagement layered service model
for OSS ?
(7) What will be strength and advantage of OSS vs traditional
engineering engagement ?
(8) What will be the impact to current engineering engagement
routine ?
(9) How to evaluate the performance of engineering engagement for
OSS
(10) How to ensure MDE and Tim's Team differentiate and out perform
engineering engagement for OSS
(11) What are the future work to be done ?

SPARC IV+ and T1 favors

SPARC IV+ and T1 are the different lines of platforms driving
industry requirements.

(1) From uts implementation point of view, sun4v addresses the
future platform architecture down the road from core kernel
implementation to FM architecture design. However, I do not
have specific core structure vs x64 core finity strcuture either.
This is has been fiting CMP SPARC IV+ for a long time since
Sun OS 2.6 from processor set to pid resource management.
This means sun4v requires more virtualization on processor
and core than traditional sun4u architecture and implementation.
(2) In addition, other than the firmware and HW architecture and
implmentation, Niagara address the throughput and latency
from bottom up as CMT promising.
(3) In process management, ABI contuines to offer the standard
for Solaris binary interface as part of ELF format for both platform
independent and processor specific system calls and stack mgt

(4) Network performance and throughput at Device level from traditional bge(7D) support to sustain scalable and throughput based volume
access

(5) Other than core uts processor implementation and system library
io, fpc IPC and px, pcb, vm, ebus, along with genunix implementation
including all loadable kernel modules such as device drivers, core
solaris kernel implementation are reused such as vm and file system
and scheduling as addressed in Solaris 10 core kernel implementation

so on so forth.................................

In general, there are quite of platform specific enhancement done
over T1 processor from kernel perspectives

(6) At user land, it depends on the application provider's architecture
and implementation to the level of utlizing process management
and resource allocation. What are the user land thread model designed
and implemented ? How LWP are created and how kthread is leveraged ?
How lock prmitives are designed at user land ? what thread libirary used
at user land for kthread creation and execution ?

Communication Service with SSO

the comms channels use SSO adapter to provide SSO with MS Exchange.

It is required to quickly get in MS Exchange without having to use
a full Directory Server DN matching the Active Directory DN

The Exchange plugin for the Mail channel uses the SSOAdapter
property "uid" for the IMAP user name and "password" for the password.

Niagara Process Image and ABI ELF format

There has been one thing since I have been working
on Niagara. From solaris process mgt point of view,
process image with ELF format addressed as part
of ABI standard starting the Solaris binary interface.

ELF addresses both platform independent and
processor specific specifications. For processor dependent
ABI standards include the routine sequence
(system calls, stack mgt etc) and Solaris interface for signals,
process initialization etc.

Does this mean Niagara inherit most the SPARC IV+
functional calls and stack mgt, process mgt and signal
interface ?

Monday, March 13, 2006

How to setup cisco VPN client on Ubuntu

1. Install


# sudo su -
# apt-get install build-essential linux-headers-`uname -r`
# echo tun >> /etc/modules
# modprobe tun
# cd VPNCLIENT_SRC_DIR
# ./vpn_install
# /etc/init.d/vpnclient_init start

2. edit sfbay.pcf files in /etc/CiscoSystemsVPNClient/Profiles


Description=Ebay VPN3000
Host=192.18.42.83
AuthType=1
GroupName=vpn
EnableISPConnect=0
ISPConnectType=0
ISPConnect=
ISPCommand=
Username=ll149252
SaveUserPassword=0
EnableBackup=1
BackupServer=ivpn-east.sun.com,ivpn-central.sun.com,ivpn-aus.sun.com
TunnelingMode=1
TCPTunnelingPort=10000
EnableLocalLAN=0
EnableNat=1
CertStore=0
CertName=
CertPath=
CertSubjectName=
CertSerialHash=00000000000000000000000000000000
DHGroup=2
ForceKeepAlives=0

3. connect to VPN

$vpnclient connect sfbay

4. Disconnect VPN

$vpnclient disconnect

kmem(7D) and kstat on NG-Z resource mgt and performance monitoring

(1) Regarding to Platform Computing

kmem(7D) is the device library with 3 open routines below

openkmem which opens /dev/kmem file descriptor
which read through the file by following routines
kemecpy
kstrncpy

For access to the virtual address space of Solaris kernel,
excluding memory associated with an I/O device.

However, kmem(7D)does not have full functionality in
a non-global zone.

(2) Regarding to HPOV

from Solaris process management point of view,
profs presudo file system export the abstraction
of kernel process mgt. There are a few user land
data strcuture to illustrate the performance mgt
needs. In addition kstat (1M) no longer sufficient
for NG-Z use case which address the uts structure
tagged with zoneid.

S10 Resource Mgt and HW resource Mgt

(2) S10 SRM and resource pool
Having resource pools enabled allows one to have virtualized statistics
for things like the CPU kstats, APIs like sysinfo(3C) and
getloadavg(3C) and utilities like mpstat(1M) and vmstat(1M). Basically
if pools are enabled, a zone will see a virtualize view of the relevant
statistics based on the pool the zone is bound to.

(3)FSS and processor resource usage

It's not as fine-grained as using solely FSS but it does provide a
great deal of flexibility including the ability to automatically set
the scheduling class of processes bound to the pool.


HW resource management approach, hypervisor

You can lose a lot of optimization if the
hypervisor abstracts too many hardware details (thread to processor
affinity is one such example). So while the more general purpose
the hypervisor (abstracts the most details) the fewer opportunities
for the OS to optimize (and in some cases they futilely optimize -
like a compulsion for a pointless or destructive activity).

The relevance here is that the abstraction layer presented by
Solaris zones is higher in the stack (near the user space layer)
so all of the platform specific optimizations are available to the
kernel. When you begin to think about the impact of optimizations
such multiple page sizes and memory placement, these details can become
very important. And of course reduced VM pressure from sharing a
common (but secure) buffer cache and shared libraries.

Mobile awared Resource Discovery with ad-hoc network

Scalability & Latency

1.Traditionally, physical entities such as a computer, network or storage system are considered as resource. With service oriented architecture, in addition to traditional physical resource, virtual services provide the consistent functionalities across the network.
2.Service Discovery: It is important for service consumer to identify service and characteristics of the service in order to understand the interfaces and authorized identity for services accessing.
3.Traditionally, service discovery is accomplished by service registry.
4.User tends to discover a service based on the knowledge of the service
5.Auto service discovery, search, selection, matching, composition and interoperation, invocation and execution requires a service description which is crucial
6.Functional classification or categories of the service is important for efficient way of querying and indexing a specific service
7.Discover, locating the network accessible capabilities, to support heterogeneous environment, standard RDS protocol and standard mechanism for expressing resource is required
8.Client normally query resource by properties such as capabilities, quality, terms, and configuration etc. Therefore the description language is demanded for resource discovery
9.Discovery is lightweight and non-authorized operation with no resource commitment. In addition, the aggregation of resource information for the purpose of large and distributed resource set needs to handle the overheads
10.Due to the boundary of the network, both physical resources and virtual services may not be reachable. The discovery model based on the network structure will not be effective. Specially for mobile aware provider and consumer applications and services, service transmission should continue with mobility. (1) Multi-cast Model (2)Directory Server Model (3) Hierarchical Directory Server Model which Directory Server located at each virtual logical boundary. The information aggregated all services in the Top level logical container. Requester unicasts the discovery queries to the server and queries will be forwarded the hierarchy. However, hierarchical model requires the deployment of the directory servers running in the upper and lower layer structure domains. The directory server in turns has to be configured in such hierarchical manner. But one thing for sure, applications in each local network does not require global network connection.
11.Flooding based unstructured P2P discovery model does not scale in terms of message overhead. Some proposed the optimized model to reduce the network traffic but increase the cost of query latency. DHT based system shows scalability and efficiency but can not handle complex query. Avoid overhead of resource discovery queries over the network, semantics-based P2P query forwarding strategy is valuable. Only forwarding query to semantically related nodes. RFD is used for resource and query expression. After the related node is identified, the original RDF query is applied to retrieve the designated information.

Sunday, March 12, 2006

Process and Address Space

Address space is the kernel abstraction of managing the memory
pages allocation for the process. Process needs memory address
space to store text instruction, data segment,heap as tmp process
space and stack

HW context: platform specific process execution environment
such as registers ------ CPU
address space ------------Memory

SW context: process credentials, open file lists, PIDs, signal disposition, signal handler, scheduling class and priority, process state,

Most of the process has stdin, stdout, stderr which define the source and destination
of input and output char streams for the process.

Solaris kernel, a process is composed of LWP and kthread. It is a kernel data strcution
linked to the process structure. This threading model seperate the user land threads with
kernel threads.

User land thread is created by routine call: thr_create(3T) or pthread_create(3T) from
libthread.so and libpthread.so. User thread has it's own scheduling and priority scheme
which different from kernel scheduling class and priority. It is done by calling into
thread library dispatcher and switch routines periodically at predefined preemption points.
User land thread is scheduled by linking on to a available LWP. It is required for a user
land thread to be executed. However, user thread is created does not mean LWP is created.
It must be specified as user thread to have kernel to create LWP from THR_NEW_LWP or THR_BOUND
flag. But for a threaded process with many bound LWPs with kernel thread will cause performance
impact.

In addition, thr_setconcurrency(3T) informs kernel that how may threads programmer wishes to run.
Without specification from code, thread library will maintain the reasonable number of LWPs for user land thread execution.

System should balance the case which there are too many LWPs or no enough LWPs to run.
Too many LWP cause kernel overhead to manage and too less LWPs cause many runnable user
land threads wait for resources for execution.

As traditional process modlel, exec or fork create new process.

In a multi thread process model, all HW context are not shared among user land threads
However, SW context such as address space, PIDs, credentials, signal hanlders etc. are shared.

ABI & ELF & Process Image

With ELF format within ABI standard, kernel and OS tools create
executable object which can be loaded into memory and created as
process for scheduling and execution.

As program becomes a binary executable object as it is complied
and linked with OS program language specific compiler. As the
executable object is exec(1), dynamic linking process starts with
the lib.so.1(1) is called to link with other shared objects from
libc.so.1 (dynamic link library) for instruction execution. Please
note that all references in the program are resolved via ld.so.1

However, static link can be achieved via -B static flag with compilation
which force all references are included at build time. But as dynamic
link process needs libc.so.1, static link process requires libc.a
Building 64 bit app can not be done with static link since there
is no static archieved libaray (libc.a) released with OS.

A program is compiled as ELF format executable object.
ELF defines the format for process on disk and in memory (process image)
ELF format is the part of ABI standard states the OS binary interface
for compiled and executed programs.

ELF addresses both platform independent and processor specific
specifications. For processor dependent ABI standards include the
function-calling squence (system calls, stack mgt etc) and OS
interface for signals, process initialization etc.

S10 PCB Structure

Two major abstraction of Solaris (Process and File)

(1) Process is basic unit of scheduling and execution on the Solaris
(2) Multi-threaded Process architecture: process, LWP and kernel thread
(3) Solaris kernel process model: procfs, signals,process group,session mgt
(4) Solaris kernel maintain system wide process table for PID and related data
(5) Solaris process abstraction includes traditional unix process model for
HW state and OS maintained SW context. Additionaly, supports multi-thread
execution within a single process. Each thread shares the process state
and can be scheduled and executed on a processor independently of other
threads within the same process
(6) Solaris invents Time Sharing scheduling policy and round robin approach
for process schediling scheme and alogrithm

In addition to Solaris unique process architecture design, a good tool is
been evolved for process monitoring. Procfs is the pseudo file system export
the Process Abstraction Model to user with a file system like interface for
extraction of process data.

Saturday, March 11, 2006

S10 Well Known Processes

(1) Memory scheduler

proc_sched

(2) init process

proc_init

(3) pageout deamon

proc_pageout

(4) fsflush

proc_fsflush

S10 zones vs IBM LPARs

- Platform Availability - Zones is SPARC, x86 and x64 - with
OpenSolaris, who knows what else, LPARs are extremely vendor specific

- Performance overhead - Zones offer 0 perf overhead for applications,
as there is no virtualization layer (ala hypervisor) that apps have to
punch through. The overall system overhead for Zones is minimal, due to
all the resource sharing. Contrast this architecturally against LPARs.

- Managebability - LPARs do nothing for manageability of the datacenter,
all they do is consolidate the hardware footprint. For a large %age of
apps, Zones resolve a large part of the management headache

- Obserability - this is *key*. If an app in a LPAR is not behaving,
there is no way for someone inside that OS instance to see whats going
on around itself. You cant call someone up who can check the entire
platform to try and diagnose the problem. With Zones, the global zone
admin has full visibility into all the local zones, and into the entire
hardware platform, no virtualization.