Tuesday, July 31, 2007

divide my dataset to subsets to perform some experiments on each

You can add an ID attribute to all your data using the AddID filter in
weka.filters.unsupervised.attribute. Following this you can create your
splits explicitly using filters in the weka.filters.unsupervised.instance package (e.g. RemovePercentage, RemoveRange and RemoveFolds) or use the cross-validation (or
percentage split) evaluation options in the Explorer. In order to make
sure that the ID attribute is not used by the learned models you can
use the weka.classifiers.meta.FilteredClassifier in conjunction with
your chosen classifier and the weka.filters.unsupervised.attribute.Remove filter in order to remove the ID attribute just prior to constructing a classifier (and at
testing time too). With the current snapshot of the developer
version of Weka, you can also output additional attributes alongside the
predictions (in your case, the ID attribute).

Applying Quantum Principles to GAs for Multiple Objective Scheduling

Multi-objective scheduling problem has been proposed in literature (T'kindt et. al., 1996). However, conventional meta heuristics-based methods, such as GA algorithms, were studied with single objective function to derive combinatorial optimization.

Recent advances of GAs and multi-objective researches (Han et. al., 2005) applied principles of Quantum Computers to stochastic optimization problems. In addition, Q bit representation and permutation-based GA was advocated by researchers (Li & Wang, 2007). For multi-objective scheduling Q bit GA needs to obtain good approximation for both co-operative and competitive task environment. Moreover, Q bit-based permutation, crossover operators, selection, encoding, generation processes and fitness values are required to explore and exploit the large or high dimensional state spaces. It is a relaxed minimization problem with the Q-bit.

Algorithm evaluation needs to combine a vector of all local objective function values for each job. However, local optimum per job and global optimum for entire task environment indicates further layer of constraint based satisfaction with enumeration of iterative policy and state transitions. Other than the optimization of enumeration, parallelization may need to considered at both system and application level. Furthermore, pipelined processing and data parallelization are critical to reduce both time and space complexities.

Han et. al. (2005). A quantum-inspired genetic algorithm for flow shop scheduling. Springer-Velag.
T'kindt et. al.(1996). Multicriteria Scheduling: Theory, Models and Algorithms. Springer-Velag, 2002.

Enable and Disable dhcpagent

To enable and disable DHCP Agent

sys-unconfig

Thursday, July 26, 2007

Balance Dataset

A dataset with 2 unbalanced classes. 7,500 rows belong to Class A
and 2,500 rows belong to Class B. How do I randomly select rows from
Class A and Class B to balance the dataset


Using weka.filters.supervised.instance.SpreadSubsample with a
value of 1 (uniform) for the distributionSpread parameter.

Tuesday, July 24, 2007

Selective Attribute for classification training

The easiest way to do this would be to first make a copy of the
Instances object (Instances copyt = new Instances(train)) holding your
training data. Then for each instance that has values that you don't
want to train on set them to "missing". I.e. assume i is an instance
and the value at attribute 3 is to be skipped when training naive
Bayes, then i.setMissing(2) would do the trick. Note, that this
approach is specific to the way that naive Bayes works.

BI Feature Selection

A BI Feature selection descritipn:


"The random search method (weka.attributeSelection.RandomSearch)
in the attribute selection package that can be combined with any
feature subset evaluator in order to search for good subsets randomly.
If no start set is supplied, Random search starts from a random point
and reports the best subset found. If a start set is supplied, Random
searches randomly for subsets that are as good or better than the
start point with the same or or fewer attributes."

But heuristically, with some confidence, the 50000
features selected using Chi-squared correlation produce a more
accurate SVM model than 50000 features selected uniformly at random.

Thursday, July 19, 2007

CPU performance counter

(1) AMD Performance Counter, http://developer.amd.com/article_print.jsp?id=90
(2) cpustat and cputrack , trapstat
(3) lib.cpc

ID Problem Formulation

(1) Since I formulated the ID as stochastic DP problem. It owns
properties
of dynamic.
(2) To handle large state space and unknown attack type, the DP problem
transformed into adaptive tuning problem. Adaptiveness in terms of tunning
the networks interactively.

The properties of problem formulation has been addressed in the problem
formulation section. I have mathematical proof for the above items. It has been
further addressed in the methodology section of the formal paper.

As for the time constraints, do you mean I need time to implement the
entire mathematical framework as a software ? If so, it is true that I need to implement it
myself since existing tools such as MatLab only handle traditional weights tuning for fixed
neuro networks. In addition, no RL toolbox yet. This research counts on my own implementation for proposed algorithmic operations. As for the dataset preprocessing, it will not be issue for
me since I/O formating is ok for me.

False alarm ratio is only evaluated from known attacks from research
point of review. In real time system operation, it can not be proved by research framework.
This is the motivation 67to come up with "Tuning" framework for online detection in order to
reduce the false alarm ratio.Hence, false alarm problem is relaxed as the problem of "Tuning" of DP 5tproblem. This is one 1of the major advantagesyt of this research proposal.

Dataset is only critical for traditional neuro learning but not this
research. parameter is not restricted for this research either. All these are traditional neuro
learning problem. That is 238x3ethe motivation to propose RL based "Tuning" framework. This is one of major advantages of this
research proposal. For the specific host and network attack (spoofing
and memory overflow)
I have mathematical proof for this research. For the implementation, I
have to start with arbitrary
parameters and architecture. It is important to know that there is no
readily training set of state
and ROC function in DP context. The possible way is to evaluate the ROC
function by simulation
state decisions. Afterwards, using RL based interactive algorithm to
improve the ROC value. That
is the most key point of the research design.

Versioning Manager Output

Singleton VersioningManager can output any JComponent to the console

NB Back End Threaded Progresses

(1) Runnable for backend long run processes and Progress Handle management
Runnable allRunnable = new Runnable() {
WorkspaceMgr mgr = null;
Workspace ws = PerforceConfig.getDefault().getDefaultWorkspace();
ProgressHandle handle = null;

public void run() {
try {
handle = ProgressHandleFactory.createHandle(NbBundle.getMessage(PerforceAnnotator.class, "CTL_PopupMenuItem_ViewAllChangelist"));
mgr = new WorkspaceMgr(ws);
handle.start();
showAllChangelist(changes);
SwingUtilities.invokeLater(new Runnable() {

public void run() {
ChangelistView comp = ChangelistView.findInstance();
VersioningOutputManager.getInstance().addComponent(NbBundle.getMessage(PerforceAnnotator.class, "CTL_PopupMenuItem_ViewAllChangelist"), comp);
comp.setContext(ctx, changes);
comp.setDisplayName(NbBundle.getMessage(PerforceAnnotator.class, "CTL_PopupMenuItem_ViewAllChangelist"));
//comp.open();
comp.requestActive();
}
});
} catch (WorkspaceException ex) {
Exceptions.printStackTrace(ex);
} finally {
handle.finish();
}
}
};

(2) UI current thread context

if (cmd.equals(NbBundle.getMessage(PerforceAnnotator.class, "CTL_PopupMenuItem_ViewAllChangelist"))) {
RequestProcessor.getDefault().post(allRunnable);
}

(3) So, current thread, spawn thread for progress bar and backend process, back to UI population

Performance Analysis and Methodology

Performance analysis and methodologies are very broad topics. It is about optimization. Performance as a set of Bellman equations to solve. Traditional states and performance functions
enumerations does not address the performance evaluation issues
since a traditional MDP problem results in large state transitions
or high dimensional performance feature extractions. For a networking
only related problem formulation may involved into 30+ performance
parameters and for Solaris kernel, it involves 100+ parameters.
Hence, dynamic and adaptive performance analysis and associated
resource utilization analysis may reach the optimum
performance function evaluation with fast convergence.

It is involved with Performance Metrics (Parameters, feature extractions), Performance Functions, Performance Evaluations, Performance Learning, Performance Instrumentation, Performance Management and Adaptive Tuning etc. It really depends on specific issues to formulate the specific problem into adequate function, models to resolve the performance issues.

In addition, from CS analysis and design methods such as dynamic programming, divide by conquer, greedy and amortization. They are popular techniques to address performance from subproblem to global problems. However, to achieve end-to-end performance gains such as network tuning, global optimum may be the most concerns instead of local optimum. In addition, queuing theory has been widely adopts for traditional SMP based performance management and capacity planning.
Core based parallelism and pipelining introducing many new issues down
the road. Is queuing still works well for parallelism paradigm, if not
what will be the optimization, if yes, what will be proper queue
partitioning etc


In general quantitative methods should be the main theme of the analysis
and evaluation. It is hard to generalize as a whole but specific to
the target problem formulations.

P4 Annotation For Security Compliance Auditing

P4 annotate is the solution to discover the code change per version and who submit the change

Friday, July 13, 2007

P4 job

(1) P4 job specification can be customized
(2) create p4 job with the above specification
(3) lookup jobs assigned to specific developers
(4) developer edit src and submit the changelist
(5) developer run "fix" to associated submitted "changeNo" with job to ensure job is in closed state

P4 Labeling

It is not encourage to use label but changelist.

P4 branch

(1) Branch can be created with "integrate" for the "From Files" to "To Files" following by "submit"
However, Branch is the best practice to create branch. Because, with branches,
integrate -b branchname -r (bi-direction population can happen)
(2) any working branch can only populate the change with "integrate" with "submit" again. All conflicts willl be reported during submission phases.
(3) resolve action will be taken by user

HTTP/S for Performance Management Analysis/Report

First, it is a common engineering practice for
agent to collect data for analysis and report
layers in system management space. It happens
to all industrial players.
In addition, three tier performance management
architecture is also considered as the best
practice to scale to large data center performance
management. It can be further processing for event
collaboration for even larger performance
management crossing data centers.

Second, HTTPS is a security compliant requirement.
It is part of compliance practice.

The only limitation is that it is coming from
hosts and server domain management. If goes to
small devices management such as fans or power controller
The only methods now is SNMP or SNMP/S.

Crypto on T2000

Crytpo framework supports SCA6000 crypto provider for all NG zones, if it's configured.changes to the crypto framework config can only be done from the Global zone. You can list the providers from within the non-global zone but cannot change.

Thursday, July 05, 2007

Solaris Resource Management

From resource control, prcess level
control will pass in pid which is run
time only. So we create project to
assigned user progress to modify the
resource control per project based without
knowing pid. Of course, I want to try
pid too.


(1)Create a project
# projadd -U progress -p 8888 openedge
(2) projmod -c "It is project for resource control on openedge database" openedge
(3)List projects created
# project -l
ksh: project: not found
# projects -l
system (System built-in project, project id 0)
projid : 0
comment: ""
users : (none)
groups : (none)
attribs:
user.root (System built-in project, project id 1)
projid : 1
comment: ""
users : (none)
groups : (none)
attribs:
noproject (System built-in project, project id 2)
projid : 2
comment: ""
users : (none)
groups : (none)
attribs:
default (System built-in project, project id 3
projid : 3
comment: ""
users : (none)
groups : (none)
attribs:
group.staff (System built-in project, project id 10)
projid : 10
comment: ""
users : (none)
groups : (none)
attribs:
openedge (Project we created with designated id 8888)
projid : 8888
comment: "It is project for resource control on openedge database"
users : progress
groups : (none)
attribs:
(3) check project membership


id -p

# id -p
uid=0(root) gid=0(root) projid=1(user.root)

You can see root belongs to built-in project id1 which is user.root

# prstat -J
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
707 noaccess 222M 133M sleep 59 0 0:01:47 0.0% java/55
1025 root 4776K 4232K cpu8 59 0 0:00:00 0.0% prstat/1
118 root 5216K 4728K sleep 59 0 0:00:02 0.0% nscd/26
117 root 4640K 4008K sleep 59 0 0:00:00 0.0% picld/4
125 root 2592K 2080K sleep 59 0 0:00:00 0.0% syseventd/14
258 daemon 2752K 2432K sleep 59 0 0:00:00 0.0% statd/1
371 root 4856K 1672K sleep 59 0 0:00:00 0.0% automountd/2
97 root 2552K 2176K sleep 59 0 0:00:00 0.0% snmpdx/1
55 root 9160K 7528K sleep 59 0 0:00:01 0.0% snmpd/1
308 root 2080K 1224K sleep 59 0 0:00:00 0.0% smcboot/1
259 daemon 2432K 2136K sleep 60 -20 0:00:00 0.0% nfs4cbd/2
249 root 2728K 1632K sleep 59 0 0:00:00 0.0% cron/1
9 root 11M 10M sleep 59 0 0:00:19 0.0% svc.configd/17
7 root 19M 17M sleep 59 0 0:00:08 0.0% svc.startd/12
136 daemon 4680K 3528K sleep 59 0 0:00:00 0.0% kcfd/5
PROJID NPROC SIZE RSS MEMORY TIME CPU PROJECT
1 5 10M 9488K 0.0% 0:00:00 0.0% user.root
3 1 1376K 1280K 0.0% 0:00:00 0.0% default
0 37 390M 257M 0.7% 0:02:21 0.0% system


Total: 43 processes, 218 lwps, load averages: 0.01, 0.01, 0.01

# id -p root
uid=0(root) gid=0(root) projid=1(user.root)
# id -p daemon
uid=1(daemon) gid=1(other) projid=3(default)
# id -p noaccess
uid=60002(noaccess) gid=60002(noaccess) projid=3(default)


svcadm enable system/pools:default (resource pools framework)
svcadm enable system/pools/dynamic:default (dynamic resource pools)
svcadm enable svc:/system/pools:default (enable DRP service)

Check if pool services and dynamic pool service are enabled

# svcs *pool*
STATE STIME FMRI
online 11:10:01 svc:/system/pools:default
online 11:11:05 svc:/system/pools/dynamic:default

Share Memory setting in Soalris 10 or +

(1) If it is a shared memory issue, it may not be zone specific but S10 specific. Since it
/etc/system is not good practice but following S10 resource control practices
(2) Using prctl to project or even process based control. However, you need to create
project and assign user which running the process to the project. I have experienced
some bugs before to do this assignment. However, to login as the user then su to
root to assign the project
(3) projmod -s -K "project.max-shm-memory=(privileged, 8GB, deny) xxx

You may need to run it in global zone first before moving into local zone.

Core and LDOM Performance Management

Core and LDOM based system operations introduces parametric
modeling and approximation optimization problems from
traditional execution time to throughput, IPC, parallelism
and pipelining. This is applied to OS modeling and
performance management. It impacts the predictive
monitoring, analysis and reporting. This has been my
engineering interests since I worked in
system management and integration spaces.

Tuesday, July 03, 2007

From CMT perspective, out T1, AMD, Intel x86 platforms does introduce variance from traditional
SMP platforms. First, from HW platform point of view, physical processor structures changed to core
based, Second from Solaris point of view, kernel CPU structures, CPC counters, CPUTrack, CPUStat
Kstat changed (including all core changes), Third, from system management point of SNMP MIBII
database structure changed. This will impact current system management parametric model learning and system management view design.
From LDOM perspective, more virtualization based predictive modeling and reporting design needs to
be enhanced to predict and measure physical resources, kernel CPU structures, CPC counters and MIB
structures.

In general, to have system management to be earlier CMT and LDOM adaptor, they could need some level of support from Sun. This is just a proactive assessments.