I have developed numerous algorithms for applied scientific applications. These include:
Machine learning for function approximation:
- Hermite orthogonal polynomial networks
- Radial basis function (RBF) networks
- Kernel regression
Machine learning for classification:
- k-nearest neighbor
- Naïve Bayes classifier based on discretized quantiles
- Linear discriminant analysis (LDA)
- Quadratic discriminant analysis (QDA)
- Fisher's discriminant analysis (FDA)
- Learning vector quantization (LVQ1, LVQ2, GLVQ, RLVQ, GRLVQ)
- Radial basis function (RBF) networks
- L1 soft-norm support vector machines (SVM) for classification
- Multilayer perceptron back-propagation artificial neural networks (ANN)
- Unconditional logistic regression classifier (LOG)
- Polytomous logistic regression classifier (PLOG)
- Particle swarm optimization (PSO)
- Genetic algorithms (GA)
- k-fold cross-validation (CV)
- Leave-one-out cross validation (LOOCV)
- 0.632 Bootstrap cross validation
- Boosting
- Gini diversity index
- Information gain (entropy)
Grouping and dimensional reduction:
- Self organizing maps (SOM)
- Hierarchical cluster analysis (HCA)
- Crisp c-means cluster analysis (CCM)
- Fuzzy c-means cluster analysis (FCM)
- Principal component analysis (PCA)
Regression modeling:
- Iteratively reweighted least squares regression (IRLS)
- Linear regression
- Logistic regression (unconditional, conditional, and polytomous)
- Poisson regression (additive, multiplicative, power)
- Linear categorical regression (Grizzle-Starmer-Koch, Forthofer models)
- Non-linear regression with objective function solved through partial derivatives or finite differencing
Survival data analysis:
- Parametric regression (exponential, Gompertz, Weibull, etc.)
- Cox proportional hazards regression
- Kaplan-Meier survival analysis
- Double-decrement life tables for lifetime risk
Matrix operations:
- Inner product (kernel methods)
- Outer product
- Transpose
- Multiplication
- Non-symmetric eigenvalue problem with the QR algorithm
- Symmetric eigenvalue problem using Jacobi method for matrix inverse or eigenanalysis
- Singular value decomposition (SVD)
- Method of scoring with Jacobian matrix of first partial derivatives of objective function w.r.t. parameters
- Method of maximum likelihood with Fisher information matrix (Hessian) of first and second partial derivatives of log-likelihood function w.r.t. parameters and score vector of first partial derivatives of log-likelihood w.r.t. parameters
- Conjugate gradient
- Ascending and descending array sorting
- Hadamard matrices
- Jagged arrays
Hypothesis testing:
- 2-sample tests (t-test and Mann-Whitney U test, randomization versions available)
- k
-sample tests (F-test one-way ANOVA and Kruskal-Wallis, randomization versions available)
- Chi-square contingency tables
- Incomplete Beta Function
- Gamma Function
- Incomplete Gamma Function
- Bonferroni and Sidak corrections for multiple testing
- Benjamini & Hochberg false discovery rates (FDR)
- Storey q-values for (pFDR)
Summary statistics:
- Mean
- Standard deviation (variance)
- Minimum
- Maximum
- Median
- Range
- Skewness
- Kurtosis
- Interquartile range (IQR)
- Confidence limits
- Histograms
- Covariance matrices
- Pearson correlation matrices
- Spearman correlation
Probability distribution simulation:
- Uniform
- Constant
- Normal
- Log-normal
- Triangle
- Binomial
- Poisson
- Chi-square
Data bases:
- MS Access
- DAO Jet
- ADO
- SQL
- DataReader
Graphics:
- GDI+
- Histograms
- Boxplots
- Matrix scatter plots
- Line graph
- X-Y Scatter graphs
- Bar graphs
Object Oriented Programming (OOP):
- Class libraries
- Synchronous and asynchronous threading
- Delegates
- Invoke
Physics:
- Monte Carlo-based internal organ radiation dose from diagnostic x-ray
- Monte Carlo uncertainty analysis
- Quantum chromodynamics (QCD) formalism of intermittency with scaled factorial moments (SFM)
- Kernel density estimation (KDE)
- Cellular automata (Wolfram form)
Statistical genetics:
- Allele frequency
- Genotype frequency
- Population drift
- Hardy-Weinberg disequilibrium
- Linkage disequilibrium
- Pedigree management systems
The CLUSFAVOR algorithm for cluster and principal component analysis of DNA microarray data was released for public distribution on June 19, 2001. CLUSFAVOR has a wide user audience on an international scale. The recent release of ChipST2C (April 1, 2005) expands the user base and empowers users with capabilities far beyond what is offered in most public domain DNA microarray analysis packages.
BioMedStat is program which has numerous capabilities extending from Kaplan-Meier survival analysis to multiplicative poisson regression and polytomous logistic regression. All of the BioMedStat features listed on the home page have successfully been programmed; however, in the majority of cases the results have not been benchmarked against other software algorithms. For the results shown in the BioMedStat screenshots link, many of the cases used for development were based on published material (which used other software), and thus there was preliminary concordance with those data. Benchmarking is being perfromed with SPSS.