statcpp CLI Command Reference
Basic syntax:
statcpp <category> <command> [options] [file]
statcpp <shortcut> [options] [file]
Input methods:
# 1. Specify a CSV file
statcpp desc mean test/e2e/data/basic.csv --col value
# 2. Pipe from stdin (vertical, one value per line)
cat test/e2e/data/basic.csv | statcpp desc mean --col value
# 3. --row: Pass inline values directly (comma/space-separated)
echo '1,2,3,4,5' | statcpp desc mean --noheader --col 1 --row
echo '1 2 3 4 5' | statcpp desc mean --noheader --col 1 --row
| Input Method |
File Argument |
Additional Flags |
Data Format |
| CSV file |
Required |
Not required |
Standard CSV/TSV |
| stdin pipe |
Not required |
Not required |
Vertical (one value per line) |
--row |
Not required |
--noheader --col 1 --row |
Horizontal (comma/space-separated) |
Common Options
Options available for all commands.
| Flag |
Default |
Description |
--delimiter <char> |
Auto-detect |
CSV delimiter character |
--header / --noheader |
--header |
Whether a header row is present |
--na <strings> |
NA,NaN,nan,N/A,n/a |
Strings treated as missing values |
--skip_na / --noskip_na |
--skip_na |
Exclude missing values from calculations |
--fail_na |
false |
Exit with error if missing values are present |
--presorted |
false |
Data is already sorted (skip sorting) |
--row |
false |
Treat horizontal data as a single column (comma/space-separated) |
Output Control
| Flag |
Default |
Description |
--json |
false |
Output in JSON format |
--quiet |
false |
Output numbers only (for piping) |
Statistics Common
| Flag |
Default |
Description |
--alpha |
0.05 |
Significance level |
--level |
0.95 |
Confidence level |
--population |
false |
Population statistics (ddof=0) |
desc - Descriptive Statistics
Computes descriptive statistics for a single column of numeric data.
| Subcommand |
--col |
Additional Options |
Description |
summary |
1 column |
--presorted |
Summary statistics (count, mean, SD, five-number summary, skewness, kurtosis) |
mean |
1 column |
|
Arithmetic mean |
median |
1 column |
--presorted |
Median |
mode |
1 column |
|
Mode (displays all if multiple) |
var |
1 column |
--population |
Variance (default: sample variance) |
sd |
1 column |
--population |
Standard deviation |
range |
1 column |
|
Range (max - min) |
iqr |
1 column |
--presorted |
Interquartile range |
cv |
1 column |
|
Coefficient of variation (SD / Mean) |
skewness |
1 column |
|
Skewness |
kurtosis |
1 column |
|
Excess kurtosis (0 for normal distribution) |
percentile |
1 column |
--p (required), --presorted |
Percentile value |
quartiles |
1 column |
--presorted |
Quartiles (Q1, Q2, Q3) |
five-number |
1 column |
--presorted |
Five-number summary |
gmean |
1 column |
|
Geometric mean |
hmean |
1 column |
|
Harmonic mean |
trimmed-mean |
1 column |
--trim (default: 0.1) |
Trimmed mean (removes upper and lower extremes) |
statcpp desc summary test/e2e/data/basic.csv --col value
statcpp desc mean test/e2e/data/basic.csv --col value
statcpp desc median test/e2e/data/basic.csv --col value
statcpp desc mode test/e2e/data/basic.csv --col value
statcpp desc var test/e2e/data/basic.csv --col score --population
statcpp desc sd test/e2e/data/basic.csv --col score
statcpp desc range test/e2e/data/basic.csv --col value
statcpp desc iqr test/e2e/data/basic.csv --col value
statcpp desc cv test/e2e/data/basic.csv --col value
statcpp desc skewness test/e2e/data/basic.csv --col value
statcpp desc kurtosis test/e2e/data/basic.csv --col value
statcpp desc percentile test/e2e/data/basic.csv --col value --p 0.95
statcpp desc quartiles test/e2e/data/basic.csv --col value
statcpp desc five-number test/e2e/data/basic.csv --col value
statcpp desc gmean test/e2e/data/basic.csv --col value
statcpp desc hmean test/e2e/data/basic.csv --col value
statcpp desc trimmed-mean test/e2e/data/basic.csv --col value --trim 0.1
test - Statistical Tests
You can specify --alternative (two-sided / less / greater) and --alpha (default 0.05).
| Subcommand |
--col |
Additional Required Options |
Additional Optional Options |
Description |
t |
1 or 2 columns |
|
--mu0, --paired, --alternative |
t-test (auto-detected by number of columns: 1 column = one-sample, 2 columns = two-sample) |
welch |
2 columns |
|
--alternative |
Welch's t-test |
z |
1 column |
--sigma |
--mu0, --alternative |
z-test (requires known population standard deviation) |
f |
2 columns |
|
--alternative |
F-test (test for equality of variances) |
shapiro |
1 column |
|
|
Shapiro-Wilk normality test |
ks |
1 column |
|
|
Lilliefors normality test |
mann-whitney |
2 columns |
|
--alternative |
Mann-Whitney U test |
wilcoxon |
1 or 2 columns |
|
--mu0, --alternative |
Wilcoxon signed-rank test (1 column = one-sample, 2 columns = paired) |
kruskal |
2+ columns |
|
|
Kruskal-Wallis test |
levene |
2+ columns |
|
|
Levene's test (homogeneity of variances) |
bartlett |
2+ columns |
|
|
Bartlett's test (homogeneity of variances) |
chisq |
1 or 2 columns |
|
|
Chi-squared test (1 column = goodness-of-fit, 2 columns = observed/expected) |
statcpp test t test/e2e/data/two_groups.csv --col group1 --mu0 25
statcpp test t test/e2e/data/two_groups.csv --col group1,group2
statcpp test t test/e2e/data/two_groups.csv --col group1,group2 --paired
statcpp test welch test/e2e/data/two_groups.csv --col group1,group2
statcpp test z test/e2e/data/two_groups.csv --col group1 --mu0 25 --sigma 3
statcpp test f test/e2e/data/two_groups.csv --col group1,group2
statcpp test shapiro test/e2e/data/two_groups.csv --col group1
statcpp test ks test/e2e/data/two_groups.csv --col group1
statcpp test mann-whitney test/e2e/data/two_groups.csv --col group1,group2
statcpp test wilcoxon test/e2e/data/two_groups.csv --col group1 --mu0 25
statcpp test wilcoxon test/e2e/data/two_groups.csv --col group1,group2
statcpp test kruskal test/e2e/data/scores.csv --col math,science,english
statcpp test levene test/e2e/data/scores.csv --col math,science,english
statcpp test bartlett test/e2e/data/scores.csv --col math,science,english
statcpp test chisq test/e2e/data/basic.csv --col value
statcpp test chisq test/e2e/data/two_groups.csv --col group1,group2
corr - Correlation & Covariance
| Subcommand |
--col |
Additional Options |
Description |
pearson |
2 columns |
|
Pearson correlation coefficient |
spearman |
2 columns |
|
Spearman rank correlation coefficient |
kendall |
2 columns |
|
Kendall rank correlation coefficient |
cov |
2 columns |
--population |
Covariance |
matrix |
2+ columns |
|
Correlation matrix |
statcpp corr pearson test/e2e/data/two_groups.csv --col group1,group2
statcpp corr spearman test/e2e/data/two_groups.csv --col group1,group2
statcpp corr kendall test/e2e/data/two_groups.csv --col group1,group2
statcpp corr cov test/e2e/data/two_groups.csv --col group1,group2
statcpp corr matrix test/e2e/data/scores.csv --col math,science,english
effect - Effect Size
| Subcommand |
--col |
Additional Required Options |
Additional Optional Options |
Description |
cohens-d |
1 or 2 columns |
|
--mu0 |
Cohen's d (1 column = one-sample, 2 columns = two-sample) |
hedges-g |
1 or 2 columns |
|
--mu0 |
Hedges' g (with small-sample correction) |
glass-delta |
2 columns |
|
|
Glass's delta (standardized by control group SD) |
cohens-h |
Not required (no CSV input) |
--p1, --p2 |
|
Cohen's h (effect size for proportions) |
odds-ratio |
1 column (4 values) |
|
|
Odds ratio (4 values: a, b, c, d) |
risk-ratio |
1 column (4 values) |
|
|
Risk ratio (4 values: a, b, c, d) |
statcpp effect cohens-d test/e2e/data/two_groups.csv --col group1 --mu0 25
statcpp effect cohens-d test/e2e/data/two_groups.csv --col group1,group2
statcpp effect hedges-g test/e2e/data/two_groups.csv --col group1,group2
statcpp effect glass-delta test/e2e/data/two_groups.csv --col group1,group2
statcpp effect cohens-h --p1 0.6 --p2 0.4
statcpp effect odds-ratio test/e2e/data/contingency.csv --col value
statcpp effect risk-ratio test/e2e/data/contingency.csv --col value
ci - Confidence Intervals
Specify the confidence level with --level (default 0.95). ci prop and ci sample-size do not require CSV input.
| Subcommand |
--col |
Additional Required Options |
Additional Optional Options |
Description |
mean |
1 column |
|
--sigma |
Confidence interval for the mean (z-based if --sigma is specified, t-based otherwise) |
diff |
2 columns |
|
|
Confidence interval for the difference of means |
prop |
Not required |
--successes, --trials |
|
Confidence interval for a proportion (Wilson) |
var |
1 column |
|
|
Confidence interval for the variance |
sample-size |
Not required |
--moe |
--sigma |
Required sample size (for means if --sigma is specified, for proportions otherwise) |
statcpp ci mean test/e2e/data/two_groups.csv --col group1
statcpp ci mean test/e2e/data/two_groups.csv --col group1 --level 0.99
statcpp ci mean test/e2e/data/two_groups.csv --col group1 --sigma 3
statcpp ci diff test/e2e/data/two_groups.csv --col group1,group2
statcpp ci prop --successes 45 --trials 100
statcpp ci var test/e2e/data/two_groups.csv --col group1
statcpp ci sample-size --moe 0.03
statcpp ci sample-size --moe 5 --sigma 20
reg - Regression Analysis
The last column in --col is the response variable; the remaining columns are predictor variables.
| Subcommand |
--col |
Description |
simple |
2 columns (x, y) |
Simple linear regression |
multiple |
3+ columns (x1,...,xp, y) |
Multiple linear regression |
predict |
2 columns (x, y) |
Predicted values |
residuals |
2 columns (x, y) |
Residual diagnostics |
vif |
2+ columns (predictors only) |
Variance inflation factor (multicollinearity diagnostics) |
statcpp reg simple test/e2e/data/two_groups.csv --col group1,group2
statcpp reg multiple test/e2e/data/scores.csv --col math,science,english
statcpp reg predict test/e2e/data/two_groups.csv --col group1,group2
statcpp reg residuals test/e2e/data/two_groups.csv --col group1,group2
statcpp reg vif test/e2e/data/two_groups.csv --col group1,group2
anova - Analysis of Variance
Each column is treated as one group.
| Subcommand |
--col |
Description |
oneway |
2+ columns |
One-way ANOVA |
posthoc-tukey |
2+ columns |
Tukey HSD post hoc test |
posthoc-bonferroni |
2+ columns |
Bonferroni post hoc test |
posthoc-scheffe |
2+ columns |
Scheffe post hoc test |
eta-squared |
2+ columns |
Effect size (eta-squared, omega-squared, Cohen's f) |
statcpp anova oneway test/e2e/data/scores.csv --col math,science,english
statcpp anova posthoc-tukey test/e2e/data/scores.csv --col math,science,english
statcpp anova posthoc-bonferroni test/e2e/data/scores.csv --col math,science,english
statcpp anova posthoc-scheffe test/e2e/data/scores.csv --col math,science,english
statcpp anova eta-squared test/e2e/data/scores.csv --col math,science,english
resample - Resampling
Output depends on random numbers, so results will vary between runs.
| Subcommand |
--col |
Additional Options |
Description |
bootstrap-mean |
1 column |
|
Bootstrap confidence interval for the mean |
bootstrap-median |
1 column |
|
Bootstrap confidence interval for the median |
bootstrap-sd |
1 column |
|
Bootstrap confidence interval for the standard deviation |
bca |
1 column |
|
BCa method (bias-corrected and accelerated bootstrap) |
permtest |
2 columns |
--paired |
Permutation test (independent two-sample / paired) |
permtest-corr |
2 columns |
|
Permutation test for correlation |
statcpp resample bootstrap-mean test/e2e/data/basic.csv --col value
statcpp resample bootstrap-median test/e2e/data/basic.csv --col value
statcpp resample bootstrap-sd test/e2e/data/basic.csv --col value
statcpp resample bca test/e2e/data/basic.csv --col value
statcpp resample permtest test/e2e/data/two_groups.csv --col group1,group2
statcpp resample permtest test/e2e/data/two_groups.csv --col group1,group2 --paired
statcpp resample permtest-corr test/e2e/data/two_groups.csv --col group1,group2
ts - Time Series Analysis
| Subcommand |
--col |
Description |
acf |
1 column |
Autocorrelation function (max lag 20) |
pacf |
1 column |
Partial autocorrelation function |
ma |
1 column |
Moving average (window = 3) |
ema |
1 column |
Exponential moving average (alpha = 0.3) |
diff |
1 column |
Differencing (first-order) |
mae |
2 columns (actual, predicted) |
Mean absolute error |
rmse |
2 columns (actual, predicted) |
Root mean squared error |
mape |
2 columns (actual, predicted) |
Mean absolute percentage error |
statcpp ts acf test/e2e/data/two_groups.csv --col group1
statcpp ts pacf test/e2e/data/two_groups.csv --col group1
statcpp ts ma test/e2e/data/two_groups.csv --col group1
statcpp ts ema test/e2e/data/two_groups.csv --col group1
statcpp ts diff test/e2e/data/two_groups.csv --col group1
statcpp ts mae test/e2e/data/forecast.csv --col actual,predicted
statcpp ts rmse test/e2e/data/forecast.csv --col actual,predicted
statcpp ts mape test/e2e/data/forecast.csv --col actual,predicted
robust - Robust Statistics
| Subcommand |
--col |
Additional Options |
Description |
mad |
1 column |
|
Median absolute deviation (MAD) |
outliers |
1 column |
|
Outlier detection (IQR method) |
outliers-zscore |
1 column |
|
Outlier detection (Z-score method) |
outliers-modified |
1 column |
|
Outlier detection (modified Z-score / MAD-based) |
winsorize |
1 column |
--trim (default: 0.1) |
Winsorization |
hodges-lehmann |
1 column |
|
Hodges-Lehmann estimator |
biweight |
1 column |
|
Biweight midvariance |
statcpp robust mad test/e2e/data/two_groups.csv --col group1
statcpp robust outliers test/e2e/data/basic.csv --col value
statcpp robust outliers-zscore test/e2e/data/basic.csv --col value
statcpp robust outliers-modified test/e2e/data/basic.csv --col value
statcpp robust winsorize test/e2e/data/basic.csv --col value --trim 0.05
statcpp robust hodges-lehmann test/e2e/data/two_groups.csv --col group1
statcpp robust biweight test/e2e/data/two_groups.csv --col group1
survival - Survival Analysis
Event column: 1 = event occurred, 0 = censored.
| Subcommand |
--col |
Description |
kaplan-meier |
2 columns (time, event) |
Kaplan-Meier survival curve |
logrank |
4 columns (time1, event1, time2, event2) |
Log-rank test (two-group comparison) |
nelson-aalen |
2 columns (time, event) |
Nelson-Aalen cumulative hazard |
statcpp survival kaplan-meier test/e2e/data/survival.csv --col time,event
statcpp survival logrank test/e2e/data/survival_two.csv --col time1,event1,time2,event2
statcpp survival nelson-aalen test/e2e/data/survival.csv --col time,event
cluster - Clustering
Requires 2 or more columns of numeric data. Output may depend on random numbers.
| Subcommand |
--col |
Description |
kmeans |
2+ columns |
k-means clustering (k = 3) |
hierarchical |
2+ columns |
Hierarchical clustering |
silhouette |
2+ columns |
Silhouette analysis (k = 3) |
statcpp cluster kmeans test/e2e/data/scores.csv --col math,science,english
statcpp cluster hierarchical test/e2e/data/scores.csv --col math,science,english
statcpp cluster silhouette test/e2e/data/scores.csv --col math,science,english
multiple - Multiple Testing Correction
Specify a column of p-values.
| Subcommand |
--col |
Description |
bonferroni |
1 column |
Bonferroni correction |
holm |
1 column |
Holm-Bonferroni correction |
bh |
1 column |
Benjamini-Hochberg (FDR) correction |
statcpp multiple bonferroni test/e2e/data/pvalues.csv --col pvalue
statcpp multiple holm test/e2e/data/pvalues.csv --col pvalue
statcpp multiple bh test/e2e/data/pvalues.csv --col pvalue
power - Power Analysis
No CSV input required. If --n is specified, power is calculated; if omitted, the required sample size is calculated.
| Subcommand |
--col |
Required Options |
Optional Options |
Description |
t-one |
Not required |
--effect |
--n, --power (default: 0.8), --alternative |
Power for one-sample t-test |
t-two |
Not required |
--effect |
--n, --power, --ratio (default: 1.0), --alternative |
Power for two-sample t-test |
prop |
Not required |
--p1, --p2 |
--n, --power, --alternative |
Power for proportion test |
statcpp power t-one --effect 0.5 --n 30
statcpp power t-one --effect 0.5 --power 0.8
statcpp power t-two --effect 0.5 --power 0.8 --ratio 2.0
statcpp power prop --p1 0.3 --p2 0.5 --n 50
glm - Generalized Linear Models
The last column in --col is the response variable.
| Subcommand |
--col |
Description |
logistic |
2+ columns (x1,...,xp, y) |
Logistic regression (y is binary 0/1) |
poisson |
2+ columns (x1,...,xp, y) |
Poisson regression (y is count data) |
statcpp glm logistic test/e2e/data/binary.csv --col x1,x2,y
statcpp glm poisson test/e2e/data/count.csv --col x1,x2,y
model - Model Selection
The last column in --col is the response variable. Requires 3 or more columns (at least 2 predictor variables).
| Subcommand |
--col |
Description |
aic |
3+ columns |
Model comparison by AIC |
cv |
3+ columns |
Cross-validation (5-fold) |
ridge |
3+ columns |
Ridge regression (lambda = 1.0) |
lasso |
3+ columns |
LASSO regression (lambda = 1.0) |
statcpp model aic test/e2e/data/scores.csv --col math,science,english
statcpp model cv test/e2e/data/scores.csv --col math,science,english
statcpp model ridge test/e2e/data/scores.csv --col math,science,english
statcpp model lasso test/e2e/data/scores.csv --col math,science,english
Shortcuts
Frequently used commands can be entered without the category. If the first argument matches a shortcut name, it is automatically expanded to the corresponding category and command.
# The following two are equivalent
statcpp mean data.csv --col value
statcpp desc mean data.csv --col value
| Shortcut |
Expands To |
mean |
desc mean |
median |
desc median |
mode |
desc mode |
sd |
desc sd |
var |
desc var |
summary |
desc summary |
range |
desc range |
iqr |
desc iqr |
cv |
desc cv |
skewness |
desc skewness |
kurtosis |
desc kurtosis |
quartiles |
desc quartiles |
gmean |
desc gmean |
hmean |
desc hmean |
ttest |
test t |
pearson |
corr pearson |
spearman |
corr spearman |
kendall |
corr kendall |