Skip to content

日本語

sqlite3StatisticalLibrary — Function Reference

Overview

This library is an extension module that makes statistical functions directly available from SQL using SQLite3's LOAD_EXTENSION feature.

This reference covers a total of 249 functions.

Category Count Description
Basic Aggregate Functions 24 Single-column aggregates: SELECT stat_xxx(col) FROM table
Parameterized Aggregate Functions 20 Aggregates with parameters: SELECT stat_xxx(col, param) FROM table
Two-Column Aggregate Functions 27 Two-column input aggregates: SELECT stat_xxx(col1, col2) FROM table
Window Functions 23 Full-scan window functions that return a value per row
Complex Aggregate Functions 32 Aggregate functions returning JSON results, two-sample tests, survival analysis, etc.
Scalar Functions — Test Helpers 40 DB-independent: distribution functions, special functions, proportion tests, multiple testing corrections, etc.
Scalar Functions — Distributions & Transformations 83 DB-independent: additional distribution functions, effect size conversions, power analysis, etc.
  • Basic Aggregate Functions through Two-Column Aggregate Functions (71 functions) are all aggregate functions and can be used in any context where SQLite3 standard aggregate functions are available, including GROUP BY, HAVING, and subqueries.
  • Window Functions (23 functions) are implemented as full-scan type and return one value per row.
  • Complex Aggregate Functions (32 functions) are aggregate functions (including those returning JSON results).
  • Scalar Functions (123 functions) compute results solely from their parameters.

Loading the Extension

Loading from sqlite3 CLI

-- .load command (recommended)
.load ./ext_funcs sqlite3_ext_funcs_init

-- load_extension() function
SELECT load_extension('./ext_funcs', 'sqlite3_ext_funcs_init');

Loading from C/C++ Programs

sqlite3_enable_load_extension(db, 1);
sqlite3_load_extension(db, "./ext_funcs.dylib",
                        "sqlite3_ext_funcs_init", &errmsg);

Note: The entry point name sqlite3_ext_funcs_init must be explicitly specified. SQLite's auto-detection removes _ characters.


Common Specifications

NULL Handling

  • Rows with NULL input values are ignored (following SQLite3 aggregate function conventions)
  • Returns NULL when all rows are NULL or the result set is empty
  • Returns NULL when the computation results in NaN or Inf
-- Data with NULL values
CREATE TABLE sample (val REAL);
INSERT INTO sample VALUES (1), (NULL), (3), (NULL), (5);

-- NULLs are ignored; computed over 3 values: 1, 3, 5
SELECT stat_mean(val) FROM sample;
-- → 3.0

-- All rows are NULL
SELECT stat_mean(NULL);
-- → NULL

-- Empty result set
SELECT stat_mean(val) FROM sample WHERE val > 100;
-- → NULL

Minimum Data Count Requirements

Some functions require a minimum number of data points. When insufficient data is provided, NULL is returned.

Function Minimum Data Count
stat_sample_variance, stat_sample_stddev, stat_se, stat_cv 2
stat_population_skewness, stat_skewness 3
stat_population_kurtosis, stat_kurtosis 4
Others 1

Using with GROUP BY

All aggregate functions can be used with GROUP BY.

SELECT category,
       stat_mean(score)   AS mean_score,
       stat_median(score) AS median_score,
       stat_sample_stddev(score) AS stddev_score
FROM exam_results
GROUP BY category;

Function List (Quick Reference)

Basic Aggregate Functions (24 functions)

Function Description Return Min n Details
stat_mean(col) Arithmetic mean REAL 1 Details
stat_median(col) Median REAL 1 Details
stat_mode(col) Mode (smallest value) REAL 1 Details
stat_geometric_mean(col) Geometric mean REAL 1 Details
stat_harmonic_mean(col) Harmonic mean REAL 1 Details
stat_range(col) Range REAL 1 Details
stat_var(col) Variance (population) REAL 1 Details
stat_population_variance(col) Population variance REAL 1 Details
stat_sample_variance(col) Sample variance (unbiased) REAL 2 Details
stat_stdev(col) Standard deviation (population) REAL 1 Details
stat_population_stddev(col) Population standard deviation REAL 1 Details
stat_sample_stddev(col) Sample standard deviation REAL 2 Details
stat_cv(col) Coefficient of variation REAL 2 Details
stat_iqr(col) Interquartile range REAL 1 Details
stat_mad_mean(col) Mean absolute deviation REAL 1 Details
stat_geometric_stddev(col) Geometric standard deviation REAL 1 Details
stat_population_skewness(col) Population skewness REAL 3 Details
stat_skewness(col) Sample skewness REAL 3 Details
stat_population_kurtosis(col) Population kurtosis (excess) REAL 4 Details
stat_kurtosis(col) Sample kurtosis (excess) REAL 4 Details
stat_se(col) Standard error REAL 2 Details
stat_mad(col) Median absolute deviation (MAD) REAL 1 Details
stat_mad_scaled(col) Scaled MAD REAL 1 Details
stat_hodges_lehmann(col) Hodges-Lehmann estimator REAL 1 Details

Parameterized Aggregate Functions (20 functions)

Function Description Return Min n Details
stat_trimmed_mean(col, proportion) Trimmed mean REAL 1 Details
stat_quartile(col) Quartiles (Q1, Q2, Q3) JSON 1 Details
stat_percentile(col, p) Percentile REAL 1 Details
stat_z_test(col, mu0, sigma) One-sample z-test JSON 1 Details
stat_t_test(col, mu0) One-sample t-test JSON 2 Details
stat_chisq_gof_uniform(col) Chi-square goodness-of-fit test JSON 2 Details
stat_shapiro_wilk(col) Shapiro-Wilk test JSON 3 Details
stat_ks_test(col) Lilliefors test (normality) JSON 2 Details
stat_wilcoxon(col, mu0) Wilcoxon signed-rank test JSON 2 Details
stat_ci_mean(col, confidence) Confidence interval for the mean (t) JSON 2 Details
stat_ci_mean_z(col, sigma, confidence) Confidence interval for the mean (z) JSON 2 Details
stat_ci_var(col, confidence) Confidence interval for variance JSON 2 Details
stat_moe_mean(col, confidence) Margin of error for the mean REAL 2 Details
stat_cohens_d(col, mu0) Cohen's d (one-sample) REAL 2 Details
stat_hedges_g(col, mu0) Hedges' g (one-sample) REAL 2 Details
stat_acf_lag(col, lag) Autocorrelation coefficient REAL lag+1 Details
stat_biweight_midvar(col, c) Biweight Midvariance REAL 1 Details
stat_bootstrap_mean(col, n) Bootstrap of the mean JSON 1 Details
stat_bootstrap_median(col, n) Bootstrap of the median JSON 1 Details
stat_bootstrap_stddev(col, n) Bootstrap of standard deviation JSON 2 Details

Two-Column Aggregate Functions (27 functions)

Function Description Return Min n Details
stat_population_covariance(x, y) Population covariance REAL 1 Details
stat_covariance(x, y) Sample covariance REAL 2 Details
stat_pearson_r(x, y) Pearson correlation coefficient REAL 2 Details
stat_spearman_r(x, y) Spearman rank correlation REAL 2 Details
stat_kendall_tau(x, y) Kendall rank correlation REAL 2 Details
stat_weighted_covariance(val, wt) Weighted covariance REAL 2 Details
stat_weighted_mean(val, wt) Weighted mean REAL 1 Details
stat_weighted_harmonic_mean(val, wt) Weighted harmonic mean REAL 1 Details
stat_weighted_variance(val, wt) Weighted variance REAL 2 Details
stat_weighted_stddev(val, wt) Weighted standard deviation REAL 2 Details
stat_weighted_median(val, wt) Weighted median REAL 1 Details
stat_weighted_percentile(val, wt, p) Weighted percentile REAL 1 Details
stat_simple_regression(x, y) Simple linear regression JSON 3 Details
stat_r_squared(actual, pred) Coefficient of determination R² REAL 2 Details
stat_adjusted_r_squared(actual, pred) Adjusted R² REAL 3 Details
stat_t_test_paired(x, y) Paired t-test JSON 2 Details
stat_chisq_gof(obs, exp) Chi-square goodness-of-fit test JSON 2 Details
stat_mae(actual, pred) Mean absolute error REAL 1 Details
stat_mse(actual, pred) Mean squared error REAL 1 Details
stat_rmse(actual, pred) RMSE REAL 1 Details
stat_mape(actual, pred) Mean absolute percentage error REAL 1 Details
stat_euclidean_dist(a, b) Euclidean distance REAL 1 Details
stat_manhattan_dist(a, b) Manhattan distance REAL 1 Details
stat_cosine_sim(a, b) Cosine similarity REAL 1 Details
stat_cosine_dist(a, b) Cosine distance REAL 1 Details
stat_minkowski_dist(a, b, p) Minkowski distance REAL 1 Details
stat_chebyshev_dist(a, b) Chebyshev distance REAL 1 Details

Window Functions (23 functions)

Function Description Return Details
stat_rolling_mean(col, window) Rolling mean REAL/row Details
stat_rolling_std(col, window) Rolling standard deviation REAL/row Details
stat_rolling_min(col, window) Rolling minimum REAL/row Details
stat_rolling_max(col, window) Rolling maximum REAL/row Details
stat_rolling_sum(col, window) Rolling sum REAL/row Details
stat_moving_avg(col, window) Simple moving average REAL/row Details
stat_ema(col, span) Exponential moving average REAL/row Details
stat_rank(col) Rank transformation REAL/row Details
stat_fillna_mean(col) Mean imputation REAL/row Details
stat_fillna_median(col) Median imputation REAL/row Details
stat_fillna_ffill(col) Forward fill REAL/row Details
stat_fillna_bfill(col) Backward fill REAL/row Details
stat_fillna_interp(col) Linear interpolation REAL/row Details
stat_label_encode(col) Label encoding REAL/row Details
stat_bin_width(col, n_bins) Equal-width binning REAL/row Details
stat_bin_freq(col, n_bins) Equal-frequency binning REAL/row Details
stat_lag(col, k) Lag REAL/row Details
stat_diff(col, order) Differencing REAL/row Details
stat_seasonal_diff(col, period) Seasonal differencing REAL/row Details
stat_outliers_iqr(col) Outlier detection (IQR) REAL/row Details
stat_outliers_zscore(col) Outlier detection (Z-score) REAL/row Details
stat_outliers_mzscore(col) Outlier detection (modified Z) REAL/row Details
stat_winsorize(col, pct) Winsorization REAL/row Details

Complex Aggregate Functions (32 functions)

Function Description Return Details
stat_modes(col) Modes (all) JSON Details
stat_five_number_summary(col) Five-number summary JSON Details
stat_frequency_table(col) Frequency table JSON Details
stat_frequency_count(col) Frequency count for each value JSON Details
stat_relative_frequency(col) Relative frequency JSON Details
stat_cumulative_frequency(col) Cumulative frequency JSON Details
stat_cumulative_relative_frequency(col) Cumulative relative frequency JSON Details
stat_t_test2(grp1, grp2) Two-sample t-test JSON Details
stat_t_test_welch(grp1, grp2) Welch t-test JSON Details
stat_chisq_independence(col1, col2) Chi-square test of independence JSON Details
stat_f_test(grp1, grp2) F-test JSON Details
stat_mann_whitney(grp1, grp2) Mann-Whitney U test JSON Details
stat_anova1(val, grp) One-way ANOVA JSON Details
stat_contingency_table(col1, col2) Contingency table JSON Details
stat_cohens_d2(grp1, grp2) Cohen's d (two-sample) REAL Details
stat_hedges_g2(grp1, grp2) Hedges' g (two-sample) REAL Details
stat_glass_delta(ctrl, trt) Glass's Delta REAL Details
stat_ci_mean_diff(grp1, grp2) CI for two-sample mean difference JSON Details
stat_ci_mean_diff_welch(grp1, grp2) Welch CI for mean difference JSON Details
stat_kaplan_meier(time, event) Kaplan-Meier survival curve JSON Details
stat_nelson_aalen(time, event) Nelson-Aalen cumulative hazard JSON Details
stat_logrank(time, event, grp) Log-rank test JSON Details
stat_bootstrap(col, n) General bootstrap JSON Details
stat_bootstrap_bca(col, n) BCa bootstrap JSON Details
stat_bootstrap_sample(col) Bootstrap sample JSON Details
stat_permutation_test2(grp1, grp2) Two-sample permutation test JSON Details
stat_permutation_paired(x, y) Paired permutation test JSON Details
stat_permutation_corr(x, y) Correlation permutation test JSON Details
stat_acf(col, max_lag) Autocorrelation function JSON Details
stat_pacf(col, max_lag) Partial autocorrelation function JSON Details
stat_sample_replace(col, n) Sampling with replacement JSON Details
stat_sample(col, n) Sampling without replacement JSON Details

Scalar Functions — Test Helpers (40 functions)

Function Description Return Details
stat_normal_pdf(x [,mu, sigma]) Normal distribution PDF REAL Details
stat_normal_cdf(x [,mu, sigma]) Normal distribution CDF REAL Details
stat_normal_quantile(p [,mu, sigma]) Normal distribution quantile REAL Details
stat_normal_rand([mu, sigma]) Normal distribution random variate REAL Details
stat_chisq_pdf(x, df) Chi-square distribution PDF REAL Details
stat_chisq_cdf(x, df) Chi-square distribution CDF REAL Details
stat_chisq_quantile(p, df) Chi-square distribution quantile REAL Details
stat_chisq_rand(df) Chi-square distribution random variate REAL Details
stat_t_pdf(x, df) t-distribution PDF REAL Details
stat_t_cdf(x, df) t-distribution CDF REAL Details
stat_t_quantile(p, df) t-distribution quantile REAL Details
stat_t_rand(df) t-distribution random variate REAL Details
stat_f_pdf(x, df1, df2) F-distribution PDF REAL Details
stat_f_cdf(x, df1, df2) F-distribution CDF REAL Details
stat_f_quantile(p, df1, df2) F-distribution quantile REAL Details
stat_f_rand(df1, df2) F-distribution random variate REAL Details
stat_betainc(a, b, x) Regularized incomplete beta function REAL Details
stat_betaincinv(a, b, p) Inverse incomplete beta function REAL Details
stat_norm_cdf(x) Standard normal CDF REAL Details
stat_norm_quantile(p) Standard normal inverse CDF REAL Details
stat_gammainc_lower(a, x) Lower incomplete gamma function REAL Details
stat_gammainc_upper(a, x) Upper incomplete gamma function REAL Details
stat_gammainc_lower_inv(a, p) Inverse incomplete gamma function REAL Details
stat_z_test_prop(x, n, p0) One-sample proportion z-test JSON Details
stat_z_test_prop2(x1, n1, x2, n2) Two-sample proportion z-test JSON Details
stat_bonferroni(p, m) Bonferroni correction REAL Details
stat_bh_correction(p, rank, total) BH correction REAL Details
stat_holm_correction(p, rank, total) Holm correction REAL Details
stat_fisher_exact(a, b, c, d) Fisher's exact test JSON Details
stat_odds_ratio(a, b, c, d) Odds ratio REAL Details
stat_relative_risk(a, b, c, d) Relative risk REAL Details
stat_risk_difference(a, b, c, d) Risk difference REAL Details
stat_nnt(a, b, c, d) Number needed to treat REAL Details
stat_ci_prop(x, n [,conf]) CI for proportion (Wald) JSON Details
stat_ci_prop_wilson(x, n [,conf]) CI for proportion (Wilson) JSON Details
stat_ci_prop_diff(x1, n1, x2, n2 [,conf]) CI for proportion difference JSON Details
stat_aic(ll, k) AIC REAL Details
stat_aicc(ll, n, k) AICc REAL Details
stat_bic(ll, n, k) BIC REAL Details
stat_boxcox(x, lambda) Box-Cox transformation REAL Details

Scalar Functions — Distributions & Transformations (83 functions)

Function Description Return Details
stat_uniform_pdf/cdf/quantile/rand Uniform distribution (4 functions) REAL Details
stat_exponential_pdf/cdf/quantile/rand Exponential distribution (4 functions) REAL Details
stat_gamma_pdf/cdf/quantile/rand Gamma distribution (4 functions) REAL Details
stat_beta_pdf/cdf/quantile/rand Beta distribution (4 functions) REAL Details
stat_lognormal_pdf/cdf/quantile/rand Log-normal distribution (4 functions) REAL Details
stat_weibull_pdf/cdf/quantile/rand Weibull distribution (4 functions) REAL Details
stat_binomial_pmf/cdf/quantile/rand Binomial distribution (4 functions) REAL Details
stat_poisson_pmf/cdf/quantile/rand Poisson distribution (4 functions) REAL Details
stat_geometric_pmf/cdf/quantile/rand Geometric distribution (4 functions) REAL Details
stat_nbinom_pmf/cdf/quantile/rand Negative binomial distribution (4 functions) REAL Details
stat_hypergeom_pmf/cdf/quantile/rand Hypergeometric distribution (4 functions) REAL Details
stat_bernoulli_pmf/cdf/quantile/rand Bernoulli distribution (4 functions) REAL Details
stat_duniform_pmf/cdf/quantile/rand Discrete uniform distribution (4 functions) REAL Details
stat_binomial_coef(n, k) Binomial coefficient INTEGER Details
stat_log_binomial_coef(n, k) Log binomial coefficient REAL Details
stat_log_factorial(n) Log factorial REAL Details
stat_lgamma(x) Log-gamma function REAL Details
stat_tgamma(x) Gamma function REAL Details
stat_beta_func(a, b) Beta function REAL Details
stat_lbeta(a, b) Log-beta function REAL Details
stat_erf(x) Error function REAL Details
stat_erfc(x) Complementary error function REAL Details
stat_logarithmic_mean(a, b) Logarithmic mean REAL Details
stat_hedges_j(n) Hedges correction factor REAL Details
stat_t_to_r(t, df) t to r conversion REAL Details
stat_d_to_r(d) d to r conversion REAL Details
stat_r_to_d(r) r to d conversion REAL Details
stat_eta_squared_ef(ss_eff, ss_total) Eta-squared REAL Details
stat_partial_eta_sq(F, df1, df2) Partial eta-squared REAL Details
stat_omega_squared_ef(ss_eff, ss_tot, ms_err, df_eff) Omega-squared REAL Details
stat_cohens_h(p1, p2) Cohen's h REAL Details
stat_interpret_d(d) Cohen's d interpretation TEXT Details
stat_interpret_r(r) Correlation coefficient interpretation TEXT Details
stat_interpret_eta2(eta2) Eta-squared interpretation TEXT Details
stat_power_t1(d, n, alpha) One-sample power REAL Details
stat_n_t1(d, power, alpha) One-sample required n REAL Details
stat_power_t2(d, n1, n2, alpha) Two-sample power REAL Details
stat_n_t2(d, power, alpha) Two-sample required n REAL Details
stat_power_prop(p1, p2, n, alpha) Proportion power REAL Details
stat_n_prop(p1, p2, power, alpha) Proportion required n REAL Details
stat_moe_prop(x, n [,conf]) Margin of error for proportion REAL Details
stat_moe_prop_worst(n [,conf]) Worst-case margin of error REAL Details
stat_n_moe_prop(moe [,conf [,p]]) Required n for proportion estimation REAL Details
stat_n_moe_mean(moe, sigma [,conf]) Required n for mean estimation REAL Details

Practical Usage Examples

Batch Descriptive Statistics

SELECT
    COUNT(val)                       AS n,
    stat_mean(val)                   AS mean,
    stat_median(val)                 AS median,
    stat_mode(val)                   AS mode,
    stat_sample_stddev(val)          AS stddev,
    stat_sample_variance(val)        AS variance,
    MIN(val)                         AS min,
    MAX(val)                         AS max,
    stat_range(val)                  AS range,
    stat_iqr(val)                    AS iqr,
    stat_se(val)                     AS se,
    stat_skewness(val)               AS skewness,
    stat_kurtosis(val)               AS kurtosis
FROM measurements;

Comparing Robust and Classical Statistics

-- Assess the impact of outliers
SELECT
    'classical' AS type,
    stat_mean(val)            AS location,
    stat_sample_stddev(val)   AS spread
FROM data
UNION ALL
SELECT
    'robust' AS type,
    stat_hodges_lehmann(val)  AS location,
    stat_mad_scaled(val)      AS spread
FROM data;

Detailed Analysis by Group

SELECT
    category,
    COUNT(score) AS n,
    stat_mean(score)              AS mean,
    stat_median(score)            AS median,
    stat_sample_stddev(score)     AS stddev,
    stat_cv(score)                AS cv,
    stat_iqr(score)               AS iqr,
    stat_skewness(score)          AS skewness,
    stat_kurtosis(score)          AS kurtosis,
    stat_mad_scaled(score)        AS robust_spread,
    stat_hodges_lehmann(score)    AS robust_location
FROM exam_results
GROUP BY category
ORDER BY mean DESC;

Quick Normality Assessment

-- Check normality of multiple columns at once
SELECT
    'height' AS variable,
    stat_skewness(height) AS skewness,
    stat_kurtosis(height) AS kurtosis,
    CASE
        WHEN ABS(stat_skewness(height)) < 2 AND ABS(stat_kurtosis(height)) < 7
        THEN 'approximately normal'
        ELSE 'non-normal'
    END AS normality
FROM people
UNION ALL
SELECT
    'weight',
    stat_skewness(weight),
    stat_kurtosis(weight),
    CASE
        WHEN ABS(stat_skewness(weight)) < 2 AND ABS(stat_kurtosis(weight)) < 7
        THEN 'approximately normal'
        ELSE 'non-normal'
    END
FROM people;

Regression Analysis and Prediction Accuracy

-- Run regression analysis and evaluate accuracy at once
SELECT
    stat_simple_regression(x, y) AS regression,
    stat_pearson_r(x, y) AS correlation,
    stat_r_squared(x, y) AS r_squared,
    stat_mae(x, y) AS mae,
    stat_rmse(x, y) AS rmse
FROM experiment_data;

Time Series Analysis Pipeline

-- Comprehensive time series analysis of stock price data
SELECT date, close_price,
       stat_moving_avg(close_price, 20) AS sma_20,
       stat_ema(close_price, 12) AS ema_12,
       stat_rolling_std(close_price, 20) AS volatility,
       stat_diff(close_price, 1) AS daily_change,
       stat_lag(close_price, 1) AS prev_close
FROM stock_prices;

Missing Value Handling and Outlier Removal

-- Impute missing values -> detect outliers -> analyze clean data
SELECT id, raw_value,
       stat_fillna_interp(raw_value) AS filled,
       stat_outliers_iqr(raw_value) AS is_outlier,
       stat_winsorize(raw_value, 5) AS winsorized
FROM sensor_data;