Patterns in static

Apophenia

stats.h File Reference

Go to the source code of this file.

Defines

Functions


Detailed Description


Define Documentation

#define Apop_col

After this call, v will hold a vector view of the colth column of apop_data set d.

#define APOP_COL ( m,
col,
 ) 
Value:
gsl_vector apop_vv_##v = gsl_matrix_column((m)->matrix, (col)).vector;\
gsl_vector * v = &( apop_vv_##v );
#define Apop_col_t

After this call, v will hold a vector view of the colth column of m. Unlike Apop_col, the second argument is a column name, that I'll look up using apop_name_find.

#define APOP_COL_T ( m,
col,
 ) 
Value:
gsl_vector apop_vv_##v = gsl_matrix_column((m)->matrix, apop_name_find((m)->names, col, 'c')).vector;\
gsl_vector * v = &( apop_vv_##v );
#define Apop_matrix_col

After this call, v will hold a vector view of the colth column of m.

#define APOP_MATRIX_COL ( m,
col,
 ) 
Value:
gsl_vector apop_vv_##v = gsl_matrix_column(m, (col)).vector;\
gsl_vector * v = &( apop_vv_##v );
#define Apop_matrix_row

After this call, v will hold a vector view of the rowth row of m.

#define APOP_MATRIX_ROW ( m,
row,
 ) 
Value:
gsl_vector apop_vv_##v = gsl_matrix_row(m, (row)).vector;\
gsl_vector * v = &( apop_vv_##v );
#define Apop_row

After this call, v will hold a vector view of the rowth row of apop_data set d.

#define APOP_ROW ( m,
row,
 ) 
Value:
gsl_vector apop_vv_##v = gsl_matrix_row((m)->matrix, (row)).vector;\
gsl_vector * v = &( apop_vv_##v );
#define Apop_row_t

After this call, v will hold a vector view of the rowth row of m. Unlike Apop_row, the second argument is a row name, that I'll look up using apop_name_find.

#define APOP_ROW_T ( m,
row,
 ) 
Value:
gsl_vector apop_vv_##v = gsl_matrix_row((m)->matrix, apop_name_find((m)->names, row, 'r')).vector;\
gsl_vector * v = &( apop_vv_##v );
#define Apop_submatrix

Pull a pointer to a submatrix into a gsl_matrix

Parameters:
m The root matrix
srow the first row (in the root matrix) of the top of the submatrix
scol the first column (in the root matrix) of the left edge of the submatrix
nrow number of rows in the submatrix
ncol number of columns in the submatrix
#define APOP_SUBMATRIX ( m,
srow,
scol,
nrows,
ncols,
 ) 
Value:
gsl_matrix apop_mm_##o = gsl_matrix_submatrix(m, (srow), (scol), (nrows),(ncols)).matrix;\
gsl_matrix * o = &( apop_mm_##o );

Function Documentation

apop_data* apop_anova ( char *  table,
char *  data,
char *  grouping1,
char *  grouping2 
)

This function produces a traditional one- or two-way ANOVA table. It works from data in an SQL table, using queries of the form select data from table group by grouping1, grouping2.

Parameters:
table The table to be queried.
data The name of the column holding the data
grouping1 The name of the first column by which to group data
grouping2 If this is NULL, then the function will return a one-way ANOVA. Otherwise, the name of the second column by which to group data in a two-way ANOVA.
apop_data* apop_data_correlation ( const apop_data in  ) 

Returns the matrix of correlation coefficients $(\sigma^2_{xy}/(\sigma_x\sigma_y))$ relating each column with each other.

This is the apop_data version of apop_matrix_correlation; if you don't have column names or weights, (or want the option for the faster, data-destroying version), use that one.

Parameters:
in A data matrix: rows are observations, columns are variables. If you give me a weights vector, I'll use it.
Returns:
Returns the variance/covariance matrix relating each column with each other. This function allocates the matrix for you.
apop_data* apop_data_covariance ( const apop_data in  ) 

Returns the variance/covariance matrix relating each column of the matrix to each other column.

This is the apop_data version of apop_matrix_covariance; if you don't have column names or weights, or would like to use the speed-saving and data-destroying normalization option, use that one.

Parameters:
in An apop_data set. If the weights vector is set, I'll take it into account.
Returns:
Returns a apop_data set the variance/covariance matrix relating each column with each other.
apop_data* apop_data_to_dummies ( apop_data d,
int  col,
char  type,
int  keep_first 
)

A utility to make a matrix of dummy variables. You give me a single vector that lists the category number for each item, and I'll return a gsl_matrix with a single one in each row in the column specified.

After running this, you will almost certainly want to join together the output here with your main data set. E.g.,:

apop_data *dummies  = apop_data_to_dummies(main_regression_vars, .col=8, .type='t');
apop_data_stack(main_regression_vars, dummies, 'c');
Parameters:
d The data set with the column to be dummified (No default.)
col The column number to be transformed (default = 0)
type 'd'==data column (-1==vector), 't'==text column. (default = 't')
keep_first if zero, return a matrix where each row has a one in the (column specified MINUS ONE). That is, the zeroth category is dropped, the first category has an entry in column zero, et cetera. If you don't know why this is useful, then this is what you need. If you know what you're doing and need something special, set this to one and the first category won't be dropped. (default = 0)

This function uses the Designated initializers syntax for inputs.

apop_model* apop_estimate_fixed_effects_OLS ( apop_data data,
gsl_vector *  categories 
)

A fixed-effects regression. The input is a data matrix for a regression, plus a single vector giving the fixed effect vectors.

The solution of a fixed-effects regression is via a partitioned regression. Given that the data set is divided into columns $\beta_1$ and $\beta_2$, then the reader may

Todo:
finish this documentation. [Was in a rush today.]
void apop_estimate_parameter_t_tests ( apop_model est  ) 

For many, it is a knee-jerk reaction to a parameter estimation to test whether each individual parameter differs from zero. This function does that.

Parameters:
est The apop_estimate, which includes pre-calculated parameter estimates, var-covar matrix, and the original data set.

Returns nothing. At the end of the routine, the est->parameters->matrix includes a set of t-test values: p value, confidence (=1-pval), t statistic, standard deviation, one-tailed Pval, one-tailed confidence.

apop_data* apop_f_test ( apop_model est,
apop_data contrast,
int  normalize 
)

Runs an F-test specified by q and c. Your best bet is to see the chapter on hypothesis testing in Modeling With Data, p 309. It will tell you that:

\[{N-K\over q} {({\bf Q}'\hat\beta - {\bf c})' [{\bf Q}' ({\bf X}'{\bf X})^{-1} {\bf Q}]^{-1} ({\bf Q}' \hat\beta - {\bf c}) \over {\bf u}' {\bf u} } \sim F_{q,N-K},\]

and that's what this function is based on.

Parameters:
est an apop_model that you have already calculated. (No default)
contrast The matrix ${\bf Q}$ and the vector ${\bf c}$, where each row represents a hypothesis. (Defaults: if matrix is NULL, it is set to the identity matrix; if the vector is NULL, it is set to zero; if the entire apop_data set is NULL or omitted, both of these settings are made.)
normalize If 1, then I will normalize the data set at est->data so that each column has mean zero (that is, I run apop_matrix_normalize (data, 'c', 'm');).If zero, then I will copy off the entire dataset and do the normalization on my copy, leaving the input data as-is. (Default: 0)
Returns:
An apop_data set with a few variants on the confidence with which we can reject the joint hypothesis.
Todo:
There should be a way to get OLS and GLS to store $(X'X)^{-1}$. In fact, if you did GLS, this is invalid, because you need $(X'\Sigma X)^{-1}$, and I didn't ask for $\Sigma$.

This function uses the Designated initializers syntax for inputs.

gsl_matrix* apop_matrix_correlation ( gsl_matrix *  in,
const char  normalize 
)

Returns the matrix of correlation coefficients $(\sigma^2_{xy}/(\sigma_x\sigma_y))$ relating each column with each other.

This is the gsl_matrix version of apop_data_covariance; if you have column names, use that one.

Parameters:
in A data matrix: rows are observations, columns are variables. (No default, must not be NULL)
normalize 'n' or 'N' = subtract the mean from each column, thus changing the input data but speeding up the computation.
anything else (like 0)= don't modify the input data (default = no modification)
Returns:
Returns the variance/covariance matrix relating each column with each other. This function allocates the matrix for you.

This function uses the Designated initializers syntax for inputs.

gsl_matrix* apop_matrix_covariance ( gsl_matrix *  in,
const char  normalize 
)

Returns the variance/covariance matrix relating each column with each other.

This is the gsl_matrix version of apop_data_covariance; if you have column names, use that one.

Parameters:
in A data matrix: rows are observations, columns are variables. (No default, must not be NULL)
normalize 'n', 'N', or 1 = subtract the mean from each column, thus changing the input data but speeding up the computation.
anything else (like 0)= don't modify the input data (default = no modification)
Returns:
Returns the variance/covariance matrix relating each column with each other. This function allocates the matrix for you. This is the sample version---dividing by $n-1$, not $n$. It uses the Designated initializers syntax for inputs.
int apop_matrix_is_positive_semidefinite ( gsl_matrix *  m,
char  semi 
)

Test whether the input matrix is positive semidefinite.

A covariance matrix will always be PSD, so this function can tell you whether your matrix is a valid covariance matrix.

Consider the 1x1 matrix in the upper left of the input, then the 2x2 matrix in the upper left, on up to the full matrix. If the matrix is PSD, then each of these has a positive determinant. This function thus calculates $N$ determinants for an $N$x$N$ matrix.

Parameters:
m The matrix to test. If NULL, I will return zero---not PSD.
semi If anything but 's', check for positive definite, not semidefinite. (default 's')

See also apop_matrix_to_positive_semidefinite, which will change the input to something PSD.

This function uses the Designated initializers syntax for inputs.

void apop_matrix_normalize ( gsl_matrix *  data,
const char  row_or_col,
const char  normalization 
)

Normalize each row or column in the given matrix, one by one.

Basically just a convenience fn to iterate through the columns or rows and run apop_vector_normalize for you.

Parameters:
data The data set to normalize.
row_or_col Either 'r' or 'c'.
normalization see apop_vector_normalize.
double apop_matrix_to_positive_semidefinite ( gsl_matrix *  m  ) 

First, this function passes tests, but is under development.

It takes in a matrix and converts it to the `closest' positive semidefinite matrix.

Parameters:
m On input, any matrix; on output, a positive semidefinite matrix.
Returns:
the distance between the original and new matrices.

See also the test function apop_matrix_is_positive_semidefinite.

Adapted from the R Matrix package's nearPD, which is Copyright (2007) Jens Oehlschlägel [and is GPL].

double apop_multivariate_gamma ( double  a,
double  p 
)

The multivariate generalization of the Gamma distribution. $ \Gamma_p(a)= \pi^{p(p-1)/4}\prod_{j=1}^p \Gamma\left[ a+(1-j)/2\right]. $

See also apop_multivariate_lngamma, which is more numerically stable in most cases.

double apop_multivariate_lngamma ( double  a,
double  p 
)

The log of the multivariate generalization of the Gamma; see also apop_multivariate_gamma.

double apop_random_double ( double  min,
double  max,
gsl_rng *  r 
)

Gives a random double between min and max [inclusive].

This function uses the Designated initializers syntax for inputs. Notice that calling this function with no arguments,

conveniently produces a number between zero and one. [To do this with less overhead, allocate your own RNG and use gsl_ran_uniform(r).]

Parameters:
min Default = 0
max Default = 1
r A gsl_rng. If NULL, I'll take care of the RNG; see Auto-allocated RNGs. (Default = NULL)
int apop_random_int ( double  min,
double  max,
const gsl_rng *  r 
)

Gives a random integer between min and max [inclusive].

Parameters:
min (default 0)
max (default 1)
r A gsl_rng. If NULL, I'll take care of the RNG; see Auto-allocated RNGs. (Default = NULL)

Thus,

makes a binary zero-one draw, and

data fivepoints[] = {1, 2, 3, 5, 7};
y = apop_random_int(0, 4)
x = apop_random_int(.max=4)

gives two draws from a five-item vector. Notice that the max is the largest index, which is one minus the dimension.

apop_data* apop_test_fisher_exact ( apop_data intab  ) 

Run the Fisher exact test on an input contingency table.

Returns:
An apop_data set with two rows:
"probability of table": Probability of the observed table for fixed marginal totals.
"p value": Table p-value. The probability of a more extreme table, where `extreme' is in a probabilistic sense.
apop_data* apop_text_to_factors ( apop_data d,
size_t  textcol,
int  datacol 
)

Convert a column of text in the text portion of an apop_data set into a column of numeric elements, which you can use for a multinomial probit, for example.

Parameters:
d The data set to be modified in place.
datacol The column in the data set where the numeric factors will be written (-1 means the vector, which I will allocate for you if it is NULL)
textcol The column in the text that will be converted.

For example:

apop_data *d  = apop_query_to_mixed_data("mmt", "select 1, year, color from data");
apop_text_to_factors(d, 0, 0);

Notice that the query pulled a column of ones for the sake of saving room for the factors.

Returns:
A table of the factors used in the code. This is an apop_data set with only one column of text.
apop_data* apop_text_unique_elements ( const apop_data d,
size_t  col 
)

Give me a column of text, and I'll give you a sorted list of the unique elements. This is basically running "select distinct * from datacolumn", but without the aid of the database.

Parameters:
d An apop_data set with a text component
col The text column you want me to use.
Returns:
An apop_data set with a single sorted column of text, where each unique text input appears once.
See also:
{apop_vector_unique_elements}
void apop_vector_normalize ( gsl_vector *  in,
gsl_vector **  out,
const char  normalization_type 
)

This function will normalize a vector, either such that it has mean zero and variance one, or such that it ranges between zero and one, or sums to one.

Parameters:
in A gsl_vector which you have already allocated and filled. NULL input gives NULL output. (No default)
out If normalizing in place, NULL. If not, the address of a gsl_vector. Do not allocate. (default = NULL.)
normalization_type 'p': normalized vector will sum to one. E.g., start with a set of observations in bins, end with the percentage of observations in each bin. (the default)
'r': normalized vector will range between zero and one. Replace each X with (X-min) / (max - min).
's': normalized vector will have mean zero and variance one. Replace each X with $(X-\mu) / \sigma$, where $\sigma$ is the sample standard deviation.
'm': normalize to mean zero: Replace each X with $(X-\mu)$

Example

#include <apop.h>

int main(void){
gsl_vector  *in, *out;

in = gsl_vector_calloc(3);
gsl_vector_set(in, 1, 1);
gsl_vector_set(in, 2, 2);

printf("The orignal vector:\n");
apop_vector_show(in);

apop_vector_normalize(in, &out, 's');
printf("Standardized with mean zero and variance one:\n");
apop_vector_show(out);

apop_vector_normalize(in, &out, 'r');
printf("Normalized range with max one and min zero:\n");
apop_vector_show(out);

apop_vector_normalize(in, NULL, 'p');
printf("Normalized into percentages:\n");
apop_vector_show(in);
}

This function uses the Designated initializers syntax for inputs.

gsl_vector* apop_vector_unique_elements ( const gsl_vector *  v  ) 

Give me a vector of numbers, and I'll give you a sorted list of the unique elements. This is basically running "select distinct * from datacolumn", but without the aid of the database.

Parameters:
v a vector of items
Returns:
a sorted vector of the distinct elements that appear in the input.
See also:
{apop_text_unique_elements}
double apop_vector_weighted_cov ( const gsl_vector *  v1,
const gsl_vector *  v2,
const gsl_vector *  w 
)

Find the sample covariance of a pair of weighted vectors. This only makes sense if the weightings are identical, so the function takes only one weighting vector for both.

Parameters:
v1,v2 The data vectors
w the weight vector. If NULL, assume equal weights.
Returns:
The weighted sample covariance
double apop_vector_weighted_kurt ( const gsl_vector *  v,
const gsl_vector *  w 
)

Find the population kurtosis of a weighted vector.

Parameters:
v The data vector
w the weight vector. If NULL, assume equal weights.
Returns:
The weighted kurtosis. No sample adjustment given weights.
Todo:
apop_vector_weighted_skew and apop_vector_weighted_kurt are lazily written.
double apop_vector_weighted_mean ( const gsl_vector *  v,
const gsl_vector *  w 
)

Find the weighted mean.

Parameters:
v The data vector
w the weight vector. If NULL, assume equal weights.
Returns:
The weighted mean
double apop_vector_weighted_skew ( const gsl_vector *  v,
const gsl_vector *  w 
)

Find the population skew of a weighted vector.

Note: Apophenia tries to be smart about reading the weights. If weights sum to one, then the system uses w->size as the number of elements, and returns the usual sum over $n-1$. If weights > 1, then the system uses the total weights as $n$. Thus, you can use the weights as standard weightings or to represent elements that appear repeatedly.

Parameters:
v The data vector
w the weight vector. If NULL, assume equal weights.
Returns:
The weighted skew. No sample adjustment given weights.
Todo:
apop_vector_weighted_skew and apop_vector_weighted_kurt are lazily written.
double apop_vector_weighted_var ( const gsl_vector *  v,
const gsl_vector *  w 
)

Find the sample variance of a weighted vector.

Note: Apophenia tries to be smart about reading the weights. If weights sum to one, then the system uses w->size as the number of elements, and returns the usual sum over $n-1$. If weights > 1, then the system uses the total weights as $n$. Thus, you can use the weights as standard weightings or to represent elements that appear repeatedly.

Parameters:
v The data vector
w the weight vector. If NULL, assume equal weights.
Returns:
The weighted sample variance.

SourceForge.net Logo

Autogenerated by doxygen on 23 Nov 2009.