Patterns in static

Apophenia

apop_stats.c File Reference

Functions


Detailed Description

Basic moments and some distributions.


Function Documentation

apop_data* apop_data_correlation ( const apop_data in  ) 

Returns the matrix of correlation coefficients $(\sigma^2_{xy}/(\sigma_x\sigma_y))$ relating each column with each other.

This is the apop_data version of apop_matrix_correlation; if you don't have column names or weights, (or want the option for the faster, data-destroying version), use that one.

Parameters:
in A data matrix: rows are observations, columns are variables. If you give me a weights vector, I'll use it.
Returns:
Returns the variance/covariance matrix relating each column with each other. This function allocates the matrix for you.
apop_data* apop_data_covariance ( const apop_data in  ) 

Returns the variance/covariance matrix relating each column of the matrix to each other column.

This is the apop_data version of apop_matrix_covariance; if you don't have column names or weights, or would like to use the speed-saving and data-destroying normalization option, use that one.

Parameters:
in An apop_data set. If the weights vector is set, I'll take it into account.
Returns:
Returns a apop_data set the variance/covariance matrix relating each column with each other.
gsl_matrix* apop_matrix_correlation ( gsl_matrix *  in,
const char  normalize 
)

Returns the matrix of correlation coefficients $(\sigma^2_{xy}/(\sigma_x\sigma_y))$ relating each column with each other.

This is the gsl_matrix version of apop_data_covariance; if you have column names, use that one.

Parameters:
in A data matrix: rows are observations, columns are variables. (No default, must not be NULL)
normalize 'n' or 'N' = subtract the mean from each column, thus changing the input data but speeding up the computation.
anything else (like 0)= don't modify the input data (default = no modification)
Returns:
Returns the variance/covariance matrix relating each column with each other. This function allocates the matrix for you.

This function uses the Designated initializers syntax for inputs.

gsl_matrix* apop_matrix_covariance ( gsl_matrix *  in,
const char  normalize 
)

Returns the variance/covariance matrix relating each column with each other.

This is the gsl_matrix version of apop_data_covariance; if you have column names, use that one.

Parameters:
in A data matrix: rows are observations, columns are variables. (No default, must not be NULL)
normalize 'n', 'N', or 1 = subtract the mean from each column, thus changing the input data but speeding up the computation.
anything else (like 0)= don't modify the input data (default = no modification)
Returns:
Returns the variance/covariance matrix relating each column with each other. This function allocates the matrix for you. This is the sample version---dividing by $n-1$, not $n$. It uses the Designated initializers syntax for inputs.
void apop_matrix_normalize ( gsl_matrix *  data,
const char  row_or_col,
const char  normalization 
)

Normalize each row or column in the given matrix, one by one.

Basically just a convenience fn to iterate through the columns or rows and run apop_vector_normalize for you.

Parameters:
data The data set to normalize.
row_or_col Either 'r' or 'c'.
normalization see apop_vector_normalize.
double apop_random_double ( double  min,
double  max,
gsl_rng *  r 
)

Gives a random double between min and max [inclusive].

This function uses the Designated initializers syntax for inputs. Notice that calling this function with no arguments,

conveniently produces a number between zero and one. [To do this with less overhead, allocate your own RNG and use gsl_ran_uniform(r).]

Parameters:
min Default = 0
max Default = 1
r A gsl_rng. If NULL, I'll take care of the RNG; see Auto-allocated RNGs. (Default = NULL)
int apop_random_int ( double  min,
double  max,
const gsl_rng *  r 
)

Gives a random integer between min and max [inclusive].

Parameters:
min (default 0)
max (default 1)
r A gsl_rng. If NULL, I'll take care of the RNG; see Auto-allocated RNGs. (Default = NULL)

Thus,

makes a binary zero-one draw, and

data fivepoints[] = {1, 2, 3, 5, 7};
y = apop_random_int(0, 4)
x = apop_random_int(.max=4)

gives two draws from a five-item vector. Notice that the max is the largest index, which is one minus the dimension.

void apop_vector_normalize ( gsl_vector *  in,
gsl_vector **  out,
const char  normalization_type 
)

This function will normalize a vector, either such that it has mean zero and variance one, or such that it ranges between zero and one, or sums to one.

Parameters:
in A gsl_vector which you have already allocated and filled. NULL input gives NULL output. (No default)
out If normalizing in place, NULL. If not, the address of a gsl_vector. Do not allocate. (default = NULL.)
normalization_type 'p': normalized vector will sum to one. E.g., start with a set of observations in bins, end with the percentage of observations in each bin. (the default)
'r': normalized vector will range between zero and one. Replace each X with (X-min) / (max - min).
's': normalized vector will have mean zero and variance one. Replace each X with $(X-\mu) / \sigma$, where $\sigma$ is the sample standard deviation.
'm': normalize to mean zero: Replace each X with $(X-\mu)$

Example

#include <apop.h>

int main(void){
gsl_vector  *in, *out;

in = gsl_vector_calloc(3);
gsl_vector_set(in, 1, 1);
gsl_vector_set(in, 2, 2);

printf("The orignal vector:\n");
apop_vector_show(in);

apop_vector_normalize(in, &out, 's');
printf("Standardized with mean zero and variance one:\n");
apop_vector_show(out);

apop_vector_normalize(in, &out, 'r');
printf("Normalized range with max one and min zero:\n");
apop_vector_show(out);

apop_vector_normalize(in, NULL, 'p');
printf("Normalized into percentages:\n");
apop_vector_show(in);
}

This function uses the Designated initializers syntax for inputs.

double apop_vector_weighted_cov ( const gsl_vector *  v1,
const gsl_vector *  v2,
const gsl_vector *  w 
)

Find the sample covariance of a pair of weighted vectors. This only makes sense if the weightings are identical, so the function takes only one weighting vector for both.

Parameters:
v1,v2 The data vectors
w the weight vector. If NULL, assume equal weights.
Returns:
The weighted sample covariance
double apop_vector_weighted_kurt ( const gsl_vector *  v,
const gsl_vector *  w 
)

Find the population kurtosis of a weighted vector.

Parameters:
v The data vector
w the weight vector. If NULL, assume equal weights.
Returns:
The weighted kurtosis. No sample adjustment given weights.
Todo:
apop_vector_weighted_skew and apop_vector_weighted_kurt are lazily written.
double apop_vector_weighted_mean ( const gsl_vector *  v,
const gsl_vector *  w 
)

Find the weighted mean.

Parameters:
v The data vector
w the weight vector. If NULL, assume equal weights.
Returns:
The weighted mean
double apop_vector_weighted_skew ( const gsl_vector *  v,
const gsl_vector *  w 
)

Find the population skew of a weighted vector.

Note: Apophenia tries to be smart about reading the weights. If weights sum to one, then the system uses w->size as the number of elements, and returns the usual sum over $n-1$. If weights > 1, then the system uses the total weights as $n$. Thus, you can use the weights as standard weightings or to represent elements that appear repeatedly.

Parameters:
v The data vector
w the weight vector. If NULL, assume equal weights.
Returns:
The weighted skew. No sample adjustment given weights.
Todo:
apop_vector_weighted_skew and apop_vector_weighted_kurt are lazily written.
double apop_vector_weighted_var ( const gsl_vector *  v,
const gsl_vector *  w 
)

Find the sample variance of a weighted vector.

Note: Apophenia tries to be smart about reading the weights. If weights sum to one, then the system uses w->size as the number of elements, and returns the usual sum over $n-1$. If weights > 1, then the system uses the total weights as $n$. Thus, you can use the weights as standard weightings or to represent elements that appear repeatedly.

Parameters:
v The data vector
w the weight vector. If NULL, assume equal weights.
Returns:
The weighted sample variance.

SourceForge.net Logo

Autogenerated by doxygen on 23 Nov 2009.