![]() |
|
Go to the source code of this file.
| #define Apop_col |
After this call, v will hold a vector view of the colth column of apop_data set d.
| #define APOP_COL | ( | m, | |||
| col, | |||||
| v | ) |
gsl_vector apop_vv_##v = gsl_matrix_column((m)->matrix, (col)).vector;\ gsl_vector * v = &( apop_vv_##v );
| #define Apop_col_t |
After this call, v will hold a vector view of the colth column of m. Unlike Apop_col, the second argument is a column name, that I'll look up using apop_name_find.
| #define APOP_COL_T | ( | m, | |||
| col, | |||||
| v | ) |
gsl_vector apop_vv_##v = gsl_matrix_column((m)->matrix, apop_name_find((m)->names, col, 'c')).vector;\ gsl_vector * v = &( apop_vv_##v );
| #define Apop_matrix_col |
After this call, v will hold a vector view of the colth column of m.
| #define APOP_MATRIX_COL | ( | m, | |||
| col, | |||||
| v | ) |
gsl_vector apop_vv_##v = gsl_matrix_column(m, (col)).vector;\ gsl_vector * v = &( apop_vv_##v );
| #define Apop_matrix_row |
After this call, v will hold a vector view of the rowth row of m.
| #define APOP_MATRIX_ROW | ( | m, | |||
| row, | |||||
| v | ) |
gsl_vector apop_vv_##v = gsl_matrix_row(m, (row)).vector;\ gsl_vector * v = &( apop_vv_##v );
| #define Apop_row |
After this call, v will hold a vector view of the rowth row of apop_data set d.
| #define APOP_ROW | ( | m, | |||
| row, | |||||
| v | ) |
gsl_vector apop_vv_##v = gsl_matrix_row((m)->matrix, (row)).vector;\ gsl_vector * v = &( apop_vv_##v );
| #define Apop_row_t |
After this call, v will hold a vector view of the rowth row of m. Unlike Apop_row, the second argument is a row name, that I'll look up using apop_name_find.
| #define APOP_ROW_T | ( | m, | |||
| row, | |||||
| v | ) |
gsl_vector apop_vv_##v = gsl_matrix_row((m)->matrix, apop_name_find((m)->names, row, 'r')).vector;\ gsl_vector * v = &( apop_vv_##v );
| #define Apop_submatrix |
Pull a pointer to a submatrix into a gsl_matrix
| m | The root matrix | |
| srow | the first row (in the root matrix) of the top of the submatrix | |
| scol | the first column (in the root matrix) of the left edge of the submatrix | |
| nrow | number of rows in the submatrix | |
| ncol | number of columns in the submatrix |
| #define APOP_SUBMATRIX | ( | m, | |||
| srow, | |||||
| scol, | |||||
| nrows, | |||||
| ncols, | |||||
| o | ) |
gsl_matrix apop_mm_##o = gsl_matrix_submatrix(m, (srow), (scol), (nrows),(ncols)).matrix;\ gsl_matrix * o = &( apop_mm_##o );
| apop_data* apop_anova | ( | char * | table, | |
| char * | data, | |||
| char * | grouping1, | |||
| char * | grouping2 | |||
| ) |
This function produces a traditional one- or two-way ANOVA table. It works from data in an SQL table, using queries of the form select data from table group by grouping1, grouping2.
| table | The table to be queried. | |
| data | The name of the column holding the data | |
| grouping1 | The name of the first column by which to group data | |
| grouping2 | If this is NULL, then the function will return a one-way ANOVA. Otherwise, the name of the second column by which to group data in a two-way ANOVA. |
Returns the matrix of correlation coefficients
relating each column with each other.
This is the apop_data version of apop_matrix_correlation; if you don't have column names or weights, (or want the option for the faster, data-destroying version), use that one.
| in | A data matrix: rows are observations, columns are variables. If you give me a weights vector, I'll use it. |
Returns the variance/covariance matrix relating each column of the matrix to each other column.
This is the apop_data version of apop_matrix_covariance; if you don't have column names or weights, or would like to use the speed-saving and data-destroying normalization option, use that one.
| in | An apop_data set. If the weights vector is set, I'll take it into account. |
A utility to make a matrix of dummy variables. You give me a single vector that lists the category number for each item, and I'll return a gsl_matrix with a single one in each row in the column specified.
After running this, you will almost certainly want to join together the output here with your main data set. E.g.,:
apop_data *dummies = apop_data_to_dummies(main_regression_vars, .col=8, .type='t'); apop_data_stack(main_regression_vars, dummies, 'c');
| d | The data set with the column to be dummified (No default.) | |
| col | The column number to be transformed (default = 0) | |
| type | 'd'==data column (-1==vector), 't'==text column. (default = 't') | |
| keep_first | if zero, return a matrix where each row has a one in the (column specified MINUS ONE). That is, the zeroth category is dropped, the first category has an entry in column zero, et cetera. If you don't know why this is useful, then this is what you need. If you know what you're doing and need something special, set this to one and the first category won't be dropped. (default = 0) |
This function uses the Designated initializers syntax for inputs.
| apop_model* apop_estimate_fixed_effects_OLS | ( | apop_data * | data, | |
| gsl_vector * | categories | |||
| ) |
A fixed-effects regression. The input is a data matrix for a regression, plus a single vector giving the fixed effect vectors.
The solution of a fixed-effects regression is via a partitioned regression. Given that the data set is divided into columns
and
, then the reader may
| void apop_estimate_parameter_t_tests | ( | apop_model * | est | ) |
For many, it is a knee-jerk reaction to a parameter estimation to test whether each individual parameter differs from zero. This function does that.
| est | The apop_estimate, which includes pre-calculated parameter estimates, var-covar matrix, and the original data set. |
Returns nothing. At the end of the routine, the est->parameters->matrix includes a set of t-test values: p value, confidence (=1-pval), t statistic, standard deviation, one-tailed Pval, one-tailed confidence.
| apop_data* apop_f_test | ( | apop_model * | est, | |
| apop_data * | contrast, | |||
| int | normalize | |||
| ) |
Runs an F-test specified by q and c. Your best bet is to see the chapter on hypothesis testing in Modeling With Data, p 309. It will tell you that:
and that's what this function is based on.
| est | an apop_model that you have already calculated. (No default) | |
| contrast | The matrix and the vector , where each row represents a hypothesis. (Defaults: if matrix is NULL, it is set to the identity matrix; if the vector is NULL, it is set to zero; if the entire apop_data set is NULL or omitted, both of these settings are made.) | |
| normalize | If 1, then I will normalize the data set at est->data so that each column has mean zero (that is, I run apop_matrix_normalize (data, 'c', 'm');).If zero, then I will copy off the entire dataset and do the normalization on my copy, leaving the input data as-is. (Default: 0) |
apop_data set with a few variants on the confidence with which we can reject the joint hypothesis.
. In fact, if you did GLS, this is invalid, because you need
, and I didn't ask for
.This function uses the Designated initializers syntax for inputs.
| gsl_matrix* apop_matrix_correlation | ( | gsl_matrix * | in, | |
| const char | normalize | |||
| ) |
Returns the matrix of correlation coefficients
relating each column with each other.
This is the gsl_matrix version of apop_data_covariance; if you have column names, use that one.
| in | A data matrix: rows are observations, columns are variables. (No default, must not be NULL) | |
| normalize | 'n' or 'N' = subtract the mean from each column, thus changing the input data but speeding up the computation. anything else (like 0)= don't modify the input data (default = no modification) |
This function uses the Designated initializers syntax for inputs.
| gsl_matrix* apop_matrix_covariance | ( | gsl_matrix * | in, | |
| const char | normalize | |||
| ) |
Returns the variance/covariance matrix relating each column with each other.
This is the gsl_matrix version of apop_data_covariance; if you have column names, use that one.
| in | A data matrix: rows are observations, columns are variables. (No default, must not be NULL) | |
| normalize | 'n', 'N', or 1 = subtract the mean from each column, thus changing the input data but speeding up the computation. anything else (like 0)= don't modify the input data (default = no modification) |
, not
. It uses the Designated initializers syntax for inputs. | int apop_matrix_is_positive_semidefinite | ( | gsl_matrix * | m, | |
| char | semi | |||
| ) |
Test whether the input matrix is positive semidefinite.
A covariance matrix will always be PSD, so this function can tell you whether your matrix is a valid covariance matrix.
Consider the 1x1 matrix in the upper left of the input, then the 2x2 matrix in the upper left, on up to the full matrix. If the matrix is PSD, then each of these has a positive determinant. This function thus calculates
determinants for an
x
matrix.
| m | The matrix to test. If NULL, I will return zero---not PSD. | |
| semi | If anything but 's', check for positive definite, not semidefinite. (default 's') |
See also apop_matrix_to_positive_semidefinite, which will change the input to something PSD.
This function uses the Designated initializers syntax for inputs.
| void apop_matrix_normalize | ( | gsl_matrix * | data, | |
| const char | row_or_col, | |||
| const char | normalization | |||
| ) |
Normalize each row or column in the given matrix, one by one.
Basically just a convenience fn to iterate through the columns or rows and run apop_vector_normalize for you.
| data | The data set to normalize. | |
| row_or_col | Either 'r' or 'c'. | |
| normalization | see apop_vector_normalize. |
| double apop_matrix_to_positive_semidefinite | ( | gsl_matrix * | m | ) |
First, this function passes tests, but is under development.
It takes in a matrix and converts it to the `closest' positive semidefinite matrix.
| m | On input, any matrix; on output, a positive semidefinite matrix. |
See also the test function apop_matrix_is_positive_semidefinite.
Adapted from the R Matrix package's nearPD, which is Copyright (2007) Jens Oehlschlägel [and is GPL].
| double apop_multivariate_gamma | ( | double | a, | |
| double | p | |||
| ) |
The multivariate generalization of the Gamma distribution. ![$ \Gamma_p(a)= \pi^{p(p-1)/4}\prod_{j=1}^p \Gamma\left[ a+(1-j)/2\right]. $](form_96.png)
See also apop_multivariate_lngamma, which is more numerically stable in most cases.
| double apop_multivariate_lngamma | ( | double | a, | |
| double | p | |||
| ) |
The log of the multivariate generalization of the Gamma; see also apop_multivariate_gamma.
| double apop_random_double | ( | double | min, | |
| double | max, | |||
| gsl_rng * | r | |||
| ) |
Gives a random double between min and max [inclusive].
This function uses the Designated initializers syntax for inputs. Notice that calling this function with no arguments,
conveniently produces a number between zero and one. [To do this with less overhead, allocate your own RNG and use gsl_ran_uniform(r).]
| min | Default = 0 | |
| max | Default = 1 | |
| r | A gsl_rng. If NULL, I'll take care of the RNG; see Auto-allocated RNGs. (Default = NULL) |
| int apop_random_int | ( | double | min, | |
| double | max, | |||
| const gsl_rng * | r | |||
| ) |
Gives a random integer between min and max [inclusive].
| min | (default 0) | |
| max | (default 1) | |
| r | A gsl_rng. If NULL, I'll take care of the RNG; see Auto-allocated RNGs. (Default = NULL) |
Thus,
x = apop_random_int()
makes a binary zero-one draw, and
data fivepoints[] = {1, 2, 3, 5, 7};
y = apop_random_int(0, 4)
x = apop_random_int(.max=4)
gives two draws from a five-item vector. Notice that the max is the largest index, which is one minus the dimension.
Run the Fisher exact test on an input contingency table.
Convert a column of text in the text portion of an apop_data set into a column of numeric elements, which you can use for a multinomial probit, for example.
| d | The data set to be modified in place. | |
| datacol | The column in the data set where the numeric factors will be written (-1 means the vector, which I will allocate for you if it is NULL) | |
| textcol | The column in the text that will be converted. |
For example:
apop_data *d = apop_query_to_mixed_data("mmt", "select 1, year, color from data"); apop_text_to_factors(d, 0, 0);
Notice that the query pulled a column of ones for the sake of saving room for the factors.
apop_data set with only one column of text. Give me a column of text, and I'll give you a sorted list of the unique elements. This is basically running "select distinct * from datacolumn", but without the aid of the database.
| d | An apop_data set with a text component | |
| col | The text column you want me to use. |
| void apop_vector_normalize | ( | gsl_vector * | in, | |
| gsl_vector ** | out, | |||
| const char | normalization_type | |||
| ) |
This function will normalize a vector, either such that it has mean zero and variance one, or such that it ranges between zero and one, or sums to one.
| in | A gsl_vector which you have already allocated and filled. NULL input gives NULL output. (No default) | |
| out | If normalizing in place, NULL. If not, the address of a gsl_vector. Do not allocate. (default = NULL.) | |
| normalization_type | 'p': normalized vector will sum to one. E.g., start with a set of observations in bins, end with the percentage of observations in each bin. (the default) 'r': normalized vector will range between zero and one. Replace each X with (X-min) / (max - min). 's': normalized vector will have mean zero and variance one. Replace each X with , where is the sample standard deviation.'m': normalize to mean zero: Replace each X with ![]() |
Example
#include <apop.h> int main(void){ gsl_vector *in, *out; in = gsl_vector_calloc(3); gsl_vector_set(in, 1, 1); gsl_vector_set(in, 2, 2); printf("The orignal vector:\n"); apop_vector_show(in); apop_vector_normalize(in, &out, 's'); printf("Standardized with mean zero and variance one:\n"); apop_vector_show(out); apop_vector_normalize(in, &out, 'r'); printf("Normalized range with max one and min zero:\n"); apop_vector_show(out); apop_vector_normalize(in, NULL, 'p'); printf("Normalized into percentages:\n"); apop_vector_show(in); }
This function uses the Designated initializers syntax for inputs.
| gsl_vector* apop_vector_unique_elements | ( | const gsl_vector * | v | ) |
Give me a vector of numbers, and I'll give you a sorted list of the unique elements. This is basically running "select distinct * from datacolumn", but without the aid of the database.
| v | a vector of items |
| double apop_vector_weighted_cov | ( | const gsl_vector * | v1, | |
| const gsl_vector * | v2, | |||
| const gsl_vector * | w | |||
| ) |
Find the sample covariance of a pair of weighted vectors. This only makes sense if the weightings are identical, so the function takes only one weighting vector for both.
| v1,v2 | The data vectors | |
| w | the weight vector. If NULL, assume equal weights. |
| double apop_vector_weighted_kurt | ( | const gsl_vector * | v, | |
| const gsl_vector * | w | |||
| ) |
Find the population kurtosis of a weighted vector.
| v | The data vector | |
| w | the weight vector. If NULL, assume equal weights. |
apop_vector_weighted_skew and apop_vector_weighted_kurt are lazily written. | double apop_vector_weighted_mean | ( | const gsl_vector * | v, | |
| const gsl_vector * | w | |||
| ) |
Find the weighted mean.
| v | The data vector | |
| w | the weight vector. If NULL, assume equal weights. |
| double apop_vector_weighted_skew | ( | const gsl_vector * | v, | |
| const gsl_vector * | w | |||
| ) |
Find the population skew of a weighted vector.
Note: Apophenia tries to be smart about reading the weights. If weights sum to one, then the system uses w->size as the number of elements, and returns the usual sum over
. If weights > 1, then the system uses the total weights as
. Thus, you can use the weights as standard weightings or to represent elements that appear repeatedly.
| v | The data vector | |
| w | the weight vector. If NULL, assume equal weights. |
apop_vector_weighted_skew and apop_vector_weighted_kurt are lazily written. | double apop_vector_weighted_var | ( | const gsl_vector * | v, | |
| const gsl_vector * | w | |||
| ) |
Find the sample variance of a weighted vector.
Note: Apophenia tries to be smart about reading the weights. If weights sum to one, then the system uses w->size as the number of elements, and returns the usual sum over
. If weights > 1, then the system uses the total weights as
. Thus, you can use the weights as standard weightings or to represent elements that appear repeatedly.
| v | The data vector | |
| w | the weight vector. If NULL, assume equal weights. |