Patterns in static

Apophenia

asst.h File Reference

Go to the source code of this file.

Defines

Functions


Detailed Description


Define Documentation

#define Apop_assert ( test,
returnval,
level,
stop,
...   ) 
Value:
do \
    if (!(test)) {  \
        if (apop_opts.verbose >= level) { fprintf(stderr, "%s: ", __func__); fprintf(stderr, __VA_ARGS__); fprintf(stderr, "\n");}   \
        if (stop == 's' || stop == 'h') assert(test);   \
        return returnval;  \
} while (0);

A convenient front for apop_error, that tests the first element and basically runs apop_error if it is false. See also Apop_assert_void and apop_error.

Following the tradition regarding assert functions, this is a macro but is not in all caps.

Parameters:
test The expression that you are asserting is nonzero.
returnval If the assertion fails, return this. If you want to halt on error, this is irrelevant, but still has to match your function's return type.
level Print the warning message only if apop_opts.verbose is greater than or equal to this. Zero usually works, but for minor infractions use one.
stop If 's', halt the program (using the standard C assert); if 'c', continue by returning the return value and printing an error message if appropriate.
... The error message in printf form, plus any arguments to be inserted into the printf string. I'll provide the function name and a carriage return.
#define Apop_assert_void ( test,
level,
stop,
...   ) 
Value:
do \
    if (!(test)) {  \
        if (apop_opts.verbose >= level) { fprintf(stderr, "%s: ", __func__); fprintf(stderr, __VA_ARGS__); fprintf(stderr, "\n");}   \
        if (stop == 's' || stop == 'h') assert(test);   \
} while (0);

Like Apop_assert, but no return step. It is thus useful in void functions.

Following the tradition regarding assert functions, this is a macro but is not in all caps.

Parameters:
test The expression that you are asserting is nonzero.
level Print the warning message only if apop_opts.verbose is greater than or equal to this. Zero usually works, but for minor infractions use one.
stop If 's', halt the program (using the standard C assert); if 'c', continue by returning the return value and printing an error message if appropriate.
... The error message in printf form, plus any arguments to be inserted into the printf string. I'll provide the function name and a carriage return.
#define Apop_settings_alloc ( type,
out,
...   )     apop_ ##type ##_settings *out = apop_ ##type ##_settings_alloc(__VA_ARGS__);

This is obsolete. Use Apop_model_add_group.

For what it's worth, this is a convenience macro. Expands:

 Apop_settings_alloc(mle, ms, data, model);

to:

 apop_mle_settings *ms = apop_mle_settings_alloc(data, model);

As of this writing, options for the first argument include mle, histogram, and update. See the respective documentations for the arguments to be sent to the respective allocation functions. Because this is an obsolete function, that list may shrink.


Function Documentation

apop_data* apop_data_listwise_delete ( apop_data d,
char  inplace 
)

If there is an NaN anywhere in the row of data (including the matrix, the vector, and the weights) then delete the row from the data set.

The function returns a new data set with the NaNs removed, so the original data set is left unmolested. You may want to apop_data_free the original immediately after this function.

  • If every row has an NaN, then this returns NULL.
  • If there is text, it gets pruned as well.
  • If inplace = 'y', then I'll free each element of the input data set and refill it with the pruned elements. Again, I'll take up (up to) twice the size of the data set in memory during the function. If every row has an NaN, then your apop_data set will have a lot of NULL elements.
  • This function uses the Designated initializers syntax for inputs.
Parameters:
d The data, with NaNs
inplace If 'y', clear out the pointer-to-apop_data that you sent in and refill with the pruned data. If 'n', leave the set alone and return a new data set.
Returns:
A (potentially shorter) copy of the data set, without NaNs. If inplace=='y', redundant with the input.
apop_data* apop_data_sort ( apop_data data,
int  sortby,
char  asc 
)

This function sorts the whole of a apop_data set based on one column. Sorts in place, with little additional memory used.

Uses the gsl_sort_vector_index function internally, and that function just ignores NaNs; therefore this function just leaves NaNs exactly where they lay.

Parameters:
data The input set to be modified. (No default, must not be NULL.)
sortby The column of data by which the sorting will take place. As usual, -1 indicates the vector element. (default: column zero of the matrix)
asc If 'd' or 'D', sort in descending order; else sort in ascending order. (Default: ascending)
Returns:
A pointer to the data set, so you can do things like apop_data_show(apop_data_sort(d, -1)).

This function uses the Designated initializers syntax for inputs.

void apop_error ( int  level,
char  stop,
char *  msg,
  ... 
)

Inform the user of a faux pas. See also Apop_assert, which allows the function to return a value.

Parameters:
level At what verbosity level should the user be warned? E.g., if level==2, then print iff apop_opts.verbosity >= 2. You can set apop_opts.verbose==-1 to turn off virtually all messages, but this is probably ill-advised.
stop Either 's' or 'c', indicating whether the program should stop or continue. If stopping, uses assert(0) for easy debugging. You can use 'h' (halt) as a synonym for 's'.
msg The message to write to STDERR (presuming the verbosity level is high enough). This can be a printf-style format with following arguments. You can produce much more informative error messages this way, e.g., apop_error(0, 's', "Beta is %g but should be greater than zero.", beta);.
double apop_generalized_harmonic ( int  N,
double  s 
)

Calculate $\sum_{n=1}^N {1\over n^s}$

apop_model* apop_histogram_model_reset ( apop_model base,
apop_model m,
long int  draws,
gsl_rng *  rng 
)

Give me an existing histogram (i.e., an apop_model) and I'll create a new histogram with the same bins, but with data from draws random draws from the parametrized model you provide.

Unlike with most other histogram-genrating functions, this one will normalize the output to integrate to one. It uses the Designated initializers syntax for inputs.

Parameters:
base An apop_model produced using a form like apop_estimate(yourdata, apop_histogram). I.e. a histogram model to be used as a template. (No default)
m The model to be drawn from. Because this function works via random draws, the model needs to have a draw method. (No default)
draws The number of random draws to make. (arbitrary default = 1e5)
rng The gsl_rng used to make random draws. (default: see note on Auto-allocated RNGs)
apop_model* apop_histogram_moving_average ( apop_model m,
size_t  bandwidth 
)

Return a new histogram that is the moving average of the input histogram.

Parameters:
m A histogram, in apop_model form.
bandwidth The number of elements to be smoothed.
void apop_histogram_normalize ( apop_model m  ) 

Scale a histogram so it integrates to one (and is thus a proper PMF).

apop_model* apop_histogram_vector_reset ( apop_model template,
gsl_vector *  indata 
)

Give me an existing histogram (i.e., an apop_model) and I'll create a new histogram with the same bins, but with data from the vector you provide

Parameters:
template An apop_model produced using a form like apop_estimate(yourdata, apop_histogram).
indata The new data to be binned.
apop_data* apop_histograms_test_goodness_of_fit ( apop_model m0,
apop_model m1 
)

Test the goodness-of-fit between two histograms (in apop_model form). I assume that the histograms are aligned.

apop_model* apop_ml_imputation ( apop_data d,
apop_model mvn 
)

Impute the most likely data points to replace NaNs in the data, and insert them into the given data. That is, the data set is modified in place.

Parameters:
d The data set. It comes in with NaNs and leaves entirely filled in.
mvn A parametrized apop_model from which you expect the data was derived. if NULL, then I'll use the Multivariate Normal that best fits the data after listwise deletion.
Returns:
An estimated apop_ml_imputation_model. Also, the data input will be filled in and ready to use.
double apop_test ( double  statistic,
char *  distribution,
double  p1,
double  p2,
char  tail 
)

This is a convenience function to do the lookup of a given statistic along a given distribution. You give me a statistic, its (hypothesized) distribution, and whether to use the upper tail, lower tail, or both. I will return the odds of a Type I error given the model---in statistician jargon, the $p$-value. [Type I error: odds of rejecting the null hypothesis when it is true.]

For example,

   apop_test(1.3);

will return the density of the standard Normal distribution that is more than 1.3 from zero. If this function returns a small value, we can be confident that the statistic is significant. Or,

   apop_test(1.3, "t", 10, tail='u');

will give the appropriate odds for an upper-tailed test using the $t$-distribution with 10 degrees of freedom (e.g., a $t$-test of the null hypothesis that the statistic is less than or equal to zero).

Several more distributions are supported; see below.

  • For a two-tailed test (the default), this returns the density outside the range. I'll only do this for symmetric distributions.
  • For an upper-tail test ('u'), this returns the density above the cutoff
  • For a lower-tail test ('l'), this returns the density below the cutoff
Parameters:
statistic The scalar value to be tested.
distribution The name of the distribution; see below.
p1 The first parameter for the distribution; see below.
p2 The second parameter for the distribution; see below.
tail 'u' = upper tail; 'l' = lower tail; anything else = two-tailed. (default = two-tailed)
Returns:
The odds of a Type I error given the model (the $p$-value).

Here is a list of distributions you can use, and their parameters.

"normal" or "gaussian"

  • p1=mu, p2=sigma
  • default (0, 1)

"lognormal"

  • p1=mu, p2=sigma
  • default (0, 1)
  • Remember, mu and sigma refer to the Normal one would get after exponentiation
  • One-tailed tests only

"uniform"

  • p1=lower edge, p2=upper edge
  • default (0, 1)
  • two-tailed tests are run relative to the center, (p1+p2)/2.

"t"

  • p1=df
  • no default

"chi squared", "chi", "chisq":

  • p1=df
  • no default
  • One-tailed tests only; default='u' ($p$-value for typical cases)

"f"

  • p1=df1, p2=df2
  • no default
  • One-tailed tests only
apop_data* apop_test_kolmogorov ( apop_model m1,
apop_model m2 
)

Run the Kolmogorov test to determine whether two distributions are identical.

Parameters:
m1,m2 Two matching apop_histograms, probably produced via apop_histogram_vector_reset or apop_histogram_model_reset.
Returns:
The $p$-value from the Kolmogorov test that the two distributions are equal.
apop_model* apop_update ( apop_data data,
apop_model prior,
apop_model likelihood,
gsl_rng *  rng 
)

Take in a prior and likelihood distribution, and output a posterior distribution.

This function first checks a table of conjugate distributions for the pair you sent in. If the names match the table, then the function returns a closed-form model with updated parameters. If the parameters aren't in the table of conjugate priors/likelihoods, then it uses Markov Chain Monte Carlo to sample from the posterior distribution, and then outputs a histogram model for further analysis. Notably, the histogram can be used as the input to this function, so you can chain Bayesian updating procedures.

To change the default settings (MCMC starting point, periods, burnin...), add an apop_update_settings struct to the prior.

Here are the conjugate distributions currently defined:

Prior

Likelihood

Notes

Beta

Binomial

Beta

Bernoulli

Exponential

Gamma

Gamma likelihood represents the distribution of $\lambda^{-1}$, not plain $\lambda$

Normal

Normal

Assumes prior with fixed $\sigma$; updates distribution for $\mu$

Gamma

Poisson

Uses sum and size of the data

Parameters:
data The input data, that will be used by the likelihood function (default = NULL.)
prior The prior apop_model (No default, must not be NULL.)
likelihood The likelihood apop_model. If the system needs to estimate the posterior via MCMC, this needs to have a draw method. (No default, must not be NULL.)
rng A gsl_rng, already initialized (e.g., via apop_rng_alloc). (default: see Auto-allocated RNGs)
Returns:
an apop_model struct representing the posterior, with updated parameters.
Todo:
The table of conjugate prior/posteriors (in its static check_conjugacy subfuction), is a little short, and can always be longer.
gsl_vector* apop_vector_moving_average ( gsl_vector *  v,
size_t  bandwidth 
)

Return a new vector that is the moving average of the input vector.

Parameters:
v The input vector, unsmoothed
bandwidth The number of elements to be smoothed.
double* apop_vector_percentiles ( gsl_vector *  data,
char  rounding 
)

Returns a vector of size 101, where returned_vector[95] gives the value of the 95th percentile, for example. Returned_vector[100] is always the maximum value, and returned_vector[0] is always the min (regardless of rounding rule).

Parameters:
data a gsl_vector of data. (No default, must not be NULL.)
rounding This will either be 'u', 'd', or 'a'. Unless your data is exactly a multiple of 101, some percentiles will be ambiguous. If 'u', then round up (use the next highest value); if 'd' (or anything else), round down to the next lowest value; if 'a', take the mean of the two nearest points. If 'u' or 'a', then you can say "5% or more of the sample is below returned_vector[5]"; if 'd' or 'a', then you can say "5% or more of the sample is above returned_vector[5]". (Default = 'd'.)

This function uses the Designated initializers syntax for inputs.

SourceForge.net Logo

Autogenerated by doxygen on 23 Nov 2009.