![]() |
|
If trace_path (in the apop_mle_settings struct) has a name of positive length, then every time the MLE evaluates the function, then the value will be output to a table in the database/a file with the given name (depending on the value of apop_opts.output_type). You can then plot this table to get an idea of the path the estimation routine used to arrive at its MLE.
To write to a pipe or stdout, set apop_opts.output_type appropriately and set trace_path to the literal string "NULL".
Below is a sample of the sort of output one would get:
An ML search, tracing out the surface of the function
Simulated annealing is a controlled random walk. As with the other methods, the system tries a new point, and if it is better, switches. Initially, the system is allowed to make large jumps, and then with each iteration, the jumps get smaller, eventually converging. Also, there is some decreasing probability that if the new point is {less} likely, it will still be chosen. Simulated annealing is best for situations where there may be multiple local optima. Early in the random walk, the system can readily jump from one to another; later it will fine-tune its way toward the optimum. The number of points tested is basically not dependent on the function: if you give it a 4,000 step program, that is basically how many steps it will take. If you know your function is globally convex (as are most standard probability functions), then this method is overkill.
The GSL's simulated annealing system doesn't actually do very much. It basically provides a for loop that calls a half-dozen functions that we the users get to write. So, the file apop_mle.c handles all of this for you. The likelihood function is taken from the model, the metric is the Manhattan metric, the copy/destroy functions are just the usual vector-handling fns., et cetera. The reader who wants further control is welcome to override these functions.
Verbosity: if ep->verbose==1, show likelihood, temp, &c. in a table; if ep->verbose>1, show that plus the vector of params.
| apop_model* apop_estimate_restart | ( | apop_model * | e, | |
| apop_model * | copy, | |||
| char * | starting_pt, | |||
| double | boundary | |||
| ) |
The simplest use of this function is to restart a model at the current parameter estimates.
apop_model *m = apop_estimate(data, model_using_an_MLE_search); for (int i=0; i< 10; i++) m = apop_estimate_restart(m); apop_data_show(m);
By adding a line to reduce the tolerance each round [e.g., Apop_settings_add(m, apop_mle, tolerance, pow(10,-i))], you can hone in on a precise optimum.
You may have a new estimation method, such as first doing a simulated annealing search, then honing in via a conjugate gradient method:
Apop_settings_set(your_base_model, apop_mle, APOP_SIMAN); apop_model *m = apop_estimate(data, your_base_model); Apop_settings_set(m, apop_mle, APOP_CG_PR); m = apop_estimate_restart(data, m); apop_data_show(m);
Only one estimate is returned, either the one you sent in or a new one. The loser (which may be the one you sent in) is freed. That is, there is no memory leak in the above loop.
| e | An apop_model that is the output from a prior MLE estimation. (No default, must not be NULL.) | |
| copy | Another not-yet-parametrized model that will be re-estimated with (1) the same data and (2) a starting_pt as per the next setting (probably to the parameters of e). If this is NULL, then copy off e. (Default = NULL) | |
| starting_pt | "ep"=last estimate of the first model (i.e., its current parameter estimates); "es"= starting point originally used by the first model; "np"=current parameters of the new (second) model; "ns"=starting point specified by the new model's MLE settings. (default = "ep") | |
| boundary | I test whether the starting point you give me is outside this certain bound, so I can warn you if there's divergence in your sequence of re-estimations. (default: 1e8) |
This function uses the Designated initializers syntax for inputs.
| apop_model* apop_maximum_likelihood | ( | apop_data * | data, | |
| apop_model * | dist | |||
| ) |
The maximum likelihood calculations
| data | The data matrix (an apop_data set). | |
| dist | The apop_model object: waring, probit, zipf, &c. You can add an apop_mle_settings struct to it (Apop_model_add_group(your_model, apop_mle, .verbose=1, .method=APOP_CG_FR, and_so_on)). So, see the apop_mle_settings documentation for the many options, such as choice of method and tuning parameters. |
apop_model *est = apop_estimate(your_data, apop_probit); apop_data *info = apop_data_get_page(est->parameters, "Info"); int status = apop_data_get_ti(info, "status", -1); if (status) //trouble else //optimum found