Apophenia is an open statistical library for working with data sets and statistical models. It provides functions
on the same level as those of the typical stats package (such as OLS, probit, or
singular value decomposition) but gives the user more flexibility to be creative in model-building.
The core functions are written in C, but bindings exist for Python (and they should be easy to bind to in Perl/Ruby/&c.)
It is written to scale well, to comfortably work with gigabyte data sets or computationally-intensive agent-based
models. If you have tried using other open source tools for computationally demanding work
and found that those tools weren't up to the task, then Apophenia is the library for you.
The goods
To date, the library has over two hundred functions to facilitate scientific computing, such as:
- OLS and family, discrete choice models like probit and logit, kernel density estimators, and other common models
- database querying and maintenance utilities
- moments, percentiles, and other basic stats utilities
- t-tests, F-tests, et cetera
- Several optimization methods available for your own new models
- It does not re-implement basic matrix operations or build yet another database
engine. Instead, it builds upon the excellent GNU
Scientific and SQLite libraries. MySQL is also supported.
For the full list, click the index link from the header.
Most users will just want to download the packaged version using the giant green button at the
SourceForge page linked from the
Download Apophenia here header.
Those who would like to work on a cutting-edge copy of the source code
can get the latest version (and five years of project history) by cutting and pasting the following onto
the command line. If you follow this route, be sure to read the development README in the
apophenia directory this command will create.
git clone git://apophenia.git.sourceforge.net/gitroot/apophenia/apophenia
The online reference for Apophenia is here. The reader may
also be interested in the textbook
Modeling with Data,
which discusses general methods for doing statistics in C with the GSL
and SQLite, as well as Apophenia itself.
The Frequently Asked Question: Why not use [name of stats package]?
- Matrices and databases. There are things you can
do with a one-line database query that you need a hundred lines of
matrix-manipulation code to do; there are things you can do with matrices
that you simply can't do with a database query. A good stats library
therefore takes both representations of data seriously.
- Models as objects. The apop_model object is unique among stats packages in
providing a consistent interface to linear models, probability distributions, and
exotic models that can only be solved via maximum likelihood. The consistent interface means
that you can compare several models at once, or can construct multilevel models or
creative variant models by using standard models as building blocks. Simply put, having statistical
models as objects is nifty.
- Better MLEs. The package focuses on facilitating maximum
likelihood estimation. The usual OLS and GLS are still there, but since the
world isn't linear, Apophenia focuses on giving you methods of
fitting generally-specified models via MLE.
- Not slow; not limited. First, the software imposes no restrictions on data
size (Stata says: "Your matrix must be less than 4,000
columns"). Second, Apophenia shares C code with certain open source
stats packages, and yet runs that same code over fifty (50)
times faster. Apophenia's speed and effectively unlimited data
handling mean that it is the only open source option for
statistical analysis of very large data sets.
- Open source and portable.
The packages Apophenia uses are ported to almost
any computer you will ever use. You can begin your analysis on the
university/company servers, then send it to a colleague, then copy it to
your laptop for the ride home, and never worry about compatibility or
licensing.
Contribute!
- Develop a new model object.
- Contribute your favorite statistical routine.
- Package Apophenia into an RPM, apt, portage, cygwin package.
- Report bugs or suggest features.
- Write bindings for your preferred language, which may just mean modifying the existing SWIG interface file.
If you're interested, write to the maintainer (Ben Klemens), or join the
SourceForge project.