Apophenia is an open statistical library. It provides functions
on the same level as those of the typical stats package (such as OLS, probit, or
singular value decomposition) but doesn't tie the user to an
ad hoc language or environment. The core functions are written in C, but should be easy to bind to functions in Perl/Phython/&c.
It is written to scale well. If you have tried to analyze
your gigabyte data set using other open source tools but
found that they weren't up to handling
large data sets or exceptionally computationally-intensive
work, Apophenia is the library for you.
More technically, there are a wealth of libraries that work at the level of matrices and matrix
manipulation, but most of a modeler's or statistician's work is at the level of data sets and models.
Apophenia provides objects and tools to work at this layer of abstraction.
[By the way, this page (and its CSS) is 100% valid
XHTML. If your browser can't render it legibly, please try this page.]
The goods
To date, the library has over a hundred functions to facilitate statistical computing, including:
- maximum likelihood estimators for probit, Waring, Yule, Zipf, &c. estimators
- OLS and GLS
- database querying and maintenance utilities
- moments, percentiles, and other basic stats utilities
- singular value decomposition tools
- t-tests, F-tests, et cetera
Most users will just want to download the autoconf-packaged library here.
Those who would like to work on a cutting-edge copy of the source code
can get the latest version by cutting and pasting the following onto
the command line.
svn co https://svn.sourceforge.net/svnroot/apophenia/trunk/apophenia
We have the technology
There is no need to
reinvent the wheel in the process of rebuilding our regression
functions. The Apophenia library is based on two lower-level
libraries: the GNU
Scientific Library, which does the number-crunching, and SQLite, which handles the data.
The online reference for Apophenia is here. The reader may
also be interested in this extensive text entitled
Modern
Statistical Computing (PDF),
which discusses general methods for doing statistics in C with the GSL
and SQLite, as well as Apophenia itself.
The FAQ: Why not use [name of stats package]?
- Matrices and databases. There are things you can
do with a one-line database query that you need a hundred lines of
matrix-manipulation code to do; there are things you can do with matrices
that you simply can't do with a database query. A good stats library
therefore takes both representations of data seriously.
- Better MLEs. The package focuses on facilitating maximum
likelihood estimation. The usual OLS and GLS are still there, but since the
world isn't linear, Apophenia focuses on giving you methods of
fitting generally-specified models via MLE.
- No ad hoc languages and no glue. There exists a package somewhere that does
everything Apophenia does (database libraries, matrix manipulation
languages, ML estimators), and a savvy user could chain them together
through creative use of text files and batch scripts. But with
Apophenia, there is no need to jump between packages, and no need to
simultaneously wrangle with mutliple idiosyncratic languages:
the entire flow from text files to databases to final estimation is in one place.
- Not slow. First, the software imposes no restrictions on data
size (Stata says: "Your matrix must be less than 4,000
columns"). Second, Apophenia shares C code with certain open source
stats packages, and yet runs that same code over fifty (50)
times faster. Apophenia's speed and effectively unlimited data
handling mean that it is the only open source option for
statistical analysis of very large data sets.
- Open source and portable.
The packages Apophenia uses are ported to almost
any computer you will ever use. You can begin your analysis on the
university/company servers, then send it to a colleague, then copy it to
your laptop for the ride home, and never worry about compatibility or
licensing.
Contribute!
You don't need to eat C code for breakfast to help. Ways you can contribute:
- Report bugs or suggest features.
- Package Apophenia into an RPM, apt, portage, cygwin package.
- Write bindings for your preferred language.
- Contribute your favorite statistical routine.
- Help make the C code base more robust and still faster.
If you're interested, write to the maintainer (Ben Klemens), or join the
SourceForge project.