![]() |
|
The functions to shunt data between text files, database tables, GSL matrices, and plain old arrays.
Use fill_me = apop_query_to_matrix("select * from table_name;"); or fill_me = apop_query_to_data("select * from table_name;");. [See apop_query_to_matrix; apop_query_to_data.]
| apop_data * apop_array_to_data | ( | const double ** | in, | |
| const int | rows, | |||
| const int | cols | |||
| ) |
Convert a double ** array to an apop_data set. It will have no names. Input data is copied.
| in | the array to read in | |
| rows,cols | the size of the array. |
usage:
apop_data *d = apop_array_to_data(indata, 34, 4);
If you want to intialize on the allocation line, this isn't what you want. See apop_line_to_data.
| gsl_matrix * apop_array_to_matrix | ( | const double ** | in, | |
| const int | rows, | |||
| const int | cols | |||
| ) |
Convert a double ** array to a gsl_matrix
| in | the array to read in | |
| rows,cols | the size of the array. |
gsl_matrix, allocated for you and ready to use.usage:
gsl_matrix *m = apop_array_to_matrix(indata, 34, 4);
If you want to intialize on the allocation line, this isn't what you want. See apop_line_to_matrix.
| gsl_vector * apop_array_to_vector | ( | double * | in, | |
| int | size | |||
| ) |
Just copies a one-dimensional array to a gsl_vector. The input array is undisturbed.
| in | An array of doubles. (No default. Must not be NULL); | |
| size | How long line is. If this is zero or omitted, I'll guess using the sizeof(line)/sizeof(line[0]) trick, which will work for most arrays allocated using double [] and won't work for those allocated using double *. (default = auto-guess) |
gsl_vector (which I will allocate for you).This function uses the Designated initializers syntax for inputs.
| gsl_vector * apop_data_pack | ( | const apop_data * | in, | |
| gsl_vector * | out | |||
| ) |
Sometimes, you need to turn an apop_data set into a column of numbers. Thus, this function, that takes in an apop_data set and outputs a gsl_vector. It is valid to use the out_vector->data element as an array of doubles of size out_vector->data->size.
The complement is apop_data_unpack. I.e.,
apop_data_unpack(apop_data_pack(in_data), data_copy)
will return the original data set (stripped of text and names).
| in | an apop_data set. (No default. If NULL, return NULL). | |
| out | If this is not NULL, then put the output here. The dimensions must match exactly. If NULL, then allocate a new data set. |
gsl_vector with the vector data (if any), then each row of data (if any). If out is not NULL, then this is a pointer there. | void apop_data_to_db | ( | const apop_data * | set, | |
| const char * | tabname | |||
| ) |
Dump an apop_data set into the database.
This function is basically preempted by apop_data_print. Use that one; this may soon no longer be available.
Column names are inserted if there are any. If there are, all dots are converted to underscores. Otherwise, the columns will be named c1, c2, c3, &c.
If apop_opts.db_name_column is not blank (the default is "row_name"), then a so-named column is created, and the row names are placed there.
| set | The name of the matrix | |
| tabname | The name of the db table to be created |
| void apop_data_unpack | ( | const gsl_vector * | in, | |
| apop_data * | d | |||
| ) |
This is the complement to apop_data_pack, qv. It converts the gsl_vector produced by that function back to an apop_data set with the given dimensions. It overwrites the data in the vector and matrix elements (and that's it).
| in | a gsl_vector of the form produced by apop_data_pack. | |
| d | that data set to be filled. Must be allocated to the correct size. |
| apop_data * apop_line_to_data | ( | double * | in, | |
| int | vsize, | |||
| int | rows, | |||
| int | cols | |||
| ) |
A convenience function to convert a double ** array to an apop_data set. It will have no names. The input data is copied, not pointed to.
See also apop_line_to_matrix or apop_array_to_vector; this function will use exactly one of these and then wrap an apop_data struct around the output.
| in | The array to read in. If there were appropriately placed line breaks, then this would look like the eventual data set. For example,
double params[] = {0, 1, 2 3, 4, 5}; apop_data_alloc(params, 2, 2, 2); will produce a vector | |
| vsize | The vector size. If there are also rows/cols, I expect this to equal the number or rows. | |
| rows,cols | the size of the array. |
| gsl_matrix * apop_line_to_matrix | ( | double * | line, | |
| int | rows, | |||
| int | cols | |||
| ) |
Convert a double * array to a gsl_matrix. Input data is copied.
| line | the array to read in | |
| rows,cols | the size of the array. |
gsl_matrix, allocated for you and ready to use.usage:
gsl_matrix *m = apop_line_to_matrix(indata, 34, 4);
| void apop_matrix_to_db | ( | const gsl_matrix * | data, | |
| const char * | tabname, | |||
| const char ** | headers | |||
| ) |
Dump a gsl_matrix into the database. This function is basically preempted by apop_matrix_print. Use that one; this may soon no longer be available.
| data | The name of the matrix | |
| tabname | The name of the db table to be created | |
| headers | A list of column names. If NULL, then the columns will be named c1, c2, c3, &c. |
| apop_data* apop_text_to_data | ( | char * | text_file, | |
| int | has_row_names, | |||
| int | has_col_names, | |||
| int * | field_ends | |||
| ) |
Read a delimited text file into the matrix element of an apop_data set.
See Notes on input text file formatting.
| text_file | = "-" The name of the text file to be read in. If "-" (the default), use stdin. | |
| has_row_names | = 'n'. Does the lines of data have row names? | |
| has_col_names | = 'y'. Is the top line a list of column names? If there are row names, then there should be no first entry in this line like 'row names'. That is, for a 100x100 data set with row and column names, there are 100 names in the top row, and 101 entries in each subsequent row (name plus 100 data points). | |
| field_ends | If fields have a fixed size, give the end of each field, e.g. {3, 8 11}. |
example: See apop_ols.
This function uses the Designated initializers syntax for inputs.
| apop_data* apop_text_to_data | ( | char * | text_file, | |
| int | has_row_names, | |||
| int | has_col_names | |||
| ) |
Read a delimited text file into the matrix element of an apop_data set.
See Notes on input text file formatting.
| text_file | = "-" The name of the text file to be read in. If "-" (the default), use stdin. | |
| has_row_names | = 'n'. Does the lines of data have row names? | |
| has_col_names | = 'y'. Is the top line a list of column names? If there are row names, then there should be no first entry in this line like 'row names'. That is, for a 100x100 data set with row and column names, there are 100 names in the top row, and 101 entries in each subsequent row (name plus 100 data points). |
example: See apop_ols.
This function uses the Designated initializers syntax for inputs.
| int apop_text_to_db | ( | char * | text_file, | |
| char * | tabname, | |||
| int | has_row_names, | |||
| int | has_col_names, | |||
| char ** | field_names, | |||
| int * | field_ends | |||
| ) |
Read a text file into a database table.
See Notes on input text file formatting.
Using the data set from the example on the apop_ols page, here's another way to do the regression:
#include <apop.h> int main(void){ apop_data *data; apop_model *est; apop_db_open(NULL); apop_text_to_db("data", "d",.has_row_names= 0,.has_col_names=1); data = apop_query_to_data("select * from d"); est = apop_estimate(data, apop_ols); printf("The OLS coefficients:\n"); apop_model_print(est); }
By the way, there is a begin/commit wrapper that bundles the process into bundles of 2000 inserts per transaction. if you want to change this to more or less frequent commits, you'll need to modify and recompile the code.
| text_file | The name of the text file to be read in. If "-", then read from STDIN. (default = "-") | |
| tabname | The name to give the table in the database (default = apop_strip_dots (text_file, 'd'); default in Python/R interfaces="t") | |
| has_row_names | Does the lines of data have row names? (default = 0) | |
| has_col_names | Is the top line a list of column names? All dots in the column names are converted to underscores, by the way. (default = 1) | |
| field_names | The list of field names, which will be the columns for the table. If has_col_names==1, read the names from the file (and just set this to NULL). If has_col_names == 1 && field_names !=NULL, I'll use the field names. (default = NULL) | |
| field_ends | If fields have a fixed size, give the end of each field, e.g. {3, 8 11}. |
| int apop_text_to_db | ( | char * | text_file, | |
| char * | tabname, | |||
| int | has_row_names, | |||
| int | has_col_names, | |||
| char ** | field_names | |||
| ) |
Read a text file into a database table.
See Notes on input text file formatting.
Using the data set from the example on the apop_ols page, here's another way to do the regression:
#include <apop.h> int main(void){ apop_data *data; apop_model *est; apop_db_open(NULL); apop_text_to_db("data", "d",.has_row_names= 0,.has_col_names=1); data = apop_query_to_data("select * from d"); est = apop_estimate(data, apop_ols); printf("The OLS coefficients:\n"); apop_model_print(est); }
By the way, there is a begin/commit wrapper that bundles the process into bundles of 2000 inserts per transaction. if you want to change this to more or less frequent commits, you'll need to modify and recompile the code.
| text_file | The name of the text file to be read in. If "-", then read from STDIN. (default = "-") | |
| tabname | The name to give the table in the database (default = apop_strip_dots (text_file, 'd'); default in Python/R interfaces="t") | |
| has_row_names | Does the lines of data have row names? (default = 0) | |
| has_col_names | Is the top line a list of column names? All dots in the column names are converted to underscores, by the way. (default = 1) | |
| field_names | The list of field names, which will be the columns for the table. If has_col_names==1, read the names from the file (and just set this to NULL). If has_col_names == 1 && field_names !=NULL, I'll use the field names. (default = NULL) |
| double * apop_vector_to_array | ( | const gsl_vector * | in | ) |
Converts a gsl_vector to an array.
| in | A gsl_vector |
double*, which will be malloced inside the function.