Introduction
Crunch exposes a REST API for third parties, and indeed its own UI, to manage datasets. This API is also used by the Python and R libraries. This User Guide is for developers who are writing applications on top of the Crunch REST API, with or without those language bindings. It describes the existing interfaces for the current version and attempts to provide context and examples to guide development.
The documents are organized in three overlapping scopes: a feature guide, which provide higher-level vignettes that illustrate key features; an endpoint reference, which describes individual URIs in detail; and an object reference, which defines the building blocks of the Crunch platform, such as values, columns, types, variables, and datasets.
Feature Guide
Authentication
POST /api/public/login/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
Content-Length: 67
{
"email": "fake.user@example.com",
"password": "password"
}
HTTP/1.1 204 No Content
Set-Cookie: token=dac20c82c79a514d572b4f5d7e11cb53; Domain=.crunch.io; Max-Age=31536000; Path=/
Vary: Cookie, Accept-Encoding
library(crunch)
login("fake.user@example.com", "password")
# See ?login for options, including how to store your credentials
# in your .Rprofile
import pycrunch
curl -c cookie-jar
-X POST
-d '{"email": "fake.user@example.com", "password": "password"}'
-H "Content-type: application/json"
https://app.crunch.io/api/public/login/
# The above command will perform a login and save the login cookie to a file called 'cookie-jar'.
# After this, you can access the endpoint via `curl' commands (POST, GET, PATCH), as long as the '-b cookie-jar' flag is present. Note, -b not -c. -c saves cookies, -b submits cookies from the existing file. It is good practice to delete this file when you are done.
Replace “fake.user@example.com” and “password” with your email and password, respectively.
Nearly all interactions with the Crunch API need to be authenticated. The standard password authentication method involves POSTing credentials and receiving a cookie back, which should be included in subsequent requests.
The client should then store the Cookie and pass it along with each subsequent request.
Failure will return 401 Unauthorized.
Crunch also supports OAuth 2.0/OpenID Connect. See the public endpoint reference for more on how to authenticate with OAuth.
If you’d like to add your auth provider to the set of supported providers, contact support@crunch.io
Password policy
Password policy is as follows:
- Password must be 8 characters or longer
- Password must contain at least 4 unique characters
Examples:
- Secure 8 character password: 4B3a8f4$
- Secure passphrase: correct horse battery staple
We highly recommend that users use long pass phrases instead of passwords for security and for ease of remembering the credentials for accessing the service.
Importing Data
There are several ways to build a Crunch dataset. The most appropriate method for you to create a dataset depends primarily on the format in which the data is currently stored.
Import from a data file
In some cases, you already have a file sitting on your computer which has source data, in CSV or SPSS format (or a Zip file containing a single file in CSV or SPSS format). You can upload these to Crunch and then attach them to datasets by following these steps.
1. Create a Dataset entity
POST /datasets/ HTTP/1.1
Content-Type: application/shoji
Content-Length: 974
...
{
"element": "shoji:entity",
"body": {
"name": "my survey",
...
}
}
--------
201 Created
Location: /datasets/{dataset_id}/
ds <- newDatasetFromFile("my.csv", name="my survey")
# All three steps are handled within newDatasetFromFile
POST a Dataset Entity to the datasets catalog. See the documentation for POST /datasets/ for details on valid attributes to include in the POST.
2. Upload the file
POST /sources/ HTTP/1.1
Content-Length: 8874357
Content-Type: multipart/form-data; boundary=df5b17ff463a4cb3aa61cf02224c7303
--df5b17ff463a4cb3aa61cf02224c7303
Content-Disposition: form-data; name="uploaded_file"; filename="my.csv"
Content-Type: text/csv
"case_id","q1","q2"
234375,3,"sometimes"
234376,2,"always"
...
--------
201 Created
Location: /sources/{source_id}/
POST the file to the sources catalog.
Note that if the file is large (>100 megabytes), you should consider uploading it to a file-sharing service, like Dropbox.
To import from a URL (rather than a local file), use a JSON body with a location
property giving the URL.
POST /sources/ HTTP/1.1
Content-Length: 71
Content-Type: application/json
{"location": "https://www.dropbox.com/s/znpoawnhg0rdzhw/iris.csv?dl=1"}
3. Add the Source to the Dataset
POST /datasets/{dataset_id}/batches/ HTTP/1.1
Content-Type: application/json
...
{
"element": "shoji:entity",
"body": {
"source": "/sources/{source_id}/"
}
}
--------
202 Continue
Location: /datasets/{dataset_id}/batches/{batch_id}/
...
{
"element": "shoji:view",
"value": "/progress/{progress_id}/"
}
POST the URL of the just-created source entity (the Location in the 201 response from the previous step) to the batches catalog of the dataset entity created in step 1.
The POST to the batches catalog will return 202 Continue status, and the response body contains a progress URL. Poll that URL to monitor the completion of the batch addition. See “Progress” for more. The 202 response will also contain a Location header with the URL of the newly created batch.
Metadata document + CSV
This approach may be most natural for importing data from databases that store data by rows. You can dump or export your database to Crunch’s JSON metadata format, plus a CSV of data, and upload those to Crunch, without requiring much back-and-forth with the API.
1. Create a Dataset entity with variable definitions
POST /datasets/ HTTP/1.1
Content-Type: application/shoji
Content-Length: 974
...
{
"element": "shoji:entity",
"body": {
"name": "my survey",
...,
"table": {
"element": "crunch:table",
"metadata": {
"educ": {"name": "Education", "alias": "educ", "type": "categorical", "categories": [...], ...},
"color": {"name": "Favorite color", "alias": "color", "type": "text", ...},
"state": {"name": "State", "alias": "state", "view": {"geodata": [{"geodatum": <uri>, "feature_key": "properties.postal-code"}]}}
},
"order": ["educ", {'my group": "color"}]
},
}
}
--------
201 Created
Location: /datasets/{dataset_id}/
POST a Dataset Entity to the datasets catalog, and in the “body”, include a Crunch Table object with variable definitions and order.
The “metadata” member in the table is an object containing all variable definitions, keyed by variable alias. See the Object Reference: Variable Definitions discussion for specific requirements for defining variables of various types, as well as the example below.
The “order” member is a Shoji Order object specifying the order, potentially hierarchically nested, of the variables in the dataset. The example below illustrates how this can be used. Shoji is JSON, which means the “metadata” object is explicitly unordered. If you wish the variables to have an order, you must supply an order object rather than relying on any order of the “metadata” object.
It is possible to create derived variables using any of the derivation functions available simulaneously in one request when creating the dataset along its metadata. The variable references inside the derivation expressions must point to declared aliases of variables or subvariables.
POST /datasets/ HTTP/1.1
Content-Type: application/shoji
Content-Length: 3294
...
{
"element": "shoji:entity",
"body": {
"name": "Dataset with derived arrays",
"settings": {
"viewers_can_export": true,
"viewers_can_change_weight": false,
"min_base_size": 3,
"weight": "weight_variable",
"dashboard_deck": null
},
"table": {
"metadata": {
"element": "crunch:table"
"weight_variable": {
"name": "weight variable",
"alias": "weight_variable",
"type": "numeric"
},
"combined": {
"name": "combined CA",
"derivation": {
"function": "combine_categories",
"args": [
{
"variable": "CA1"
},
{
"value": [
{
"combined_ids": [2],
"numeric_value": 2,
"missing": false,
"name": "even",
"id": 1
},
{
"combined_ids": [1],
"numeric_value": 1,
"missing": false,
"name": "odd",
"id": 2
}
]
}
]
}
},
"numeric": {
"name": "numeric variable",
"type": "numeric"
},
"numeric_copy": {
"name": "Copy of numeric",
"derivation": {
"function": "copy_variable",
"args": [{"variable": "numeric"}]
}
},
"MR1": {
"name": "multiple response",
"derivation": {
"function": "select_categories",
"args": [
{
"variable": "CA3"
},
{
"value": [
1
]
}
]
}
},
"CA3": {
"name": "cat array 3",
"derivation": {
"function": "array",
"args": [
{
"function": "select",
"args": [
{
"map": {
"var1": {
"variable": "ca2-subvar-2",
"references": {
"alias": "subvar2",
"name": "Subvar 2"
}
},
"var0": {
"variable": "ca1-subvar-1",
"references": {
"alias": "subvar1",
"name": "Subvar 1"
}
}
}
},
{
"value": ["var1", "var0"]
}
]
}
]
}
},
"CA2": {
"subvariables": [
{
"alias": "ca2-subvar-1",
"name": "ca2-subvar-1"
},
{
"alias": "ca2-subvar-2",
"name": "ca2-subvar-2"
}
],
"type": "categorical_array",
"name": "cat array 2",
"categories": [
{
"numeric_value": null,
"missing": false,
"id": 1,
"name": "yes"
},
{
"numeric_value": null,
"missing": false,
"id": 2,
"name": "no"
},
{
"numeric_value": null,
"missing": true,
"id": -1,
"name": "No Data"
}
]
},
"CA1": {
"subvariables": [
{
"alias": "ca1-subvar-1",
"name": "ca1-subvar-1"
},
{
"alias": "ca1-subvar-2",
"name": "ca1-subvar-2"
},
{
"alias": "ca1-subvar-3",
"name": "ca1-subvar-3"
}
],
"type": "categorical_array",
"name": "cat array 1",
"categories": [
{
"numeric_value": null,
"missing": false,
"id": 1,
"name": "yes"
},
{
"numeric_value": null,
"missing": false,
"id": 2,
"name": "no"
},
{
"numeric_value": null,
"missing": true,
"id": -1,
"name": "No Data"
}
]
}
}
}
}
}
--------
201 Created
Location: /datasets/{dataset_id}/
The example above does a number of things:
- Creates variables
numeric
and arraysCA1
andCA2
. - Makes a shallow copy of variable
numeric
asnumeric_copy
. - Makes an ad hoc array
CA3
reusing subvariables fromCA1
andCA2
. - Makes a multiple response view
MR1
selecting category 1 from categorical arrayCA3
.
Validation rules
All variables mentioned in the metadata must contain a valid variable definition with a matching alias.
Array variables definitions should contain valid subvariable or subreferences members.
Any attribute that contains a null
value will be ignored and get the attribute’s
default value instead.
An empty order
for the dataset will be handled as if no order was passed in.
2. Add row data
By file:
POST /datasets/{dataset_id}/batches/ HTTP/1.1
Content-Type: text/csv
Content-Length: 8874357
Content-Disposition: form-data; name="file"; filename="thedata.csv"
...
"educ","color"
3,"red"
2,"yellow"
...
--------
202 Continue
Location: /datasets/{dataset_id}/batches/{batch_id}/
...
{
"element": "shoji:view",
"value": "/progress/{progress_id}/"
}
By S3 URL:
POST /datasets/{dataset_id}/batches/ HTTP/1.1
Content-Type: application/shoji
Content-Length: 341
...
{
"element": "shoji:entity",
"body": {
"url": "s3://bucket_name/dir/subdir/?accessKey=ASILC6CBA&secretKey=KdJy7ZRK8fDIBQ&token=AQoDYXdzECAa%3D%3D"
}
}
--------
202 Continue
Location: /datasets/{dataset_id}/batches/{batch_id}/
...
{
"element": "shoji:view",
"value": "/progress/{progress_id}/"
}
POST a CSV file or URL to the new dataset’s batches catalog. The CSV must include a header row of variable identifiers, which should be the aliases of the variables (and array subvariables) defined in step (1).
The values in the CSV MUST be the same format as the values you get out of Crunch, and it must match the metadata specified in the previous step. This includes:
- Categorical variables should have data identified by the integer category ids, not strings, and all values must be defined in the “categories” metadata for each variable.
- Datetimes must all be valid ISO 8601 strings
- Numeric variables must have only (unquoted) numeric values
- The only special value allowed is an empty “cell” in the CSV, which will be read as the system-missing value “No Data”
Violation of any of these validation criteria will result in a 409 Conflict response status. To resolve, you can either (1) fix your CSV locally and re-POST it, or (2) PATCH the variable metadata that is invalid and then re-POST the CSV.
Imports are done in “strict” mode by default. Strict imports are faster, and using strict mode will alert you if there is any mismatch between data and metadata. However, in some cases, it may be convenient to be more flexible and silently ignore or resolve inconsistencies. For example, you may have a large CSV dumped out of a database, and the data format isn’t exactly Crunch’s format, but it would be costly to read-munge-write the whole file for minor changes. In cases like this, you may append ?strict=0
to the URL of the POST request to loosen that strictness.
With non-strict imports:
- The CSV may contain columns not described by the metadata; these columns will be ignored, rather than returning an error response
- The metadata may describe variables not contained in the CSV; these variables will be filled with missing values, rather than returning an error response
- And more things to come
The CSV can be sent in one of two ways:
- Upload a file by POSTing a multipart form
- POST a Shoji entity with a “url” in the body, containing all necessary auth keys as query parameters. If the URL points to a single file, it should be a CSV or gzipped CSV, as described above. If the URL points to a directory, the contents will be assumed to be (potentially zipped) batches of a CSV and will be concatenated for appending. In the latter case, only the first CSV in the directory listing should contain a header row.
A 201 response to the POST request indicates success. All rows added in a single request become part of a new Batch, whose URL is returned in the response Location. You may inspect the new rows in isolation by following its batch/ link.
Example
Here’s an example dataset metadata and corresponding csv.
Several things to note:
- Everything–metadata, order, and data–is keyed by variable “alias”, not “name”, because Crunch believes that names are for people, not computers, to understand. Aliases must be unique across the whole dataset, while variable “names” must only be unique within their group or array variable.
- For categorical variables, all values in the CSV correspond to category ids, not category names, and also not “numeric_values”, which need not be unique or present for all categories in a variable.
- The array variables defined in the metadata (“allpets” and “petloc”) don’t themselves have columns in the CSV, but all of their “subvariables” do, keyed by their aliases.
- With the exception of those array variable definitions, all variables and subvariables defined in the metadata have columns in the CSV, and there are no columns in the CSV that are not defined in the metadata.
- For internal variables, such as a case identifier in this example, that you don’t want to be visible in the UI, you can add them as “hidden” from the beginning by including
"discarded": "true"
in their definition, as in the example of “caseid”. - Missing values
- Variables with categories (categorical, multiple_response, categorical_array) have missing values defined as categories with
"missing": "true"
- Text, numeric, and datetime variables have missing variables defined as “missing_rules”, which can be “value”, “set”, or “range”. See, for example, “q3” and “ndogs”.
- Empty cells in the CSV, if present, will automatically be translated as the “No Data” system missing value in Crunch. See, for example, “ndogs_b”.
- Variables with categories (categorical, multiple_response, categorical_array) have missing values defined as categories with
- Order
- All variables should be referenced by alias in the “order” object, inside a group’s “entities” key. Any omitted variables (in this case, the hidden variable “caseid”) will automatically be added to a group named “ungrouped”.
- Variables may appear in multiple groups.
- Groups may be nested within each other.
Column-by-column
Crunch stores data by column internally, so if your data are stored in a column-major format as well, importing by column may be the most efficient way to import data.
1. Create a Dataset entity
POST /datasets/ HTTP/1.1
Content-Type: application/shoji
Content-Length: 974
...
{
"element": "shoji:entity",
"body": {
"name": "my survey",
...
}
}
--------
201 Created
Location: /datasets/{dataset_id}/
ds <- createDataset("my suryey")
POST a Dataset Entity to the datasets catalog, just as in the first import method.
2. Add Variable definitions and column data
POST /datasets/{dataset_id}/variables/ HTTP/1.1
Content-Type: application/shoji
Content-Length: 38475
...
{
"element": "shoji:entity",
"body": {
"name": "Gender",
"alias": "gender",
"type": "categorical",
"categories": [
{
"name": "Male",
"id": 1,
"numeric_value": null,
"missing": false
},
{
"name": "Female",
"id": 2,
"numeric_value": null,
"missing": false
},
{
"name": "Skipped",
"id": 9,
"numeric_value": null,
"missing": true
}
],
"values": [1, 9, 1, 2, 2, 1, 1, 1, 1, 2, 9, 1]
}
}
--------
201 Created
Location: /datasets/{dataset_id}/variables/{variable_id}/
# Here's a similar example. R's factor type becomes "categorical".
gender.names <- c("Male", "Female", "Skipped")
gen <- factor(gender.names[c(1, 3, 1, 2, 2, 1, 1, 1, 1, 2, 3, 1)],
levels=gender.names)
# Assigning an R vector into a dataset will create a variable entity.
ds$gender <- gen
POST a Variable Entity to the newly created dataset’s variables catalog, and include with that Entity definition a “values” key that contains the column of data. Do this for all columns in your dataset.
If the values
attribute is not present, the new column will be filled with
“No Data” in all rows.
The data passed in values
can correspond to either the full data column for
the new variable or a single value, in which case it will be used to fill
up the column.
In the case of arrays, the single value should be a list containing the correct categorical values.
If the type of the values passed in does not correspond with the variable’s type, the server will return a 400 response indicating the error and the variable will not be created.
Appending Data
Appending data to an existing Dataset is not much different from uploading the initial data; both use a “Batch” resource which represents the process of importing the data from the source into the dataset. Once you have created a Source for your data, POST its URL to datasets/{id}/batches/ to start the import process. That process may take some time, depending on the size of the dataset. The returned Location is the URI of the new Batch; GET the batches catalog and look up the Batch URI in the catalog’s index and inspect its status attribute until it moves from “analyzing” to “appended”. User interfaces may choose here to show a progress meter or some other widget.
During the “analyzing” stage, the Crunch system imports the data into a temporary table, and matches its variables with any existing variables. During the “importing” stage, the new rows will move to the target Dataset, and once “appended”, the new rows will be included in all queries against that Dataset.
Adding a subsequent Source
Once you have created a Dataset, you can upload new files and append rows to the same Dataset as often as you like. If the structure of each file is the same as that of the first uploaded file, Crunch should automatically pass your new rows through exactly the same process as the old rows. If there are any derived variables in your Dataset, new data will be derived in the new rows following the same rules as the old data. You can follow the progress as above via the batch’s status attribute.
Let’s look at an example: you had uploaded an initial CSV of 3 columns, A, B and C. Then:
- The Crunch system automatically converted column A from the few strings that were found in it to a Categorical type.
- You derived a new column D that consisted of B * C.
Then you decide to upload another CSV of new rows. What will happen?
When you POST to create the second Batch, the service will: 1) match up the new A with the old A and cast the new strings to existing categories by name, and 2) fill column D for you with B * C for each new row.
However, from time to time, the new source has significant differences: a new variable, a renamed variable, and other changes. When you append the first Source to a Dataset, there is nothing with which to conflict. But a subsequent POST to batches/ may result in a conflict if the new source cannot be confidently reconciled with the existing data. Even though you get a 201 Created response for the new batch resource, it will have a status of “conflict”.
Reporting and Resolving Conflicts
When you append a Source to an existing Dataset, the system attempts to match up the new data with the old. If the source’s schema can be automatically aligned with the target Dataset, the new rows from the Batch are appended. When things go wrong, however, the Batch can be inspected to see what conflicted with the target (or vice-versa, in some cases!).
GET the new Batch:
GET /api/datasets/{dataset_id}/batches/{batch_id}/ HTTP/1.1
...
--------
200 OK
Content-Type: application/shoji
{
"element": "shoji:entity",
"body": {
"conflicts": {
"cdbd11/": {
"metadata": {},
"conflicts": [{
"message": "Types do not match and cannot be converted",
}]
}
}
}
}
If any variable conflicts, it will possess one or more “conflicts” members. For example, if the new variable “cdbd11” had a different type that could not be converted compared to the existing variable “cdbd11”, the Batch resource would contain the above message. Only unresolvable conflicts will be shown; if a variable is not reported in the conflicts object, it appended cleanly.
See Batches for more details on batch entities and conflicts.
Streaming rows
Existing datasets are best sent to Crunch as a single Source, or a handful of subsequent Sources if gathered monthly or on some other schedule. Sometimes however you want to “stream” data to Crunch as it is being gathered, even one row at a time, rather than in a single post-processing phase. You do not want to make each row its own batch (it’s simply not worth the overhead). Instead, you should make a Stream and send rows to it, then periodically create a Source and Batch from it.
Send rows to a stream
To send one or more rows to a dataset stream, simply POST one or more lines of
line-delimited JSON to the
dataset’s stream
endpoint:
{"var_id_1": 1, "var_id_2": "a"}
by_alias = ds.variables.by('alias')
while True:
row = my_system.read_a_row()
importing.importer.stream_rows(ds, {
'gender': row['gender'],
'age': row['age']
})
Streamed values must be keyed either by id or by alias. The variable ids/aliases must correspond to existing variables in the dataset. The Python code shows how to efficiently map aliases to ids. The data must match the target variable types so that we can process the row as quickly as possible. We want no casting or other guesswork slowing us down here. Among other things, this means that categorical values must be represented as Crunch’s assigned category ids, not names or numeric values.
You may also send more than one row at a time if you prefer. For example, your data collection system may already post-process row data in, say, 5 minute increments. The more rows you can send together, the less overhead spent processing each one and the more you can send in a given time. Send multiple lines of line-delimited JSON, or if using pycrunch, a list of dicts rather than a single dict.
Each time you send a POST, all of the rows in that POST are assembled into a new message which is added to the stream. Each message can contain one or more rows of data.
As when creating a new source, don’t worry about sending values for derived variables; Crunch will fill these out for you for each row using the data you send.
Append the new rows to the dataset
The above added new rows to the Stream resource so that you can be confident that your data is completely safe with Crunch. To append those rows to the dataset requires another step. You could stream rows and then, once they are all assembled, append them all as a single Source to the dataset. However, if you’re streaming rows at intervals it’s likely you want to append them to the dataset at intervals, too. But doing so one row at a time is usually counter-productive; it slows the rate at which you can send rows, balloons metadata, and interrupts users who are analyzing the data.
Instead, you control how often you want the streamed rows to be appended to the
dataset. When you’re ready, POST to /datasets/{id}/batches/
and provide the
“stream” member, plus any extra metadata the new Source should possess:
{
"stream": null,
"type": "ldjson",
"name": "My streamed rows",
"description": "Yet Another batch from the stream"
}
ds.batches.create({"body": {
"stream": None,
"type": "ldjson",
"name": "My streamed rows",
"description": "Yet Another batch from the stream"
}})
The “stream” member tells Crunch to acquire the data from the stream to form this
Batch. The “stream” member must be null
, then the system will acquire all
currently pending messages (any new messages which arrive during the formation of
this Batch will be queued and not fetched). If there are no pending messages,
409 Conflict
is returned instead of 201/202 for the new Batch.
Pending rows will be added automatically
Every hour, the Crunch system goes through all datasets, and for each that has pending streamed data, it batches up the pending rows and adds them to the dataset automatically, as long as the dataset is not currently in use by someone. That way, streamed data will magically appear in the dataset for the next time a user loads it, but if a user is actively working with the dataset, the system won’t update their view of the data and disrupt their session.
See Stream for more details on streams.
Combining datasets
Combining datasets consists on creating a new dataset formed by stacking a list of datasets together. It works under the same rules as a normal append.
To create a new dataset combined from others, it is necessary to POST to the
datasets catalog indicating a combine_datasets
expression:
POST /api/datasets/
{
"element": "shoji:entity",
"body": {
"name": "My combined dataset",
"description": "Consists on dsA and dsB",
"derivation": {
"function": "combine_datasets",
"args": [
{"dataset": "https://app.crunch.io/api/datasets/dsabc/"},
{"dataset": "https://app.crunch.io/api/datasets/ds123/"}
]
}
}
}
The server will verify that the authenticated user has view permission to all datasets, else will raise a 400 error.
The resulting dataset will consist on the matched union of all included datasets with the rows in the same order. Private/public variable visibility and exclusion filters will be honored in the result.
Transformations during combination
The combine procedures will perform normal append matching rules which means that any mismatch on aliases or types will not proceed, as well limiting the existing union of variables from the present datasets as the result.
It is possible to provide transformations on the datasets to ensure that they line up on the combination phase and to add extra columns with constant dataset metadata per dataset on the resulting combined result.
Each {"dataset"}
argument allows for an extra frame
key that can contain
a function expression on the desired dataset transformation, for example:
{
"dataset": "<dataset_url>",
"frame": {
"function": "select",
"args": [{
"map": {
"*": {"variable": "*"},
"dataset_id": {
"value": "<dataset_id>",
"type": "text",
"references": {
"name": "Dataset ID",
"alias": "dataset_id"
}
}
}
}]
}
}
Selecting a subset of variables to combine
In the same fashion that it is possible to add extra variables to the dataset transforms, it is possible to select which variables only to include.
Note in the example above, we use the "*": {"variable": "*"}
expressions
which instructs the server to include all variables. Omitting that would cause
to only include the selected variables, for example:
{
"dataset": "<dataset_url>",
"frame": {
"function": "select",
"args": [{
"map": {
"A": {"variable": "A"},
"B": {"variable": "B"},
"C": {"variable": "C"},
"dataset_id": {
"value": "<dataset_id>",
"type": "text",
"references": {
"name": "Dataset ID",
"alias": "dataset_id"
}
}
}
}]
}
}
On this example, the expression indicates to only include variables with IDs
A
, B
and C
from the referenced dataset as well as add the new extra
variable dataset_id
. This would effectively append only these 4 variables
instead of the full dataset’s variables.
Merging and Joining Datasets
Crunch supports joining variables from one dataset to another by a key variable that maps rows from one to the other. To add a snapshot of those variables to the dataset, POST an adapt
function expression to its variables catalog.
POST /api/datasets/{dataset_id}/variables/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
{
"function": "adapt",
"args": [{
"dataset": "https://app.crunch.io/api/datasets/{other_id}/"
}, {
"variable": "https://app.crunch.io/api/datasets/{other_id}/variables/{right_key_id}/"
}, {
"variable": "https://app.crunch.io/api/datasets/{dataset_id}/variables/{left_key_id}/"
}]
}
-----
HTTP/1.1 202 Accepted
{
"element": "shoji:view",
"self": "https://app.crunch.io/api/datasets/{dataset_id}/variables/",
"value": "https://app.crunch.io/api/progress/5be82a/"
}
A successful request returns 202 Continue status with a progress resource in the response body; poll that to track the status of the asynchronous job that adds the data to your dataset.
Currently Crunch only supports left joins: all rows of the left (current) dataset will be kept, and only rows from the right (incoming) dataset that have a key value present in the left dataset will be brought in. Rows in the left dataset that do not have a corresponding row in the right dataset will be filled with missing values for the incoming variables.
The join key must be of type “numeric” or “text”, must be the same type in both datasets, and must have unique values within each dataset.
Joining a subset of variables
To select certain variables to bring over from the right dataset, include select
function expression around the adapt
function described above:
POST /api/datasets/{dataset_id}/variables/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
{
"function": "select",
"args": [{
"map": {
"{right_var1_id}/": {
"variable": "https://app.crunch.io/api/datasets/{other_id}/variables/{right_var1_id}/"
},
"{right_var2_id}/": {
"variable": "https://app.crunch.io/api/datasets/{other_id}/variables/{right_var2_id}/"
},
"{right_var3_id}/": {
"variable": "https://app.crunch.io/api/datasets/{other_id}/variables/{right_var3_id}/"
}
}
}],
"frame": {
"function": "adapt",
"args": [{
"dataset": "https://app.crunch.io/api/datasets/{other_id}/"
}, {
"variable": "https://app.crunch.io/api/datasets/{other_id}/variables/{right_key_id}/"
}, {
"variable": "https://app.crunch.io/api/datasets/{dataset_id}/variables/{left_key_id}/"
}]
}
}
-----
HTTP/1.1 202 Accepted
{
"element": "shoji:view",
"self": "https://app.crunch.io/api/datasets/{dataset_id}/variables/",
"value": "https://app.crunch.io/api/progress/5be82a/"
}
Joining a subset of rows
Rows to consider from the right dataset can also be filtered.
To do so, include a filter
attribute on the payload, containing either a filter expression, wrapped under {"expression": <expr>}
, or
an existing filter entity URL (from the right-side dataset), wrapped as {"filter": <url>}
.
POST /api/datasets/{dataset_id}/variables/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
{
"function": "adapt",
"args": [{
"dataset": "https://app.crunch.io/api/datasets/{other_id}/"
}, {
"variable": "https://app.crunch.io/api/datasets/{other_id}/variables/{right_key_id}/"
}, {
"variable": "https://app.crunch.io/api/datasets/{dataset_id}/variables/{left_key_id}/"
}],
"filter": {
"expression": {
"function": "==",
"args": [
{"variable": "https://app.crunch.io/api/datasets/{other_id}/variables/{variable_id}/"},
{"value": "<value>"}
]
}
}
}
You can filter both rows and variables in the same request. Note that the “filter” parameter remains at the top-level function in the expression, which when specifying a variable subset is “select” instead of “adapt”:
POST /api/datasets/{dataset_id}/variables/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
{
"function": "select",
"args": [{
"map": {
"{right_var1_id}/": {
"variable": "https://app.crunch.io/api/datasets/{other_id}/variables/{right_var1_id}/"
},
"{right_var2_id}/": {
"variable": "https://app.crunch.io/api/datasets/{other_id}/variables/{right_var2_id}/"
},
"{right_var3_id}/": {
"variable": "https://app.crunch.io/api/datasets/{other_id}/variables/{right_var3_id}/"
}
}
}],
"frame": {
"function": "adapt",
"args": [{
"dataset": "https://app.crunch.io/api/datasets/{other_id}/"
}, {
"variable": "https://app.crunch.io/api/datasets/{other_id}/variables/{right_key_id}/"
}, {
"variable": "https://app.crunch.io/api/datasets/{dataset_id}/variables/{left_key_id}/"
}]
},
"filter": {
"filter": "https://app.crunch.io/api/datasets/{other_id}/filters/{filter_id}/"
}
}
Deriving Variables
Derived variables are variables that, instead of having a column of values backing them, are functionally dependent on other variables. In Crunch, users with view-only permissions on a dataset can still make derived variables of their own–just as they can make filters. Dataset editors can also derive other types of variables as permanent additions to the dataset, available for all viewers.
Combining categories
The “combine_categories” function takes two arguments:
- A reference to the categorical or categorical_array variable to be combined
- A definition of the categories of the new variable, including all members found in categories, plus a “combined_ids” key that maps the derived category to one or more categories (by id) in the input variable.
Given a variable such as:
{
"element": "shoji:entity",
"self": "https://app.crunch.io/api/datasets/3ad42c/variables/0000f5/",
"body": {
"name": "Education",
"alias": "educ",
"type": "categorical",
"categories": [
{
"numeric_value": null,
"missing": true,
"id": -1,
"name": "No Data"
},
{
"numeric_value": 1,
"missing": false,
"id": 1,
"name": "No HS"
},
{
"numeric_value": 2,
"missing": false,
"id": 2,
"name": "High school graduate"
},
{
"numeric_value": 3,
"missing": false,
"id": 3,
"name": "Some college"
},
{
"numeric_value": 4,
"missing": false,
"id": 4,
"name": "2-year"
},
{
"numeric_value": 5,
"missing": false,
"id": 5,
"name": "4-year"
},
{
"numeric_value": 6,
"missing": false,
"id": 6,
"name": "Post-grad"
},
{
"numeric_value": 8,
"missing": true,
"id": 8,
"name": "Skipped"
},
{
"numeric_value": 9,
"missing": true,
"id": 9,
"name": "Not Asked"
}
],
"description": "Education"
}
}
POST'ing to the private variables catalog a Shoji Entity containing a ZCL function like:
{
"element": "shoji:entity",
"body": {
"name": "Education (3 category)",
"description": "Combined from six-category education",
"alias": "educ3",
"derivation": {
"function": "combine_categories",
"args": [
{
"variable": "https://app.crunch.io/api/datasets/3ad42c/variables/0000f5/"
},
{
"value": [
{
"name": "High school or less",
"numeric_value": null,
"id": 1,
"missing": false,
"combined_ids": [1, 2]
},
{
"name": "Some college",
"numeric_value": null,
"id": 2,
"missing": false,
"combined_ids": [3, 4]
},
{
"name": "4-year college or more",
"numeric_value": null,
"id": 3,
"missing": false,
"combined_ids": [5, 6]
},
{
"name": "Missing",
"numeric_value": null,
"id": 4,
"missing": true,
"combined_ids": [8, 9]
},
{
"name": "No data",
"numeric_value": null,
"id": -1,
"missing": true,
"combined_ids": [-1]
}
]
}
]
}
}
}
results in a private categorical variable with three valid categories.
Combining the categories of a categorical array is the same as it is for categorical variables. The resulting variable is also of type “categorical_array”. This variable type also has a “subvariables_catalog”, like the variable from which it is derived, and the subvariables contained in it are derived “combine_categories” categorical variables.
Combining responses
For multiple response variables, you may combine responses rather than categories.
Given a variable such as:
{
"element": "shoji:entity",
"self": "https://app.crunch.io/api/datasets/455288/variables/3c2e57/",
"body": {
"name": "Aided awareness",
"alias": "aided",
"subvariables": [
"../870a2d/",
"../a8b0eb/",
"../dc444f/",
"../8e6279/",
"../f775ab/",
"../6405c2/"
],
"type": "multiple_response",
"categories": [
{
"numeric_value": 1,
"selected": true,
"id": 1,
"name": "Selected",
"missing": false
},
{
"numeric_value": 2,
"id": 2,
"name": "Not selected",
"missing": false
},
{
"numeric_value": 8,
"id": 3,
"name": "Skipped",
"missing": true
},
{
"numeric_value": 9,
"id": 4,
"name": "Not asked",
"missing": true
},
{
"numeric_value": null,
"id": -1,
"name": "No data",
"missing": true
}
],
"description": "Which of the following coffee brands do you recognize? Check all that apply."
}
}
POSTing to the variables catalog a Shoji Entity containing a ZCL function like:
{
"element": "shoji:entity",
"body": {
"name": "Aided awareness by region",
"description": "Combined from aided brand awareness",
"alias": "aided_region",
"derivation": {
"function": "combine_responses",
"args": [
{
"variable": "https://app.crunch.io/api/datasets/455288/variables/3c2e57/"
},
{
"value": [
{
"name": "San Francisco",
"combined_ids": [
"../870a2d/",
"../a8b0eb/",
"../dc444f/"
]
},
{
"name": "Portland",
"combined_ids": [
"../8e6279/",
"../f775ab/"
]
},
{
"name": "Chicago",
"combined_ids": [
"../6405c2/"
]
}
]
}
]
}
}
}
results in a multiple response variable with three responses. The “selected” state of the responses in the derived variable is an “OR” of the combined subvariables.
Case statements
The “case” function derives a variable using values from the first argument. Each of the remaining arguments contains a boolean expression. These are evaluated in order in an IF, ELSE IF, ELSE IF, …, ELSE fashion; the first one that matches selects the corresponding value from the first argument. For example, if the first two boolean expressions do not match (return False) but the third one matches, then the third value in the first argument is placed into that row in the output. You may include an extra value for the case when none of the boolean expressions match; if not provided, it defaults to the system “No Data” missing value.
{
"element": "shoji:entity",
"body": {
"name": "Market segmentation",
"description": "Super-scientific classification of people",
"alias": "segments",
"derivation": {
"function": "case",
"args": [
{
"column": [1, 2, 3, 4],
"type": {
"value": {
"class": "categorical",
"categories": [
{"id": 3, "name": "Hipsters", "numeric_value": null, "missing": false},
{"id": 1, "name": "Techies", "numeric_value": null, "missing": false},
{"id": 2, "name": "Yuppies", "numeric_value": null, "missing": false},
{"id": 4, "name": "Other", "numeric_value": null, "missing": true}
]
}
}
},
{
"function": "and",
"args": [
{"function": "in", "args": [{"variable": "55fc29/"}, {"value": [5, 6]}]},
{"function": "<=", "args": [{"variable": "673dde/"}, {"value": 30}]}
]
},
{
"function": "and",
"args": [
{"function": "in", "args": [{"variable": "889dc3/"}, {"value": [4, 5, 6]}]},
{"function": ">", "args": [{"variable": "673dde/"}, {"value": 40}]}
]
},
{"function": "==", "args": [{"variable": "13cbf4/"}, {"value": 1}]}
]
}
}
}
Making ad hoc arrays
It is possible to create derived arrays reusing subvariables from other arrays
using the array
function and indicating the reference for each of its
subvariables.
The subvariables of an array are specified using the select
function, with its
first map
argument indicating the IDs for each of these virtual subvariables.
These IDs are user defined and can be any string. They remain unique inside the
parent variable so they can be reused between different arrays.
The second argument of the select
function indicates the order of the
subvariables in the array. They are referenced by the user defined IDs.
Each of its variables must point to a variable expression, which can take an
optional (but recommended) references
attribute to specify a particular name and alias for
the subvariable, if not specified, the same name from the original will be used
and the alias will be padded to ensure uniqueness.
{
"CA3": {
"name": "cat array 3",
"derivation": {
"function": "array",
"args": [
{
"function": "select",
"args": [
{
"map": {
"var1": {
"variable": "ca2-subvar-2",
"references": {
"alias": "subvar2",
"name": "Subvar 2"
}
},
"var0": {
"variable": "ca1-subvar-1",
"references": {
"alias": "subvar1",
"name": "Subvar 1"
}
}
}
},
{
"value": [
"var1",
"var0"
]
}
]
}
]
}
},
"CA2": {
"subvariables": [
{
"alias": "ca2-subvar-1",
"name": "ca2-subvar-1"
},
{
"alias": "ca2-subvar-2",
"name": "ca2-subvar-2"
}
],
"type": "categorical_array",
"name": "cat array 2",
"categories": [
{
"numeric_value": null,
"missing": false,
"id": 1,
"name": "yes"
},
{
"numeric_value": null,
"missing": false,
"id": 2,
"name": "no"
},
{
"numeric_value": null,
"missing": true,
"id": -1,
"name": "No Data"
}
]
},
"CA1": {
"subvariables": [
{
"alias": "ca1-subvar-1",
"name": "ca1-subvar-1"
},
{
"alias": "ca1-subvar-2",
"name": "ca1-subvar-2"
},
{
"alias": "ca1-subvar-3",
"name": "ca1-subvar-3"
}
],
"type": "categorical_array",
"name": "cat array 1",
"categories": [
{
"numeric_value": null,
"missing": false,
"id": 1,
"name": "yes"
},
{
"numeric_value": null,
"missing": false,
"id": 2,
"name": "no"
},
{
"numeric_value": null,
"missing": true,
"id": -1,
"name": "No Data"
}
]
}
}
On the above example, the array CA3
uses the array function and uses
subvariables ca1-subvar-1
and ca2-subvar-2
from CA1
and CA2
respectively.
The references
attribute is used to indicate specific name/alias for these
subvariables.
Weights
A numeric variable suitable for use as row weights can be constructed from one or more categorical variables and target proportions of their categories. The sample distribution is “raked” iteratively to each categorical marginal target to produce a set of joint values that can be used as weights. Note that available weight variables are shared by all; you may not create private weights. To create a weight variable, POST a JSON variable definition to the variables catalog describing the properties of the weight variable, with an “derivation” member indicating to use the “rake” function, which takes arguments containing an array of variable targets:
POST /api/datasets/{datasetid}/variables/ HTTP/1.1
Content-Type: application/shoji
Content-Length: 739
{
"name": "weight",
"description": "my raked weight",
"derivation": {
"function": "rake",
"args": [{
"variable": variabl1.id,
"targets": [[1, 0.491], [2, 0.509]]
}]
}
}
---------
201 Created
Location: /api/datasets/{datasetid}/variables/{variableid}/
Multiple Response Views
The “select_categories” function allows you to form a multiple response array from a categorical array, or alter the “selected” categories in an existing multiple response array. It takes two arguments:
- A reference to a categorical or categorical_array variable
- A list of the category ids or category names to mark as “selected”
Given a variable such as:
{
"element": "shoji:entity",
"self": "https://app.crunch.io/api/datasets/3ad42c/variables/0000f5/",
"body": {
"name": "Cola",
"alias": "cola",
"type": "categorical",
"categories": [
{"id": -1, "name": "No Data", "numeric_value": null, "missing": true},
{"id": 0, "name": "Never", "numeric_value": null, "missing": false},
{"id": 1, "name": "Sometimes", "numeric_value": null, "missing": false},
{"id": 2, "name": "Frequently", "numeric_value": null, "missing": false},
{"id": 3, "name": "Always", "numeric_value": null, "missing": false}
],
"subvariables": ["0001", "0002", "0003"],
"references": {
"subreferences": {
"0003": {"alias": "Coke"},
"0002": {"alias": "Pepsi"},
"0001": {"alias": "RC"}
}
}
}
}
POST'ing to the private variables catalog a Shoji Entity containing a ZCL function like:
{
"element": "shoji:entity",
"body": {
"name": "Cola likes",
"description": "Cola preferences",
"alias": "cola_likes",
"derivation": {
"function": "select_categories",
"args": [
{"variable": "https://app.crunch.io/api/datasets/3ad42c/variables/0000f5/"},
{"value": [2, 3]}
]
}
}
}
…results in a private multiple_response variable where the “Frequently” and “Always” categories are selected.
Text Analysis
Sentiment Analysis
The “sentiment” function allows you to derive a categorical variable from text variable data, which is classified and accumulated in three categories (positive, negative, and neutral). It takes one parameter:
- A reference to a text variable
Given a variable such as:
{
"element": "shoji:entity",
"self": "https://app.crunch.io/api/datasets/3ad42c/variables/0000f5/",
"body": {
"name": "Zest",
"alias": "zest",
"type": "text",
"values": [
"Zest is best",
"Zest I can take it or leave it",
"Zest is the worst"
]
}
}
POST
ing to the private variables catalog a Shoji Entity containing a ZCL function like:
{
"element": "shoji:entity",
"body": {
"name": "Zesty Sentiment",
"description": "Customer sentiment about Zest",
"alias": "zest_sentiment",
"derivation": {
"function": "sentiment",
"args": [
{"variable": "https://app.crunch.io/api/datasets/3ad42c/variables/0000f5/"}
]
}
}
}
Will result in a new categorical variable, where for each row the text variable is classified as “Negative”, “Neutral”, or “Positive” using the VADER English social-media-tuned lexicon.
Other transformations
Arithmetic operations
It is possible to create new numeric variables out of pairs of other numeric variables. The following arithmetic operations are available and will take two numeric variables as their arguments.
- “+” for adding up two numeric variables.
- “-” returns the difference between two numeric variables.
- “*” for the product of two numeric variables.
- “/” Real division.
- “//” Floor division; Returns always an integer.
- “^” Raises the first argument to the power of the second argument
- “%” Modulo operation; Accepts floats
The usage is as follows for all operators:
{
"function": "+",
"args": [
{"variable": "https://app.crunch.io/api/datasets/123/variables/abc/"}
{"variable": "https://app.crunch.io/api/datasets/123/variables/def/"}
]
}
bin
Receives a numeric variable and returns a categorical one where each category represents a bin of the numeric values.
Each category on the new variable is annotated with a “boundaries” member that contains the lower/upper bound of each bin.
{
"function": "bin",
"args": [
{"variable": "https://app.crunch.io/api/datasets/123/variables/abc/"}
]
}
Optionally it is possible to pass a second argument indicating the desired bin size to use instead of allowing the API to decide them.
{
"function": "bin",
"args": [
{"variable": "https://app.crunch.io/api/datasets/123/variables/abc/"},
{'value': 100}
]
}
case
Returns a categorical variable with its categories following the specified conditions from different variables on the dataset. View Case Statements
cast
Returns a new variable with its type and values casted. Not applicable on arrays or date variable; use Date Functions to work with date variables.
{
"function": "cast",
"args": [
{"variable": "https://app.crunch.io/api/datasets/123/variables/abc/"},
{"value": "numeric"}
]
}
The allowed output variable types are:
- numeric
- text
- categorical
For categorical types it is necessary to indicate the categories as a type definition instead of a string name:
To cast to categorical type, the second argument value
should not be
a name string (numeric
, text
) but a type definition indicating a
class
and categories
as follow:
{
"function": "cast",
"args": [
{"variable": "https://app.crunch.io/api/datasets/123/variables/abc/"},
{"value": {
"class": "categorical",
"categories": [
{"id": 1, "name": "one", "missing": false, "numeric_value": null},
{"id": 2, "name": "two", "missing": false, "numeric_value": null},
{"id": -1, "name": "No Data", "missing": true, "numeric_value": null},
]
}
}
]
}
To change the type of a variable a client should POST to the /variable/:id/cast/
endpoint. See Convert type for API examples.
char_length
Returns a numeric variable containing the text length of each value. Only applicable on text variables.
{
"function": "char_length",
"args": [
{"variable": "https://app.crunch.io/api/datasets/123/variables/abc/"}
]
}
copy_variable
Returns a shallow copy of the indicated variable maintaining type and data.
{
"function": "copy_variable",
"args": [
{"variable": "https://app.crunch.io/api/datasets/123/variables/abc/"}
]
}
Changes on the data of the original variable will be reflected on this copy.
combine_categories
Returns a categorical variable with values combined following the specified combination rules. See Combining categories
combine_responses
Given a list of categorical variables, return the selected value out of them. See Combining responses
row
Returns a numeric variable with row 0 based indices. It takes no arguments.
{
"function": "row",
"args": []
}
remap_missing
Given a text, numeric or datetime variable. return a new variable of the same type with its missing values mapped to new codes
{
"function": "remap_missing",
"args": [
{"variable": "varid"},
{"value": [
{
"reason": "Combined 1 and 2",
"code": 1,
"mapped_codes": [1, 2]
},
{
"reason": "Only 3",
"code": 2,
"mapped_codes": [3]
},
{
"reason": "No Data",
"code": -1,
"mapped_codes": [-1]
}
]}
]
}
The example above will return a copy of the variable with id varid
with the
new missing_reasons
grouping and mapping following the original codes.
Integrating variables
“Integrating” a variable means to remove its derived properties and turn it into
a regular base variable. Doing so will make this variable stop reflecting
the expression if new data is added to its original parent variable and new rows
will be filled with No Data {"?": -1}
.
To integrate a variable it is necessary to PATCH to the variable entity with
the derived
attribute set to false
as so:
PATCH /api/dataset/abc/variables/123/
{
"element": "shoji:entity",
"body": {
"derived": false
}
}
Will effectively integrate the variable and make its derivation
attribute
contain null
from now in. Note that it is only possible to set the derived
attribute to false
and never to true
.
Creating unlinked derivations
It is possible to create a material copy, or one off copy of a variable or an expression of it.
To create such variables, proceed normally creating a derived variable with
the derivation expression, but also include derived: false
attribute to it.
So the variable will be created with the values of the expression but will be
unlinked from the original variable.
POST /api/dataset/abc/variables/
{
"element": "shoji:entity",
"body": {
"derivation": {
"function": "copy_variable",
"args": [{"variable": "https://app.crunch.io/api/datasets/abc/variables/123/"}]
},
"derived": false
}
}
Array Variables
Simple variables have only one value per row; sometimes, however, it is convenient to consider multiple values (of the same type) as a single Variable. The Crunch system implements the data as a 2-dimensional array, but the array variable includes two additional attributes: “subvariables”, which is a list of subvariable URLs, and “subreferences”, which is an object of {name, alias, description, …} objects keyed by subvariable URL. There are two types of array variable: categorical array and multiple response.
Categorical arrays
For the “categorical_array” type, a row has multiple values, and may have a different value for each subvariable. For example, you might field a survey where you ask respondents to rate soft drinks by filling in a grid of a set of brands versus a set of ratings:
72. How much do you like each soft drink?
Not at all Not much OK A bit A lot
Coke o o o o o
Pepsi o o o o o
RC o o o o o
The respondent may only select one rating in each row. To represent that answer data in Crunch, you would define an array. For example, you might POST a Variable Entity with the payload:
{
"element": "shoji:entity",
"body": {
"name": "Soft Drinks",
"type": "categorical_array",
"subvariables": [
"./subvariables/001/",
"./subvariables/002/",
"./subvariables/003/"
],
"subreferences": {
"./subvariables/002/": {"name": "Coke", "alias": "coke"},
"./subvariables/003/": {"name": "Pepsi", "alias": "pepsi"},
"./subvariables/001/": {"name": "RC", "alias": "rc"}
},
"categories": [
{"id": -1, "name": "No Data", "numeric_value": null, "missing": true},
{"id": 1, "name": "Not at all", "numeric_value": null, "missing": false},
{"id": 2, "name": "Not much", "numeric_value": null, "missing": false},
{"id": 3, "name": "OK", "numeric_value": null, "missing": false},
{"id": 4, "name": "A bit", "numeric_value": null, "missing": false},
{"id": 5, "name": "A lot", "numeric_value": null, "missing": false},
{"id": 99, "name": "Skipped", "numeric_value": null, "missing": true}
],
"values": [
[1, 2, {"?": 99}],
[{"?": -1}, 4, 3],
[5, 2, {"?": -1}],
]
}
}
The “Soft Drinks” categorical array variable may now be included in analyses like any other variable, but has 2 dimensions instead of the typical 1. For example, a crosstab of a 1-dimensional “Gender” variable with a 1-dimensional “Education” variable yields a 2-D cube. A crosstab of 1-D “Gender” by 2-D “Soft Drinks” yields a 3-D cube.
In rare cases, you may have already added a separate Variable for “Coke”, one for “Pepsi”, and one for “RC”. You may move them to a single array variable by POSTing a Variable Entity for the array that instead of a “subreferences” attribute has a “subvariables” attribute, a list of URL’s of the variables you’d like to bind together:
{
"body": {
"name": "Soft Drinks",
"type": "categorical_array",
"subvariables": [<URI of the "Coke" variable>, <URI of the "Pepsi" variable>, <URI of the "RC" variable>]
}
}
The existing variables are removed from the normal order and become virtual subvariables of the new array. This approach will cast all subvariables to a common set of categories if they differ. The existing name and alias of each subvariable will be moved to the array’s “subreferences” attribute.
If you wish to analyze a set of categorical variables as an array without moving them, you need to build a derived array instead.
{
"body": {
"name": "Soft Drinks",
"type": "categorical_array",
"derivation": {
"function": "array",
"args": [{
"function": "select",
"args": [{"map": {
"000000": {"variable": <URI of the "Coke" variable>},
"000001": {"variable": <URI of the "Pepsi" variable>},
"000002": {"variable": <URI of the "RC" variable>}
}}]
}]
}
}
}
Your client library may have helper functions to construct the above more easily. This is a bit more advanced, but consequently more powerful: you can grab subvariables from other existing arrays, use more powerful subsetting functions like “deselect” and “subvariables”, cast, combine, what-have-you.
Multiple response
The second type of array is “multiple_response”. These arrays look very similar to categorical_array variables in their data representations, but are usually gathered very differently and behave differently in analyses. For example, you might field a survey where you ask respondents to select countries they have visited:
38. Which countries have you visited?
[] USA
[] Germany
[] Japan
[] None of the above
The respondent may check the box or not for each row. To represent that answer data in Crunch, you would define an array Variable with separate subreferences for “USA”, “Germany”, “Japan”, and “None of the above”:
{
"element": "shoji:entity",
"body": {
"name": "Countries Visited",
"type": "multiple_response",
"subvariables": [
"./subvariables/001/",
"./subvariables/002/",
"./subvariables/003/",
"./subvariables/004/"
],
"subreferences": {
"./subvariables/002/": {"name": "USA", "alias": "visited_usa"},
"./subvariables/004/": {"name": "Germany", "alias": "visited_germany"},
"./subvariables/001/": {"name": "Japan", "alias": "visited_japan"},
"./subvariables/003/": {"name": "None of the above", "alias": "visited_none_of_the_above"}
},
"categories": [
{"id": -1, "name": "No Data", "numeric_value": null, "missing": true},
{"id": 1, "name": "Checked", "numeric_value": null, "missing": false, "selected": true},
{"id": 2, "name": "Not checked", "numeric_value": null, "missing": false},
{"id": 98, "name": "Not shown", "numeric_value": null, "missing": true},
{"id": 99, "name": "Skipped", "numeric_value": null, "missing": true}
]
}
}
Aside from the new type name, the primary difference from the basic categorical array is that one or more categories are marked as “selected”. These are then used to dichotomize the categories such that any subvariable response is treated more as if it were true or false (selected or unselected) than maintaining the difference between each category. If POSTing to create “multiple_response”, you may include a “selected_categories” key in the body, containing an array of category names that indicate the dichotomous selection. If you do not include “selected_categories”, there must be at least one “selected”: true category in the subvariables you are binding into the multiple-response variable to indicate the dichotomous selection–see Object Reference#categories. If neither are true, the request will return 400 status.
The “Countries Visited” multiple response variable may now be included in analyses like any other variable, but with a noticeable difference. Rather than contributing a dimension of distinct categories, it instead contributes a dimension of distinct subvariables. For example, a crosstab of a 1-dimensional “Gender” variable with a 1-dimensional “Education” variable yields a 2-D cube: one dimension of the categories of Gender and one dimension of the categories of Education. A crosstab of 1-D “Gender” by the multiple response “Countries Visited” also yields a 2-D cube: one dimension of the categories of Gender but the other dimension has one entry for “USA”, one for “Germany”, one for “Japan”, and one for “None of the above”.
A quirk of multiple response variables is that analyses of them often require knowledge across subvariables: which rows had any subvariable selected, which rows had no subvariable selected, and which rows had all subvariables marked as “missing”. The Crunch system calculates these ancillary “subvariables” for you, and includes them in analysis output. Including an explicit “None of the above” subvariable in the example above complicates this, since Crunch has no way of knowing to treat such subvariables specially; it will faithfully consider the “None of the above” subvariable like any other subvariable when calculating the any/none/missing views. Depending on your application, you may wish to 1) not even include that option in your survey, 2) skip adding that variable to your Crunch dataset, 3) add it but do not bind it into the parent array variable, or 4) include it and have it be treated like any other multiple response subvariable in your analyses.
Non-uniform basis
As presented above, multiple response variables assume that subvariables have a consistent, uniform basis or number of rows in each subvariable. In some cases, the number of valid and missing entries may be wildly different from one subvariable to the next. In a survey example, a new response may be added to a longer-running series, or different responses may be presented to subsets of respondents in the context of an experiment. The boolean field uniform_basis
, if false
, provides a hint to users that, rather than using the __any__
column (from the selections
function output) in an analysis query, they should instead calculate the basis per subvariable by summing the ‘selected’ and ‘not selected’ categories. The field’s default is true
.
Adding new subvariables
In the scenario that a variable was left out when creating an array variable, it is possible to modify the array variable so that new subvariables get added (always on the last position).
To do so, the subvariable-to-be should currently be a variable of the dataset and have the same type as the subvariables (“categorical”).
Send a PATCH request containing the url of the new subvariable with an empty object as its tuple:
{
...
"index": {
"http://.../url/new/subvariable/": {}
}
}
A 204 response will indicate that the catalog was updated, and the new subvariable now is part of the array variable.
Multidimensional analysis
In the Crunch system, any analysis is also referred to as a “cube”. Cubes are the mechanical means of representing analyses to and from the Crunch system; you can think of them as spreadsheets that might have other than two dimensions. A cube consists of two primary parts: “dimensions” which supply the cube axes, and “measures” which populate the cells. Although both the request and response include dimensions and measures, it is important to distinguish between them. The request supplies expressions for each, while the response has data (and metadata) for each. The request declares what variables to use and what to do with them, while the response includes and describes the results. See Object Reference:Cube for complete details.
Dimensions
Each dimension of an analysis can be simply one variable, a function over it, a traversal of its subvariables (for array variables), or even a combination of multiple variables (e.g. A + B). Any expression you can use in a “select” command can be used as a dimension. The big difference is that the system will consider the distinct values rather than all values of the result. Variables which are already “categorical” or “enumerated” will simply use their “categories” or “elements” as the extent. Other variables form their extents from their distinct values.
For example, if “3ffd45” is a categorical variable with three categories (one of which is “No Data”: -1), then the following dimension expressions:
{
"dimensions": [
{"variable": "datasets/ab8832/variables/3ffd45/"},
{"function": "+", "args": [{"variable": "datasets/ab8832/variables/2098f1/"}, {"value": 5}]}
]
}
…would form a result cube with two dimensions: one using the categories of variable “3ffd45”, and one using the distinct values of (variable “2098f1” + 5). If variable “2098f1” has the distinct values [5, 15, 25, 35], then we would obtain a cube with the following extents:
1 | 2 | -1 | |
---|---|---|---|
5 | |||
15 | |||
25 | |||
35 |
Each dimension used in a cube query needs to be reduced to distinct values. For categorical or enumerated variables, we only need to refer to the variable, and the system will automatically use the “categories” or “elements” metadata to determine the distinct values. For other types, the default is to scan the variable’s data to find the unique values present and use those. Often, however, we want a more sophisticated approach: numeric variables, for example, are usually more useful when binned into a handful of ranges, like “0 to 10, 10 to 20, …90 to 100” rather than 100 distinct points (or many more when dealing with non-integers). The available dimensioning functions vary from type to type; the most common are:
- categorical: {“variable”: url}
- text: {“variable”: url}
- numeric: Group the distinct values into a smaller number of bins via:
- {“function”: “bin”, “args”: [{“variable”: url}]}
- datetime: Roll up seconds into hours, days into months, or any other grouping via:
- {“function”: “rollup”, “args”: [{“variable”: url}, {“value”: variable.rollup_resolution}]}
- categorical_array:
- One dimension for the subvariables: {“each”: url}
- One dimension for the categories: {“variable”: url}
- multiple response:
- One dimension for the subvariables: {“each”: url}
- One dimension for the selected-ness, which means transforming the array from a set of arbitrary categories to a standard “selected” set of categories (1, 0, -1) via:
- {“function”: “selections”, “args”: [{“variable”: url}]}
Measures
A set of named functions to populate each cell of the cube. You can request multiple functions over the same dimensions (such as “cube_mean” and “cube_stddev”) or more commonly just one (like “cube_count”). For example:
{"measures": {"count": {"function": "cube_count", "args": []}}}
or:
{"measures": {
"mean": {"function": "cube_mean", "args": [{"variable": "datasets/1/variables/3"}]},
"stddev": {"function": "cube_stddev", "args": [{"variable": "datasets/1/variables/3/"}]}
}}
When applied to the dimensions we defined above, this second example might fill the table thusly for the “mean” measure:
mean | 1 | 2 | -1 |
---|---|---|---|
5 | 4.3 | 12.3 | 8.1 |
15 | 13.1 | 0.0 | 9.2 |
25 | 72.4 | 4.2 | 55.5 |
35 | 8.9 | 9.1 | 0.4 |
…and produce a similar one for the “stddev” measure. You can think of multiple measures as producing “overlays” over the same dimensions. However, the actual output format (in JSON) is more compact in that the dimensions are not repeated; see Object Reference:Cube output for details.
ZCL expressions are composable. If you need, for example, to find the mean of a categorical variable’s “numeric_value” attributes, cast the variable to the “numeric” type class before including it as the cube argument:
{"measures": {
"mean": {
"function": "cube_mean",
"args": [{
"function": "cast",
"args": [
{"variable": "datasets/1/variables/3"},
{"class": "numeric"}
]
}]
}
}}
Comparisons
Occasionally, it is useful to compare analyses from different sources. A common example is to define “benchmarks” for a given analysis, so that you can quickly compare an analysis to an established target. These are, in effect, one analysis laid over another in such a way that at least one of their dimensions lines up (and typically, using the same measures). These are also therefore defined in terms of cubes: one set which defines the base analyses, and another which defines the overlay.
For example, if we have an analysis over two categorical variables “88dd88” and “ee4455”:
{
"dimensions": [
{"variable": "../variables/88dd88/"},
{"variable": "../variables/ee4455/"}
],
"measures": {"count": {"function": "cube_count", "args": []}}
}
then we might obtain a cube with the following output:
1 | 2 | -1 | |
---|---|---|---|
1 | 15 | 12 | 9 |
2 | 72 | 8 | 3 |
3 | 23 | 4 | 17 |
Let’s say we then want to overlay a comparison showing benchmarks for 88dd88 as follows:
1 | 2 | -1 | benchmarks | |
---|---|---|---|---|
1 | 15 | 12 | 9 | 20 |
2 | 72 | 8 | 3 | 70 |
3 | 23 | 4 | 17 | 10 |
Our first pass at this might be to generate the benchmark targets in some other system, and hand-enter them into Crunch. To accomplish this, we need to define a comparison. First, we need to define the “bases”: the cube(s) to which our comparison applies, which in our case is just the above cube:
{
"name": "My benchmark",
"bases": [{
"dimensions": [{"variable": "88dd88"}],
"measures": {"count": {"function": "cube_count", "args": []}}
}]
}
Notice, however, that we’ve left out the second dimension. This means that this comparison will be available for any analysis where “88dd88” is the row dimension. The base cube here is a sort of “supercube”: a superset of the cubes to which we might apply the comparison. We include the measure to indicate that this comparison should apply to a “cube_count” (frequency count) involving variable “88dd88”.
Then, we need to define target data. We are supplying these in a hand-generated way, so our measure is simply a static column instead of a function:
{
"overlay": {
"dimensions": [{"variable": "88dd88"}],
"measures": {
"count": {
"column": [20, 70, 10],
"type": {"function": "typeof", "args": [{"variable": "88dd88"}]}
}
}
}
}
Note that our overlay has to have a dimension, too. In this case, we simply re-use variable “88dd88” as the dimension. This ensures that our target data is interpreted with the same category metadata as our base analysis.
We POST the above to datasets/{id}/comparisons/ and can obtain the overlay output at datasets/{id}/comparisons/{comparison_id}/cube/. See the Endpoint Reference for details.
Multitables
GET datasets/{id}/multitables/ HTTP/1.1
200 OK
{
"element": "shoji:catalog",
"index": {
"1/": {"name": "Major demographics"},
"2/": {"name": "Political tendencies"}
}
}
POST datasets/{id}/multitables/ HTTP/1.1
{
"element": "shoji:entity",
"body": {
"name": "Geographical indicators",
"template": [
{
"query": [
{
"variable": "../variables/de85b32/"
}
]
},
{
"query": [
{
"variable": "../variables/398620f/"
}
]
},
{
"query": [
{
"function": "bin",
"args": [
{
"variable": "../variables/398620f/"
}
]
}
]
}
],
"is_public": false
}
}
201 Created
Location: datasets/{id}/multitables/3/
Analyses as described above are truly multidimensional; when you add another variable, the resulting cube obtains another dimension. Sometimes, however, you want to compare analyses side by side, typically looking at several (even all) variables against a common set of conditioning variables. For example, you might nominate “Gender”, “Age”, and “Race” as the conditioning variables and cross every other variable with those, in order to quickly discover common correlations.
Multi-table definitions mainly provide a template
member that clients can use to construct a valid query with the variable(s) of interest.
Crunch provides a separate catalog where you can define and manage these common sets of variables. Like most catalogs, you can GET it to see which multitables are defined.
Template query
A multitable is a set of queries that form groups of ‘columns’ for different later chosen ‘row’ variables.
It is defined by a name and a template. At minimum the template must contain a query
fragment:
this will be later inserted after some function of a row variable to form the dimension
of a result. Each template dimension can currently only be a function of one variable.
GET datasets/{id}/multitable/3/ HTTP/1.1
{
"element": "shoji:entity",
"body": {
"name": "Geographical indicators",
"template": [
{
"query": [
{
"variable": "../variables/de85b32/"
}
]
},
{
"query": [
{
"variable": "../variables/398620f/"
}
]
},
{
"query": [
{
"function": "bin",
"args": [
{
"variable": "../variables/398620f/"
}
]
}
]
}
]
}
}
Each multi-table template may be a list of variable references and other information used to construct the dimension and transform its output.
Transforming analyses for presentation
The transform
member of an analysis specification (or multitable definition)
is a declarative definition of what the dimension should look like after
computation. The cube result dimension itself will always be derived from the
query
part of the request
({variable: $variableId})
, {function: f, args: [$variableId, …]}
, &c.,
after which clients should do what is necessary to arrive at the transformed
result — changing element names, orders, etc.
Structure
A transform
can contain elements
or categoriees
, which is an array of
target transforms for output-dimension elements. Therefore to create a valid
element/category transform
it is generally necessary to make a cube query, inspect
the result dimension, and proceed from there. For categorical and multiple response
variables, elements may also be obtained from the variable entity.
Transforms are designed for variables that are more stable than not, with element ids
that inhere in the underlying elements, such as category or subvariable ids. Dynamic
elements such as results of bin
ning a numeric variable, may not be transformed.
Transformations stored on a variable’s view
are the default transforms for
that variable. They may be shorter, alternate versions of category names, or
contain insertions, described below.
Insertions
In addition to transforming the categories or elements already defined on
a cube ‘dimension’, it is possible to insert headings and subtotals to the
result. These insertions
are attached after an anchor
element/category id.
Insertions are processed last, after renaming, reordering, or sorting elements according to the elements/categories transform specification. They are “attached” to their anchor, always following it in the result; or, simply appended to the end of the result. If the result is sorted by some column’s value, it may make the most sense to choose to display insertions last, rather than inserting them into a result table because their values will not be considered when sorting the non-inserted elements themselves.
An insertion is defined by an anchor and a name, which will be displayed
alongside the names of categories/elements. It may also contain
"function": { "combine": []}
, where array arguments are the id
s of elements
to combine as “subtotals”.
Use an anchor of 0
to indicate an insertion before other results. Any anchor
other than 0
that does not match an id in the elements/categories will be
included at the end of results.
Examples
Consider the following example result dimension:
Name | missing | id |
---|---|---|
Element A | 0 | |
Element B | 1 | |
Element C | 2 | |
Don’t know | 3 | |
Not asked | true | 4 |
An element transform can specify a new order of output elements, new names,
and in the future, bases for hypothesis testing, result sorting, and
aggregation of results. A transform
has elements that look generally like
the dimension’s extent, with some optional properties:
- id: (required) id of the target element/category
- name: name of new target element/category
- sort:
-1
or1
indicating to sort results descending or ascending by this element - compare:
neq
,leq
,geq
indicating to test other rows/columns against the hypothesis that they are ≠, ≤, or ≥ to the present element - hide: suppress this element’s row/column from displaying at all. Defaults to false for valid elements, true for missing, so that if an element is added, it will be present until a transform with
hide: true
is added to suppress it.
A transform
with object members can do lots of things. Suppose we want to put Element C first,
hide the Don’t know, and more compactly represent the result as just C, A, B:
{
"transform": {"categories": [
{"id": 2, "name": "C"},
{"id": 0, "name": "A"},
{"id": 1, "name": "B"},
{"id": 3, "hide": true}
]}
}
Example transform in a saved analysis
In a saved analysis the transforms are an array in display_settings
with the same extents output dimensions (as well as, of course, the query used to generate them). This syntax makes a univariate table of a multiple response variable and re-orders the result.
{
"query": {
"dimensions": [
{
"function": "selections",
"args": [{"variable": "../variables/398620f/"}]
},
{"variable": "../variables/398620f/"}
],
"measures": {
"count": {"function": "cube_count", "args": []}
}
},
"display_settings": {
"transform": {
"categories": [{
"id": "f007",
"value": "My preferred first item"
},
{
"id": "fee7",
"value": "The zeroth response"
},
{
"id": "c001",
"name": "Third response"
}],
"insertions": [
{"anchor": "fee7", "name": "Feet", "function": {"combine": ["f00t", "fee7"]}}
]
}
}
}
Example transform in a multitable template
In a multitable, the transform
is part of each dimension definition object in the template
array.
{
"template": [
{
"query": [
{"variable": "A"}
],
"transform": [{}, {}]
},
{
"query": [
{
"function": "rollup",
"args": [
{"value": "M"},
{"variable": "B"}
]
}
]
}
]
}
More complex multitable templates
The template may contain in addition to variable references and their query arguments, an optional transform
:
To obtain their multiple output cubes, you GET datasets/{id}/cube?query=<q>
where <q>
is a ZCL object in JSON format (which must then be URI encoded for inclusion in the querystring). Use the “each” function to iterate over the overview variables’ query
, producing one output cube for each one as “variable x”. For example, to cross each of the above 3 variables against another variable “449b421”:
{
"function": "each",
"args": [
{
"value": "x"
},
[
{
"variable": "de85b32"
},
{
"variable": "398620f"
},
{
"variable": "c116a77"
}
]
],
"block": {
"function": "cube",
"args": [
[
{
"variable": "449b421"
},
{
"variable": "x"
}
],
{
"map": {
"count": {
"function": "cube_count",
"args": []
}
}
},
{
"value": null
}
]
}
}
The result will be an array of output cubes:
{
"element": "shoji:view",
"value": [
{
"query": {},
"result": {
"element": "crunch:cube",
"dimensions": [
{
"references": "449b421",
"type": "etc."
},
{
"references": "de85b32",
"type": "etc."
}
],
"measures": {
"count": {
"function": "cube_count",
"args": []
}
}
}
},
{
"query": {},
"result": {
"element": "crunch:cube",
"dimensions": [
{
"references": "449b421",
"type": "etc."
},
{
"references": "398620f",
"type": "etc."
}
],
"measures": {
"count": {
"function": "cube_count",
"args": []
}
}
}
},
{
"query": {},
"result": {
"element": "crunch:cube",
"dimensions": [
{
"references": "449b421",
"type": "etc."
},
{
"references": "c116a77",
"type": "etc."
}
],
"measures": {
"count": {
"function": "cube_count",
"args": []
}
}
}
}
]
}
Versioning Datasets
All Crunch datasets keep track of the changes you make to them, from the initial import, through name changes and deriving new variables, to appending new rows. You can review the changes to see who did what and when, revert to a previous version, “fork” a dataset to make a copy of it, make changes to the copy, and merge those changes back into the original dataset.
Actions
The list of changes are available in the dataset/{id}/actions/
catalog. GET it and sort/filter by the “datetime” and/or “user” members as desired. Follow the links to an individual action entity to get exact details about what changed.
Viewing Changes Diff
Through the actions catalog it’s possible to retrieve the differences of a “fork” dataset from its “upstream” dataset.
Two endpoints are provided to do so, the dataset/{id}/actions/since_forking
and the dataset/{id}/actions/upstream_delta
endpoints.
The dataset/{id}/actions/since_forking
endpoint will return the state of the fork and the upstream and the
the list of actions that were performed on the fork since the two diverged.
>>> forkds.actions.since_forking
pycrunch.shoji.View(**{
"self": "https://app.crunch.io/api/datasets/051ebb979db44523822ffe29236a6670/actions/since_forking/",
"value": {
"dataset": {
"modification_time": "2017-02-16T11:01:41.807000+00:00",
"revision": "58a586950183667486130f0c",
"id": "051ebb979db44523822ffe29236a6670",
"name": "My fork"
},
"actions": [
{
"hash": "2a863871-c809-4cad-a20c-9fea86b9e763",
"state": {
"failed": false,
"completed": true,
"played": true
},
"params": {
"variable": "fab0c81d16b442089cc50019cf610961",
"definition": {
"alias": "var1",
"type": "text",
"name": "var1",
"id": "fab0c81d16b442089cc50019cf610961"
},
"dataset": {
"id": "051ebb979db44523822ffe29236a6670",
"branch": "master"
},
"values": [
"sample sentence",
"sample sentence",
"sample sentence",
"sample sentence",
"sample sentence",
"sample sentence",
"sample sentence"
],
"owner_id": null
},
"key": "Variable.create"
}
],
"upstream": {
"modification_time": "2017-02-16T11:01:40.131000+00:00",
"revision": "58a586940183667486130efc",
"id": "2730c0744cba4d7c9acc9f3551380e49",
"name": "My Dataset"
}
},
"element": "shoji:view"
})
GET /api/datasets/5de96a/actions/since_forking HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
Content-Length: 1769
{
"element": "shoji:view",
"value": {
"dataset": {
"modification_time": "2017-02-16T11:01:41.807000+00:00",
"revision": "58a586950183667486130f0c",
"id": "051ebb979db44523822ffe29236a6670",
"name": "My fork"
},
"actions": [
{
"hash": "2a863871-c809-4cad-a20c-9fea86b9e763",
"state": {
"failed": false,
"completed": true,
"played": true
},
"params": {
"variable": "fab0c81d16b442089cc50019cf610961",
"definition": {
"alias": "var1",
"type": "text",
"name": "var1",
"id": "fab0c81d16b442089cc50019cf610961"
},
"dataset": {
"id": "051ebb979db44523822ffe29236a6670",
"branch": "master"
},
"values": [
"sample sentence",
"sample sentence",
"sample sentence",
"sample sentence",
"sample sentence",
"sample sentence",
"sample sentence"
],
"owner_id": null
},
"key": "Variable.create"
}
],
"upstream": {
"modification_time": "2017-02-16T11:01:40.131000+00:00",
"revision": "58a586940183667486130efc",
"id": "2730c0744cba4d7c9acc9f3551380e49",
"name": "My Dataset"
}
}
}
The dataset/{id}/actions/upstream_delta
endpoint usage and response matches the one of
the other endpoint, but the returned actions are instead the ones that were performed
on the upstream since the two diverged.
Savepoints
You can snapshot the current state of the dataset at any time with a POST to
datasets/{id}/savepoints/
. This marks the current point in the actions
history, allowing you to provide a description of your progress.
The response will contain a Location header that will lead to the new version created.
In case creating the new version can be created fast enough a 201 response will be issued, when the new version takes too long a 202 response will be issued and the creation will proceed in background. In case of a 202 response the body will be a Shoji:view containing a progress URL where you may query the progress.
>>> svp = ds.savepoints.create({"body": {"description": "TestSVP"}})
pycrunch.shoji.Entity(**{
"body": {
"creation_time": "2017-05-09T14:18:07.761000+00:00",
"version": "master__000003",
"user_name": "captain-68305620",
"description": "",
"last_update": "2017-05-09T14:18:07.761000+00:00"
},
"self": "http://local.crunch.io:19404/api/datasets/5283e3f4e3d645c0a750c09e854bdcb1/savepoints/6fbe47c97d8e4290a0c09227d6d6b63a/",
"views": {
"revert": "http://local.crunch.io:19404/api/datasets/5283e3f4e3d645c0a750c09e854bdcb1/savepoints/6fbe47c97d8e4290a0c09227d6d6b63a/revert/"
},
"element": "shoji:entity"
})
There is no guarantee that creating a savepoint will lead to a savepoint that points
to the exact revision the dataset was when the POST was issued. This is because
the dataset might have moved forward in the meanwhile. For this reason instead
of reponding with a Location
header that points to an exact savepoint, the
POST savepoints endpoint will respond with Location
header that points to
/progress/{operation_id}/result
URL, which when accessed will
redirect to the nearest savepoint for that revision.
Reverting savepoints
You can revert to any savepoint version (throwing away any changes since that
time) with a POST to /datasets/{dataset_id}/savepoints/{version_id}/revert/
.
It will return a 202 response with a Shoji:view containing a progress URL on its value where the asynchronous job’s status can be observed.
Forking and Merging
A common pattern when collaborating on a dataset is for one person to make changes on their own and then, when all is ready, share the whole set of changes back to the other collaborators. Crunch implements this with two mechanisms: the ability to “fork” a dataset to make a copy, and then “merge” any changes made to it back to the original dataset.
To fork a dataset, POST a new fork entity to the dataset’s forks catalog.
>>> ds.forks.index
{}
>>> forked_ds = ds.forks.create({"body": {"name": "My fork"}}).refresh()
>>> ds.forks.index.keys() == [forked_ds.self]
True
>>> ds.forks.index[forked_ds.self]["name"]
"My fork"
The response will be a 201 response if the fork could happen in the allotted time limit for the request or a 202 if the fork requires too much time and is going to continue in background. Both cases will include a Location header with the URL of the new dataset that has been forked from the current one.
POST /api/datasets/{id}/forks/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
Content-Length: 231
{
"element": "shoji:entity",
"body": {"name": "My fork"}
}
----
HTTP/1.1 201 Created
Location: https://app.crunch.io/api/datasets/{forked_id}/
In case of a 202, in addition to the Location headers with the URL of the fork that is going to be created, the response will contain a Shoji view with the url of the endpoint that can be polled to track fork completion
POST /api/datasets/{id}/forks/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
Content-Length: 231
{
"element": "shoji:entity",
"body": {"name": "My fork"}
}
----
HTTP/1.1 202 Accepted
Location: https://app.crunch.io/api/datasets/{forked_id}/
...
{
"element": "shoji:view",
"value": "/progress/{progress_id}/"
}
The forked dataset can then be viewed and altered like the original; however, those changes do not alter the original until you merge them back with a POST to datasets/{id}/actions/
.
ds.actions.post({
"element": "shoji:entity",
"body": {"dataset": forked_ds.self, "autorollback": True}
})
POST /api/datasets/5de96a/actions/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
Content-Length: 231
{
"element": "shoji:entity",
"body": {
"dataset": {forked ds URL},
"autorollback": true
}
}
----
HTTP/1.1 204 No Content
*or*
HTTP/1.1 202 Accepted
{
"element": "shoji:view",
"self": "https://app.crunch.io/api/datasets/5de96a/actions/",
"value": "https://app.crunch.io/api/progress/912ab3/"
}
The POST to the actions catalog tells the original dataset to replay a set of actions; since we specify a “dataset” url, we are telling it to replay all actions from the forked dataset. Crunch keeps track of which actions are already common between the two datasets, and won’t try to replay those. You can even make further changes to the forked dataset and merge again and again.
Use the “autorollback” member to tell Crunch how to handle merge conflicts. If an action cannot be replayed on the original dataset (typically because it had conflicting changes or has been rolled back), then if “autorollback” is true (the default), the original dataset will be reverted to the previous state before any of the new changes were applied. If “autorollback” is false, the dataset is left to the last action that it could successfully play, which allows you to investigate the problem, repair it if possible (in either dataset as needed), and then POST again to continue the merge from that point.
Per-user settings (filters, decks and slides, variable permissions etc) are copied to the new dataset when you fork. However, changes to them are not merged back at this time. Please reach out to us as you experiment so we can fine-tune which details to fork and merge as we discover use cases.
Merging actions may take a few seconds, in which case the POST to actions/ will return 204 when finished. Merging many or large actions, however, may take longer, in which case the POST will return 202 with a Location header containing the URL of a progress resource.
Filtered Merges
When merging actions it is possible to provide a filter to select which actions should
be replayed from the other dataset. It is currently possible to filter them by key
and by hash
.
When filtering by hash
, only the provided actions will be merged:
ds.actions.post({
"element": "shoji:entity",
"body": {"dataset": forked_ds.self,
"filter": {"hash": ["000003"]}}
})
When filtering by key
, only the actions that are part of that category will be merged:
ds.actions.post({
"element": "shoji:entity",
"body": {"dataset": forked_ds.self,
"filter": {"key": ["Variable.create"]}}
})
Recording the filtered actions
If you know that you are going to merge from the same two datasets multiple times it is possible to tell crunch to remember the filtered actions so that a subsequent merge to the same target won’t try to apply them again if they were skipped in a previous merge.
This behaviour can be changed by providing remember: True
option to the filter,
which means that the filtered actions will be recorded and a subsequent merge won’t
try to apply them to the target if they are not explicitly filtered again.
ds.actions.post({
"element": "shoji:entity",
"body": {"dataset": forked_ds.self,
"remember": True,
"filter": {"key": ["Variable.create"]}}
})
Note that only the actions skipped during this merge are recorded, so the previous example
won’t skipp all the Variable.create
actions forever, but will only remember the action
that was skipped at that time.
Endpoint Reference
Public
/
/public/
{
"views": {
"signup_resend": "https://app.crunch.io/api/public/signup_resend/",
"inquire": "https://app.crunch.io/api/public/inquire/",
"password_reset": "https://app.crunch.io/api/public/password_reset/",
"signup": "https://app.crunch.io/api/public/signup/",
"oauth2redirect": "https://app.crunch.io/api/public/oauth2redirect/",
"change_email": "https://app.crunch.io/api/public/change_email/",
"login": "https://app.crunch.io/api/public/login/",
"config": "https://app.crunch.io/api/public/config/",
"password_change": "https://app.crunch.io/api/public/password_change/"
}
}
Application configuration
GET /public/config/
When accessing Crunch from a configured application via its subdomain:
- https://mycompany.crunch.io/api/public/config/
A GET request on /public/config/
return a Shoji Entity with the
subdomain’s available configurations, if any; if none exists, the body will be empty.
{
"element": "shoji:entity",
"body": {
"name": "Your Company",
"logo": {
"small": "https://s.crunch.io/logos/yours_small.png",
"large": "https://s.crunch.io/logos/yours_large.png"
},
"palette": {
"brand": {
"primary": "#FFAABB",
"secondary": "#G4EEBB",
"message": "#BAA5E7"
}
},
"manifest": {}
}
}
CrunchBox
A CrunchBox represents a snapshot of a crunch dataset. These snapshots are intended for public proliferation and therefore the endpoints for interacting with this data is housed under the unauthed API path.
Share
The share endpoint is for retrieving the HTML code for rendering the share page, complete with the meta data utilized by social sharing platform crawlers in constructing a share-preview. Among this metadata is a url to a preview image of the rendered CrunchBox.
GET /crunchbox/share/ HTTP/1.1
Required parameters for this endpoint:
Parameter | Type | Description |
---|---|---|
data | string | CrunchBox widget url (URL encoded) e.g. “https%3A%2F%2Fs.crunch.io%2Fwidget%2Findex.html%23%2Fds%2Fa1b2c3d4e5f6g7h8%2Frow%2F000001%2Fcolumn%2F000000” (the encoded string of “https://s.crunch.io/widget/index.html#/ds/a1b2c3d4e5f6g7h8/row/000001/column/000000”) |
Optional parameters for this endpoint:
Parameter | Type | Description |
---|---|---|
ref | string | referring url (URL encoded) to pull content from the referring page for inclusion on the CrunchBox share page and provide a link back to the referrer e.g. “http%3A%2F%2Fcrunch.io%2Fcrunching-the-data-of-politics” (the encoded string of “http://crunch.io/crunching-the-data-of-politics”) |
Preview
The preview endpoint is used to preemptively initiate rendering a given CrunchBox configuration to a raster image. This image will be requested by social network platform crawlers during construction of the post share preview. The preview-rendering process can be time-consuming. Therefore, it is preferable to initiate it as soon as is reasonable before a request for the image data.
This endpoint returns no data.
POST /crunchbox/preview/ HTTP/1.1
Parameter | Type | Description |
---|---|---|
data | string | CrunchBox widget url (URL encoded) e.g. “https%3A%2F%2Fs.crunch.io%2Fwidget%2Findex.html%23%2Fds%2Fa1b2c3d4e5f6g7h8%2Frow%2F000001%2Fcolumn%2F000000” (the encoded string of “https://s.crunch.io/widget/index.html#/ds/a1b2c3d4e5f6g7h8/row/000001/column/000000”) |
Accounts
Accounts provide an organization-level scope for Crunch.io customers. All Users belong to one and only one Account. Account managers can administer their various users and entities and have visibility on them.
Permissions
A user is an “account manager” if their account_permissions
have
alter_users
set to True
.
Account entity
The account entity is available on the API root following the Shoji
views.account
path, which will point to the authenticated user’s account.
If the account has a name, it will be available here, as well as the path to the account’s users.
If the authenticated user is an account manager, the response will include paths to these additional catalogs: * Account projects * Account teams * Account datasets
GET
GET /account/
{
"element": "shoji:entity",
"body": {
"name": "Account's name",
"id": "abcd",
"oauth_providers": [{
"id": "provider",
"name": "Service auth"
}, {
"id": "provider",
"name": "Service auth"
}]
},
"catalogs": {
"teams": "http://app.crunch.io/api/account/teams/",
"projects": "http://app.crunch.io/api/account/projects/",
"users": "http://app.crunch.io/api/account/users/",
"datasets": "http://app.crunch.io/api/account/datasets/",
"applications": "http://app.crunch.io/api/account/applications/"
}
}
Applications
GET /account/applications/
GET returns a Shoji Catalog with the list of all the configured subdomains an account has.
{
"element":"shoji:catalog",
"index": {
"./mycompany/": {}
}
}
POST a Shoji Entity here to make a new application. The subdomain
:
- must be unique system-wide, case insensitive
- can only contain letters, numbers, and
-
(dash) - must be between 3 and 32 characters in length
- cannot start with
-
or a number
If the requested subdomain is unavailable or invalid, the server will return a 400 response.
{
"element": "shoji:entity",
"body": {
"name": "my company",
"subdomain": "mycompany",
"palette": {
"brand": {
"primary": "#FFAABB", // Color of links, interactable things
"secondary": "#G4EEBB", // Titles and such
"message": "#BAA5E7"
}
},
"manifest": {}
}
}
Attributes name
and subdomain
are required; palette
and manifest
are optional. Note that you cannot specify logos in the POST request. Use the created entity’s logo/
resource to upload the image files to the
app (see below).
Application entity
GET /account/applications/app_id/
GET this endpoint for a Shoji Entity containing all details about the configured application.
{
"element":"shoji:entity",
"body": {
"name": "Application name",
"subdomain": "mycompany",
"logos": {
"small": "<URL>",
"large": "<URL>",
"favicon": "<URL>"
},
"palette": {
"brand": {
"primary": "#FFAABB", // Color of links, interactable things
"secondary": "#G4EEBB", // Titles and such
"message": "#BAA5E7"
}
},
"manifest": {}
},
"views": {
"logo": "https://app.crunch.io/api/account/applications/mycompany/logo/"
}
}
PATCH this endpoint to change the name, palette, or manifest. Logos are controlled by the logo subresource.
Attribute | Type | Description |
---|---|---|
name | string | Name of the configured application on the given subdomain |
logo | object | Contains two attributes, large , small and favicon , with different resolution company logos |
palette | object | Contains three colors, primary , secondary and message , under the brand attribute to theme the web app |
manifest | object | Optional, contains further client configurations |
Change application logo
POST /account/applications/app_id/logo/
To set/change an application’s logo the client needs to make a multipart/form-data
request containing either or both large
and small
fields containing the desired
image files to use. Only account admins are authorized to change this resource.
POST /account/applications/app_id/logo/ HTTP/1.1
Content-Type: multipart/form-data; boundary=----------123456789
Content-Length: 500326
----------123456789
Content-Disposition: form-data; name="large"; filename="newlogo.jpg"
Content-Type: image/jpeg
xxxxxxxxxx
----------123456789
Content-Disposition: form-data; name="small"; filename="newlogo_small.jpg"
Content-Type: image/jpeg
xxxxxxxxxx
----------123456789--
HTTP/1.1 204
The server will update the images accordingly. The only valid file extensions are GIF, JPEG and PNG image files.
Account users
Provides a catalog of all the users that belong to this account. Any account member can GET, but only account managers can POST/PATCH on it.
GET
GET /account/users/
{
"element": "shoji:catalog",
"index": {
"http://app.crunch.io/api/users/123/": {
"id_method": "pwhash",
"id_provider": null,
"email": "email@example.com",
"name": "Steve Austin",
"dataset_permissions": {
"view": true,
"edit": false
},
"account_permissions": {
"alter_users": false,
"create_datasets": false
}
},
"http://app.crunch.io/api/users/234/": {
"id_method": "pwhash",
"id_provider": null,
"email": "email1@example.com",
"name": "Shawn Michaels",
"dataset_permissions": {
"view": true,
"edit": true
},
"account_permissions": {
"alter_users": true,
"create_datasets": true
}
},
"http://app.crunch.io/api/users/345/": {
"id_method": "oauth",
"id_provider": "google",
"email": "email2@example.com",
"name": "Rocky Maivia",
"dataset_permissions": {
"view": true,
"edit": true
},
"account_permissions": {
"alter_users": false,
"create_datasets": true
}
}
}
}
POST
Account members can POST to the account’s users catalog to create new users. If the a user with the provided email address already exists in the application (on another account), the server will return a 400 response.
POST /account/users/
{
"element": "shoji:entity",
"body": {
"email": "new_email@example.com",
"name": "Initial name",
"account_permissions": {
"alter_users": false,
"create_datasets": true
},
"teams": ["<list of team urls>"],
"projects": ["<list of project urls>"],
"id_method": "pwhash/oauth",
"id_provider": "",
"send_invite": true,
"url_base": "http://app.crunch.io/"
}
}
It is possible to create a user to belong to different teams or projects by including those teams or projects’ urls in the payload, for example:
{
"element": "shoji:entity",
"body": {
"email": "new_email@example.com",
"name": "Initial name",
"account_permissions": {
"alter_users": false,
"create_datasets": true
},
"teams": ["https://app.crunch.io/api/teams/abc/", "https://app.crunch.io/api/teams/123/"],
"projects": ["https://app.crunch.io/api/projects/def/"],
"id_method": "pwhash"
}
}
The teams
and projects
attributes are optional and can be omited or empty
lists.
PATCH
PATCH to the users’ catalog allows account admins to edit users’ permissions
in batch. It is only possible to change the account_permissions
attribute.
Additionally, it is possible to delete users from the account by sending null
as their tuple.
PATCH /account/users/
{
"element": "shoji:catalog",
"index": {
"http://app.crunch.io/api/users/123/": {
"account_permissions": {
"alter_users": false,
"create_datasets": false
}
},
"http://app.crunch.io/api/users/234/": null
}
}
Account datasets
Only account managers have access to this catalog. It is a read only shoji catalog containing all the datasets that users of this account have created (potentially very large catalog).
Account managers have implicit editor access to all the account datasets.
GET /account/datasets/
{
"element": "shoji:catalog",
"index": {
"https://app.crunch.io/api/datasets/cc9161/": {
"owner_name": "James T. Kirk",
"name": "The Voyage Home",
"description": "Stardate 8390",
"archived": false,
"size": {
"rows": 1234,
"columns": 67
},
"is_published": true,
"id": "cc9161",
"owner_id": "https://app.crunch.io/api/users/685722/",
"start_date": "2286",
"end_date": null,
"streaming": "no",
"creation_time": "1986-11-26T12:05:00",
"modification_time": "1986-11-26T12:05:00",
"current_editor": "https://app.crunch.io/api/users/ff9443/",
"current_editor_name": "Leonard Nimoy"
},
"https://app.crunch.io/api/datasets/a598c7/": {
"owner_name": "Spock",
"name": "The Wrath of Khan",
"description": "",
"archived": false,
"size": {
"rows": null,
"columns": null
},
"is_published": true,
"id": "a598c7",
"owner_id": "https://app.crunch.io/api/users/af432c/",
"start_date": "2285-10-03",
"end_date": "2285-10-20",
"streaming": "no",
"creation_time": "1982-06-04T09:16:23.231045",
"modification_time": "1982-06-04T09:16:23.231045",
"current_editor": null,
"current_editor_name": null
}
}
}
Account projects
This catalog is available for account managers and lists all the projects that the users have created. Account managers have implicit edit access on all projects.
GET /account/projects/
{
"element": "shoji:catalog",
"index": {
"https://app.crunch.io/api/projects/cc9161/": {
"name": "Project 1",
"id": "cc9161",
"owner": "http://app.crunch.io/api/users/abcdef/"
},
"https://app.crunch.io/api/projects/a598c7/": {
"name": "Project 2",
"id": "a598c7",
"owner": "http://app.crunch.io/api/users/123456/"
}
}
}
Account teams
This catalog is available for account managers and lists all the teams that the users have created. Account managers have implicit edit access on all teams.
GET /account/teams/
{
"element": "shoji:catalog",
"index": {
"https://app.crunch.io/api/teams/cc9161/": {
"name": "Team 1",
"id": "cc9161",
"owner": "http://app.crunch.io/api/users/123456/"
},
"https://app.crunch.io/api/teams/a598c7/": {
"name": "Team 2",
"id": "a598c7",
"owner": "http://app.crunch.io/api/users/123456/"
}
}
}
Account Collaborators
An account collaborator is a Crunch.io user that is not a member of your account and has access to some/any of your account’s datasets.
Account admins can visit the account’s collaborators catalog to view the list of all collaborators for all datasets of the account.
GET /account/collaborators/
This catalog lists all the users that are not members of the account that have access to any of the account’s datasets, projects or teams.
Each element in the catalog tuple links to the user’s entity endpoint and has the name and email attribute.
{
"element": "shoji:catalog",
"index": {
"https://app.crunch.io/api/users/cc9161/": {
"name": "John doe",
"email": "user1@example.com",
"active": true,
},
"https://app.crunch.io/api/users/a598c7/": {
"name": "John notdoe",
"email": "user2@example.com",
"active": true,
}
}
}
Collaborators order
GET /account/collaborators/order/
It is possible to group collaborators using a Shoji order.
It is possible to PATCH the graph
attribute with a standard shoji order payload
indicating the groups and collaborators (user URLs) for each group.
Collaborators datasets
The full list of datasets a collaborator has access to is available through
its user’s entity endpoint by following the visible_datasets
catalog.
Batches
Catalog
/datasets/{id}/batches/
GET
A GET request on this resource returns a Shoji Catalog enumerating the batches present in the Dataset. Each tuple in the index includes a “status” member, which may be one of “analyzing”, “conflict”, “error”, “importing”, “imported”, or “appended”.
{
"element": "shoji:catalog",
"self": "...datasets/837498a/batches/",
"index": {
"0/": {"status": "appended"},
"2/": {"status": "error"},
"3/": {"status": "importing"}
}
}
POST
A POST to this resource adds a new batch. The request payload can contain (1) the URL of another Dataset, (2) the URL of a Source object, or (3) a Crunch Table definition with variable metadata, row data, or both.
A successful request will return either 201 status, if sufficiently fast, or 202, if the task is large enough to require processing outside of the request cycle. In both cases, the newly created batch entity’s URL is returned in the Location header. The 202 response contains a body with a Progress resource in it; poll that URL for updates on the completion of the append. See Progress.
Batches are created in analyzing
state and will be advanced through importing
, imported
, and appended
states if there are no problems. If there was a problem in processing it, its status will be conflict
or error
. Note that the response status code will always be 202 for asynchronous or 201 for synchronous creation of the batch whether there were conflicts or not. So you need to GET the new batch’s URL to see if the data is good to go (status appended
).
If an append is already in process on the dataset, the POST request will return 409 status.
Appending a dataset
To append a Dataset, POST a Shoji Entity with a dataset URL. You must have at least view (read) permissions on this dataset. Internally, this action will create a Source entity pointing to that dataset.
{
"element": "shoji:entity",
"body": {
"dataset": "<url>"
}
}
The variables from the incoming dataset to be included by default will depend
on the current user’s permissions. Those with edit permissions on the incoming dataset will
append all public and hidden (discarded = true
) variables. Those with only view permissions will
just include public variables that aren’t hidden.
To append only certain variables from the incoming dataset, include an where
attribute in the entity body. See Frame functions for how to compose the where
expression.
{
"element": "shoji:entity",
"body": {
"dataset": "<url>",
"where": {
"function":"select",
"args": [
{"map":
{"000001": {"variable": "<url>"},
"000002": {"variable": "<url>"}}
}
]
}
}
}
Users with edit permissions on the incoming dataset can select hidden variables to be included, but viewers cannot. Editors and viewers can however both specify their personal variables to be included.
To select a subset of rows to append, include an filter
attribute in the entity body, containing a Crunch filter expression.
{
"element": "shoji:entity",
"body": {
"dataset": "<url>",
"where": {
"function":"select",
"args": [
{"map":
{"000001": {"variable": "<url>"},
"000002": {"variable": "<url>"}}
}
]
},
"filter": {
"function":"<",
"args": [
{"variable": "<url>"},
{"value": "<value>"}
]
}
}
}
Appending a source
POST a Shoji Entity with a Source URL. The user must have permission to view the Source entity. Use Source appending to send data in CSV format that matches the schema of the Dataset.
{
"element": "shoji:entity",
"body": {
"source": "<url>"
}
}
Appending a Crunch Table
The variables IDs must match those of the target dataset since their types will be matched based on ID. The data is expected to match the target dataset’s variable types. This action will create a new Source entity, its name and description will match those provided on the JSON response, if not provided they’ll default to empty string.
{
"element": "crunch:table",
"name": "<optional string>",
"description": "<optional string>",
"data": {
"var_url_1": [1, 2, 3, ...],
"var_url_2": ["a", "b", ...]
}
}
Append Failures
For single appends, if a batch fails, the dataset will be automatically reverted back to the state it was before the append; the batch is automatically deleted.
When multiple appends are performed in immediate succession, it’s not efficient to checkpoint the state of each one. In this case, only the first append is rolled back on failure.
Checking if an append will cause problems
/datasets/{id}/batches/compare/
An append cannot proceed if there are any conditions in the involved datasets that will cause ambiguous situations. If such datasets were to be appended the server will return a 409 response.
It is possible to verify these conditions before trying the append using the batches compare endpoint.
GET /datasets/4bc6af/batches/compare/?dataset=http://app.crunch.io/api/datasets/3e2cfb/
The response will contain a conflicts key that can contain either current
,
incoming
or union
depending on the type and location of the problem. The response status
will always be 200, with conflicts, described below, or an empty body.
current
refers to issues find on the dataset where new data would be addedincoming
has issues on the far dataset that contains the new data to addunion
expresses problems on the combined variables(metadata) of the final dataset after append.
{
"union": {...},
"current": {...},
"incoming": {...}
}
A successful response will not contain any of the keys returning an empty object.
{}
The possible keys in the conflicts and verifications made are:
- Variables missing alias: All variables should have a valid alias string. This will indicate the IDs of those that don’t.
- Variables missing name: All variables should have a valid name string. This will indicate the IDs of those that don’t.
- Variables with duplicate alias: In the event of two or more variables sharing an alias, they will be reported here. When this occurs as a union conflict, it is likely that names and aliases of a variable or subvariable in current and incoming are swapped (e.g., VariantOne:AliasOne, Variant1:Alias1 in current but VariantOne:Alias1, Variant1:AliasOne in incoming).
- Variables with duplicate name: Variable names should be unique across non subvariables.
- Subvariable in different arrays per dataset: If a subvariable is used for different arrays that are impossible to match, it will be reported here. User action will be needed to fix this.
For each of these, a list of variable IDs will be made available indicating the conflicting entities. Union conflicting ids generally refer to variables in the current dataset and may be referenced by alias in incoming.
Lining up datasets for append/combine
/datasets/align/
Given that some datasets may be close to being fit for appending but could need
some work before proceeding, the align
endpoint provides API expressions
that can be used directly on the append steps as where
parameter in order
to avoid such conflicts.
Currently, this endpoint will provide an expression that will exclude the troubling variables from the append.
- Exclude different arrays that may share subvariables by alias.
- Exclude variables with matching aliases but different types.
Those are currently not allowed and would reject the append operation.
To use this endpoint, the client needs to provide a list of variables they wish to line up together as a list of lists.
[
[
{"variable": "http://app.crunch.io/api/datasets/abc/variables/123/"},
{"variable": "http://app.crunch.io/api/datasets/def/variables/234/"},
{"variable": "http://app.crunch.io/api/datasets/hij/variables/345/"}
],
[
{"variable": "http://app.crunch.io/api/datasets/abc/variables/678/"},
{"variable": "http://app.crunch.io/api/datasets/def/variables/789/"},
{"variable": "http://app.crunch.io/api/datasets/hij/variables/890/"}
],
[
{"variable": "http://app.crunch.io/api/datasets/abc/variables/1ab/"},
{"variable": "http://app.crunch.io/api/datasets/def/variables/ab2/"},
{"variable": "http://app.crunch.io/api/datasets/hij/variables/b23/"}
]
]
The example above indicates that the client wishes to line up three variables from three datasets as indicated by the groups.
From the input, the endpoint wil analyze the groups and return an expression
which will only include those variables that can be appended without conflict
among all of them. This expression is ready to be used as a where
parameter
on the append /batches/
endpoint.
The payload needs to be sent as JSON encoded variables
POST parameter:
POST /datasets/align/
{
"element": "shoji:entity",
"body": {
"variables": [
[
{"variable": "http://app.crunch.io/api/datasets/abc/variables/123/"},
{"variable": "http://app.crunch.io/api/datasets/def/variables/234/"},
{"variable": "http://app.crunch.io/api/datasets/hij/variables/345/"}
],
[
{"variable": "http://app.crunch.io/api/datasets/abc/variables/678/"},
{"variable": "http://app.crunch.io/api/datasets/def/variables/789/"},
{"variable": "http://app.crunch.io/api/datasets/hij/variables/890/"}
],
[
{"variable": "http://app.crunch.io/api/datasets/abc/variables/1ab/"},
{"variable": "http://app.crunch.io/api/datasets/def/variables/ab2/"},
{"variable": "http://app.crunch.io/api/datasets/hij/variables/b23/"}
]
]}
}
The response will be a 202 with a Progress resource in it;
poll that URL for updates on the completion and follow Location
once it completed.
See Progress.
On completion the align response will be a shoji:view
containing the
where
expression used for each dataset:
{
"element": "shoji:view",
"value": {
"abc": {"function": "select", "args": [{"map": {
"678": {"variable": "678"},
"1ab": {"variable": "1ab"}
}}]},
"def": {"function": "select", "args": [{"map": {
"789": {"variable": "789"},
"ab2": {"variable": "ab2"}
}}]},
"hij": {"function": "select", "args": [{"map": {
"890": {"variable": "890"},
"b23": {"variable": "b23"}
}}]}
}
}
Following the example above, in the case that the first group could not be appended because conflicts between their variables, it will be excluded from the final expressions.
Later, using the expressions obtained, it is possible to append all the datasets to a new one without conflicts.
POST /datasets/abd/batches/
{
"element": "shoji:entity",
"body": {
"dataset": "http://app.crunch.io/api/datasets/abc/",
"where": {"function": "select", "args": [{"map": {
"678": {"variable": "678"},
"1ab": {"variable": "1ab"}
}}]}
}
}
POST /datasets/abd/batches/
{
"element": "shoji:entity",
"body": {
"dataset": "http://app.crunch.io/api/datasets/def/",
"where": {"function": "select", "args": [{"map": {
"789": {"variable": "789"},
"ab2": {"variable": "ab2"}
}}]}
}
}
POST /datasets/abd/batches/
{
"element": "shoji:entity",
"body": {
"dataset": "http://app.crunch.io/api/datasets/hij/",
"where": {"function": "select", "args": [{"map": {
"890": {"variable": "890"},
"b23": {"variable": "b23"}
}}]}
}
}
Entity
/datasets/{id}/batches/{id}/
A GET on this resource returns a Shoji Entity describing the batch, and a link to its Crunch Table (see next).
{
"conflicts": {},
"source_children": {},
"target_children": {},
"source_columns": 3500,
"source_rows": 235490,
"target_columns": 3499,
"target_rows": 120000,
"error": "",
"progress": 100.0,
"source": "<url>",
"status": "appended"
}
The conflicts object
Each batch has a “conflicts” member describing any unresolvable differences found between variables in the two datasets. On a successful append, this object will be empty; if the batch status is “conflict”, the object will contain conflict information keyed by id of the variable in the target dataset. The conflict data for each variable follows this shape:
{
"metadata": {
"name": "<string>",
"alias": "<string>",
"type": "<string>",
"categories": [{}]
},
"source_id": "<id of the matching variable in the source frame",
"source_metadata": {
"name, etc": "as above"
},
"conflicts": [{
"message": "<string>"
}]
}
Each conflict has four attributes: metadata
about the variable on the target dataset (unless it is a variable that only exists on the source dataset), source_id
and source_metadata
, which describe the corresponding variable in the source frame (if any), and a conflicts
member. The conflicts
member contains an array with a list of individual conflicts that indicate what situations were found during batch preparation.
If there are conflicts in your batch, address the conflicting issues in your datasets, DELETE the batch entity from the failed append attempt, and POST a new one.
Table
/datasets/{id}/batches/{id}/table/{?offset,limit}
A GET returns the rows of data from the Dataset for the identified batch as a Crunch Table.
Boxdata
Boxdata is the data that Crunch provides to the CrunchBox for rendering web components that are made publicly available. This endpoint provides a catalog of data that has been precomputed to provide visualizations cubes of json data. Metadata associated with this raw computed data is accessed and manipulated through this endpoint.
Catalog
/datasets/{id}/boxdata/
A Shoji Catalog of boxdata for a given dataset.
GET catalog
When authenticated and authorized to view the given dataset, GET returns 200 status with a Shoji Catalog of boxdata associated with the dataset. If authorization is lacking, response will instead be 404.
Catalog tuples contain the following keys:
Name | Type | Description |
---|---|---|
title | string | Human friendly identifier |
notes | string | Other information relevent for this CrunchBox |
header | string | header information for the CrunchBox |
footer | string | footer information for the CrunchBox |
dataset | string | URL of the dataset associated with the CrunchBox |
filters | object | A Crunch expression indicating which filters to include in the CrunchBox |
where | object | A Crunch expression indicating which variables to include in the CrunchBox. An undefined value is equavilent to specifying all dataset variables. |
creation_time | string | A timestamp of the date when this CrunchBox was created |
{
"element": "shoji:catalog",
"self": "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/boxdata/",
"index": {
"https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/boxdata/44a4d477d70c85da4b8298677e527ad8/": {
"user_id": "00002",
"footer": "This is for the footer",
"notes": "just a couple of variables",
"title": "z and str",
"dataset": "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/",
"header": "This is for the header",
"creation_time": "2017-03-14T00:13:42.024000+00:00",
"filters": {
"function": "identify",
"args": [
{
"filter": [
"https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/filters/da9d86e43381443d9d708dc29c0c6308/",
"https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/filters/80638457c8bd4731990eebdc3baee839/"
]
}
]
},
"where": {
"function": "identify",
"args": [
{
"id": [
"https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/variables/000002/",
"https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/variables/000003/"
]
}
]
},
"id": "44a4d477d70c85da4b8298677e527ad8"
},
"https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/boxdata/75ff1d67ed698e0986f1c1c3daebf9a2/": {
"user_id": "00002",
"title": "xz",
"dataset": "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/",
"filters": null,
"creation_time": "2017-03-14T00:13:42.024000+00:00",
"where": {
"function": "identify",
"args": [
{
"id": [
"https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/variables/000000/"
]
}
]
},
"id": "75ff1d67ed698e0986f1c1c3daebf9a2"
}
}
}
POST catalog
Use POST to create a new datasource for CrunchBox. Note that new boxdata is only created when there is a new combination of where and filter data. If the same variables and filteres are indicated by the POST data, the existing combination will result in a modification of metadata associated with the cube data. This is to keep avoid recomputing analysis needlessly.
A POST to this resource must be a Shoji Entity with the following “body” attributes:
Name | Description |
---|---|
title | Human friendly identifier |
notes | Other information relevent for this CrunchBox |
header | header information for the CrunchBox |
footer | footer information for the CrunchBox |
dataset | URL of the dataset associated with the CrunchBox |
filters | A Crunch expression indicating which filters to include |
where | A Crunch expression indicating which variables to include |
display_settings | Options to customize how it looks and behaves |
{
"element": "shoji:entity",
"body": {
"where": {
"function": "select",
"args": [{
"map": {
"000002": {"variable": "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/variables/000002/"},
"000003": {"variable": "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/variables/000003/"}
}
}]
},
"filters": [
{"filter": "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/filters/da9d86e43381443d9d708dc29c0c6308/"},
{"filter": "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/filters/80638457c8bd4731990eebdc3baee839/"}
],
"force": false,
"title": "z and str",
"notes": "just a couple of variables",
"header": "This is for the header",
"footer": "This is for the footer"
}
}
Display Settings
The display_settings
member of a CrunchBox payload allows you to customize several aspects of how it will be displayed.
A minBaseSize
member will suppress display of values in tables or graphs where the sample size is below a given threshold.
To customize a CrunchBox’s color scheme, you may include an optional palette
member in the display_settings
of the body of the request to create or edit the boxdata. There are four types of customization available.
{"display_settings": {
"minBaseSize": {"value": 50},
"palette": {
"brand": {
"primary": "#111111",
"secondary": "#222222",
"messages": "#333333"
},
"static_colors": ["#444444", "#555555", "#666666"],
"category_lookup": {
"category name": "#aaaaaa",
"another category:": "bbbbbb"
}
}
}}
Brand
The CrunchBox interface uses three colors, named Primary, Secondary, and Messages. By default, these are Crunch brand colors of green, blue, and purple. These are used, for example, as the background colors at the top of the interface and the color of the filter selector.
Static colors
Include an array of static_colors
and every categorical color will be taken from the list in order. If none of your variables have more categories than colors provided here, the generator (below) will never be used, but category lookup will be performed.
Base
If the number of categories exceeds the number of static colors, or no static colors are specified, “base” colors are used to generate a categorical palette. By default, these are also the Crunch green, blue, and purple, and are not overridden by brand
. Each color is interpolated in HCL space from itself to Hue + 100, Lightness + 20; and then colors are ordered to maximize sequential absolute distance in L*a*b* space so adjacent colors can be easily distinguished.
Category Lookup
Finally, you may include an object where keys are exact category names that should always be assigned a specific color. Using semantically resonant colors in this manner is a boon for interpretation and is highly recommended when possible. For example, to ensure that the Green Party is a verdant shade, include a member such as "Green": "#00dd00"
. Building a category lookup list requires some attention to the specific categories in a dataset; they must match exactly, and not partially; to ensure that “Green Party” is also green, include an additional "Green Party"
key with the same value. Lookup values are processed last, replacing erstwhile static or generated colors.
Entity
/datasets/{id}/boxdata/{id}/
This endpoint represents each of the boxdata entities listed in the catalog.
The body of any of the entities is the same as the catalog’s tuple:
GET
Returns the body of the boxdata entity
{
"user_id": "00002",
"footer": "This is for the footer",
"notes": "just a couple of variables",
"title": "z and str",
"dataset": "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/",
"header": "This is for the header",
"filters": {
"function": "identify",
"args": [
{
"filter": [
"https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/filters/da9d86e43381443d9d708dc29c0c6308/",
"https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/filters/80638457c8bd4731990eebdc3baee839/"
]
}
]
},
"where": {
"function": "identify",
"args": [
{
"id": [
"https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/variables/000002/",
"https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/variables/000003/"
]
}
]
},
"id": "44a4d477d70c85da4b8298677e527ad8"
}
DELETE
Deletes the boxdata entity. Returns 204.
Comparisons
Entity
/datasets/{id}/comparisons/{id}/
A Shoji Entity with the following “body” attributes:
Name | Type | Description |
---|---|---|
name | string | |
bases | array of cube input objects | one for each analysis to which the comparison applies |
overlay | cube | input object defining the comparison data |
See the Feature Guide for a discussion of the cube objects. POST one to the catalog (see below) to create a new comparison. GET to retrieve the complete Entity. PUT a new one to replace it. PATCH a subset of the attributes as desired. DELETE to remove the comparison.
The Entity also includes a “cube” link in its “catalogs” object; a GET on this link returns the output of the overlay cube. See “Cube” next.
Cube
/datasets/{id}/comparisons/{id}/cube/
A GET on this endpoint returns the output of the “overlay” cube query for the given comparison. The response will be a Crunch Cube with “dimensions” and “measures” members.
Catalog
/datasets/{id}/comparisons/
A Shoji Catalog of comparison entities associated to the specified dataset.
GET catalog
When authenticated and authorized to view the given dataset, GET returns 200 status with a Shoji Catalog of the dataset’s comparisons. If authorization is lacking, response will instead be 404.
Catalog tuples contain the following keys:
Name | Type | Description |
---|---|---|
name | string | Human-friendly string identifier |
bases | array of cube input objects | References to the dimensions and measures for which the comparison is valid |
cube | URL | Link to generate the comparison data |
The catalog looks something like this:
{
"element": "shoji:catalog",
"self": "https://app.crunch.io/api/datasets/5ee0a0/comparisons/",
"specification": "https://app.crunch.io/api/specifications/comparisons/",
"description": "List of the comparisons for this dataset",
"index": {
"491fe3/": {
"name": "All actors",
"bases": [{
"dimensions": [{"variable": "../variables/0f7378/"}, {"variable": "../variables/8451cb/"}],
"measures": {"count": {"function": "cube_count", "args": []}}
}],
"cube": "491fe3/cube/"
},
"9942ce/": {
"name": "Awareness: sector average",
"bases": [{
"dimensions": [{"variable": "../variables/bf31fc/"}],
"measures": {"count": {"function": "cube_count", "args": []}}
}],
"cube": "9942ce/cube/"
}
}
}
PATCH catalog
Use PATCH to edit the “name” and/or “bases” of one or more comparisons. A successful request returns a 204 response.
Authorization is required: you must have “edit” privileges on the dataset, as shown in the “permissions” object in the dataset’s catalog tuple. If you try to PATCH and are not authorized, you will receive a 403 response and no changes will be made.
Because this catalog contains its entities rather than collecting them, do not PATCH to add or delete comparisons. POST to the catalog to create new comparisons, and DELETE individual comparison entities.
POST catalog
Use POST to add a new comparison entity to the catalog. A 201 indicates success and includes the URL of the newly-created comparison in the Location header.
Datasets
Datasets are the primary containers of statistical data in Crunch. Datasets contain a collection of variables, with which analyses can be composed, saved, and exported. These analyses may include filters, which users can define and persist. Users can also share datasets with each other.
Datasets are comprised of one or more batches of data uploaded to Crunch, and additional batches can be appended to datasets. Similarly, variables from other datasets can be joined onto a dataset.
As with other objects in Crunch, references to the set of dataset entities are exposed in a catalog. This catalog can be organized and ordered.
Catalog
GET
GET /datasets/ HTTP/1.1
library(crunch)
login()
# Upon logging in, a GET /datasets/ is done automatically, to populate:
listDatasets() # Shows the names of all datasets you have
listDatasets(refresh=TRUE) # Refreshes that list (and does GET /datasets/)
# To get the raw Shoji object, should you need it,
crGET("https://app.crunch.io/api/datasets/")
{
"element": "shoji:catalog",
"self": "https://app.crunch.io/api/datasets/",
"catalogs": {
"by_name": "https://app.crunch.io/api/datasets/by_name/{name}/"
},
"views": {
"search": "https://app.crunch.io/api/datasets/search/"
},
"orders": {
"order": "https://app.crunch.io/api/datasets/order/"
},
"specification": "https://app.crunch.io/api/specifications/datasets/",
"description": "Catalog of Datasets that belong to this user. POST a Dataset representation (serialized JSON) here to create a new one; a 201 response indicates success and returns the location of the new object. GET that URL to retrieve the object.",
"index": {
"https://app.crunch.io/api/datasets/cc9161/": {
"owner_name": "James T. Kirk",
"name": "The Voyage Home",
"description": "Stardate 8390",
"archived": false,
"permissions": {
"edit": false,
"change_permissions": false,
"view": true
},
"size": {
"rows": 1234,
"columns": 67
},
"is_published": true,
"id": "cc9161",
"owner_id": "https://app.crunch.io/api/users/685722/",
"start_date": "2286",
"end_date": null,
"streaming": "no",
"creation_time": "1986-11-26T12:05:00",
"modification_time": "1986-11-26T12:05:00",
"current_editor": "https://app.crunch.io/api/users/ff9443/",
"current_editor_name": "Leonard Nimoy"
},
"https://app.crunch.io/api/datasets/a598c7/": {
"owner_name": "Spock",
"name": "The Wrath of Khan",
"description": "",
"archived": false,
"permissions": {
"edit": true,
"change_permissions": true,
"view": true
},
"size": {
"rows": null,
"columns": null
},
"is_published": true,
"id": "a598c7",
"owner_id": "https://app.crunch.io/api/users/af432c/",
"start_date": "2285-10-03",
"end_date": "2285-10-20",
"streaming": "no",
"creation_time": "1982-06-04T09:16:23.231045",
"modification_time": "1982-06-04T09:16:23.231045",
"current_editor": null,
"current_editor_name": null
}
},
"template": "{\"name\": \"Awesome Dataset\", \"description\": \"(optional) This dataset is awesome because I made it, and you can do it too.\"}"
}
GET /datasets/
When authenticated, GET returns 200 status with a Shoji Catalog of datasets to which the authenticated user has access. Catalog tuples contain the following attributes:
Name | Type | Default | Description |
---|---|---|---|
name | string | Required. The name of the dataset | |
description | string | “” | A longer description of the dataset |
id | string | The dataset’s id | |
archived | bool | false | Whether the dataset is “archived” or active |
permissions | object | {"edit": false} |
Authorizations on this dataset; see Permissions |
owner_id | URL | URL of the user entity of the dataset’s owner | |
owner_name | string | “” | That user’s name, for display |
size | object | {"rows": 0, "columns": 0, "unfiltered_rows": 0} |
Dimensions of the dataset |
creation_time | ISO-8601 string | Datetime at which the dataset was created in Crunch | |
modification_time | ISO-8601 string | Datetime of the last modification for this dataset globally | |
start_date | ISO-8601 string | Date/time for which the data in the dataset corresponds | |
end_date | ISO-8601 string | End date/time of the dataset’s data, defining a start_date:end_date range | |
streaming | string | Possible values “no”, “finished” and “streaming” to enable/disable streaming | |
current_editor | URL or null | URL of the user entity that is currently editing the dataset, or null if there is no current editor |
|
current_editor_name | string or null | That user’s name, for display | |
is_published | boolean | true | Indicates if the dataset is published to viewers or not |
Drafts
A dataset marked as is_published: false
can only be accessed by dataset editors.
They will still be available on the catalog for all shared users but API clients
should know to display these to the appropriate users.
The is_published
flag of a dataset can be changed by editors from the catalog or
directly on the dataset entity.
PATCH
PATCH /api/datasets/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
Content-Length: 231
{
"element": "shoji:catalog",
"index": {
"https://app.crunch.io/api/datasets/a598c7/": {
"description": "Stardate 8130.4"
}
}
}
HTTP/1.1 204 No Content
library(crunch)
login()
# Dataset objects contain information from
# the catalog tuple and the dataset entity.
# Editing attributes by <- assignment will
# PATCH or PUT the right payload to the
# right place--you don't have to think about
# catalogs and entities.
ds <- loadDataset("The Wrath of Khan")
description(ds)
## [1] ""
description(ds) <- "Stardate 8130.4"
description(ds)
## [1] "Stardate 8130.4"
# If you needed to touch HTTP more directly,
# you could:
payload <- list(
`https://app.crunch.io/api/datasets/a598c7/`=list(
description="Stardate 8130.4"
)
)
crPATCH("https://app.crunch.io/api/datasets/",
body=toJSON(payload))
PATCH /datasets/
Use PATCH to edit the “name”, “description”, “start_date”, “end_date”, or “archived” state of one or more datasets. A successful request returns a 204 response. The attributes changed will be seen by all users with access to this dataset; i.e., names, descriptions, and archived state are not merely attributes of your view of the data but of the datasets themselves.
Authorization is required: you must have “edit” privileges on the dataset(s) being modified, as shown in the “permissions” object in the catalog tuples. If you try to PATCH and are not authorized, you will receive a 403 response and no changes will be made.
The tuple attributes other than “name”, “description”, and “archived” cannot be modified here by PATCH. Attempting to modify other attributes, or including new attributes, will return a 400 response. Changing permissions is accomplished by PATCH on the permissions catalog, and changing the owner is a PATCH on the dataset entity. The “owner_name” and “current_editor_name” attributes are modifiable, assuming authorization, by PATCH on the associated user entity. Dataset “size” is a cached property of the data, changing only if the number of rows or columns in the dataset change. Dataset “id”, “modification_time” and “creation_time” are immutable/system generated.
When PATCHing, you may include only the keys in each tuple that are being modified, or you may send the complete tuple. As long as the keys that cannot be modified via PATCH here are not modified, the request will succeed.
Note that, unlike other Shoji Catalog resources, you cannot PATCH to add new datasets, nor can you PATCH a null tuple to delete them. Attempting either will return a 400 response. Creating datasets is allowed only by POST to the catalog, while deleting datasets is accomplished via a DELETE on the dataset entity.
Changing ownership
Any changes to the ownership of a dataset need to be done by the current editor.
Only the dataset owner can change the ownership to another user. This can be done by PATCH request with the new owners’ email of API URL. The new owner must have advanced permissions on Crunch.
Other editors of the dataset can change the ownership of a dataset only to a Project as long as they andthe current owner of the dataset are both editors on such project.
POST
POST /api/datasets/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
Content-Length: 88
{
"element": "shoji:entity",
"body": {
"name": "Trouble with Tribbles",
"description": "Stardate 4523.3"
}
}
HTTP/1.1 201 Created
Location: https://app.crunch.io/api/datasets/223fd4/
library(crunch)
login()
# To create just the dataset entity, you can
ds <- createDataset("Trouble with Tribbles",
description="Stardate 4523.3")
# More likely, you'll have a data.frame or
# similar object in R, and you'll want to send
# it to Crunch. To do that,
df <- read.csv("~/tribbles.csv")
ds <- newDataset(df, name="Trouble with Tribbles",
description="Stardate 4523.3")
POST /datasets/
POST a JSON object to create a new Dataset; a 201 indicates success, and the returned Location header refers to the new Dataset resource.
The body must contain a “name”. You can also include a Crunch Table in a “table” key, as discussed in the Feature Guide. The full set of possible attributes to include when POSTing to create a new dataset entity are:
Name | Type | Description |
---|---|---|
name | string | Human-friendly string identifier |
description | string | Optional longer string |
archived | boolean | Whether the variable should be hidden from most views; default: false |
owner | URL | Provide a project URL to set the owner to that project; if omitted, the authenticated user will be the owner |
notes | string | Blank if omitted. Optional notes for the dataset |
start_date | date | ISO-8601 formatted date with day resolution |
end_date | date | ISO-8601 formatted date with day resolution |
streaming | string | Only “streaming”, “finished” and “no” available values to define if a dataset will accept streaming data or not |
is_published | boolean | If false, only project editors will have access to this dataset |
weight_variables | array | Contains aliases of weight variables to start this dataset with; variables must be numeric type. |
table | object | Metadata definition for the variables in the dataset |
maintainer | URL | User URL that will be the maintainer of this dataset in case of system notifications; if omitted, the authenticated user will be the maintainer |
settings | object | Settings object containing weight , viewers_can_export , viewers_can_change_weight , viewers_can_share , dashboard_deck , and/or min_base_size attributes. If a “weight” is specified, it will be automatically added to “weight_variables” if not already specified there. |
Other catalogs
In addition to /datasets/
, there are a few other catalogs of datasets in the API:
Team datasets
/teams/{team_id}/datasets/
A Shoji Catalog of datasets that have been shared with this team. These datasets are not included in the primary dataset catalog. See teams for more.
Project datasets
/projects/{project_id}/datasets/
A Shoji Catalog of datasets that belong to this project. These datasets are not included in the primary dataset catalog. See projects for more.
Filter datasets by name
/datasets/by_name/{dataset_name}/
The by_name
catalog returns (on GET) a Shoji Catalog that is a subset of
/datasets/
where the dataset name matches the “dataset_name” value. Matches
are case sensitive.
Verbs other than GET are not supported on this subcatalog. PATCH and POST at the primary dataset catalog.
Dataset order
The dataset order allows each user to organize the order in which their datasets are presented.
This endpoint returns a shoji:order
. Like all shoji orders, it may not contain
all available datasets. The catalog should always be the authoritative source
of available datasets.
Any dataset not present on the order graph should be considered to be at the bottom of the root list in arbitrary order.
GET
GET /datasets/{dataset_id}/order/
{
"element": "shoji:order",
"self": "/datasets/{dataset_id}/order/",
"graph": [
"dataset_url",
{"group": [
"dataset_url"
]}
]
}
PUT
Receives a complete shoji:order
payload and replaces the existing graph
with the new one.
It cannot contain dataset references that are not in the dataset catalog, else the API will return a 400 response.
Standard shoji:order
graph validation will apply.
PATCH
Same semantics as PUT
Entity
GET
GET /datasets/{dataset_id}/
URL Parameters
Parameter | Description |
---|---|
dataset_id | The id of the dataset |
Dataset attributes
Name | Type | Default | Description |
---|---|---|---|
name | string | Required. The name of the dataset | |
description | string | “” | A longer description of the dataset |
notes | string | “” | Additional information you want to associate with this dataset |
id | string | The dataset’s id | |
archived | bool | false | Whether the dataset is “archived” or active |
permissions | object | {"edit": false} |
Authorizations on this dataset; see Permissions |
owner_id | URL | URL of the user entity of the dataset’s owner | |
owner_name | string | “” | That user’s name, for display |
size | object | {"rows": 0, "unfiltered_rows", "columns": 0} |
Dimensions of the dataset |
creation_time | ISO-8601 string | Datetime at which the dataset was created in Crunch | |
start_date | ISO-8601 string | Date/time for which the data in the dataset corresponds | |
end_date | ISO-8601 string | End date/time of the dataset’s data, defining a start_date:end_date range | |
streaming | string | Possible values are “no”, “finished” and “streaming” to determine if a dataset is streamed or not | |
current_editor | URL or null | URL of the user entity that is currently editing the dataset, or null if there is no current editor |
|
current_editor_name | string or null | That user’s name, for display | |
maintainer | URL | The URL of the dataset maintainer. Will always point to a user | |
app_settings | object | {} |
A place for API clients to store values they need per dataset; It is recommended that clients namespace their keys to avoid collisions |
Dataset catalogs
A dataset contains a number of catalog resources that contain collections of
related objects. They are available under the catalogs
attribute of the
dataset Shoji entity.
{
"batches": "http://app.crunch.io/api/datasets/c5d751/batches/",
"joins": "http://app.crunch.io/api/datasets/c5d751/joins/",
"parent": "http://app.crunch.io/api/datasets/",
"variables": "http://app.crunch.io/api/datasets/c5d751/variables/",
"actions": "http://app.crunch.io/api/datasets/c5d751/actions/",
"savepoints": "http://app.crunch.io/api/datasets/c5d751/savepoints/",
"filters": "http://app.crunch.io/api/datasets/c5d751/filters/",
"multitables": "http://app.crunch.io/api/datasets/c5d751/multitables/",
"comparisons": "http://app.crunch.io/api/datasets/c5d751/comparisons/",
"forks": "http://app.crunch.io/api/datasets/c5d751/forks/",
"decks": "http://app.crunch.io/api/datasets/c5d751/decks/",
"permissions": "http://app.crunch.io/api/datasets/c5d751/permissions/"
}
Catalog name | Resource |
---|---|
batches | Returns all the batches (successful and failed) used for this dataset. See Batches. |
joins | Contains the list of all datasets joined to the current dataset. See Joins. |
parent | Indicates the catalog where this dataset is found (project or main dataset catalog) |
variables | Catalog of all public variables of this dataset. See Variables. |
actions | All actions executed on this dataset |
savepoints | Lists the saved versions for this dataset. See Versions. |
filters | Makes available the public and user-created filters. See Filters. |
multitables | Similar to filters, displays all available multitables. See Multitables |
comparisons | Contains all available comparisons. See Comparisons. |
forks | Returns all the forks created from this dataset |
decks | The list of all decks on this dataset for the authenticated user |
permissions | Returns the list of all users and teams with access to this dataset. See Permissions. |
PATCH
PATCH /datasets/{dataset_id}/
See above about PATCHing the dataset catalog for all attributes duplicated on the entity and the catalog. You may PATCH those attributes on the entity, but you are encouraged to PATCH the catalog instead. The two attributes appearing on the entity and not the catalog, “notes” is modifiable by PATCH here.
A successful PATCH request returns a 204 response. The attributes changed will be seen by all users with access to this dataset; i.e., names, descriptions, and archived state are not merely attributes of your view of the data but of the datasets themselves.
Authorization is required: you must have “edit” privileges on this dataset. If you try to PATCH and are not authorized, you will receive a 403 response and no changes will be made. If you have edit permissions but are not the current editor of this dataset, PATCH requests of anything other than “current_editor” will respond with 409 status. You will need first to PATCH to make yourself the current editor and then proceed to make the desired changes.
When PATCHing, you may include only the keys that are being modified, or you may send the complete entity. As long as the keys that cannot be modified via PATCH here are not modified, the request will succeed.
Changing dataset ownership
If you are the current editor of a dataset you can change its owner by PATCHing
the owner
attribute witht he URL of the new owner.
Only Users, Teams or Projects can be set as owners of a dataset.
- Users: New owner needs to be advanced users to be owner of a dataset.
- Teams: Authenticated user needs to be a member of the team.
- Projects: Authenticated user needs to have edit permissions on the project.
Copying over from another dataset
In the needed case to copy over the work from another dataset to the
current one, it is possible to issue a PATCH request with the copy_from
attribute pointing to the URL of the source dataset to use.
{
"element": "shoji:entity",
"body": {
"copy_from": "https://app.crunch.io/api/datasets/1234/"
}
}
All dataset attributes, permissions, derivations, private variables, etc will be brought over to the current dataset:
- Decks
- Filters
- Multitables
- Comparisons
- Personal variable order
- Derived variables
- Personal variables
- Permissions
The response will be a shoji:entity
containing as a body an object with
keys for each entity type that uas not been copied. In the case of variables
these entities will display their name, alias and owner (if personal).
All the URLs will refer to entities on the source dataset.
{
"element": "shoji:entity",
"body": {
"variables": {
"https://app.crunch.io/dataset/1234/variables/abc/": {
"name": "Variable name",
"alias": "Variable alias",
"owner_url": "https://app.crunch.io/users/qwe/",
"owner_name": "Angus MacGyver"
},
"https://app.crunch.io/dataset/1234/variables/cde/": {
"name": "Variable name",
"alias": "Variable alias",
"owner_url": null,
"owner_name": null
}
},
"filters": {
"https://app.crunch.io/filters/abcd/": {
"name": "filter name",
"owner_url": "http://app.crunch.io/users/qwe/"
},
"http://app.crunch.io/filters/cdef/": {
"name": "filter name",
"owner_url": "https://app.crunch.io/users/qwe/"
}
}
}
}
It is possible to copy information only for one user from another dataset,
the payload will need the extra user
key. It can contain either a user URL
or a user email:
{
"element": "shoji:entity",
"body": {
"copy_from": "https://app.crunch.io/api/datasets/1234/",
"user": "https://app.crunch.io/api/users/abcd/"
}
}
DELETE
DELETE /datasets/{dataset_id}/
With sufficient authorization, a successful DELETE request removes the dataset from the Crunch system and responds with 204 status.
Views
Applied filters
Cube
/datasets/{id}/cube/?q
See Multidimensional Analysis.
Export
GET `/datasets/{id}/export/` HTTP/1.1
Host: app.crunch.io
GET returns a Shoji View of available dataset export formats.
{
"element": "shoji:view",
"self": "https://app.crunch.io/api/datasets/223fd4/export/",
"views": {
"spss": "https://app.crunch.io/api/datasets/223fd4/export/spss/",
"csv": "https://app.crunch.io/api/datasets/223fd4/export/csv/"
}
}
A POST request on any of the export views will return 202 status with a Progress response in the body and a Location header pointing to the location of the exported file to be downloaded. Poll the progress URL for status on the completion of the export. When complete, GET the Location URL from the original response to download the file.
POST `/api/datasets/f2364cc66e604d63a3be3e8811fc902f/export/spss/` HTTP/1.1
{
"where": {
"function": "select",
"args":[
{
"map": {
"https://app.crunch.io/api/datasets/f2364cc66e604d63a3be3e8811fc902f/variables/000000/": {"variable": "https://app.crunch.io/api/datasets/f2364cc66e604d63a3be3e8811fc902f/variables/000000/"},
"https://app.crunch.io/api/datasets/f2364cc66e604d63a3be3e8811fc902f/variables/000001/": {"variable": "https://app.crunch.io/api/datasets/f2364cc66e604d63a3be3e8811fc902f/variables/000001/"},
"https://app.crunch.io/api/datasets/f2364cc66e604d63a3be3e8811fc902f/variables/000002/": {"variable": "https://app.crunch.io/api/datasets/f2364cc66e604d63a3be3e8811fc902f/variables/000002/"}
}
}
]
}
}
HTTP/1.1 202 Accepted
Content-Length: 176
Access-Control-Allow-Methods: OPTIONS, AUTH, POST, GET, HEAD, PUT, PATCH, DELETE
Access-Control-Expose-Headers: Allow, Location, Expires
Content-Encoding: gzip
Location: https://crunch-io.s3.amazonaws.com/exports/dataset_exports/f2364cc66e604d63a3be3e8811fc902f/My_Dataset.sav?Signature=sOmeSigNaTurE%3D&Expires=1470265052&AWSAccessKeyId=SOMEKEY
To export a subset of the dataset, instead perform a POST request and include a JSON body with an optional “filter” expression for the rows and a “where” attribute to specify variables to include.
Attribute | Description | Example |
---|---|---|
filter | A Crunch filter expression defining a filter for the given export | {"function": "==", "args": [{"variable": "000000"}, {"value": 1}]} |
where | A Crunch expression defining which variables to export. Refer to Frame functions for the available functions to here. | {"function": "select", "args": [{"map": {"000000": {"variable": 000000"}}}]} |
options | An object of extra settings, which may be format-specific. See below. | {"use_category_ids": true} |
See “Expressions” for more on Crunch expressions.
The following rules apply for all formats:
- The dataset’s exclusion filter will be applied; however, any of the user’s personal “applied filters” are not, unless they are explicitly included in the request.
- Hidden/discarded variables are not exported unless editors use a
where
clause, then it will be evaluated over all non hidden variables. - Personal (private) variables are not exported unless indicated, then only the current user’s personal variables will be exported
- Variables (columns) will be ordered in a flattened version of the dataset’s hierarchical order.
- Derived variables will be exported with their values, without their functional links.
Some format-specific properties and options:
Format | Attribute | Description | Default |
---|---|---|---|
csv | use_category_ids | Export categorical data as its numeric IDs instead of category names? | false |
csv | missing_values | If present, will use the specified string to indicate missing values. If omitted, will use the missing reason strings | omitted |
csv | header_field | Use the variable’s alias/name/description in the CSV header row, or null for no header row |
“alias” |
spss | var_label_field | Use the variable’s name/description as SPSS variable label | “description” |
spss | prefix_subvariables | Prefix subvariable labels with the parent array variable’s label? | false |
all | include_personal | Include the user’s personal variables in the exported file? | false |
SPSS
Categorical-array and multiple-response variables will be exported as “mrsets”, as supported by SPSS. If
the prefix_subvariables
option is set to true
, then the subvariables’ labels will be prefixed with the
parent’s label.
To pick which variable field to use on the label
field on the SPSS variables, use the var_label_field
in the
options
attribute in the POST body. The only valid fields are description
and name
.
CSV
By default, categorical variable values will be exported using the category name and missing values will use their corresponding reason string for all variables.
The missing values will be exported with their configured missing reason in
the CSV file. If specified on the missing_values
export option, then all
missing values on all columns will use such string instead of the reason.
To control the output of the header row, use the header_field
option. Valid values for this option are:
- alias (default)
- name
- description
null
- Sendingnull
will make the resulting CSV without a header row.
Refer to the options described on the table above for the csv
format to change this behavior.
Match
The match endpoint provides a list of matches indicating which variables match amongst the datasets provided. To use it, send a post request representing an ordered list of datasets you would like to match. Include the “minimum_matches” parameter in your graph if you would like to limit the output of the matches based on the number of datasets matching. The default minim_matches is 2. Currently, only alias is utilized to match the variables to one another.
The result of a match endpoint request can be one of two things. If the same match has been completed previously, the api with return a 201 status code and a Location header to the existing results. Otherwise, the endpoint will return a 202 status code, with a Progress result that provides status information as the match is completed. Either request will result in the location header being set to the URI for staticly generated comparison result that can be accessed with the match is completed.
The results are a Shoji Entity with an attribute matches
. The matches are listed by order of the
number of variables matched. Each variable inside the matches will contain the dataset, the variable id and the confidence
that the variable matches the others in the list. The order of the variables inside the matches returned will match
the order of the datasets provided. The first variable will also contain some additional information
to allow previewing a match. To retrieve complete details about all the matching variables the endpoints
listed in metadata
field can be called, those provide all the matching metadata chunked by groups of matches.
POST /datasets/match/ HTTP/1.1
{
"element": "shoji:entity",
"body": {
"datasets": [
"http://app.crunch.io/api/datasets/8274bf/",
"http://app.crunch.io/api/datasets/699a33/",
"http://app.crunch.io/api/datasets/8274bf/",
"http://app.crunch.io/api/datasets/699a33/"
],
"minimum_matches": 3
}
}
Response:
201 Created
Host: app.crunch.io
Location: http://app.crunch.io/api/datasets/matches/394d9e/
GET /api/datasets/matches/394d9e/
{
"element": "shoji:order",
"self": "http://app.crunch.io:50976/api/datasets/match/3c7df5/",
"body": {
"matches": [
[
{
"alias": "SomeVariable",
"confidence": 1,
"name": "Some Variable",
"variable": "521b5c014e1e474fa5173d95000bd6e9",
"desc": "This is some variable",
"dataset": "8274bfb842d645728a49634414b999c4"
},
{
"variable": "3fa1d3358888474eb949ae586e80f9a4",
"confidence": 1,
"dataset": "699a3315c3f347d4923257380938f9b9"
}
],
[
{
"alias": "AnotherVariableThatHasMatches",
"confidence": 1,
"name": "Another Variable",
"variable": "234e8e76d0e1a32667ab33bc30a9900",
"desc": "This is another variable",
"dataset": "8274bfb842d645728a49634414b999c4"
},
{
"variable": "9373729ac990b009e0a90dca99092789",
"confidence": 1,
"dataset": "699a3315c3f347d4923257380938f9b9"
}
],
...
],
"metadata": [
"http://app.crunch.io/api/datasets/match/3c7df5/0-500/"
]
}
}
Summary
/datasets/{id}/summary/{?filter}
Query Parameters
Parameter | Description |
---|---|
filter | A Crunch filter expression |
GET returns a Shoji View with summary information about this dataset containing its number of rows (weighted and unweighted, with and without your applied filters), as well as the number of variables and columns. The column count will differ from the variable count when derived and array variables are present–these variable types don’t necessarily have their own columns of d ata behind them. The column count is useful for estimating load time and file size when exporting.
If a filter
is included, the “filtered” counts will be with respect to that
expression. If omitted, your applied filters will be used.
{
"element": "shoji:view",
"self": "https://app.crunch.io/api/datasets/223fd4/summary/",
"value": {
"unweighted": {
"filtered": 2000,
"total": 2000
},
"weighted": {
"filtered": 2000.0,
"total": 2000.0
},
"variables": 529,
"columns": 530
}
}
Fragments
Table
State
Exclusion
/datasets/{id}/exclusion/
Exclusion filters allow you to drop rows of data without permanently deleting them.
GET on this resource returns a Shoji Entity with a filter “expression” attribute in its body. Rows that match the filter expression will be excluded from all views of the data.
PATCH the “expression” attribute to modify. An empty “expression” object, like
{"body": {"expression": {}}}
, is equivalent to “no exclusion”, i.e. no rows
are dropped.
Stream
Stream lock
When a dataset is configured to receive streaming data, the /stream/ endpoint will accept POST requests to append new rows to the streaming queue.
A dataset is able to receive streaming data while its streaming
attribute is
set to streaming
.
While a dataset is receiving streams, any other kind of append is disabled returning 409 if attempted. Only streaming data is allowed.
The following operations are forbidden on a dataset while it is accepting streaming rows in order to protect the schema.
- Deleting public non derived variables
- Casting variables (Includes changing resolution on datetime variables)
- Changing variable aliases
- Deleting categories from categorical variables
- Changing ID of category IDs
- Removing subvariables from arrays
- Merging forks
- Reverting to savepoints
- Modifying the Primary Key, once it has been set
To change the streaming configuration of the dataset, PATCH the entity’s
streaming
attribute to either streaming
, finished
or no
according
to the following table:
Value | Allows schema changes | Accepts streaming rows |
---|---|---|
streaming |
No | Yes |
finished |
No | No |
no |
Yes | No |
Note that only the dataset maintainer is allowed to modify the streaming
attribute.
Sending rows
/datasets/{id}/stream/
Stream allows for sending data to a dataset as it is gathered.
GET on this resource returns a Shoji Entity with two attributes in its body:
{
"element": "shoji:entity",
"self": "https://app.crunch.io/api/datasets/223fd4/stream/",
"description": "A stream for this Dataset. Each stream acts as a write buffer, from which Sources are periodically made and appended as Batches to the owning Dataset.",
"body":{
"pending_messages": 1,
"received_messages": 8
}
}
Attribute | Description |
---|---|
pending_messages | The number of messages the stream has that have yet to be appended to the dataset (note: a message might contain more than one row, each POST that is made to /datasets/{id}/stream/ will result in a single message). |
received_messages | The total number of messages that this stream has received. |
POST to this endpoint to add rows. The payload should be a multi line string where each line contains a json representation of objects indicating the value for each variable keyed by alias.
{"alias1": 1, "alias2": "value", "alias3": 0}
{"alias1": 99, "alias2": "other", "alias3": 2}
{"alias1": 10, "alias2": "empty", "alias3": 1}
Settings
/datasets/{id}/settings/
The dataset settings allow editors to store dataset wide permissions and configurations for it.
Will always return all the available settings with default values a dataset can have.
{
"element": "shoji:entity",
"self": "https://app.crunch.io/api/datasets/223fd4/settings/",
"body": {
"viewers_can_export": false,
"viewers_can_change_weight": false,
"viewers_can_share": true,
"weight": "https://app.crunch.io/api/datasets/223fd4/variables/123456/"
}
}
To make changes, clients should PATCH the settings they wish to change with new values. Additional settings are not allowed, the server will return a 400 response.
Setting | Description |
---|---|
viewers_can_export | When false, only editor can export; else, all users with view access can export the data |
viewers_can_change_weight | When true, all users with access can set their own personal weight; else, the editor configured weight will be applied to all without option to change |
viewers_can_share | When true, all users with access can share the dataset with other users or teams; Defaults to True |
weight | Default initial weight for all new users on this dataset, and when viewers_can_change_weight is false, this variable will be the always-applied weight for all viewers of the dataset. |
dashboard_deck | When set, points to a deck that will become publicly visible and be used as dashboard by the web client |
Preferences
/datasets/{id}/preferences/
The dataset preferences provide API clients with a key/value store for settings or customizations each would need for each user.
By default, all dataset preferences start out with only a weight
key
set to null
, unless otherwise set. Clients can PATCH to add additional attributes.
{
"element": "shoji:entity",
"self": "https://app.crunch.io/api/datasets/223fd4/preferences/",
"body": {
"weight": null
}
}
To delete attributes from the preferences resources, PATCH them with null
.
Preferences are unordered; clients should not assume that they are ordered.
Weight
If the dataset has viewers_can_change_weight
setting set to false, then
all users’ preferences weight
will be set to the dataset wide configured
weight without option to change it. Attempts to modify it will return a 403
response.
Primary key
/datasets/{dataset_id}/pk/
URL Parameters
Parameter | Description |
---|---|
dataset_id | The id of the dataset |
Setting a primary key on a dataset causes updates (particularly streamed updates) mentioning existing rows to be updated instead of new rows being inserted. A primary key can only be set on a variable that is type “numeric” or “text” and that has no duplicate or missing values, and it can only be set after that variable has been added to the dataset.
GET
GET /api/datasets/{dataset_id}/pk/ HTTP/1.1
Host: app.crunch.io
--------
200 OK
Content-Type:application/json;charset=utf-8
{
"element": "shoji:entity",
"body": {
"pk": ["https://app.crunch.io/api/datasets/{dataset_id}/variables/000001/"],
}
}
>>> # "ds" is dataset via pycrunch
>>> ds.pk.body.pk
['https://app.crunch.io/api/datasets/{dataset_id}/variables/000001/']
GET /datasets/{dataset_id}/pk/
GET on this resource returns a Shoji Entity. It contains one body key: pk
,
which is an array. The “pk” member indicates the URLs of the variables
in the dataset which comprise the primary key. If there is no primary key for
this dataset, the pk
value will be []
.
POST
POST /api/datasets/{dataset_id}/pk/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
Content-Length: 15
{"pk": ["https://app.crunch.io/api/datasets/{dataset_id}/variables/000001/"]}
--------
204 No Content
>>> # "ds" is dataset via pycrunch
>>> ds.pk.post({'pk':['https://app.crunch.io/api/datasets/{dataset_id}/variables/000001/']})
>>> ds.pk.body.pk
['000001']
POST /datasets/{dataset_id}/pk/
When POSTing, set the body to a JSON object containing the key “pk” to modify
the primary key. The “pk” key should be a list containing zero or more variable URLs.
The variables referenced must be either text or numeric type
and must have no duplicate or missing values. Setting pk to []
is
equivalent to deleting the primary key for a dataset.
DELETE
DELETE /api/datasets/{dataset_id}/pk/ HTTP/1.1
Host: app.crunch.io
--------
204 No Content
>>> # "ds" is dataset via pycrunch
>>> ds.pk.delete()
>>> ds.pk.body.pk
[]
DELETE /datasets/{dataset_id}/pk/
DELETE the “pk” resource to delete the primary key for this dataset. Upon success, this method returns no body and a 204 response code.
Catalogs
Users
/datasets/{dataset_id}/users/
This catalog exposes the full list of users that have access to the dataset via the different sources:
- When the dataset belongs to a project, as project members
- Members of teams that are shared with the dataset
- Direct shares to specific users
This endpoint only supports GET, the response will be a catalog with each user as member with the tuple indicating the coalesced permissions and information about the type of access:
Attribute | Description |
---|---|
name | Name of the user |
Email of the user | |
teams | URLs of teams with dataset access this user belongs to |
last_accessed | Timestamp of last access to dataset via web app |
project_member | If dataset is part of a project and this user too |
coalesced_permissions | Permissions this user has to this access, combining all sources |
{
"https://app.crunch.io/api/users/411aa32a075b4b57bf25a4ace1baf920/": {
"name": "Jean-Luc Picard",
"last_accessed": "2017-02-25T00:00:00+00:00",
"teams": [
"https://app.crunch.io/api/teams/c6dbeb7c57e34dd08ab2316f3363e895/",
"https://app.crunch.io/api/teams/d0abf4e933fc44e38190247ae4d593f9/"
],
"project_member": false,
"email": "jeanluc@crunch.io",
"coalesced_permissions": {
"edit": true,
"change_permissions": true,
"view": true
}
},
"https://app.crunch.io/api/users/60f18c51699b4ba992721197743286a4/": {
"name": "William Riker",
"last_accessed": null,
"teams": [
"https://app.crunch.io/api/teams/d0abf4e933fc44e38190247ae4d593f9/"
],
"project_member": false,
"email": "number1@crunch.io",
"coalesced_permissions": {
"edit": false,
"change_permissions": false,
"view": true
}
},
"https://app.crunch.io/api/users/80d89e4e876344ecb46c528a910e3877/": {
"name": "Geordi La Forge",
"last_accessed": "2017-01-31T00:00:00+00:00",
"teams": [
"https://app.crunch.io/api/teams/c6dbeb7c57e34dd08ab2316f3363e895/",
"https://app.crunch.io/api/teams/d0abf4e933fc44e38190247ae4d593f9/"
],
"project_member": true,
"email": "geordilf@crunch.io",
"coalesced_permissions": {
"edit": true,
"change_permissions": true,
"view": true
}
}
}
Actions
Batches
/datasets/{dataset_id}/batches/
See Batches and the feature guides for importing and appending.
Decks
/datasets/{dataset_id}/decks/
See Decks.
Comparisons
Filters
/datasets/{dataset_id}/filters/
See Filters.
Forks
Joins
Multitables
Permissions
/datasets/{dataset_id}/permissions/
See Permissions.
Savepoints
/datasets/{dataset_id}/savepoints/
See Versions.
Variables
/datasets/{dataset_id}/variables/
See Variables.
Weight variables
Decks
Decks allow you to store analyses for future reference or for export. Decks correspond to a single dataset, and they are personal to each user unless they have been set as “public”. Each deck contains a list of slides, and each slide contains analyses.
Catalog
/datasets/{id}/decks/
GET
A GET request on the catalog endpoint will return all the decks available for this dataset for the current user. This includes decks created by the user, as well as public decks shared with all users of the dataset.
{
"element": "shoji:catalog",
"self": "https://app.crunch.io/api/datasets/223fd4/decks/",
"index": {
"https://app.crunch.io/api/datasets/cc9161/decks/4fa25/": {
"name": "my new deck",
"creation_time": "1986-11-26T12:05:00",
"id": "4fa25",
"is_public": false,
"owner_id": "https://app.crunch.io/api/users/abcd3/",
"owner_name": "Real Person",
"team": null
},
"https://app.crunch.io/api/datasets/cc9161/decks/2b53e/": {
"name": "Default deck",
"creation_time": "1987-10-15T11:45:00",
"id": "2b53e",
"is_public": true,
"owner_id": "https://app.crunch.io/api/users/4cba5/",
"owner_name": "Other Person",
"team": "https://app.crunch.io/api/teams/58acf7/"
}
},
"order": "https://app.crunch.io/api/datasets/223fd4/decks/order/"
}
The decks catalog tuples contain the following keys:
Name | Type | Description |
---|---|---|
name | string | Human-friendly string identifier |
creation_time | timestamp | Time when this deck was created |
id | string | Global unique identifier for this deck |
is_public | boolean | Indicates whether this is a public deck or not |
owner_id | url | Points to the owner of this deck |
owner_name | string | Name of the owner of the deck (referred by owner_id ) |
team | url | If the deck is shared through a team, it will point to it. null by default |
To determine if a deck belongs to the current user, check the owner_id
attribute.
POST
POST a shoji:entity to create a new deck for this dataset. The only required body attribute is “name”; other attributes are optional.
{
"element": "shoji:entity",
"self": "https://app.crunch.io/api/datasets/223fd4/decks/",
"body": {
"name": "my new deck",
"description": "This deck will contain analyses for a variable",
"is_public": false,
"team": "https://app.crunch.io/api/teams/58acf7/"
}
}
HTTP/1.1 201 Created
Location: https://app.crunch.io/api/datasets/223fd4/decks/2b3c5e/
The shoji:entity
POSTed accepts the following keys
Name | Type | required | Description
—- | —- | ———–
name | string | Yes | Human-friendly string identifier
description | string | No | Optional longer string with additional notes
is_public | boolean | No | If true
, all users with view access to this dataset will be able to read and export this deck and its analyses; if false
, the default value, the deck remains private for the current user only.
team | url | No | If set means that all members of this team will have read-only access to this deck. Else private or dataset-public.
PATCH
It is possible to bulk-edit many decks at once by PATCHing a shoji:catalog to the decks’ catalog.
{
"element": "shoji:catalog",
"index": {
"https://app.crunch.io/api/datasets/cc9161/decks/4fa25/": {
"name": "Renamed deck",
"is_public": true
}
},
"order": "https://app.crunch.io/api/datasets/223fd4/decks/order/"
}
The following attributes are editable via PATCHing this resource:
- name
- description
- is_public
For decks that the current user owns, “name”, “description” and “is_public” are editable. Only the deck owner can edit the mentioned attributes on a deck even if the deck is public. Other deck attributes are not editable and will respond with 400 status if the request tries to change them.
On success, the server will reply with a 204 response.
Entity
/datasets/{id}/decks/{id}/
GET
GET a deck entity resource to return a shoji:entity with all of its attributes:
{
"element": "shoji:entity",
"self": "https://app.crunch.io/api/datasets/223fd4/decks/223fd4/",
"body": {
"name": "Presentation deck",
"id": "223fd4",
"creation_time": "1987-10-15T11:45:00",
"description": "Explanation about the deck",
"is_public": false,
"owner_id": "https://app.crunch.io/api/users/abcd3/",
"owner_name": "Real Person",
"team": "https://app.crunch.io/api/teams/58acf7/"
}
}
Name | Type | Description |
---|---|---|
name | string | Human-friendly string identifier |
id | string | Global unique identifier for this deck |
creation_time | timestamp | Time when this deck was created |
description | string | Longer annotations for this deck |
is_public | boolean | Indicates whether this is a public deck or not |
owner_id | url | Points to the owner of this deck |
owner_name | string | Name of the owner of the deck (referred by owner_id ) |
team | url | If the deck is shared through a team, it will point to it. null by default |
PATCH
To edit a deck, PATCH it with a shoji:entity. The server will return a 204 response on success or 400 if the request is invalid.
{
"element": "shoji:entity",
"self": "https://app.crunch.io/api/datasets/223fd4/decks/223fd4/",
"body": {
"name": "Presentation deck",
"id": "223fd4",
"creation_time": "1987-10-15T11:45:00",
"description": "Explanation about the deck",
"team": "https://app.crunch.io/api/teams/58acf7/"
}
}
HTTP/1.1 204 No Content
For deck entities that the current user owns, “name”, “description”, “teams” and “is_public” are editable. Other deck attributes are not editable.
DELETE
To delete a deck, DELETE the deck’s entity URL. On success, the server returns a 204 response.
Order
/datasets/{id}/decks/order/
The deck order resource allows the user to arrange how API clients, such as the web application, will present the deck catalog. The deck order contains all decks that are visible to the current user, both personal and public. Unlike many other
shoji:order
resources, this order does not allow grouping or nesting: it
will always be a flat list of slide URLs.
GET
Returns a Shoji Order response.
{
"element": "shoji:order",
"self": "https://app.crunch.io/api/datasets/223fd4/decks/order/",
"graph": [
"https://app.crunch.io/api/datasets/223fd4/decks/1/",
"https://app.crunch.io/api/datasets/223fd4/decks/2/",
"https://app.crunch.io/api/datasets/223fd4/decks/3/"
]
}
PATCH
PATCH the order resource to change the order of the decks. A 204 response indicates success.
If the PATCH payload contains only a subset of available decks, those decks not referenced will be appended at the bottom of the top level graph in arbitrary order.
{
"element": "shoji:order",
"self": "https://app.crunch.io/api/datasets/223fd4/decks/order/",
"graph": [
"https://app.crunch.io/api/datasets/223fd4/decks/1/",
"https://app.crunch.io/api/datasets/223fd4/decks/3/"
]
}
Including invalid URLs, such as URLs of decks that are not present in the catalog, will return a 400 response from the server.
The deck order should always be a flat list of URLs. Nesting or grouping is not supported by the web application. Server will return a 400 response if the order supplied in the PATCH request has nesting.
Slides
Each deck contains a catalog of slides into which analyses are saved.
Catalog
/datasets/{id}/decks/{deck_id}/slides/
GET
Returns a shoji:catalog
with the slides for this deck.
{
"element": "shoji:catalog",
"self": "https://app.crunch.io/api/datasets/123/decks/123/slides/",
"orders": {
"flat": "https://app.crunch.io/api/datasets/123/decks/123/slides/flat/"
},
"specification": "https://app.crunch.io/api/specifications/slides/",
"description": "A catalog of the Slides in this Deck",
"index": {
"https://app.crunch.io/api/datasets/123/decks/123/slides/123/": {
"analysis_url": "https://app.crunch.io/api/datasets/123/decks/123/slides/123/analyses/123/",
"subtitle": "z",
"display": {
"value": "table"
},
"title": "slide 1"
},
"https://app.crunch.io/api/datasets/123/decks/123/slides/456/": {
"analysis_url": "https://app.crunch.io/api/datasets/123/decks/123/slides/456/",
"subtitle": "",
"display": {
"value": "table"
},
"title": "slide 2"
}
}
}
Each tuple on the slides catalog contains the following keys:
Name | Type | Description |
---|---|---|
analysis_url | url | Points to the first (and typically only) analysis contained on this slide |
title | string | Optional title for the slide |
subtitle | string | Optional subtitle for the slide |
display | object | Stores settings used to load the analysis |
POST
To create a new slide, POST a slide body to the slides catalog. It is necessary to include at least one analysis on the new slide.
The body should contain an analyses
attribute that contains an array with
one or many analyses bodies as described in the below section,
should be wrapped as a shoji:entity.
On success, the server returns a 201 response with a Location header containing the URL of the newly created slide entity with its first analysis.
{
"title": "New slide",
"subtitle": "Variable A and B",
"analyses": [
{
"query": {},
"query_environment": {},
"display_settings": {}
},
{
"query": {},
"query_environment": {},
"display_settings": {}
}
]
}
On each analysis, only a query
field is required to create a new slide; other attributes are optional.
Slide attributes:
Name | Type | Description |
---|---|---|
title | string | Optional title for the slide |
subtitle | string | Optional subtitle for the slide |
Analysis attributes:
Name | Type | Description |
---|---|---|
query | object | Contains a valid analysis query, required |
subtitle | string | Optional subtitle for the slide |
display_settings | object | Contains a set of attribtues to be interpreted by the client to render and export the analysis |
query_environment | object | Contains the weight and filter applied during the analysis, they will be applied up on future evaluation/render/export |
Old format
It is possible to create slides with one single initial analysis by POSTing an analysis body directly to the slides catalog. It will create a slide automatically with the new analysis on it:
{
"title": "New slide",
"subtitle": "Variable A and B",
"query": {},
"query_environment": {},
"display_settings": {}
}
PATCH
It is possible to bulk-edit several slides at once by PATCHing a shoji:catalog to this endpoint.
The only editable attributes with this method are:
- title
- subtitle
Other attributes should be considered read-only.
Submitting invalid attributes or references to other slides results in a 400 error response.
To edit the first or any of the slide’s analyses query attributes it is necessary to PATCH the individual analysis entity.
Entity
/datasets/223fd4/decks/slides/a126ce/
Each slide in the Slide Catalog contains reference to its first analysis.
GET
{
"element": "shoji:entity",
"self": "/api/datasets/123/decks/123/slides/123/",
"catalogs": {
"analyses": "/api/datasets/123/decks/123/slides/123/analyses/"
},
"description": "Returns the detail information for a given slide",
"body": {
"deck_id": "123",
"subtitle": "z",
"title": "slide 1",
"analysis_url": "/api/datasets/123/decks/123/slides/123/analyses/123/",
"display": {
"value": "table"
},
"id": "123"
}
}
DELETE
Perform a DELETE request on the Slide entity resource to delete the slide and its analyses.
PATCH
It is possible to edit a slide entity by PATCHing with a shoji:entity.
The editable attributes are:
- title
- subtitle
The other attributes are considered read-only.
Order
/datasets/223fd4/decks/slides/flat/
The owner of the deck can specify the order of its slides. As with deck order, the slide order must be a flat list of slide URLs.
GET
Returns the list of all the slides in the deck.
{
"element": "shoji:order",
"self": "/api/datasets/123/decks/123/slides/flat/",
"description": "Order of the slides on this deck",
"graph": [
"/api/datasets/123/decks/123/slides/123/",
"/api/datasets/123/decks/123/slides/456/"
]
}
PATCH
To make changes to the order, a client should PATCH the full shoji:order
resource to the endpoint with the new order on its graph
attribute.
Any slide not mentioned on the payload will be added at the end of the graph in arbitrary order.
{
"element": "shoji:order",
"self": "/api/datasets/123/decks/123/slides/flat/",
"description": "Order of the slides on this deck",
"graph": [
"/api/datasets/123/decks/123/slides/123/",
"/api/datasets/123/decks/123/slides/456/"
]
}
This is a flat order: grouping or nesting is not allowed. PATCHing with a nested order will generate a 400 response.
Analysis
Each slide contains one or more analyses. An analysis – a table or graph with some specific combination of variables defining measures, rows, columns, and tabs; settings such as percentage direction and decimal places – can be saved to a deck, which can then be exported, or the analysis can be reloaded in whole in the application or even exported as a standalone embeddable result.
Catalog
/api/datasets/123/decks/123/slides/123/analyses/
POST
To create multiple analyses on a slide, clients should POST analyses to the slide’s analyses catalog.
{
"query": {
"dimensions" : [],
"measures": {}
},
"query_environment": {
"filter": [
{"filter": "<url>"},
{"function": "expression", "args": [], "name": "(Optional)"}
],
"weight": "url"
},
"display_settings": {
"decimalPlaces": {
"value": 0
},
"percentageDirection": {
"value": "colPct"
},
"vizType": {
"value": "table"
},
"countsOrPercents": {
"value": "percent"
},
"uiView": {
"value": "expanded"
}
}
}
The server will return a 201 response with the new slide created. In case of invalid analysis attributes, a 400 response will be returned indicating the problems.
PATCH
It is possible to delete many analyses at once from the catalog sending
null
as their tuple. It is not possible to delete all the analysis
from a slide. For that it is necessary to delete the slide itself.
{
"/api/datasets/123/decks/123/slides/123/analyses/1/": null,
"/api/datasets/123/decks/123/slides/123/analyses/2/": {}
}
A 204 response will be returned on success.
Order
As analyses get added to a slide, they will be stored on a
shoji:order
resource.
Like other order resources, it will expose a graph
attribute that contains
the list of created analyses having new ones added at the end.
If an incomplete set of analyses is sent to the graph, the missing analyses will be added in arbitrary order.
This is a flat order and does not allow nesting.
Entity
An analysis is defined by a query, query environment, and display settings.
To save an analysis, POST
these to a deck as a new slide.
Display settings can be anything a client may need to reproduce the view of the
data returned from the query. The settings the Crunch web client uses are shown
here, but other clients are free to store other attributes as they see fit.
Display settings should be objects with a value
member.
{
"query": {
"dimensions" : [],
"measures": {}
},
"query_environment": {
"filter": [
{"filter": "<url>"},
{"function": "expression", "args": [], "name": "(Optional)"}
],
"weight": "url"
},
"display_settings": {
"decimalPlaces": {
"value": 0
},
"percentageDirection": {
"value": "colPct"
},
"vizType": {
"value": "table"
},
"countsOrPercents": {
"value": "percent"
},
"uiView": {
"value": "expanded"
}
}
}
Name | Description |
---|---|
query | Includes the query body for this analysis |
query_environment | An object with a weight and filters to be used for rendering/evaluating this analysis |
display_settings | An object containing client specific instructions on how to recreate the analysis |
PATCH
To edit an analysis, PATCH its URL with a shoji:entity.
The editable attributes are:
- query
- query_environment
- display_settings
Providing invalid values for those attributes or extra attributes will be rejected with a 400 response from the server.
DELETE
It is possible to delete analyses from a slide as long as there is always one analysis left.
Attempting to delete the last analysis of a slide will cause a 409 response from the server indicating the problem.
Filters
Catalog
/datasets/{id}/filters/
GET on this resource returns a Shoji Catalog with the list of Filters that the current user can use on this Dataset.
This index contains two kinds of filters: public and private, denoted by the is_public
tuple attribute. Private filters are those created by the authenticated user, and they cannot be accessed by other users. Public filters are available to all users who are authorized to view the dataset.
{
"name": "My filter",
"is_public": true,
"id": "1442ea",
"owner_id": "https://app.crunch.io/api/users/4152de/",
"team": "https://app.crunch.io/api/teams/680abc/"
}
The only tuple attribute editable via PATCHing the catalog is the “name”. A 204 response indicates a successful PATCH. Attempting to PATCH any other attribute will return a 400 response.
POST a Shoji Entity to this catalog to create a new filter. Entities must include a name
and an
expression
. If omitted, is_public
defaults to False. A successful POST yields a 201 response
that will contain a Location header with the URL of the newly created filter.
All users with access to the dataset can create private filters; however, only the current dataset editor can create public filters (is_public: true
). Attempting to create a public filter when not the current dataset editor results in a 403 response.
Entity
/datasets/{id}/filters/{id}/
GET this resource to return a Shoji Entity containing the requested filter.
{
"element": "shoji:entity",
"self": "https://app.crunch.io/api/datasets/ac64ef/filters/1442ea/",
"body": {
"id": "1442ea",
"name": "My filter",
"is_public": true,
"expression": {},
"last_update": "2015-12-31",
"creation_time": "2015-11-12T12:34:56",
"team": "https://app.crunch.io/api/teams/680abc/"
}
}
PATCH an entity to edit its expression
, name
, team
or is_public
attributes.
Successful PATCH requests return 204 status. As with the POSTing new entities to the catalog, only the dataset’s current editor can alter a filter.
The expression
attribute must contain a valid Crunch filter expression.
The team
attribute will point to the team this filter is shared with, in case
it isn’t shared with any teams, it will default to null
.
See expressions in the Object Reference for more details.
Applied filters
/datasets/{id}/filters/applied/
A Shoji order containing the filters applied by the current user.
{
"element": "shoji:order",
"self": "http://app.crunch.io/api/datasets/ac64ef/filters/applied/",
"graph": [
"http://app.crunch.io/api/datasets/ac64ef/filters/28ef72/",
"http://app.crunch.io/api/datasets/ac64ef/filters/0ac6e1/",
]
}
PUT the applied endpoint to change the which filters are applied for other operations. The graph parameter indicates which filters are applied. Successful PUT requests return 204 status.
Filter Order
GET /datasets/{id}/filters/order/
A Shoji order containing the persisted filter order.
{
"element": "shoji:order",
"self": "http://app.crunch.io/api/datasets/ac64ef/filters/order/",
"graph": [
"http://app.crunch.io/api/datasets/ac64ef/filters/28ef72/",
"http://app.crunch.io/api/datasets/ac64ef/filters/0ac6e1/",
]
}
PATCH the order to change the order of the filters. The graph parameter indicates the order. Private filters are not included in the order. Any filters that are missing are appended to the end of the order. Successful PATCH requests return 204 status.
Filtering endpoints
Some endpoints will support filtering, they will accept a filter
GET parameter
that can be a JSON encoded object that can contain either the URL of a filter
(available through the Filters catalog) or a filter expression or a filter URL.
To filter using a filter URL using JSON pass in an object as the filter
parameter:
{
"filter": "http://app.crunch.io/api/datasets/ac64ef/filters/28ef72/"
}
GET /datasets/id/summary/?filter=%7B%22filter%22%3A%22http%3A%2F%2Fapp.crunch.io%2Fapi%2Fdatasets%2Fac64ef%2Ffilters%2F28ef72%2F%22%7D HTTP/1.1
It is also possible to send straight filter URLs without a JSON wrapping:
GET /datasets/id/summary/?filter=http%3A%2F%2Fapp.crunch.io%2Fapi%2Fdatasets%2Fac64ef%2Ffilters%2F28ef72%2F HTTP/1.1
Or multiple filters that will be ANDed together
GET /datasets/id/summary/?filter=http%3A%2F%2Fapp.crunch.io%2Fapi%2Fdatasets%2Fac64ef%2Ffilters%2F28ef72%2F&filter=http%3A%2F%2Fapp.crunch.io%2Fapi%2Fdatasets%2Fac64ef%2Ffilters%2F28ef72%2F HTTP/1.1
To filter using a filter expression, pass a Crunch filter expression as the
filter
parameter, like:
{
"function": "==",
"args": [
{"variable": "http://app.crunch.io/api/datasets/ac64ef/variables/aae3c2/"},
{"value": 1}
]
}
GET /datasets/id/summary/?filter=%7B%22function%22%3A%22%3D%3D%22%2C%22args%22%3A%5B%7B%22variable%22%3A%22http%3A%2F%2Fapp.crunch.io%2Fapi%2Fdatasets%2Fac64ef%2Fvariables%2Faae3c2%2F%22%7D%2C%7B%22value%22%3A1%7D%5D%7D HTTP/1.1
Filter expressions can be combined with filter URLs to make reference to other filters, like so:
{
"function": "and",
"args": [
{
"filter": "http://app.crunch.io/api/datasets/ac64ef/filters/28ef72/"
},
{
"function": "==",
"args": [
{"variable": "http://app.crunch.io/api/datasets/ac64ef/variables/aae3c2/"},
{"value": 1}
]
}
]
}
GET /datasets/id/summary/?filter=%7B%22function%22%3A+%22and%22%2C+%22args%22%3A+%5B%7B%22filter%22%3A+%22http%3A%2F%2Fapp.crunch.io%2Fapi%2Fdatasets%2Fac64ef%2Ffilters%2F28ef72%2F%22%7D%2C+%7B%22function%22%3A+%22%3D%3D%22%2C+%22args%22%3A+%5B%7B%22variable%22%3A+%22http%3A%2F%2Fapp.crunch.io%2Fapi%2Fdatasets%2Fac64ef%2Fvariables%2Faae3c2%2F%22%7D%2C+%7B%22value%22%3A+1%7D%5D%7D%5D%7D HTTP/1.1
Geodata
Geodata allow you to associate a variable with features in a FeatureCollection of geojson or topojson.
Catalog
/geodata/
GET
Crunch maintains a few geojson/topojson resources and publishes them on CDN. GET the catalog https://app.crunch.io/api/geodata/ for an index of available geographies, each of which then includes a location to download the actual geojson or topojson.
{
"element": "shoji:catalog",
"self": "https://app.crunch.io/api/geodata/",
"index": {
"https://app.crunch.io/api/geodata/7ae898e210b04a9a8992314452c6677b/": {
"description": "use properties.name or properties.postal-code",
"created": "2016-07-08T16:33:44.601000+00:00",
"name": "US States GeoJSON Name + Postal Code",
"location": "https://s.crunch.io/geodata/leafletjs/us-states.geojson",
"id": "7ae898e210b04a9a8992314452c6677b"
}
}
}
The geodata catalog tuples contain the following keys:
Name | Type | Description |
---|---|---|
name | string | Human-friendly string identifier |
created | timestamp | Time when the item was created |
id | string | Global unique identifier for this deck |
location | uri | Location of crunch-curated geojson/topojson file. Users may need to inspect this actual file to learn about details of the FeatureCollection and individual Features. |
description | string | Any additional information about the geodatum |
metadata | object | Information regarding the actual data provided by the location. For now, the properties in the geodata features are extracted for the purpose of matching geodata to variable categories. |
Entity
GET
GET /geodata/{geodata_id}/
Crunch maintains a few geojson/topojson resources and publishes them on CDN.
Most of their properties, with the exception of metadata
, are present on the catalog
tuple, described above; metadata is an open field but may be populated at creation time
by a Crunch utility that extracts and aggregates across features of geojson and topojson
resources. For other formats, users may supply relevant metadata for the geodatum resource.
{
"element": "shoji:entity",
"self": "https://app.crunch.io/api/geodata/7ae898e210b04a9a8992314452c6677b/",
"body": {
"description": "use properties.name or properties.postal-code",
"created": "2016-07-08T16:33:44.601000+00:00",
"name": "US States GeoJSON Name + Postal Code",
"location": "https://s.crunch.io/geodata/leafletjs/us-states.geojson",
"id": "7ae898e210b04a9a8992314452c6677b",
"metadata": {
"status": "success",
"properties": {
"postal-code": [
"AL",
"AK",
"AZ", "etc."
],
"name": [
"Alabama",
"Arkansas",
"Alaska", "etcetera"
]
}
}
}
}
DELETE
DELETE /geodata/{geodata_id}/
Deletes the geodata entity. Returns 204.
Geodata for common applications
- https://app.crunch.io/api/geodata/7ae898e210b04a9a8992314452c6677b/
US States –
Use
properties.name
orproperties.postal-code
as yourfeature_key
depending on the variable (state name or abbreviation), orid
is FIPS code. - https://app.crunch.io/api/geodata/8f9f5fed101042c4815d2dd1fd248cec/
World –
properties
include ISO3166name
as well as ISO3166-1 Alpha-3abbrev
- https://app.crunch.io/api/geodata/d878d8471090417fa361536733e5f176/
UK Regions –
properties.EER13NM
matches a YouGov stylization of United Kingdom region names.
Creating new public Geodatum
Users with permission to create datasets can also create geodata, although in practice Crunch curates and makes available many common geographies, listed in the geodata catalog. Note that geodata created outside of the Crunch domain (ie without a .crunch.io domain in the URL) will not be available in whaam due to browser constraints. If you would like to make your geodatum public and have Crunch serve it, please contact us!
Adding a new geodatum is as easy as POSTing it to the geodata catalog, most easily via pycrunch. Crunch will attempt to download the geodata file and analyze the properties present on the features (generally polygons), which can then be associated with Crunch variables. The metadata extraction and summary can help you align variables and select the right property to associate with your Crunch geographic variable by category name.
Include a format
member in the payload (on post or patch) to trigger automatic metadata extraction. The server will
fetch and aggregate properties from FeatureCollections in order to provide hints for eventual consumers of the Crunch
geodatum. The automatic feature extractor supports GeoJSON and TopoJSON formats; you may register a Shapefile (shp) or
other resource as a Crunch geodatum, but will have to supply metadata
hints yourself and are advised to indicate its
non-json format.
The lists of properties returned in the metadata are correlated, such that if a feature in your geodata is missing a given property, it will return null.
>>> import pycrunch
>>> site = pycrunch.connect("me@mycompany.com", "yourpassword", "https://app.crunch.io/api/")
>>> geodata = self.site.geodata.create(as_entity({'name': 'test_geojson',
'location': 'https://s.crunch.io/geodata/leafletjs/us-states.geojson',
'description': '',
'format': 'geojson'}))
>>> geodata.body.metadata
pycrunch.elements.JSONObject(**{
"postal-code": [
"AL",
"AK",
"AZ",
"AK",
"CA", ...],
"name": [
"Alabama",
"Alaska",
"Arizona",
"Arkansas",
"California", ...]})
Modifying your public Geodata
You can modify any Geodatum that you own. Note that you can transfer ownership to another user if you change the owner_id of your geodatum. You may also change the metadata of your geodatum, but keep in mind that if you do this you will override any automated metadata extraction that Crunch provides. If you modify the location of the geodatum and do not provide a metadata parameter in the patch, Crunch will automatically extract metadata as long as the location is publicly accessible.
>>> import pycrunch
>>> site = pycrunch.connect("me@mycompany.com", "yourpassword", "https://app.crunch.io/api/")
>>> entity = site.geodata.index['<geodatum_url>'].entity
>>> entity.patch({'description': 'US States'})
>>> entity.refresh()
>>> entity.body.description
US States
Associating Variables with Geodata
To make maps with variables, update a variable’s view
(or include with metadata at creation) as follows, where
feature_key
is key defined for each Feature in the geojson/topojson that matches the relevant field on the
variable at hand (generally category name
s).
{"view": { "geodata": [
{"geodatum": "<uri>",
"feature_key": "properties.name"}
]}
}
Joins
Catalog
/datasets/{id}/joins/
A GET on this resource returns a Shoji Catalog enumerating the joins present in the Dataset. Each tuple in the index includes a “left_key” and a “right_key” member, each of which MUST be a variable URI. The left_key MUST be a variable in the current dataset, and the right_key SHOULD be a variable in another dataset. Both variables MUST be unique, and should be values taken from the same domain. For example, you might have a principal dataset which is a survey, with a respondent_id variable as a unique key. If you join a separate demographic dataset that has a unique column of the same respondent ids, you might see:
{
"element": "shoji:catalog",
"self": "https://app.crunch.io/api/datasets/837498a/joins/",
"index": {
"https://app.crunch.io/api/datasets/837498a/joins/demo/": {
"left_key": "https://app.crunch.io/api/datasets/837498a/variables/1ef71d/",
"right_key": "https://app.crunch.io/api/datasets/de3095/variables/19471d/"
}
}
}
A PATCH to this resource may add joins (by including new index members), alter existing joins (by replacing existing index members), or deleting joins (by setting existing members to null). A 204 indicates success. As with any Shoji Catalog, the URI of each entity in the index is the key.
Variables in joined datasets may then be used in analyses as if they were part of the principal dataset, simply by using their URI in this join’s variables catalog (see below). The joined dataset includes one row for each row in the principal dataset, by taking the key in the principal and looking up the corresponding key and row in the subordinate dataset. Rows in the principal which have no corresponding row in the subordinate are filled with the “No Data” missing value.
In order to create or alter a new join, the authenticated user will need to have reading access to the right dataset otherwise the server will respond with a 400 error.
The variable url sent for the left key must be a valid url for the current dataset. It is not allowed to use a different dataset as a left table.
Entity
/datasets/{id}/joins/{id}/
A GET on this resource returns a Shoji Entity describing the join, and a link to its Crunch Table (see next). Currently, the Join entity only contains the batch_id for its frame, and therefore isn’t very useful for clients. The entity resource is not editable; PATCH the joins catalog instead.
Joined variables catalog
/datasets/{id}/joins/{id}/variables/
A variables catalog which describes variables in the subordinate dataset. See Variables for more details.
Multitables
Catalog
/datasets/{dataset_id}/multitables/
GET
{
"element": "shoji:catalog",
"self": "/api/datasets/123/multitables/",
"specification": "/api/specifications/multitables/",
"description": "List of multitable definitions for this dataset",
"index": {
"/api/datasets/123/multitables/7ab1e/": {
"is_public": false,
"owner_id": "/api/users/b055/",
"name": "Basic Demographics",
"id": "7ab1e",
"team": "/api/teams/56789/"
}
}
}
GET on this resource returns a Shoji Catalog with the list of Multitables that the current user can use on this Dataset.
This index contains two kinds of multitables: those that belong to the dataset, denoted by the is_public
tuple attribute; and those that belong to the current user. Personal multitables are those created by the authenticated user, and they cannot be accessed by other users. Dataset multitables are available to all users who are authorized to view the dataset.
POST
POST a Shoji Entity to this catalog to create a new multitable definition.
Entities must include a name
and template
; the template
must contain a series of objects with a query
and optionally
transform
. If omitted, is_public
defaults to false
. In similar fashion, team
will default to null
unless
a specific team URL is provided.
A successful POST yields a 201 response that will contain a Location header with the URL of the newly created multitable.
All users with access to the dataset can create personal multitable definitions;
however, only the current dataset editor can create public multitables
(is_public: true
) which everyone with access to the dataset can see.
Attempting to create a public multitable when not the current dataset editor
results in a 403 response.
Copying Multitables between datasets
It is possible to copy over a multitables between datasets as long as the permissions allow it.
Multitable copying requires that all the variables present in the template
of the origin multitable exist on the target dataset and that they all have
the same type. Copied multitables will be private by default.
POST a shoji entity to the catalog with indicating the URL of the multitable to copy:
{
"element": "shoji:entity",
"body": {
"name": "Name of my copy",
"multitable": "/api/datasets/123/multitables/7ab1e/"
}
}
As shown in the example, it is possible to assign a new name to the copy. By default all copies will be private unless specified in the body.
PATCH
There are no elements of the catalog that can be changed via PATCH.
Entity
/datasets/{dataset_id}/multitables/{multitable_id}/
GET
{
"element": "shoji:entity",
"self": "datasets/123/multitables/7ab1e/",
"views": {
"tabbook": "/datasets/123/multitables/7ab1e/tabbook/"
},
"specification": "https://app.crunch.io/api/specifications/multitables/",
"description": "Detail information for one multitable definition",
"body": {
"name": "Basic Demographics",
"user": "/api/users/b055/",
"template": [{
"query": [{
"variable": "/datasets/123/variables/abc/"
}]
}, {
"query": [{
"variable": "/datasets/123/variables/def/"
}]
}],
"is_public": false,
"id": "7ab1e",
"team": "/api/teams/56789/"
}
}
GET on this resource returns a Shoji entity containing the requested multitable definition.
PATCH
PATCH the entity to edit its name
, template
, team
or is_public
attributes. Successful PATCH requests
return 204 status. As with the POSTing new entities to the catalog, only the dataset’s current editor can alter is_public
.
The template
attribute must contain a valid multitable definition.
Views
Multitable entities have a “tabbook” view. See below.
Permissions
Authorization to view, edit, and manage a dataset is controlled by the dataset’s permissions catalog:
/datasets/{id}/permissions/
The permissions catalog is a Shoji Catalog that collects (not contains) Users. There are no permission “entities” to retrieve, create, or delete: all action is achieved directly on the permissions catalog.
GET Catalog
{
"element": "shoji:catalog",
"self": "https://app.crunch.io/api/datasets/1/permissions/",
"description": "Lists all the users that have access to this dataset",
"index": {
"https://app.crunch.io/api/users/42/": {
"dataset_permissions": {
"edit": true,
"change_permissions": true,
"view": true
},
"is_owner": true,
"name": "Lauren Ipsum",
"email": "lipsum@crunch.io"
}
}
}
If authorized to view the dataset, a successful GET returns a Shoji Catalog indicating the users who have access to this dataset and their respective permissions. This includes the current, authorized user making the request. Index tuples are keyed by User URL.
Tuple values include:
Name | Type | Description |
---|---|---|
name | string | Display name of the user |
string | Email address of the user | |
is_owner | boolean | Whether this user is the dataset’s “owner” |
dataset_permissions | object | Attributes governing the user’s authorization; see below |
Supported dataset_permissions
, all boolean, are:
- view: Whether the user can view the dataset. Note that “viewing” is not limited to just GET requests, for dataset viewers may create filters, private variables, and saved analyses, for example.
- edit: Whether the user can edit the dataset. When editing, users with this permission may modify the common data of a dataset, including things like public filters available to all viewers of the dataset.
- change_permissions: Whether the user may alter other users’ authorization on this dataset, i.e., PATCH tuples for users that already exist on the catalog.
PATCH Catalog
The PATCH verb is used to make all modifications to dataset authorization: modifying existing permissions, revoking permissions for users with access, and granting access to users.
Modify existing
To change the permissions a user has, PATCH new dataset_permissions, like:
{
"https://app.crunch.io/api/users/42/": {
"dataset_permissions": {
"edit": false,
"view": true
}
},
"send_notification": true,
"dataset_url": "https://app.crunch.io/dataset/1"
}
Only the “dataset_permissions” key in the tuple can be modified by PATCHing this catalog. Other keys, such as “name”, are included only for facilitating human-readable display of the catalog. If sent, these other keys will be ignored. To modify users’ names, see users.
If a subset of dataset_permissions are included in the payload, only the specified permissions will have their values updated. Omitted permissions will remain unchanged.
Multiple users’ permissions can be modified in a single request by including multiple tuples keyed by User URL.
The “send_notification” key in the payload is optional; if included and True, the server will send an email invitation to all newly added users (see below), as well as to users who are granted “edit” privileges.
If “send_notification” is included and true, you may also include a “dataset_url”, which is the URL that will be included in the email notifying the users that they now have access to the dataset. The web application will send the “browse” view URL, for example, so that when the user receives the email notification, the link they follow will take them to the relevant dataset. If “send_notification” is true and “dataset_url” is omitted, the email link will default to https://app.crunch.io/.
Add new user from within account
To add a user (i.e. share with them), there are two cases. First, if the user to be added is a member of the current user’s account, PATCH similar to above, using this user’s URL as key:
{
"/users/id/": {
"dataset_permissions": {
"edit": false,
"view": true
},
"profile": {
"weight": null,
"applied_filters": []
}
}
}
This payload may include a “profile” member, which are initial values with which to populate the sharee’s user-dataset-profile.
Valid “profile” members include:
- weight: a URL to one of the dataset’s weight variables; if omitted, the sharer’s current weight variable will be used
- applied_filters: an array of filter URLs which are shared with all dataset viewers. If any of the specified filters are private, the PATCH request will return 400 status. Default value for “applied_filters” is [].
If the “profile” member is not included, the newly shared users will be created with their user dataset preferences matching the sharer’s current weight.
Revoking access
To revoke users’ access to this dataset (aka “unshare” with them), PATCH a null tuple for their user URLs:
{
"/users/id/": null
}
Note that all of these PATCHes for add/edit/remove access to the dataset can be done in a single request that combines them all.
Validation
The server will insist, and clients should also validate, that
- There is one and only one user with edit: true privileges for a dataset; if not, the PATCH request will return 400.
- The users who are receiving new authorization via PATCH must have corresponding dataset_permissions on their account authorization. For example, the user who is updated to have edit: true has a dataset_permission of edit: true on their account authorization. If not, the PATCH request will return 400.
- The user that is PATCHing this catalog must have share: true for this dataset; if not, the PATCH request will return 403.
Inviting new users
It is possible to share a dataset with people that are not users of Crunch yet. To do so, it is necessary to send in an email address instead of a user URL as a sharing key.
{
"somebody@email.com": {
"dataset_permissions": {
"edit": false,
"view": true
},
"profile": {
"weight": null,
"applied_filters": []
}
},
"send_notifications": true,
"url_base": "https://app.crunch.io/password/change/${token}/",
"dataset_url": "https://app.crunch.io/dataset/1/"
}
A new user with such email address will be created and added to the account of the user that is making the request. The new user will receive an invitation email to Crunch.io with an activation link. In case the user exists on other or the same account, no changes to the user will be made.
If “send_notification” was included and true in the request, the user will receive a notification email informing her about the new shared dataset if requested so. New users, unless they have an OAuth provider specified, will need to set a password, and the client application should send a URL template that directs them to a place where they can set that password. To do so, include a “url_base” attribute in the payload, a URL template with a ${token}
variable into which the server will insert the password-setting token. For the Crunch web application, this template is https://app.crunch.io/password/change/${token}/
.
Progress
Progress resources provide information about the current state of a long-running server process in Crunch. Some requests at certain endpoints may return 202 status containing a progress URL in the body, at which one can monitor the progress of the request that was accepted and not yet completed.
GET
GET /progress/{id}/ HTTP/1.1
{
"element": "shoji:view",
"self": "https:/app.crunch.io/api/progress/{id}/",
"value": {
"progress": 22,
"message": "exported 2 variables"
}
}
GET
on a Progress view returns a Shoji View containing information about the status of the indicated process. The “progress” attribute contains a integer between -1 and 100. Positive progress values indicate that the job is being processed, while a negative value indicates that an error occurred in processing. Zero entails that the job has not been started, while 100 indicated completion. Additionally, if the id
from the request URL does not exist, GET
will nevertheless return 200 status and indicate "progress": 100
.
Optionally, the View will provide a message regarding current status.
You must be authenticated to GET
this resource.
Projects
Projects represent groups of users or teams that share a common set of datasets. Any user can belong to none or many projects.
They live under /projects/ and will list the projects that the authenticated user is a member or owner of.
Catalog
The projects catalog will list all the projects the authenticated user is a member of. Here you can create new projects via POST
GET
GET /projects/ HTTP/1.1
{
"element": "shoji:catalog",
"self": "http://app.crunch.io/api/projects/",
"index": {
"http://app.crunch.io/api/projects/4643/": {
"name": "Project 1",
"id": "4643",
"icon": "",
"permissions": {"view":true, "edit": "true"}
},
"http://app.crunch.io/api/projects/6c01/": {
"name": "Project 2",
"id": "6c01",
"icon": "",
"description": "Description of project 2",
"permissions": {"view":true, "edit": "true"}
}
}
}
Name | Type | Default | Description |
---|---|---|---|
name | string | Required when creating the project | |
description | string | “” | Longer description pf tje [rpkect |
id | string | autogenerated | The project’s id |
icon | url | “” | Url for the icon file for the project. Empty string if not set |
permissions | object | {} | permissions possessed by querying user against project |
POST
New projects need a name (no uniqueness enforced) and will make the authenticated user its initial member and editor.
POST /projects/ HTTP/1.1
Payload example:
{
"body": {
"name": "My new project",
"icon_url": "http://cdn.sample.com/project-icon.png"
}
}
Creating a project with an icon
To create one with a starting icon you can POST an icon_url
attribute
indicating a url where to fetch that icon from (has to be a publicly accessible url).
If the server cannot read that URL the request will return a 409 error.
On success a copy of the file will be stored as the icon to be serve
If the icon_url
attribute is not provided the API will pick an available icon
from the icons catalog.
Default icon
The API can provide default icons to be used in new projects. Performing a GET request will return a Shoji:catalog with a list of available icons for the client to pick.
GET /icons/ HTTP/1.1
{
"element": "shoji:catalog",
"self": "http://app.crunch.io/api/icons/",
"index": {
"http://app.crunch.io/api/icons/01/": {},
"http://app.crunch.io/api/icons/02/": {},
"http://app.crunch.io/api/icons/03/": {},
"http://app.crunch.io/api/icons/04/": {}
}
}
Entity
GET
GET /projects/6c01/ HTTP/1.1
{
"element": "shoji:entity",
"self": "http://app.crunch.io/api/projects/6c01/",
"catalogs": {
"datasets": "http://app.crunch.io/api/projects/6c01/datasets/",
"members": "http://app.crunch.io/api/projects/6c01/members/"
},
"views": {
"icon": "http://app.crunch.io/api/projects/6c01/icon/"
},
"body": {
"name": "Project 2",
"description": "Long description text",
"icon": "",
"user_icon": false,
"id": ""
}
}
Name | Type | Default | Description |
---|---|---|---|
name | string | Required when creating the project | |
description | string | “” | Longer description of the project |
id | string | autogenerated | The project’s id |
icon | url | “” | Url for the icon file for the project; empty string if not set |
user_icon | boolean | autogenerated | Will indicate false if the icon used on creation is from the provided catalog |
Note about the icon
attribute that points to the actual image file where the
configured icon is. This url does not point to the views.icon
Shoji view url.
The views.icon
Shoji view endpoint is used to PUT the icon as a file upload
for this project.
PATCH
The attributes that are allowed to be edited for a projet are:
- name
- description
- icon_url
Only project editors can make these changes.
DELETE
Deleting a project will NOT delete its datasets. It will change their ownership to the authenticated user. Only the project current owner can delete a project.
DELETE /projects/6c01/ HTTP/1.1
Projects order
Returns the shoji:order
in which the projects should be displayed for
the user. This entity is independent for each user.
As the user is added to more projects, these will be added at the end of the
shoji:order
.
GET
Will return a shoji:order
containing a flat list of all the projects where
the current user belongs to.
GET /projects/order/ HTTP/1.1
{
"element": "shoji:order",
"self": "http://app.crunch.io/api/projects/order/",
"graph": [
"https://app.crunch.io/api/projects/cc9161/",
"https://app.crunch.io/api/projects/a598c7/"
]
}
PUT
In order to change the order of the projects, the client will need to PUT the full payload back to the server.
The graph attribute should contain all projects included, else it will return a 400 response.
After a successful PUT request, the server will reply with a 204 response.
PUT /projects/order/ HTTP/1.1
{
"element": "shoji:order",
"self": "http://app.crunch.io/api/projects/order/",
"graph": [
"https://app.crunch.io/api/projects/cc9161/",
"https://app.crunch.io/api/projects/a598c7/"
]
}
Members
Use this endpoint to manage the users that have access to this project.
Members permissions
Members of a project can be either viewers or editors. By default all members will be viewers and a selected group of them (at least one) will be editor.
These permissions are available on the members catalog under the permissions
attribute on each member’s tuple.
The possible permissions are:
- edit
- view
That can have boolean values. Those with edit: true
are considered project
editors.
Project editors have edit privileges on all datasets as well as permissions to make changes on the project itself such as changing its name, icon, members management or change members’ permissions.
A project can have users or teams as members. Teams represent groups of users to be handled together. When a team gets access to a project, all members of the team inherit those permissions. In the case that a user has access to a project through several teams or direct access, the final permissions will be added together.
GET
Returns a catalog with all users and teams that have access to this project and their project permissions in the following format:
GET /projects/abcd/members/ HTTP/1.1
{
"element": "shoji:catalog",
"self": "http://app.crunch.io/api/projects/6c01/members/",
"index": {
"http://app.crunch.io/api/users/00002/": {
"name": "Jean-Luc Picard",
"email": "captain@crunch.io",
"collaborator": false,
"permissions": {
"edit": true,
"view": true
},
"allowed_dataset_permissions": {
"edit": true,
"view": true
}
},
"http://app.crunch.io/api/users/00005/": {
"name": "William Riker",
"email": "firstofficer@crunch.io",
"collaborator": false,
"permissions": {
"edit": false,
"view": true
},
"allowed_dataset_permissions": {
"edit": false,
"view": true
}
},
"http://app.crunch.io/api/teams/000a5/": {
"name": "Viewers teams",
"permissions": {
"edit": false,
"view": true
}
}
}
}
The catalog will be indexed by each entity’s URL and its tuple will contain basic information (name and email) as well as the permissions each user has on the given project.
All project members have read access to this resource, but the
allowed_dataset_permissions
is only present to project editors. It contains
the maximum dataset permissions each user can have. Assigning anything more
permissive will not have effect.
PATCH
Use this method to add or remove members from the project. Only project editors have this capabilities, else you will get a 403 response.
To add a new user, PATCH a catalog keyed by the new user URL and an empty
object for its value or a permissions tuple to set specific permissions
(only edit
allowed at this point).
To remove users, PATCH a catalog keyed by the user you want to remove and null
for its value.
Note that you cannot remove yourself from the project, you will get a 400 response.
It is possible to perform many additions/removals in one request, the
following example adds users /users/001/
and deletes users /users/002/
It is allowed to invite/add users to the project by email address. If the email is registered on the system the user will be invited to the project. If the email is not part of Crunch.io a new user invitation will be sent to that email with instructions to set up their account. They will be automatically part of this project only.
Attempting to remove users also allows to do so by email. In the case that the email does not exist, the server will return a 400 response.
PATCH /projects/abcd/members/ HTTP/1.1
{
"element": "shoji:catalog",
"self": "http://app.crunch.io/api/projects/6c01/members/",
"index": {
"http://app.crunch.io/api/users/001/": {},
"http://app.crunch.io/api/teams/00a/": {},
"http://app.crunch.io/api/users/002/": {
"permissions": {
"edit": true
}
},
"http://app.crunch.io/api/users/003/": null,
"user@email.com": {},
"send_notification": true,
"url_base": "https://app.crunch.io/password/change/${token}/",
"project_url": "https://app.crunch.io/${project_id}/",
}
}
Sending notifications
The users invited to a project can be both existing Crunch.io users or new users that don’t have a user account associated with the email.
If desired, the API can send automated email notifications to the involved users indicating that they now belong to the project.
It is necessary to add the send_notification
boolean key on the index PATCHed
to command the API to send these emails. Else, no notification will be sent.
When sending notifications, it is necessary for the client to include a
url_base
key as well that includes a string template that should point to a
client location where the password resetting should happen for brand new users.
The server will replace the ${token}
part of the string with the generated
token and will be included on the notification email as a link for the invited
user to configure their account in order to use the app.
Additionally, to indicate the URL of the project, the client can provide a
project_url
key that should be formatted as a URL containing a ${project_id}
part that the server will replace with the project’s ID.
This behavior is the same as described for inviting new users when sharing a dataset
Users
A read only endpoint that lists all the individual users that have access to this project, independent from their access type (via team or direct project membership).
The payload shares a similar shape as the members endpoint, but this catalog contains only users.
GET /projects/abcd/users/ HTTP/1.1
{
"element": "shoji:catalog",
"self": "http://app.crunch.io/api/projects/6c01/members/",
"index": {
"http://app.crunch.io/api/users/00002/": {
"name": "Jean-Luc Picard",
"email": "captain@crunch.io",
"collaborator": false,
"allowed_dataset_permissions": {
"edit": true,
"view": true
},
"teams": []
},
"http://app.crunch.io/api/users/00005/": {
"name": "William Riker",
"email": "firstofficer@crunch.io",
"collaborator": false,
"allowed_dataset_permissions": {
"edit": false,
"view": true
},
"teams": ["http://app.crunch.io/api/teams/000a5/"]
}
}
}
Datasets
Will list all the datasets that have this project as their owner.
Adding datasets to projects
The way to add a dataset to a project is by changing the dataset’s owner to the id of the project you want to take ownership.
You must have edit and be current editor on any given dataset to change its owner and you must also have edit permissions on the target project.
PATCH to dataset entity
Send a PATCH request to the dataset entity that you want to make part of the project.
PATCH /datasets/cc9161/ HTTP/1.1
{"owner":"https://app.crunch.io/api/projects/abcd/"}
GET
Will show the list of all datasets where this project is their owner, the shape of the dataset tuple will be the same as in other dataset catalogs.
GET /projects/6c01/datasets/ HTTP/1.1
{
"element": "shoji:catalog",
"self": "http://app.crunch.io/api/projects/6c01/datasets/",
"orders": {
"order": "http://app.crunch.io/api/projects/6c01/datasets/order/"
},
"index": {
"https://app.crunch.io/api/datasets/cc9161/": {
"owner_name": "James T. Kirk",
"name": "The Voyage Home",
"description": "Stardate 8390",
"archived": false,
"permissions": {
"edit": false,
"change_permissions": false,
"view": true
},
"size": {
"rows": 1234,
"columns": 67
},
"id": "cc9161",
"owner_id": "https://app.crunch.io/api/users/685722/",
"start_date": "2286",
"end_date": null,
"streaming": "no",
"creation_time": "1986-11-26T12:05:00",
"modification_time": "1986-11-26T12:05:00",
"current_editor": "https://app.crunch.io/api/users/ff9443/",
"current_editor_name": "Leonard Nimoy"
},
"https://app.crunch.io/api/datasets/a598c7/": {
"owner_name": "Spock",
"name": "The Wrath of Khan",
"description": "",
"archived": false,
"permissions": {
"edit": true,
"change_permissions": true,
"view": true
},
"size": {
"rows": null,
"columns": null
},
"id": "a598c7",
"owner_id": "https://app.crunch.io/api/users/af432c/",
"start_date": "2285-10-03",
"end_date": "2285-10-20",
"streaming": "no",
"creation_time": "1982-06-04T09:16:23.231045",
"modification_time": "1982-06-04T09:16:23.231045",
"current_editor": null,
"current_editor_name": null
}
}
}
Icon
The icon endpoint for a project is a ShojiView that allows to change the project’s icon via file upload or URL.
GET
On GET, it will return a shoji:view
with its value containing a url to the
icon file or empty string in case there isn’t an icon for this project yet.
By default all new projects have an empty icon URL.
GET /projects/6c01/icon/ HTTP/1.1
{
"element": "shoji:view",
"self": "http://app.crunch.io/api/projects/6c01/icon/",
"value": ""
}
PUT
PUT to this endpoint to change a project’s icon.
There are two ways to change the icon, either via file upload or via icon URL.
Only the project’s editors can change the project’s icon.
Valid image extensions: ‘png’, 'gif’, 'jpg’, 'jpeg’ - Others will 400
File upload
The request should have be a standard multipart/form-data
file upload with
the file field named icon
.
The file’s contents will be stored and made available under the project’s url.
The API will return a 201 response with the stored icon’s URL on its Location
header.
PUT /projects/6c01/icon/ HTTP/1.1
Content-Disposition: form-data; name="icon"; filename="newicon.jpg"
Content-Type: image/jpeg
HTTP/1.1 201 Created
Location: https://app.crunch.io/api/datasets/223fd4/
Icon URL
Expects a Shoji:view
request with its value pointing to a publicly accessible
image resource that will be used as the project’s icon. This image will be
copied to an API local location.
PUT /projects/6c01/datasets/icon/ HTTP/1.1
{
"element": "shoji:view",
"self": "http://app.crunch.io/api/projects/6c01/datasets/icon/",
"value": "http://public.domain.com/icon.png"
}
HTTP/1.1 201 Created
Location: https://app.crunch.io/api/datasets/223fd4/
POST
Same as PUT
Datasets order
Contains the shoji:order
in which the datasets of this project are to be
ordered.
This is endpoint available for all project members but can only be updated by the project’s editors.
GET
Will return the shoji:order
response containing the datasets that belong
to the project.
GET /projects/6c01/datasets/order/ HTTP/1.1
{
"element": "shoji:order",
"self": "http://app.crunch.io/api/projects/6c01/datasets/order/",
"graph": [
"https://app.crunch.io/api/datasets/cc9161/",
"https://app.crunch.io/api/datasets/a598c7/"
]
}
PUT
Allow to make modifications to the shoji:order
for the contained datasets.
Only the project’s editors can make these changes.
Trying to include an invalid dataset or an incomplete list will return a 400 response.
PUT /projects/6c01/datasets/order/ HTTP/1.1
{
"element": "shoji:order",
"self": "http://app.crunch.io/api/projects/6c01/datasets/order/",
"graph": [
"https://app.crunch.io/api/datasets/cc9161/",
{
"group": "https://app.crunch.io/api/datasets/a598c7/"
}
]
}
Search
You can perform a cross-dataset search of dataset metadata (including variables) via the search endpoint. This search will return associated variables and dataset metadata. A query string, along with filtering properties can be provided to the search endpoint in order to refine the results. The query string provided is only used in plain-text format, any non-text or numeric characters are ignored at this time.
Results are limited only to those datasets the user has access to. Offset and limit parameters are also provided in order to provide performance-chunking options. The limit and offset are returned in relationship to datasets related to the search. You have reached the limit of available search entries when there are no longer records in the dataset field.
Here are the parameters that can be passed to the search endpoint.
Parameter | Type | Description |
---|---|---|
q | String | query string |
f | Json Object | used to filter the output of the search (see below) |
limit | Integer | limit the number of dataset results returned by the api to less than this amount (default: 10) |
offset | Integer | offset into the search index to start gathering results from pre-filter |
max_variables_per_dataset | Integer | limit the number of variables that match to this number (default: 100, max: 100) (deprecated, use variable_limit) |
embedded_variables | Boolean | embed the results within the dataset results (this will become the default in the future) |
projection | Json Object | used to limit the fields that should be returned in the search results. ID is always provided. |
scope | Json Object | used to limit the fields that the search should look at. |
grouping | String | One of datasets or variables . Tells if search results should be grouped by datasets or variables. |
variable_limit | Integer | Limit the number of variables returned per dataset to this value, (default: 100, max: 100) |
variable_offset | Integer | Offset into the variables returned per dataset, default 0 |
max_subfield_entries_per_variable | Integer | Number of items in the subfields of a variable (such as categories or subvariables), (default: 10, max: 100) |
Providing a Projection:
projection
argument must be a JSON array containing the name of the fields that should be projected for datasets and variables.
The fields are specified with the namespace they refer to, like "variables.fieldname"
and "datasets.fieldname"
.
The namespace is the same as the key where the relevant search results are returned.
Performing a search with an invalid field will pinpoint the invalid one and provide the list of accepted values.
Providing a Scope:
scope
parameter must be a JSON array containing the name of the fields that should be used to resolve the query.
Much like projection
paramter this one accepts a list of fields with their namespace (datasets
or variables
). T
he provided query will be looked up only in the specified fields if a scope
is provided. A special field name *
is accepted to specify that default fields should be looked for a specific namespace.
A scope like datasets.name, variables.*
will search the query in the default variable fields and in dataset name.
Grouping:
Default grouping is datasets
which will enable searching in dataset data and its variables. The returned entries
in “datasets” are datasets that match the query or contain a variable that matches it.
Search results are limited to 1000 variables per dataset when grouping per dataset.
Switching to variables
grouping makes the search only look in variables.
Note that a “datasets” field is still returned, the entries there are the datasets the matching variables are part of,
not datasets that match the query. This is done to allow providing dataset details for a variable without the need for
a second call to fetch the dataset info.
Allowable filter parameters:
Parameter | Type | Description |
---|---|---|
dataset_ids | array of strings | limit results to particular dataset_ids or urls (user must have read access to that dataset) |
team | string | url or id of the team to limit results (user must have read access to the team) |
project | string | url or id of the project to limit results (user must have access to the project) |
organization | string | if you are the owner for a given organization, you can filter all of the search results pertaining to the datasets in your organization. |
user | string | url or id of the user that has read access to the datasets to limit results (user must match with the provided one) |
owner | string | url or id of the dataset owner to limit results |
label | string | The dataset must be in a folder or subfolder with the given name. |
start_date | array of strings | array of [begin, end] range of values in ISO8601 format. Provide same for exact matching. |
end_date | array of strings | array of [begin, end] range of values in ISO8601 format. Provide same for exact matching. |
modification_time | array of strings | array of [begin, end] range of values in ISO8601 format. Provide same for exact matching. |
creation_time | array of strings | array of [begin, end] range of values in ISO8601 format. Provide same for exact matching. |
Fields Searched
Here is a list of the fields that are searched by the Crunch search endpoint
Field | Type | Description |
---|---|---|
category_names | List of Strings | Category names (associated with categorical variables) |
dataset_id | String | ID of the dataset |
description | String | description of the variable |
id | String | ID of the variable |
name | String | name of the variable |
owner | String | owner’s ID of the variable |
subvar_names | List of Strings | Names of the subvariables associated with the variable |
users | List of Strings | User IDs having read-access to the variable |
group_names | List of Strings | group names (from the variable ordering) associated with the variable |
dataset_labels | List of Objects | dataset_labels associated with the user associated with the variable |
dataset_name | String | dataset_name associated with this variable |
dataset_owner | String | ID of the owner of the dataset associated with the variable |
dataset_users | List of Strings | User IDs having read-access to the dataset associated with the variable |
dataset_teams | List of Strings | Team IDs having read-access to the dataset associated with the variable |
dataset_projects | List of Strings | Project IDs having read-access to the dataset associated with the variable |
Grouping by datasets:
GET /search/?q={query}&f={filter}&limit={limit}&offset={offset}&projection={projection}&grouping=datasets HTTP/1.1
import pycrunch
site = pycrunch.connect("me@mycompany.com", "yourpassword", "https://app.crunch.io/api/")
results = site.follow('search', 'q=findme&embedded_variables=True').value
datasets_found = results['groups'][0]['datasets']
variables_by_dataset = {k, v.get('variables', []) for k, v in datasets_found.iteritems()}
```json
{
"element": "shoji:view",
"self": "https://app.crunch.io/api/search/?q=blue&grouping=datasets",
"description": "Returns a view with relevant search information",
"value": {
"groups": [
{
"group": "Search Results",
"datasets": {
"https://app.crunch.io/api/datasets/173b4eec13f542588b9b0a9cbcd764c9/": {
"labels": [],
"name": "econ_few_columns_0",
"description": ""
},
"https://app.crunch.io/api/datasets/4473ab4ee84b40b2a7cd5cab4548d584/": {
"labels": [],
"name": "simple_alltypes",
"description": ""
}
},
"variables": {
"https://app.crunch.io/api/datasets/4473ab4ee84b40b2a7cd5cab4548d584/variables/000000/": {
"dataset_labels": [],
"users": [
"00002"
],
"alias": "x",
"dataset_end_date": null,
"category_names": [
"red",
"green",
"blue",
"4",
"8",
"9",
"No Data"
],
"dataset_start_date": null,
"name": "x",
"dataset_description": "",
"dataset_archived": false,
"group_names": null,
"dataset": "https://app.crunch.io/api/datasets/4473ab4ee84b40b2a7cd5cab4548d584/",
"dataset_id": "bb987b45a5b04caba10dec4dad7b37a8",
"dataset_created_time": null,
"subvar_names": [],
"dataset_name": "export test 94",
"description": "Numeric variable with value labels"
}
},
"variable_count": 14,
"totals": {
"variables": 4,
"datasets": 2
}
}
]
}
}
Search results are limited to 1000 variables per dataset.
Grouping by variables:
GET /search/?q={query}&f={filter}&limit={limit}&offset={offset}&grouping=variables HTTP/1.1
{
"element": "shoji:view",
"self": "https://app.crunch.io/api/search/?q=Atchafalaya&grouping=variables",
"description": "Returns a view with relevant search information",
"value": {
"groups": [{
"group":"Search Results",
"totals":{
"variables":2,
"datasets":2
},
"buckets":{
"Qk9XX0FGX05hbWU":[
"http://app.crunch.io:29668/api/datasets/825b87ff955049128b9d48b614abbe99/variables/000008/",
"http://app.crunch.io:29668/api/datasets/fcd37212fe0d4b8eb8804ffb7ccb933d/variables/000008/"
]
},
"order":[
"http://app.crunch.io:29668/api/datasets/825b87ff955049128b9d48b614abbe99/variables/000008/",
"http://app.crunch.io:29668/api/datasets/fcd37212fe0d4b8eb8804ffb7ccb933d/variables/000008/"
],
"variables":{
"http://app.crunch.io:29668/api/datasets/fcd37212fe0d4b8eb8804ffb7ccb933d/variables/000008/":{
"alias":"BOW_AF_Name",
"category_names":[
"East Cote Blanche Bay",
"Atchafalaya Bay, Delta, Gulf waters",
"Barataria Bay",
"Bayou Grand Caillou",
"Bayou du Large",
"Bays Gardene, Black, American and Crabe",
"Calcasieu Lake",
"Calcasieu River and Ship Channel",
"California Bay and Breton Sound",
"Grid 12",
"..."
],
"bucket":"Qk9XX0FGX05hbWU",
"name":"BOW_AF_Name",
"dataset":"http://app.crunch.io:29668/api/datasets/fcd37212fe0d4b8eb8804ffb7ccb933d/"
},
"http://app.crunch.io:29668/api/datasets/825b87ff955049128b9d48b614abbe99/variables/000008/":{
"alias":"BOW_AF_Name",
"category_names":[
"East Cote Blanche Bay",
"Atchafalaya Bay, Delta, Gulf waters",
"Barataria Bay",
"Bayou Grand Caillou",
"Bayou du Large",
"Bays Gardene, Black, American and Crabe",
"Calcasieu Lake",
"Calcasieu River and Ship Channel",
"California Bay and Breton Sound",
"Grid 12",
"..."
],
"bucket":"Qk9XX0FGX05hbWU",
"name":"BOW_AF_Name",
"dataset":"http://app.crunch.io:29668/api/datasets/825b87ff955049128b9d48b614abbe99/"
}
},
"datasets":{
"http://app.crunch.io:29668/api/datasets/fcd37212fe0d4b8eb8804ffb7ccb933d/":{
"modification_time":"2017-06-22T17:00:36.571000",
"archived":false,
"description":"",
"end_date":null,
"name":"test_variable_search_matching_2",
"labels":null,
"creation_time":"2017-06-22T17:00:37.024000",
"id":"fcd37212fe0d4b8eb8804ffb7ccb933d",
"projects":[
],
"start_date":null
},
"http://app.crunch.io:29668/api/datasets/825b87ff955049128b9d48b614abbe99/":{
"modification_time":"2017-06-22T17:00:34.681000",
"archived":false,
"description":"",
"end_date":null,
"name":"test_variable_search_matching_1",
"labels":null,
"creation_time":"2017-06-22T17:00:35.151000",
"id":"825b87ff955049128b9d48b614abbe99",
"projects":[
],
"start_date":null
}
}
}
]
}
}
Sources
Catalog
/sources/
A Shoji Catalog representing the Sources added by this User. POST a multipart form here, with an “uploaded_file” field containing the file to upload; 201 indicates success, and the returned Location header refers to the new Source resource.
The uploaded sources will use the file’s filename as their .name attribute and will have blank description.
The catalog will include the sources’ .name and .description
Alternately, you may POST a urlencoded payload with a source_url
parameter that
points to a publicly accessible URL. Both “http” and the “s3” scheme are
supported. This endpoint will then download such file synchronously and verify
that it is a valid source file. It will be made available for the current user
sources catalog.
Regular Shoji POST payloads are also supported to create new sources from
remote source URLs. A location
attribute should be included in the Shoji:entity
body POSTed.
{
"element": "shoji:entity",
"body": {
"location": "<url>",
"name": "Optional name",
"description": "Optional description"
}
}
Entity
/sources/{id}/
A Shoji Entity representing a single Source. Its “body” member contains:
- name: A friendly name for the Source.
- type: a string declaring the media type of the source. One of (“csv”, “spss”).
- user_id: the id of the User who created the Source.
- location: an absolute URI to the data. Currently, the only supported scheme is “crunchfile://”, which indicates a file uploaded to Crunch.io.
- settings: an object containing configuration for translating the source to crunch internals. Its members vary by type:
- csv:
- strict: an integer. If 1, extra columns or undefined category ids in the CSV will raise an error. If 0, they will be added to the dataset.
- csv:
A PUT must contain a JSON object with members from the Shoji Entity “body” which the client intends to update. 204 indicates success.
A DELETE destroys the Source resource. 204 indicates success.
/sources/{id}/file/
A GET returns the original source file.
Tab books
/datasets/{dataset_id}/multitables/{multitable_id}/tabbook/
The default tabbook
view of a multitable will generate an excel (.xlsx) workbook
containing each variable in the dataset crosstabbed with a given multitable.
POST
A successful POST request to /datasets/{dataset_id}/multitables/{multitable_id}/tabbook/
will generate a download
location to which the exporter will write this file when it is done computing
(it may take some time for large datasets). The server will return a 202 response indicating that the export job started with
a Location header indicating where the final exported file will be available. The response’s body will contain the URL for the progress URL where to query
the state of the export job. Clients should note the download URL,
monitor progress, and when complete, GET the download location. See Progress for details.
Requesting the same job, if still in progress, will return the same 202 response indicating the original progress to check. If the export is finished, the server will 302 redirect to the destination for download.
If there have been changes on the dataset attributes, a new tab book will be generated regardless of the status of any other pending exports.
POST /api/datasets/a598c7/multitables/7ab1e/tabbook/ HTTP/1.1
HTTP/1.1 202 Accepted
Location: https://s3-url/filename.xlsx
{
"element": "shoji:view",
"self": "https://app.crunch.io/api/datasets/a598c7/multitables/{id}/tabbook/",
"value": "https://app.crunch.io/api/progress/5be83a/"
}
Alternatively, you can request a JSON output for your tab book by adding an Accept request header.
POST /api/datasets/a598c7/multitables/7ab1e/tabbook/ HTTP/1.1
Accept: application/json
{
"meta": {
"dataset": {
"name": "weighted_simple_alltypes",
"notes": ""
},
"layout": "many_sheets",
"sheets": [
{
"display_settings": {
"countsOrPercents": {
"value": "percent"
},
"currentTab": {
"value": 0
},
"decimalPlaces": {
"value": 0
},
"percentageDirection": {
"value": "colPct"
},
"showEmpty": {
"value": false
},
"showNotes": {
"value": false
},
"slicesOrGroups": {
"value": "groups"
},
"valuesAreMeans": {
"value": false
},
"vizType": {
"value": "table"
}
},
"filters": null,
"name": "x",
"weight": "z"
},
... (one entry for each sheet)
],
"template": [
{
"query": [
{
"args": [
{
"variable": "000002"
}
],
"function": "bin"
}
]
},
{
"query": [
{
"args": [
{
"variable": "00000a"
},
{
"value": null
}
],
"function": "rollup"
}
]
}
]
},
"sheets": [
{
"result": [
{
"result": {
"counts": [
1,
1,
1,
1,
1,
1,
0
],
"dimensions": [
{
"derived": false,
"references": {
"alias": "x",
"description": "Numeric variable with value labels",
"name": "x"
},
"type": {
"categories": [
{
"id": 1,
"missing": false,
"name": "red",
"numeric_value": 1
},
{
"id": 2,
"missing": false,
"name": "green",
"numeric_value": 2
},
{
"id": 3,
"missing": false,
"name": "blue",
"numeric_value": 3
},
{
"id": 4,
"missing": false,
"name": "4",
"numeric_value": 4
},
{
"id": 8,
"missing": true,
"name": "8",
"numeric_value": 8
},
{
"id": 9,
"missing": false,
"name": "9",
"numeric_value": 9
},
{
"id": -1,
"missing": true,
"name": "No Data",
"numeric_value": null
}
],
"class": "categorical",
"ordinal": false
}
}
],
"measures": {
"count": {
"data": [
0.0,
0.0,
1.234,
0.0,
3.14159,
0.0,
0.0
],
"metadata": {
"derived": true,
"references": {},
"type": {
"class": "numeric",
"integer": false,
"missing_reasons": {
"No Data": -1
},
"missing_rules": {}
}
},
"n_missing": 5
}
},
"n": 6
}
},
{
"result": {
"counts": [
1,
0,
0,
0,
0,
0,
1,
0,
0,
0,
0,
0,
0,
1,
0,
0,
0,
0,
1,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
1,
1,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0
],
"dimensions": [
{
"derived": false,
"references": {
"alias": "x",
"description": "Numeric variable with value labels",
"name": "x"
},
"type": {
"categories": [
{
"id": 1,
"missing": false,
"name": "red",
"numeric_value": 1
},
{
"id": 2,
"missing": false,
"name": "green",
"numeric_value": 2
},
{
"id": 3,
"missing": false,
"name": "blue",
"numeric_value": 3
},
{
"id": 4,
"missing": false,
"name": "4",
"numeric_value": 4
},
{
"id": 8,
"missing": true,
"name": "8",
"numeric_value": 8
},
{
"id": 9,
"missing": false,
"name": "9",
"numeric_value": 9
},
{
"id": -1,
"missing": true,
"name": "No Data",
"numeric_value": null
}
],
"class": "categorical",
"ordinal": false
}
},
{
"derived": true,
"references": {
"alias": "z",
"description": "Numberic variable with missing value range",
"name": "z"
},
"type": {
"class": "enum",
"elements": [
{
"id": -1,
"missing": true,
"value": {
"?": -1
}
},
{
"id": 1,
"missing": false,
"value": [
1.0,
1.5
]
},
{
"id": 2,
"missing": false,
"value": [
1.5,
2.0
]
},
{
"id": 3,
"missing": false,
"value": [
2.0,
2.5
]
},
{
"id": 4,
"missing": false,
"value": [
2.5,
3.0
]
},
{
"id": 5,
"missing": false,
"value": [
3.0,
3.5
]
}
],
"subtype": {
"class": "numeric",
"missing_reasons": {
"No Data": -1
},
"missing_rules": {}
}
}
}
],
"measures": {
"count": {
"data": [
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
1.234,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
3.14159,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0
],
"metadata": {
"derived": true,
"references": {},
"type": {
"class": "numeric",
"integer": false,
"missing_reasons": {
"No Data": -1
},
"missing_rules": {}
}
},
"n_missing": 5
}
},
"n": 6
}
},
{
"result": {
"counts": [
1,
0,
0,
1,
0,
0,
0,
1,
0,
0,
1,
0,
0,
0,
1,
0,
0,
1,
0,
0,
0
],
"dimensions": [
{
"derived": false,
"references": {
"alias": "x",
"description": "Numeric variable with value labels",
"name": "x"
},
"type": {
"categories": [
{
"id": 1,
"missing": false,
"name": "red",
"numeric_value": 1
},
{
"id": 2,
"missing": false,
"name": "green",
"numeric_value": 2
},
{
"id": 3,
"missing": false,
"name": "blue",
"numeric_value": 3
},
{
"id": 4,
"missing": false,
"name": "4",
"numeric_value": 4
},
{
"id": 8,
"missing": true,
"name": "8",
"numeric_value": 8
},
{
"id": 9,
"missing": false,
"name": "9",
"numeric_value": 9
},
{
"id": -1,
"missing": true,
"name": "No Data",
"numeric_value": null
}
],
"class": "categorical",
"ordinal": false
}
},
{
"derived": true,
"references": {
"alias": "date",
"description": null,
"name": "date"
},
"type": {
"class": "enum",
"elements": [
{
"id": 0,
"missing": false,
"value": "2014-11"
},
{
"id": 1,
"missing": false,
"value": "2014-12"
},
{
"id": 2,
"missing": false,
"value": "2015-01"
}
],
"subtype": {
"class": "datetime",
"missing_reasons": {
"No Data": -1
},
"missing_rules": {},
"resolution": "M"
}
}
}
],
"measures": {
"count": {
"data": [
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
1.234,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
3.14159,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0
],
"metadata": {
"derived": true,
"references": {},
"type": {
"class": "numeric",
"integer": false,
"missing_reasons": {
"No Data": -1
},
"missing_rules": {}
}
},
"n_missing": 5
}
},
"n": 6
}
}
]
},
... (one entry for each sheet)
]
}
POST body parameters
At the top level, the tab book endpoint can take filtering and variable limiting parameters.
Name | Type | Default | Description | Example |
---|---|---|---|---|
filter | object | null | Filter by Crunch Expression. Variables used in the filter should be fully-expressed urls. | [{“filter”:“https://app.crunch.io/api/datasets/45fc0d5ca0a945dab7d05444efa3310a/filters/5f14133582f34b8b85b408830f4b4a9b/”}] |
where | object | null | Crunch Expression signifying which variables to use | { “function”: “select”, “args”: [ { “map”: { “https://app.crunch.io/api/datasets/45fc0d5ca0a945dab7d05444efa3310a/variables/000004/”: { “variable”: “https://app.crunch.io/api/datasets/45fc0d5ca0a945dab7d05444efa3310a/variables/000004/” }, “https://app.crunch.io/api/datasets/45fc0d5ca0a945dab7d05444efa3310a/variables/000003/”: { “variable”: “https://app.crunch.io/api/datasets/45fc0d5ca0a945dab7d05444efa3310a/variables/000003/” } } } ] } |
options | object | {} | further options defining the tabbook output. | |
weight | url | null | Provide a weight for the tabbook generation, if the weight is omitted from the request, the currently selected weight is used. If “null” is provided, then the tabbook generation will be unweighted. | “http://app.crunch.io/api/datasets/45fc0d5ca0a945dab7d05444efa3310a/variables/5f14133582f34b8b85b408830f4b4a9b/” |
Options
Options for generating tab books
Name | Type | Default | Description | Example |
---|---|---|---|---|
display_settings | object | {} | a set of settings to define how the output should be displayed | See Below. |
layout | string | many_sheets | “many_sheets” indicates each variable should have its own Sheet in the xls spreadsheet. “single_sheet” indicates all output should be in the same sheet. | single_sheet |
Display Settings
Further tab book viewing options.
Name | Type | Default | Description | Example |
---|---|---|---|---|
decimalPlaces | object | 0 | number of decimal places to diaplay | {“value”: 0} |
vizType | object | table | Visialization Type | {value:table}, |
countsOrPercents | object | percent | use counts or percents | {value:percent} |
percentageDirection | object | row or column based percents | {value:colPct} | |
showNotes | object | display variable notes in sheet header | {value:false} | |
slicesOrGroups | object | groups | slices or groups | {value:groups} |
valuesAreMeans | object | false | are values means? | {value:false} |
Table
All datasets contain a /table/ endpoint which allows access the full data values. It provides granular control over the rows and columns for each dataset.
Fetching values
GET
Dataset editors can GET to this resource and obtain a Shoji Table of the dataset’s data. It will expose all the variables that are visible by the authenticated user (Public + personals created by them if requested) as well as the exclusion filter applied (if any).
To include the personal variables on the output table the client should
include the include_personal
GET parameter on the request with a True
value.
A metadata
section contains the definitions of all the variables matched
by variable ID with the corresponding entry under data
.
Dataset viewers can only access the metadata
portion of the response. This
means they cannot make use of the limit
and offset
parameters to query
data unless the dataset’s setting viewers_can_export
is set to True, else
the server will respond with a 403 response.
GET /datasets/:id/table/ HTTP/1.1
{
"self": "https:\/\/alpha.crunch.io\/api\/datasets\/:id\/table\/",
"element": "crunch:table",
"data": {
"000007": [ 1, 1, 2 ],
"000004": [ 1, 1, 1 ],
"000005": [ 1, 0, 1 ],
"000003": [ "red", "green", "MORE JUNK" ],
"000000": [ 1, 2, 9 ],
"000001": [ "2000-01-01T00:00:00", "2000-01-02T00:00:00", { "?": -1 } ],
"000008": [ 1, 2, 3 ],
"000009": [ 2, 3, 4 ],
"00000c": [ [ 1, 1, 2 ], [ 1, 2, 3 ], [ 2, 3, 4 ] ]
},
"description": "A Crunch Table of data for this dataset.",
"metadata": {
"000004": {
"alias": "bool1",
"type": "categorical",
"name": "mymrset | Response #1",
"categories": [
{ "numeric_value": 1, "selected": true, "id": 1, "name": "1", "missing": false },
{ "numeric_value": 0, "id": 0, "name": "0", "missing": false },
{ "numeric_value": null, "id": -1, "name": "No Data", "missing": true }
],
"description": "bool1"
},
"000005": {
"alias": "bool2",
"type": "categorical",
"name": "mymrset | Response #2",
"categories": [
{ "numeric_value": 1, "selected": true, "id": 1, "name": "1", "missing": false },
{ "numeric_value": 0, "id": 0, "name": "0", "missing": false },
{ "numeric_value": null, "id": -1, "name": "No Data", "missing": true }
],
"description": "bool2"
},
"000003": {
"alias": "str",
"type": "text",
"name": "str",
"missing_reasons": { "No Data": -1 },
"description": "40 character string"
},
"000000": {
"alias": "x",
"type": "categorical",
"name": "x",
"categories": [
{ "numeric_value": 1, "id": 1, "name": "red", "missing": false },
{ "numeric_value": 2, "id": 2, "name": "green", "missing": false },
{ "numeric_value": 3, "id": 3, "name": "blue", "missing": false },
{ "numeric_value": 4, "id": 4, "name": "4", "missing": false },
{ "numeric_value": 8, "id": 8, "name": "8", "missing": true },
{ "numeric_value": 9, "id": 9, "name": "9", "missing": false },
{ "numeric_value": null, "id": -1, "name": "No Data", "missing": true }
],
"description": "Numeric variable with value labels"
},
"000001": {
"name": "y",
"type": "datetime",
"missing_reasons": { "No Data": -1 },
"alias": "y",
"resolution": "s",
"description": "Date variable"
},
"00000c": {
"alias": "categorical_array",
"type": "categorical_array",
"name": "categorical_array",
"subvariables": ["000007", "000008", "000009"],
"subreferences": {
"000009": {"alias": "ca_subvar_1", "name": "ca_subvar_1", "description": ""},
"000007": {"alias": "ca_subvar_2", "name": "ca_subvar_2", "description": ""},
"000008": {"alias": "ca_subvar_3", "name": "ca_subvar_3", "description": ""}
},
"categories": [
{ "numeric_value": null, "selected": false, "id": 1, "missing": false, "name": "a" },
{ "numeric_value": null, "selected": false, "id": 2, "missing": false, "name": "b" },
{ "numeric_value": null, "selected": false, "id": 3, "missing": false, "name": "c" },
{ "numeric_value": null, "selected": false, "id": 4, "missing": false, "name": "d" },
{ "numeric_value": null, "selected": false, "id": -1, "missing": true, "name": "No Data" }
],
"description": ""
}
}
}
Filtering
This endpoint accepts the same filter parameters described under Filtering Endpoints
Teams
Teams contain references to users and datasets. By sharing a dataset with a team, you can grant access to a set of users at once, and by adding a user to a team, you can grant them access to a set of datasets.
Catalog
/teams/
GET
GET /teams/ HTTP/1.1
Host: app.crunch.io
--------
200 OK
Content-Type: application/json
// Example team catalog:
{
"element": "shoji:catalog",
"self": "https://app.crunch.io/api/teams/",
"description": "List of all the teams where the current user is member",
"index": {
"https://app.crunch.io/api/teams/d07edb/": {
"name": "The A-Team",
"permissions": {
"team_admin": true
}
},
"https://app.crunch.io/api/teams/67fe89/": {
"name": "Palo Alto Data Science",
"permissions": {
"team_admin": false
}
}
}
}
teams <- getTeams()
names(teams)
## [1] "The A-Team" "Palo Alto Data Science"
POST
To create a new team, POST a Shoji Entity with a team “name” in the body. No other attributes are required, and you will be automatically assigned as a “team_admin”.
POST /teams/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
...
{
"element": "shoji:entity",
"body": {
"name": "My new team with ytpo"
}
}
--------
201 Created
Location: /teams/03df2a/
# Create a new team by assigning into the teams catalog
teams[["My new team with ytpo"]] <- list()
names(teams) # Let's see that it was created
## [1] "The A-Team" "Palo Alto Data Science"
## [3] "My new team with ytpo"
# You can also assign members to the team when you create it,
# even though the POST /teams/ API does not support it.
teams[["New team with members"]] <- list(members="fake.user@example.com")
Entity
/teams/{team_id}/
GET
GET /teams/d07edb/ HTTP/1.1
Host: app.crunch.io
--------
200 OK
Content-Type: application/json
// Example team entity
{
"element": "shoji:entity",
"self": "https://app.crunch.io/api/teams/d07edb/",
"description": "Details for a specific team",
"body": {
"creator": "https://app.crunch.io/api/users/41c69d/",
"id": "d07edb",
"name": "The A-Team"
},
"catalogs": {
"datasets": "https://app.crunch.io/api/teams/d07edb/datasets/",
"members": "https://app.crunch.io/api/teams/d07edb/members/"
}
}
# Access a team by name using $ or [[ from the team catalog
a.team <- teams[["The A-Team"]]
name(a.team)
## [1] "The A-Team"
self(a.team)
## [1] "https://app.crunch.io/api/teams/d07edb/"
A GET request on a team entity URL returns the same “name”, “id” and “creator” attributes as shown in the team catalog, as well as references to the “datasets” and “members” catalogs corresponding to the team. Authorization is required: if the requesting user is not a member of the team, a 404 response will result.
PATCH
Team names are editable by PATCHing the team entity. Authorization is required: only team members with “team_admin” permission may edit the team’s name; other team members will receive a 403 response on PATCH.
PATCH /teams/03df2a/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
{
"element": "shoji:entity",
"body": {
"name": "My new team without typo"
}
}
--------
204 No Content
name(teams[["My new team with ytpo"]]) <- "My new team without typo"
names(teams) # Check that it was updated
## [1] "The A-Team" "Palo Alto Data Science"
## [3] "My new team without typo"
Team members catalog
/teams/{team_id}/members/
The team members catalog is a Shoji Catalog similar in nature to the dataset permissions catalog. It collects references to users and defines the authorizations they have with respect to the team. All information about the member relationships is contained in the catalog–there are no “member entities”–and all changes to team membership, whether adding, modifying, or removing users, is done via PATCH.
GET
GET /teams/d07edb/members/ HTTP/1.1
Host: app.crunch.io
--------
200 OK
Content-Type: application/json
{
"element": "shoji:catalog",
"self": "https://app.crunch.io/api/teams/d07edb/members/",
"description": "Catalog of users that belong to this team",
"index": {
"https://app.crunch.io/api/users/47193a/": {
"name": "B. A. Baracus",
"permissions": {
"team_admin": false
}
},
"https://app.crunch.io/api/users/41c69d/": {
"name": "Hannibal",
"permissions": {
"team_admin": true
}
}
}
}
members(team)
Tuple values include:
Name | Type | Description |
---|---|---|
name | string | Display name of the user |
permissions | object | Attributes governing the user’s authorization on the team |
Supported permissions
, all boolean, include:
- team_admin: Allows add/remove and manage the members and permissions of a team as well modify and delete the team in question. Defaults as
false
.
PATCH
Authorization is required: team members who do not have the “team_admin” permission and who attempt to PATCH the member catalog will receive a 403 response. As with the team entity, non-members will receive 404 on attempted PATCH.
PATCH a partial Shoji Catalog to add users to the team, to modify permissions of members already on the team, and to remove team members. The examples below illustrate each of those actions separately, but all can be done together in a single PATCH request, in fact.
In the “index” attribute of the catalog, object keys must be either (a) URLs of User entities or (b) email addresses. They can be mixed in a single PATCH request. Using email address allows you to invite a user to Crunch while adding them to the team if they do not yet have a Crunch account, but it is also valid as a reference to Users that already exist.
Add and modify members
PATCH /teams/d07edb/members/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
{
"element": "shoji:catalog",
"index": {
"https://app.crunch.io/api/users/47193a/": {
"permissions": {
"team_admin": true
}
},
"https://app.crunch.io/api/users/e3211a/": {},
"templeton.peck@army.gov": {
"permissions": {
"team_admin": true
}
}
},
"send_notification": true,
"url_base": "https://app.crunch.io/password/change/${token}/"
}
--------
204 No Content
If the index object keys correspond to users that already appear in the member catalog, their permissions will be updated with the corresponding value. In this example, user 47193a
, B. A. Baracus, has been given the team_admin
permission.
If the index object keys do not correspond to users already found in the member catalog, the indicated users will be added to the team. And, if the indicated user, as specified by email address, does not yet exist, they will be invited to Crunch and added to the team. In this example, we added existing user e3211a
, implicitly with team_admin
set to False, to the team, and we also added “templeton.peck@army.gov”, who did not previously have a Crunch account.
If “send_notification” was included and true in the request, new-to-Crunch users will receive a notification email informing them that they have been invited to Crunch. New users, unless they have an OAuth provider specified, will need to set a password, and the client application should send a URL template that directs them to a place where they can set that password. To do so, include a “url_base” attribute in the payload, a URL template with a ${token}
variable into which the server will insert the password-setting token. For the Crunch web application, this template is https://app.crunch.io/password/change/${token}/
.
A GET on the members catalog shows the updated catalog.
GET /teams/d07edb/members/ HTTP/1.1
Host: app.crunch.io
--------
200 OK
Content-Type: application/json
{
"element": "shoji:catalog",
"self": "https://app.crunch.io/api/teams/d07edb/members/",
"description": "Catalog of users that belong to this team",
"index": {
"https://app.crunch.io/api/users/47193a/": {
"name": "B. A. Baracus",
"permissions": {
"team_admin": true
}
},
"https://app.crunch.io/api/users/41c69d/": {
"name": "Hannibal",
"permissions": {
"team_admin": true
}
},
"https://app.crunch.io/api/users/e3211a/": {
"name": "Howling Mad Murdock",
"permissions": {
"team_admin": false
}
},
"https://app.crunch.io/api/users/89eb3a/": {
"name": "templeton.peck@army.gov",
"permissions": {
"team_admin": true
}
}
}
}
Removing members
To remove members from the team, PATCH the catalog with a null
value:
PATCH /teams/d07edb/members/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
```json
{
"element": "shoji:catalog",
"index": {
"https://app.crunch.io/api/users/e3211a/": null
}
}
--------
204 No Content
GET /teams/d07edb/members/ HTTP/1.1
Host: app.crunch.io
--------
200 OK
Content-Type: application/json
{
"element": "shoji:catalog",
"self": "https://app.crunch.io/api/teams/d07edb/members/",
"description": "Catalog of users that belong to this team",
"index": {
"https://app.crunch.io/api/users/47193a/": {
"name": "B. A. Baracus",
"permissions": {
"team_admin": true
}
},
"https://app.crunch.io/api/users/41c69d/": {
"name": "Hannibal",
"permissions": {
"team_admin": true
}
},
"https://app.crunch.io/api/users/89eb3a/": {
"name": "templeton.peck@army.gov",
"permissions": {
"team_admin": false
}
}
}
}
Team datasets catalog
/teams/{team_id}/datasets/
The team datasets catalog only supports the GET verb. To add a dataset to a team, you must PATCH its permissions catalog.
GET
GET returns a Shoji Catalog of datasets that have been shared with this team. See datasets for details.
Users
Catalog
/users/{?email,id}
A successful GET on this resource returns a Shoji Catalog whose “index” URL’s refer to User objects. If the “email” or “id” parameters are provided, the result is narrowed to Users matching those parameters.
This method only supports GET requests. To add users they need to be added from each account users’ catalog. This endpoint ensures that the new users belong to an account and get an invitation accordingly.
Entity
/users/{id}/{?reason_url}
A Shoji Entity with the following body members:
- name
- id
- id_method (optional)
- id_provider (optional, and only if id_method == ‘oauth’)
The id_method member can be one of {'oauth’, 'pwhash’}. If not present, 'pwhash’ is assumed.
The authenticated user can only access another user’s entity endpoint IF any of the following are true:
- Both belong to the same account
- They are both members of a common team
- Authenticated user is account admin and viewed user is collaborator on such account
A user themselves or with “alter_users” account permission can PUT new attributes via a JSON-like request body. A 200 indicates success.
Send invitation email
/users/{id}/invite/
A POST to this resource sends an invitation from the current user to the identified User. A 204 indicates success. The current user must have “can_alter_users” account permission or 403 is returned instead.
If a “url_base” parameter is included in the request body, it will be used to form links inside the invitation.
Change password
/users/{id}/password/
A POST on this resource must consist of a JSON object with the members “old_pw” and “new_pw”. A 204 indicates success, a 400 indicates failure.
Reset user’s password
/users/{id}/password_reset/
A GET on this resource always returns 204. A POST will send a reset password notification to the identified user. A 204 indicates success.
If a “url_base” parameter is included in the request body, it will be used to form links inside the notification.
Change user’s email
/users/{id}/change_email/
A POST on this resource must consist of a JSON object with the members “pw” and “email”. A 204 indicates potential success to change the users email address to the newly provided email. The user should check their email and verify they own the email address in question.
If the password does not match the users current password they will receive an error message (400 Bad Request). If the user is an oauth account, then the email address may not be changed (409 Conflict).
If the user ID does not match the current signed in user, an 403 Forbidden will be sent back.
Expropriate a user
An account admin can expropriate a user from the same account. This will change ownership of all of the affected user’s teams, projects and datasets to a new owner.
The new owner must also be part of the same account and should have
create_datasets
permissions set to true
.
POST /users/{id}/expropriate/
{
"element": "shoji:entity",
"body": {
"owner": "http://app.crunch.io/api/users/123abc/"
}
}
The new owner provided can be a user URL or a user email.
User Datasets
/account/users/{id}/datasets/
This URL is only accessible and available to account admins.
This Shoji catalog lists all the datasets that are owned by this user.
User Visible datasets
/users/{id}/visible_datasets/
This endpoint is only available and accessible to account admins.
Returns a Shoji catalog listing all the datasets (archived or not) that a any user has access to, either via direct share, via team access or project membership.
{
"https://app.crunch.io/api/datasets/wsx345/": {
"name": "survey data",
"last_access_time": "2017-02-25",
"access_type": {
"teams": ["https://app.crunch.io/api/teams/abx/"],
"project": "https://app.crunch.io/api/projects/qwe/",
"direct": true
},
"permissions": {
"edit": true,
"view": true,
"change_permissions": true
}
},
"https://app.crunch.io/api/datasets/a2c4b2/": {
"name": "responses dataset",
"last_access_time": "2016-11-09",
"access_type": {
"teams": [],
"project": null,
"direct": true
},
"permissions": {
"edit": false,
"view": true,
"change_permissions": false
}
}
}
The tuples contain information of the type of access the user has to each
dataset via the access_type
attribute. It includes:
- The list of teams that provide access to this dataset
- The project that provides access to this dataset or null
- If the user has a direct share to this dataset
The permissions
attribute indicates the final coalesced permissions this
user enjoys on the given dataset.
Variables
Catalog
/datasets/{id}/variables/{?relative}
A Shoji Catalog of variables.
GET catalog
When authenticated and authorized to view the given dataset, GET returns 200 status with a Shoji Catalog of variables in the dataset. If authorization is lacking, response will instead be 404.
Array subvariables are not included in the index of this catalog. Their metadata are instead accessible in each array variable’s “subvariables_catalog”.
Private variables are not included in the index of this catalog, although
entities may be present at variables/{id}/
. See Private Variables for an
index of those.
Catalog tuples contain the following keys:
Name | Type | Description |
---|---|---|
name | string | Human-friendly string identifier |
alias | string | More machine-friendly, traditional name for a variable |
description | string | Optional longer string |
id | string | Immutable internal identifier |
notes | string | Optional annotations for a variable |
discarded | boolean | Whether the variable should be hidden from most views; default: false |
derived | boolean | Whether the variable is a function of another; default: false |
type | string | The string type name, one of “numeric”, “text”, “categorical”, “datetime”, “categorical_array”, or “multiple_response” |
subvariables | array of URLs | For arrays, array of (ordered) references to subvariables |
subvariables_catalog | URL | For arrays, link to a Shoji Catalog of subvariables |
resolution | string | Present in datetime variables; current resolution of data |
rollup_resolution | string | Present in datetime variables; resolution used for rolled up summaries |
geodata | URL | Present only in variables that have geodata associated; points to the catalog of geodata related to this variable |
uniform_basis | boolean | Whether each subvariable should be considered the same length as the total array. Only on multiple_response |
The catalog has two optional query parameters:
Name | Type | Description |
---|---|---|
relative | string | If “on”, all URLs in the “index” will be relative to the catalog’s “self” |
With the relative flag enabled, the variable catalog looks something like this:
{
"element": "shoji:catalog",
"self": "https://app.crunch.io/api/datasets/5ee0a0/variables/",
"orders": {
"hier": "https://app.crunch.io/api/datasets/5330a0/variables/hier/",
"personal": "https://app.crunch.io/api/datasets/5330a0/variables/personal/",
"weights": "https://app.crunch.io/api/datasets/5ee0a0/variables/weights/"
},
"specification": "https://app.crunch.io/api/specifications/variables/",
"description": "List of Variables of this dataset",
"index": {
"a77d9f/": {
"name": "Birth Year",
"derived": false,
"discarded": false,
"alias": "birthyear",
"type": "numeric",
"id": "a77d9f",
"notes": "",
"description": "In what year were you born?"
},
"9e4c84/": {
"name": "Comments",
"derived": false,
"discarded": false,
"alias": "qccomments",
"type": "text",
"id": "9e4c84",
"notes": "Global notes about this variable.",
"description": "Do you have any comments on your experience of taking this survey (optional)?"
},
"aad4ad/": {
"subvariables_catalog": "aad4ad/subvariables/",
"name": "An Array",
"derived": true,
"discarded": false,
"alias": "arrayvar",
"subvariables": [
"439dcf/",
"1c99ea/"
],
"notes": "All variable types can have notes",
"type": "categorical_array",
"id": "aad4ad",
"description": ""
}
}
}
PATCH catalog
Use PATCH to edit the “name”, “description”, “alias”, or “discarded” state of one or more variables. A successful request returns a 204 response. The attributes changed will be seen by all users with access to this dataset; i.e., names, descriptions, aliases, and discarded state are not merely attributes of your view of the data but of the datasets themselves.
Authorization is required: you must have “edit” privileges on the dataset being modified, as shown in the “permissions” object in the dataset’s catalog tuple. If you try to PATCH and are not authorized, you will receive a 403 response and no changes will be made.
The tuple attributes other than “name”, “description”, “alias”, and “discarded” cannot be modified here by PATCH. Attempting to modify other attributes, or including new attributes, will return a 400 response. Variable “type” can only be modified by the “cast” method, described below. The “subvariables” can be modified by PATCH on the variable entity. “subvariables_catalog” is a URL to a different variable catalog and is thus not editable, though you can navigate to its location and modify subvariable attributes there. A variable’s “id” and its “derived” state are immutable.
When PATCHing, you may include only the keys in each tuple that are being modified, or you may send the complete tuple. As long as the keys that cannot be modified via PATCH here are not modified, the request will succeed.
Note that, because this catalog contains its entities (rather than collecting them), you cannot PATCH to add new variables, nor can you PATCH a null tuple to delete them. Attempting either will return a 400 response. Creating variables is allowed only by POST to the catalog, while deleting variables is accomplished via a DELETE on the variable entity.
{
"element": "shoji:catalog",
"index": {
"9e4c84/": {
"discarded": true
}
}
}
PATCHing this payload on the above catalog will return a 204 status. A subsequent GET of the catalog returns the following response; note the change in line 24.
{
"element": "shoji:catalog",
"self": "https://app.crunch.io/api/datasets/5ee0a0/variables/",
"orders": {
"hier": "https://app.crunch.io/api/datasets/5330a0/variables/hier/",
"personal": "https://app.crunch.io/api/datasets/5330a0/variables/personal/",
"weights": "https://app.crunch.io/api/datasets/5ee0a0/variables/weights/"
},
"specification": "https://app.crunch.io/api/specifications/variables/",
"description": "List of Variables of this dataset",
"index": {
"a77d9f/": {
"name": "Birth Year",
"derived": false,
"discarded": false,
"alias": "birthyear",
"type": "numeric",
"id": "a77d9f",
"notes": "",
"description": "In what year were you born?"
},
"9e4c84/": {
"name": "Comments",
"derived": false,
"discarded": true,
"alias": "qccomments",
"type": "text",
"id": "9e4c84",
"notes": "Global notes about this variable.",
"description": "Do you have any comments on your experience of taking this survey (optional)?"
},
"aad4ad/": {
"subvariables_catalog": "aad4ad/subvariables/",
"name": "An Array",
"derived": true,
"discarded": false,
"alias": "arrayvar",
"subvariables": [
"439dcf/",
"1c99ea/"
],
"notes": "All variable types can have notes",
"type": "categorical_array",
"id": "aad4ad",
"description": ""
}
}
}
POST catalog
A POST to this resource must be a Shoji Entity with the following “body” attributes:
- name
- type
- If “type” is “categorical”, “multiple_response”, or “categorical_array”: categories: an array of category definitions
- If “type” is “multiple_response” or “categorical_array”: subvariables: an array of URLs of variables to be “bound” together to form the array variable
- If “type” is “multiple_response” or “categorical_array”: subreferences: an object keyed by each of the subvariable URLs where each value contains partial variable definitions, which will be created as categorical subvariables of the array. If included, the array definition must include “categories”, which are shared among the subvariables.
- If type is “multiple_response”, the definition may include selected_categories: an array of category names present in the subvariables. This will mark the specified category or categories as the “selected” response in the multiple response variable. If no “selected_categories” array is provided, the new variable will use any categories already flagged as “selected”: true. If no such category exists, the response will return a 400 status.
- If “type” is “datetime”: resolution: a string, such as “Y”, “Q”, “M”, “W”, “D”, “h”, “m”, “s”, “ms”, that indicates the unit size of the datetime data.
See Variable Definitions for more details and examples of valid attributes, and Feature Guide: Arrays for more information on the various cases for creating array variables.
It is encouraged, but not required, to include an “alias” in the body. If omitted, one will be generated from the required “name”.
You may also include “values”, which will create the column of data corresponding to this variable definition. See Importing Data: Column-by-column for details and examples.
You may instead also include an “derivation” to derive a variable as a function of other variables. In this case, “type” is not required because it depends on the output of the specified derivation function. For details and examples, see Deriving Variables.
A 201 indicates success and includes the URL of the newly-created variable in the Location header.
Private variables catalog
/datasets/{id}/variables/private/{?relative}
GET
returns a Shoji Catalog of variables, as described above, containing those
variables that are private to the authenticated user. You may PATCH
this
catalog to edit names, aliases, descriptions, etc. of the private variables.
POST
, however, is not supported at this endpoint. To create new private
variables, POST
to the main variables catalog with a "private": true
body
attribute.
Hierarchical Order
/datasets/{id}/variables/hier/
Dataset global order containing references to all public variables.
GET
Returns a Shoji Order.
PATCH
Will expect a Shoji Order representation containing a replacement or new grouped entities. This allows one to create new groups on the fly or overwrite existing groups with new ‘entities’.
The match happens by each group name and will overwrite the values of each group with the received one.
After PATCH any variable not present in the order will always be appended to the root of the graph.
PUT
Receives a Shoji Order representation with a completely new graph. Any previously existing group will be eliminated and any new groups will be added. This will overwrite the complete set of current groups.
After PUT any variable not present on any of the groups will always be appended to the root of the graph.
Personal Variable Order
/datasets/{id}/variables/personal/
Unlike the hierarchical order, the personal variable order returns different content per user. Each user can add variable references to it including personal variables and will not be shared with other users.
The personal variable order defaults to an empty Shoji order until each user makes changes to it.
The allowed variables on this order are: * Any public variable available on the variable catalog * Any personal variable or subvariable for the authenticated user * Any subvariable of an array variable on the variable catalog
GET
Returns a Shoji Order for this user.
PATCH
Same as hierarchical order, receives a Shoji Order representation to overwrite the existing order. Personal variables are allowed here.
PUT
Behaves sames as PATCH.
Weights
/datasets/{id}/variables/weights/
GET
GET a shoji:order
that contains the urls of the variables that have been
designated as possible weight variables.
PATCH
PATCH the graph
with a list of the desired list of weight variables. The
list will always be overwritten with the new values. This order can only
be a flat list of URLs, any nesting will be rejected with a 400 response.
If the dataset has a default weight variable configured, it will always be present on the response even if it wasn’t included on a PATCH request.
Removing variables from this list will have the side effect of changing any user’s preference that had such variables set as their weight to the current dataset’s default weight.
Only numeric variables are allowed to be used as weight. If a variable of another type is included in the list, the server will abort and return a 409 response.
{
"graph": ["https://app.crunch.io/api/datasets/42d0a3/variables/42229f"]
}
PUT
Behaves sames as PATCH.
Entity
/datasets/{id}/variables/{id}/
A Shoji Entity which exposes most of the metadata about a Variable in the dataset.
GET
Variable entities’ body
attributes contain the following:
Name | Type | Description |
---|---|---|
name | string | Human-friendly string identifier |
alias | string | More machine-friendly, traditional name for a variable |
description | string | Optional longer string |
id | string | Immutable internal identifier |
notes | string | Optional annotations for the variable |
discarded | boolean | Whether the variable should be hidden from most views; default: false |
private | boolean | If true, the variable is only visible to the owner and is only included in the private variables catalog, not the common catalog |
owner | url | If the variable is private it will point to the url of its owner; null for non private variables |
derived | boolean | Whether the variable is a function of another; default: false |
type | string | The string type name |
categories | array | If “type” is “categorical”, “multiple_response”, or “categorical_array”, an array of category definitions (see below). Other types have an empty array |
subvariables | array of URLs | For array variables, an ordered array of subvariable ids |
subreferences | object of objects | For array variables, an object of {“name”: …, “alias”: …, …} objects keyed by subvariable url |
resolution | string | For datetime variables, a string, such as “Y”, “M”, “D”, “h”, “m”, “s”, “ms”, that indicates the unit size of the datetime data. |
derivation | object | For derived variables, a Crunch expression which was used to derive this variable; or null |
format | object | An object with various members to control the display of Variable data (see below) |
view | object | An object with various members to control the display of Variable data (see below) |
dataset_id | string | The id of the Dataset to which this Variable belongs |
missing_reasons | object | An object whose keys are reason phrases and whose values are missing codes; missing entries in Variable data are represented by a {“?”: code} missing marker; clients may look up the corresponding reason phrase for each code in this one-to-one map |
Category objects have the following members:
Name | Type | Description |
---|---|---|
id | integer | identifier for the category, corresponding to values in the column of data |
name | string | A unique label identifying the category |
numeric_value | numeric | A quantity assigned to this category for numeric aggregation. May be null . |
missing | boolean | If true, the given category is marked as “missing”, and is omitted from most calculations. |
selected | boolean | For categories in multiple response variables, those with "selected": true which values correspond to the “response” being selected. If omitted, the category is treated as not selected. Multiple response variables must have at least one category marked as selected and may have more than one. |
Format objects may contain:
Name | Type | Description |
---|---|---|
data | object | An object with an integer “digits” member, stating how many digits to display after the decimal point when showing data values |
summary | object | An object with an integer “digits” member, stating how many digits to display after the decimal point when showing aggregates values |
View objects may contain:
Name | Type | Description |
---|---|---|
show_codes | boolean | For categorical types only; if true, numeric values are shown |
show_counts | boolean | If true, show counts; if false, show percents |
include_missing | boolean | For categorical types only; if true, include missing categories |
include_noneoftheabove | boolean | For multiple response types only; if true, display a “none of the above” category in the requested summary or analysis |
rollup_resolution | string | For datetime variables, a unit to which data should be “rolled up” by default. See “resolution” above. |
PATCH
PATCH variable entities to edit their metadata. Send a Shoji Entity with a “body” member containing the attributes to modify. Omitted body attributes will be unchanged.
Successful requests return 204 status. Among the actions achievable by PATCHing variable entities:
- Editing category attributes and adding categories. Include all categories.
- Remove categories by sending all categories except for the ones you wish to remove. You can only remove categories that don’t have any corresponding data values. Attempting to remove categories that have data associated will fail with a 400 response status.
- Reordering or removing subvariables in an array. Unlike categories, subvariables cannot be added via PATCH here.
- Editing derivation expressions
- Editing format and view settings
- Changing a datetime variable’s resolution
Actions that are best or only achieved elsewhere include:
- changing variable names, aliases, and descriptions, which is best accomplished by PATCHing the variable catalog, as described above;
- changing a variable’s type, which can only be done by POSTing to the variable’s “cast” resource (see Convert type below);
- editing names, aliases, and descriptions of subvariables in an array, which is done by PATCHing the array’s subvariable catalog;
- altering missing rules.
Variable “id” and “dataset_id” are immutable.
Example:
{
"subvariables": [
"http://app.crunch.io/api/datasets/d4db9831e08a4922b054e49b47a0045c/variables/00000c/subvariables/0008/",
"http://app.crunch.io/api/datasets/d4db9831e08a4922b054e49b47a0045c/variables/00000c/subvariables/0007/",
"http://app.crunch.io/api/datasets/d4db9831e08a4922b054e49b47a0045c/variables/00000c/subvariables/0009/"
],
"subreferences": {
"http://app.crunch.io/api/datasets/d4db9831e08a4922b054e49b47a0045c/variables/00000c/subvariables/0008/": {
"alias": "subvar_2",
"name": "v2_new_name",
"description": null
},
"http://app.crunch.io/api/datasets/d4db9831e08a4922b054e49b47a0045c/variables/00000c/subvariables/0007/": {
"alias": "subvar_1_new_name",
"name": "v1_new_name",
"description": null
},
"http://app.crunch.io/api/datasets/d4db9831e08a4922b054e49b47a0045c/variables/00000c/subvariables/0009/": {
"alias": "subvar_3",
"name": "subvar_3",
"description": "new description"
}
}
}
POST
Calling POST on an array resource will “unbind” the variable. On success, POST
returns 200 status with a Shoji View, containing the URLs of the
(formerly sub-)variables, which are promoted to regular variables.
DELETE
Calling DELETE on this resource will delete the variable. On success, DELETE
returns 200 status with an empty Shoji View. Deleting an array deletes all its
subvariable data as well.
Summary
/datasets/{id}/variables/{id}/summary/{?filter}
A collection of summary information describing the variable. A successful GET returns an object containing various scalars and tabular results in various formats. The set of included members varies by variable type. Exclusions, filters, and weights may all alter the output.
For example, given a numeric variable with data [1, 2, 3, 4, 5, 4, {“?”: -1}, 3, 5, {“?”: -1}, 4, 3], a successful GET with no exclusions, filters, or weights returns:
{
"count": 12,
"valid_count": 10,
"fivenum": [
["0", 1.0],
["0.25", 3.0],
["0.5", 3.5],
["0.75", 4.0],
["1", 5.0],
],
"missing_count": 2,
"min": 1.0,
"median": 3.5,
"histogram": [
{"at": 1.5, "bins": [1.0, 2.0], "value": 1},
{"at": 2.5, "bins": [2.0, 3.0], "value": 1},
{"at": 3.5, "bins": [3.0, 4.0], "value": 3},
{"at": 4.5, "bins": [4.0, 5.0], "value": 5}
],
"stddev": 1.2649110640673518,
"max": 5.0,
"mean": 3.4,
"missing_frequencies": [{"count": 2, "value": "No Data"}],
}
numeric
The members include several counts:
- count: The number of entries in the variable.
- valid_count: The number of entries in the variable which are not missing.
- missing_count: The number of entries in the variable which are missing.
- missing_frequencies: An array of row objects. Each row represents a distinct missing reason, and includes the reason phrase as the “value” member and the number of entries which are missing for that reason as the “count” member.
- histogram: An array of row objects. Each row represents a discrete interval in the probability distribution, whose boundaries are given by the “bins” pair. An “at” member is included giving the midpoint between the two boundaries. The “value” member gives a count of entries which fall into the given bin. as well as basic summary statistics:
- fivenum: An array of five [quartile, point] pairs, where the “quartile” element is one of the strings “0”, “0.25”, “0.5”, “0.75”, “1”, representing the min, first quartile, median, third quartile, and max boundaries to divide the data values into four equal groups. The “point” is the real number at each boundary, and is estimated using the same algorithm as Excel or R’s “algorithm 7”, where h is: (N - 1)p + 1.
- min, median, max: taken from “fivenum”, above.
- mean: the sum of the values divided by the number of values, or, if weighted, the sum of weight times value divided by the sum of the weights.
- stddev: The standard deviation of the values.
categorical
The basic counts are included:
- count: The number of entries in the variable.
- valid_count: The number of entries in the variable which are not missing.
- missing_count: The number of entries in the variable which are missing.
- missing_frequencies: An array of row objects. Each row represents a distinct missing reason, and includes the reason phrase as the “value” member. The number of entries which are missing for that reason is included as the “count” member.
And the typical “frequencies” member is expanded into a custom “categories” member:
- categories: An array of row objects. Each row represents a distinct category (whether valid or missing), and includes its id the
_id
member (note the leading underscore), and its name as the “name” member. The “missing” member is true or false depending on whether the category is marked missing or not. The number of entries which possess that value is included as the “count” member.
text
The basic counts are included:
- count: The number of entries in the variable.
- valid_count: The number of entries in the variable which are not missing.
- missing_count: The number of entries in the variable which are missing.
- nunique: The number of distinct values in the data.
- sample: A sample of 5 entries of the data.
In addition:
- max_chars: The number of characters of the longest value in the data.
Univariate frequencies
/datasets/{id}/variables/{id}/frequencies/{?filter,exclude_exclusion_filter}
An array of row objects, giving the count of distinct values. The exact members vary by type:
- numeric: Each row represents a distinct valid value, and includes it as the “value” member. The number of entries which possess that value is included as the “count” member.
- categorical: Each row represents a distinct category (whether valid or missing), and includes its id the
_id
member (note the leading underscore), and its name as the “name” member. The “missing” member is true or false depending on whether the category is marked missing or not. The number of entries which possess that value is included as the “count” member. - text: Each row represents a distinct valid value, and includes it as the “value” member. The number of entries which possess that value is included as the “count” member. The length of the array is limited to 10 entries; if more than 10 distinct values are present in the data, an 11th row is added with a “value” member of “(Others)”, summing their counts.
Transforming
Convert type
/datasets/{id}/variables/{id}/cast/
A POST to this resource, with a JSON request body of {“cast_as”: type}, will alter the variable to the given type. If the variable cannot be cast to the given type, 409 is returned. See next to obtain a preview summary of such a cast before committing to it.
Casting to datetime
- From Numeric: Need to include keys:
offset
as an ISO-8601 date string andresolution
which is one of the following strings:- Y: Year
- Q: Quarter
- M: Month
- W: Week
- D: Day
- h: Hour
- m: Minutes
- s: Seconds
- ms: Milliseconds
- From Text: Need to include a
format
key containing a valid strftime string to format with. - From Categorical: Need to include a
format
key containing a valid strftime string to format with.
Casting from datetime
- To Numeric: Not supported
- To Text: Need to include a
format
key containing a valid strftime string that matches the variable values to parse with. - To Categorical: Need to include a
format
key containing a valid strftime string that matches the category names to parse with.
Array variables
- Multiple Response: Not supported
- Categorical Array: Not supported
/datasets/{id}/variables/{id}/cast/?cast_as={type}
A GET on this resource will return the same response as ../summary would if the variable were cast to the given type. If the given type is not valid, 404 is returned.
Attributes
Missing values
/datasets/{id}/variables/{id}/missing_rules/
A Shoji Entity whose “body” member contains an array of missing rule objects. POST a {reason: rule} to this URL to add a new rule. Rules take one of the following forms:
- {'value’: v}: Entries which match the given value will be marked as missing for the given reason.
- {'set’: [v1, v2, …]}: Entries which are present in the given set will be marked as missing for the given reason.
- {'range’: [lower, upper], 'inclusive’: [true, false]}: Entries which exist between the given boundaries will be marked as missing for the given reason. If either “inclusive” element is null, the corresponding boundary is unbounded.
- {'function’: ’…’, 'args’: […]}: Entries which match the given filter function will be marked as missing for the given reason. This is typically a tree of simple rules logical-OR’d together.
Example:
[
{
"Invalid": {"value": 0},
"Sarai doesn't know how to use a calculator :(": {"range": [1000, null], "inclusive": [true, false]}
}
]
Subvariables
/datasets/{id}/variables/{id}/subvariables/
GET
This endpoint will return 404 for any variable that is not an array variable (Multiple response and Categorical variable).
For array variables, this endpoint will return a Shoji Catalog containing a tuples for the subvariables. The tuples will have the same shape as the main variables catalog.
PATCH
On PATCH, this endpoint allows modification to the variables attributes exposed on the tuples (name, description, alias, discarded).
It is possible to add new subvariables to the array variable in question. To do so include the URL of another variable (currently existing on the dataset) on the payload with an empty tuple and such variable will be converted into a subvariable and added at the end.
In the case of derived arrays, an attempt to PATCH this catalog will return a
405 response. This is because the list of subvariables for this array is
a function of its derivation expression. The correct way to make modifications
to derived arrays’ subvariables is by editing its derivation
attribute with
the desired expressions for each of them.
Values
/datasets/{id}/variables/{id}/values/{?start,total,filter}
A GET on this set of resources will return a JSON array of values from the variable’s data. Numeric variables will return numbers, text variables will return strings, and categorical variables will return category names for valid categories and {“?”: code} missing markers for missing categories. The “start” and “total” parameters paginate the results. The “filter” is a Crunch filter expression.
Note that this endpoint is only accessible by dataset editors unless the
viewers_can_export
dataset setting is set to true
, else the server will
return a 403 response.
Private Variables
/datasets/{id}/variables/private/
Private variables are variables that, instead of being shared with everyone, are viewable only by the user that created them. In Crunch, users with view-only permissions on a dataset can still make variables of their own–just as they can make private filters.
Private variables are not shown in the common variable catalog. Instead, they have their own Shoji Catalog of private variables belonging to the specified dataset for the authenticated user. Aside from this separate catalog, private variable entities and the catalog behave just as described above for public variables.
Versions
Datasets have a collection of versions, points in time to which you can roll back.
Catalog
GET
GET /datasets/{dataset_id}/savepoints/?limit,offset
When authenticated, GET returns 200 status with a (paginated) Shoji Catalog of versions to which the dataset can be reverted. Catalog tuples contain the following attributes:
Name | Type | Default | Description |
---|---|---|---|
user_display_name | string | “” | The name of the user who saved this version |
description | string | An informative note about the version, as in a commit message | |
version | string | An internal identifier for the saved version | |
creation_time | datetime | Timestamp for when the version was created | |
last_update | datetime | Timestamp for when the version was last updated | |
revert | url | URL to POST to in order to roll back to this version; see below |
Query parameters:
Name | Type | Default | Description |
---|---|---|---|
limit | integer | 1000 | How many versions to include in the catalog response |
offset | integer | 0 | How many versions to skip before returning limit versions |
For pagination purposes, catalog tuples are sorted from most to least recent. However, since JSON objects are unordered, you cannot rely on the order of the tuples within the payload you receive.
POST
POST /datasets/{dataset_id}/savepoints/
To create a new version, POST a JSON object to the versions catalog. Object attributes may contain:
Name | Type | Required | Description |
---|---|---|---|
description | string | No | An informative note about the version, as in a commit message |
A successful POST will return 201 status with the URL of the newly created version entity in the Location header. If the current user is not an editor of the dataset, POSTing will return a 403 status.
PATCH
No version attributes may be modified by PATCHing the catalog. PATCH will return a 405 status.
Entity
GET
GET /datasets/{dataset_id}/savepoints/{version_id}/
Version entities expose a subset of attributes found in the catalog tuples:
Name | Type | Default | Description |
---|---|---|---|
user_display_name | string | “” | The name of the user who saved this version |
description | string | An informative note about the version, as in a commit message | |
version | string | An internal identifier for the saved version |
PATCH
PATCH /datasets/{dataset_id}/savepoints/{version_id}/
The version’s “description” may be modified by PATCHing its entity. A successful request returns 204 status. If the current user is not an editor of the dataset, PATCHing will return a 403 status.
Reverting
POST /datasets/{dataset_id}/savepoints/{version_id}/revert/
To roll back to a saved version, POST an empty body to the version’s “revert” URL, found both inside the catalog tuple and in the “views” attribute of the entity. A successful request will return 204 status.
Reverting a dataset will not change its current ownership.
Xlsx
The xlsx
endpoint takes as input a prepared table (intended for use with
multitables) and returns an xlsx file, with some basic formatting conventions.
A POST request to /api/xlsx/
will return an xlsx file directly, with
correct content-disposition and type headers.
POST
POST /api/xlsx/ HTTP/1.1
HTTP/1.1 200 OK
Content-Disposition: attachment; filename=Crunch-export.xlsx
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
{
"element": "shoji:entity",
"body": {
"result": [
{
"rows": [],
"etc.": "described below"
}
]
}
}
Endpoint Parameters
At the top level, the xlsx takes a result
array and display_settings
object
which defines some formatting to be used on the values. Multiple tables can be
placed on a single sheet.
Result
Name | Type | Typical element | Description |
---|---|---|---|
rows | array | {"value": 30, "class": "formatted"} |
Cells are objects with at least a value member, and optional class , where a value of "formatted" prevents the exporter from applying any number format to the result cell |
colLabels | array | {"value": "All"} |
Array of objects with a value member |
colTitles | array | "Age" |
Array of strings |
spans | array | 4 |
array of integers matching the length of colTitles, indicating the number of cells to be joined for each colTitle after the first one. The first colTitle is assumed to be only one column wide. |
rowTitle | string | "Dog food brands" |
A title, which is formatted bold above the first column of the table (the rowLabels, below) |
rowLabels | array | {"value": "Canine Crunch"} |
labels for rows of the table |
rowVariableName | string | "Preferred dog food" |
title to display at the very top left of the result sheet |
filter_names | array | "Breed: Dachshund" |
Names of any filters to print beneath the table, will be labeled “Filters”. If multiple result objects are included in the payload, the filter names from the first result are used, and placed at the bottom of the sheet beneath all results. |
Display Settings
Further customization for the resulting output.
Name | Type | Default | Description | Example |
---|---|---|---|---|
decimalPlaces | object | 0 | number of decimal places to diaplay | {"value": 0} |
countsOrPercents | object | percent | use counts or percents | {"value": "percent"} |
percentageDirection | object | {“value”: “colPct”} | row or column based percents | {"value": "colPct"} |
valuesAreMeans | object | false | are values means? (If so, will be formatted with decimal places) | {"value": false} |
Quirks
Because the formatted output was designed to display values computed by other clients, it abuses some assumptions about the tables it is displaying. Some of these are enumerated below.
- Rows have a ‘marginal’ column positioned first after the row label.
- If display settings indicate
rowPct
, rows have an additional marginal column intended to show unconditional N for each row. - The remaining row labels are all accounted for in the sum of
spans
. - Column titles are placed in merged cells above one or more labels.
- The same filter(s) are applied to all tables on a page.
- No “freeze panes” are applied to the result.
- If the table contains percentages, they should be percentages, not proportions (0 to 100, not 0 to 1).
Complete example
{"element":"shoji:entity",
"body":{
"result": [
{
"filter_names": ["Name_of_filter"],
"rows": [
[
{
"value": 50,
"class": "marginal marginal-percentage"
},
{
"value": 50,
"pValue": 0,
"class": "subtable-0 col-0"
},
{
"value": 50,
"pValue": 0,
"class": "subtable-0 col-1"
}
],
[
{
"value": 50,
"class": "marginal marginal-percentage"
},
{
"value": 50,
"pValue": 0,
"class": "subtable-0 col-0"
},
{
"value": 50,
"pValue": 0,
"class": "subtable-0 col-1"
}
],
[
{
"value": 0,
"class": "marginal marginal-percentage"
},
{
"value": 0,
"pValue": null,
"class": "subtable-0 col-0"
},
{
"value": 0,
"pValue": null,
"class": "subtable-0 col-1"
}
],
[
{
"value": 0,
"class": "marginal marginal-percentage"
},
{
"value": 0,
"pValue": null,
"class": "subtable-0 col-0"
},
{
"value": 0,
"pValue": null,
"class": "subtable-0 col-1"
}
]
],
"colLabels": [
{
"value": "All"
},
{
"value": "2014",
"class": "col-0"
},
{
"value": "2015",
"class": "col-1"
}
],
"spans": [
2
],
"rowLabels": [
{
"value": "a",
"class": "row-label"
},
{
"value": "b",
"class": "row-label"
},
{
"value": "c",
"class": "row-label"
},
{
"value": "d",
"class": "row-label"
}
],
"rowTitle": "ca_subvar_1",
"rowVariableName": "categorical_array",
"colTitles": [
"quarter"
]
},
{
"rows": [
[
{
"value": 16.666666666666664,
"class": "marginal marginal-percentage"
},
{
"value": 25,
"pValue": 0.24821309601845032,
"class": "subtable-0 col-0"
},
{
"value": 0,
"pValue": -0.2482130960184501,
"class": "subtable-0 col-1"
}
],
[
{
"value": 50,
"class": "marginal marginal-percentage"
},
{
"value": 50,
"pValue": 0,
"class": "subtable-0 col-0"
},
{
"value": 50,
"pValue": 0,
"class": "subtable-0 col-1"
}
],
[
{
"value": 33.33333333333333,
"class": "marginal marginal-percentage"
},
{
"value": 25,
"pValue": -0.5464935495198773,
"class": "subtable-0 col-0"
},
{
"value": 50,
"pValue": 0.5464935495198773,
"class": "subtable-0 col-1"
}
],
[
{
"value": 0,
"class": "marginal marginal-percentage"
},
{
"value": 0,
"pValue": null,
"class": "subtable-0 col-0"
},
{
"value": 0,
"pValue": null,
"class": "subtable-0 col-1"
}
]
],
"colLabels": [
{
"value": "All"
},
{
"value": "2014",
"class": "col-0"
},
{
"value": "2015",
"class": "col-1"
}
],
"spans": [
2
],
"rowLabels": [
{
"value": "a",
"class": "row-label"
},
{
"value": "b",
"class": "row-label"
},
{
"value": "c",
"class": "row-label"
},
{
"value": "d",
"class": "row-label"
}
],
"rowTitle": "ca_subvar_2",
"rowVariableName": "categorical_array",
"colTitles": [
"quarter"
]
},
{
"rows": [
[
{
"value": 0,
"class": "marginal marginal-percentage"
},
{
"value": 0,
"pValue": null,
"class": "subtable-0 col-0"
},
{
"value": 0,
"pValue": null,
"class": "subtable-0 col-1"
}
],
[
{
"value": 33.33333333333333,
"class": "marginal marginal-percentage"
},
{
"value": 50,
"pValue": 0.045500259780248964,
"class": "subtable-0 col-0"
},
{
"value": 0,
"pValue": -0.045500259780248964,
"class": "subtable-0 col-1"
}
],
[
{
"value": 16.666666666666664,
"class": "marginal marginal-percentage"
},
{
"value": 25,
"pValue": 0.24821309601845032,
"class": "subtable-0 col-0"
},
{
"value": 0,
"pValue": -0.2482130960184501,
"class": "subtable-0 col-1"
}
],
[
{
"value": 50,
"class": "marginal marginal-percentage"
},
{
"value": 25,
"pValue": -0.0005320055485602548,
"class": "subtable-0 col-0"
},
{
"value": 100,
"pValue": 0.0005320055485602548,
"class": "subtable-0 col-1"
}
]
],
"colLabels": [
{
"value": "All"
},
{
"value": "2014",
"class": "col-0"
},
{
"value": "2015",
"class": "col-1"
}
],
"spans": [
2
],
"rowLabels": [
{
"value": "a",
"class": "row-label"
},
{
"value": "b",
"class": "row-label"
},
{
"value": "c",
"class": "row-label"
},
{
"value": "d",
"class": "row-label"
}
],
"rowTitle": "ca_subvar_3",
"rowVariableName": "categorical_array",
"colTitles": [
"quarter"
]
}
],
"display_settings":{
"valuesAreMeans": {"value": false},
"countsOrPercents": {"value": "percent"},
"percentageDirection": {"value": "colPct"},
"decimalPlaces": {"value": 1}
}
}
}
Object Reference
version 0.15
The Crunch REST API takes a decidedly column-oriented approach to data. A “column” is simply a sequence of values of the same type. A “variable” binds a name (and other metadata) to the column, and indeed may possess a series of columns over its lifetime as inserts and updates are made to it. A “dataset” is a set of variables. Each variable in the dataset is sorted the same way; the variables together form a relation. Reading the N'th item from each variable produces a row.
Interaction with the Crunch REST API is by variables and columns. When you add data to Crunch, you send a set of columns. When you fetch data from Crunch, you send a set of variable expressions and receive a set of columns. When you update data in Crunch, you send a set of expressions which tells Crunch how to update variables with new column data.
The Crunch API consists of just a few primitive objects, arranged differently for each request and response. Learning the basic components will help you create the most complicated queries.
Response types
Shoji entity
A Shoji entity is identified by the element
key having value shoji:entity
.
Its principal attribute is the body
key, which is an object containing the
attributes that describe the entity.
Shoji catalog
A catalog is identified by its element
key having value shoji:catalog
with
its principal attribute being index
that contains an object keyed by the URLs
of the entities it contains and for each key an object (tuple) with attributes
from the referenced entity.
Shoji catalogs are not ordered. For its ordered representations they may
provide an orders
set of Shoji order resources.
Shoji view
A Shoji view is identified by its element
key having value shoji:view
with
its principal attribute being value
. This can contain any arbitrary JSON
object.
Shoji order
Shoji orders are identified by the element
key having a value shoji:order
.
Their principal attribute is the graph
key which is an array containing the
order of present resources.
A shoji order may be associated with a catalog. In such case it will contain a subset or totality of the entities present in the catalog. The catalog remains as the authoritative source of available entities.
Any entity not present on the order but present in the catalog may be considered to belong at the bottom of the root of the graph in an arbitrary order, or may be excluded from view.
Statistical data
Identifiers
Datasets, variables, and other resources are always identified by strings. All identifiers are case-sensitive, and may contain any unicode character, including spaces. Examples:
- “q1”
- “My really useful dataset”
- “变量”
Data Values
Individual data values follow the JSON representations where possible. JSON exposes the following types: number, string, array, object, true, false, null. Crunch adds additional types with special syntax (see Types, below). Examples:
- 13
- 45.330495
- “foo”
- [3, 4, 5]
- {“bar”: {“a”: [12.4, 89.2, 0]}}
- true
- null
- “2014-03-02T14:29:59Z”
Because a single JSON type may be used to represent multiple Crunch types, you should never rely on the JSON type to interpret the class of a datum. Instead, inspect the type object (see below) to interpret the data.
Missing values
Crunch provides a robust “missing entries” system. Anywhere a (valid) data value can appear, a missing value may also appear. Missing values are represented by an object with a single “?” key. The value is a missing integer code (see Missing reasons, below); negative integers are reserved for system-generated reasons, user-defined reasons are automatically assigned positive integers. Examples:
- {“?”: -1}
- {“?”: 24}
Arrays
A set of data values (and/or missing values) which are of the same type can be ordered in an array. All entries in an array are of the same Crunch type.
Examples:
- [13, 4, 5, {“?”: -2}, 7, 2]
- [“foo”, “bar”]
Enumerations
Some arrays, rather than repeating a small set of large values, benefit from storing a small integer code instead, moving the larger values they represent into the metadata, and doing lookups when needed to encode/decode. The “categorical” type is the most common example of this: rather than store an array of large string names like [“Internet Explorer”, “Internet Explorer”, “Firefox”, …] it instead stores integer codes like: [1, 1, 2], placing the longer strings in the metadata as type.categories = [{“id”: 1, “name”: “Internet Explorer”, …}, …]. We call this encoding process enumeration, and its opposite, where the coded are re-expanded into their original values, elaboration.
Enumeration also provides the opportunity to order the possible values, as well as include potential values which do not yet exist in the data array itself.
Enumeration typically causes the volume of data to shrink dramatically, and can speed up very common operations like filtering, grouping, and almost any data transfer. Because of this, it is common to:
- Enumerate a data array as early as possible. Indeed, when a variable can be enumerated, the fastest way to insert new data is to send the new values as the integer codes.
- Elaborate a data array as late as possible. As long as the metadata is shipped along with the enumerated data, the transfer size and therefore time is much smaller. Many cases do not even call for a complete elaboration of the entire column.
Variable Definitions
Crunch employs a structural type system rather than a nominative one. The variable definition includes more knowledge than just the type name (numeric, text, categorical, etc); we also learn details about range, precision, missing values and reasons, order, etc. For example:
{
"type": "categorical",
"name": "Party ID",
"description": "Do you consider yourself generally a Democrat, a Republican, or an Independent?",
"categories": [
{
"name": "Republican",
"numeric_value": 1,
"id": 1,
"missing": false
},
{
"name": "Democrat",
"numeric_value": -1,
"id": 2,
"missing": false
},
{
"name": "Independent",
"numeric_value": 0,
"id": 3,
"missing": false
}
]
}
This section describes the metadata of a variable as exposed across HTTP, both expected response values and valid input values.
Variable types
The “type” of a Variable is a string which defines the superset of values from which the variable may draw. The type governs not only the set of values but also their syntax. (See below.)
The following types are defined for public use:
- text
- numeric
- categorical
- datetime
- multiple_response
- categorical_array
Variable names
Variables in Crunch have multiple attributes that provide identifying information: “name”, “alias”, and “description”.
name
Crunch takes a principled stand that variable “names” should be for people, not for computers.
You may be used to domains that have variable “name”, “label”, and “description”. Name is some short, unique, machine-friendlier ID like “Q2”; label is short and human-friendly, something like “Brand awareness”, and description is where you might put question wording if you have survey data. Crunch has “alias”, “name”, and “description”. What you may be used to thinking of as a variable name, we consider as an alias: something for more internal use, not something appropriate for a polished dataset ready to share with people who didn’t create the dataset (See more in the “Alias” section below). In Crunch, the variable’s “name” is what you may be used to thinking of as a label.
All variables must have a name, and these names must be unique across all variables, including “hidden” variables (see below) but excluding subvariables (see “Subvariables” below). Within an array variable, subvariable names must be unique. (You can think of subvariable names within an array as being variable_name.subvariable_name, and with that approach, all “variable names” must be unique.)
Names must be a string of length greater than zero, and any valid unicode string is allowed. See “Identifiers” above.
alias
Alias is a string identifier for variables. It must be unique across all variables, including subvariables, such that it can be used as an identifier. This is what legacy statistical software typically calls a variable name.
Aliases have several uses. Client applications, such as those exposing a scripting interface, may want to use aliases as a more machine-friendly, yet still human-readable, way of referencing variables. Aliases may also be used to help line up variables across different import batches.
When creating variables via the API, alias is not a required field; if omitted, an alias will be generated. If an alias is supplied, it must be unique across all variables, including subvariables, and the new variable request will be rejected if the alias is not unique. When data are imported from file formats that have unique variable names, those names will in many cases be used as the alias in Crunch.
description
Description is an optional string that provides more information about the variable. It is displayed in the web application on variable summary cards and with analyses.
Type-specific attributes
These attributes must be present for the specified variable types when creating a variable, but they are not defined for other types.
categories
Categorical variables must contain an array of Category objects, each of which includes:
- id: a read-only integer identifier for the category. These correspond to the data values.
- name: the string name which applications should use to identify the category.
- numeric_value: the numeric value bound to each name. If no numeric value should be bound, this should be null. numeric_values need not be unique, and they may be
null
. - missing: boolean indicating whether the data corresponding to this category should be interpreted as missing.
- selected: (optional) boolean indicating whether this category corresponds to a selected value for a dichotomized variable, i.e. part of a multiple response variable. Not required for regular categorical variables, and defaults to
false
if omitted. There also is no requirement for Categories in a multiple-response variable that only one Category be marked “selected”.
Categories are valid if:
- Category names are unique within the set
- Category ids are unique within the set
- Category ids for user-defined categories are positive integers no greater than 32767. Negative ids are reserved for system missing reasons. See “missing_reasons” below.
The order of the array defines the order of the categories, and thus the order in which aggregate data will be presented. This order can be changed by saving a reordered set of Categories.
subvariables
Multiple Response and Categorical Array variables contain an array of subvariable references. In the HTTP API, these are presented as URLs. To create a variable of type “multiple_response” or “categorical_array”, you must include a “subvariables” member with an array of subvariable references. These variables will become the subvariables in the new array variable.
Like Categories, the array of subvariables within an array variable indicate the order in which they are presented; to reorder them, save a modified array of subvariable ids/urls.
subreferences
Multiple Response and Categorical Array variables contain an object of subvariable “references”: names, alias, description, etc. To create a variable of type “multiple_response” or “categorical_array” directly, you must include a “subreferences” member with an object of objects. These label the subvariables in the new array variable.
The shape of each subreferences member must contain a name and optionally an alias. Note that the subreferences is an unordered object. The order of the subvariables is read from the “subvariables” attribute.
{
"type": "categorical_array",
"name": "Example array",
"categories": [
{
"name": "Category 1",
"numeric_value": 1,
"id": 1,
"missing": false
},
{
"name": "Category 2",
"numeric_value": 0,
"id": 2,
"missing": false
}
],
"subvariables": [
"/api/datasets/abcdef/variables/abc/subvariables/1/",
"/api/datasets/abcdef/variables/abc/subvariables/2/",
"/api/datasets/abcdef/variables/abc/subvariables/3/"
],
"subreferences": {
"/api/datasets/abcdef/variables/abc/subvariables/2/": {"name": "subvariable 2", "alias": "subvar2_alias"},
"/api/datasets/abcdef/variables/abc/subvariables/1/": {"name": "subvariable 1"},
"/api/datasets/abcdef/variables/abc/subvariables/3/": {"name": "subvariable 3"}
}
}
resolution
Datetime variables must have a resolution string that indicates the unit size of the datetime data. Valid values include “Y”, “M”, “D”, “h”, “m”, “s”, and “ms”. Every datetime variable must have a resolution.
Other definition attributes
These attributes may be supplied on variable creation, and they are included in API responses unless otherwise noted.
format
An object with various members to control the display of Variable data:
- data: An object with a “digits” member, stating how many digits to display after the decimal point.
- summary: An object with a “digits” member, stating how many digits to display after the decimal point.
view
An object with various members to control the display of Variable data:
- show_codes: For categorical types only. If true, numeric values are shown.
- show_counts: If true, show counts; if false, show percents.
- include_missing: For categorical types only. If true, include missing categories.
- include_noneoftheabove: For multiple-response types only. If true, display a “none of the above” category in the requested summary or analysis.
- geodata: A list of associations of a variable to Crunch geodatm entities. PATCH a variable entity amending the
view.geodata
in order to create, modify, or remove an association. An association is an object with required keysgeodatum
,feature_key
, and optionalmatch_field
. The geodatum must exist;feature_key
is the name of the property of each ‘feature’ in the geojson/topojson that corresponds to thematch_field
of the variable (perhaps a dotted string for nested properties; e.g. ”properties.postal-code”). By default,match_field
is “name”: a categorical variable will match category names to thefeature_key
present in the given geodatum.
discarded
Discarded is a boolean value indicating whether the variable should be viewed as part of the dataset. Hiding variables by setting discarded to True is like a soft, restorable delete method.
Default is false
.
private
If true
, the variable will not show in the common variable catalog; instead, it will be included in the personal variables catalog.
missing_reasons
An object whose keys are reason strings and whose values are the codes used for missing entries.
Crunch allows any entry in a column to be either a valid value or a missing code. Regardless of the class, missing codes are represented in the interface as an object with a single “?” key mapped to a single missing integer code. For example, a segment of [4.56, 9.23, {“?”: -1}] includes 2 valid values and 1 missing value.
The missing codes map to a reason phrase via this “missing reasons” type member. Entries which are missing for reasons determined by the system are negative integers. Users may define their own missing reasons, which receive positive integer codes. Zero is a reserved value.
In the above example, the code of -1 would be looked up in a missing reasons map such as:
{
"missing reasons": {
"no data": -1,
"type mismatch": -2,
"my backup was corrupted": 1
}
}
See the Endpoint Reference for user-defined missing reasons.
Categorical variables do not require a missing_reasons object because the categories array contains the information about missingness.
Values
When creating a new variable, one can also include a “values” member that contains the data column corresponding to the variable metadata. See Importing Data: Column-by-column. This subsection outlines how the various variable types have their values formatted both when one supplies values to add to the dataset and when one requests values from a dataset.
Text
Text values are an array of quoted strings. Missing values are indicated as {"?": <integer>}
, as discussed above, and all integer missing value codes must be defined in the “missing_reasons” object of the variable’s metadata.
Numeric
A “numeric” value will always be e.g. 500 (a number, without quotes) in the JSON request and response messages, not “500” (a string, with quotes). Missing values are handled as with text variables.
Categorical
Insert an array of integers that correspond to the ids of the variable’s categories. Only integers found in the category ids are allowed. That is, you cannot insert values for which there is no category metadata. It is, however, permitted to have categories defined for which there are no values.
Datetime
Datetime input and output are in ISO-8601 formatted strings.
Arrays
Crunch supports array type variables, which contain an array of subvariables. “Multiple response” and “Categorical array” are both arrays of categorical subvariables. Subvariables do not exist as independent variables; they are exposed as “virtual” variables in some places, and can be analyzed independently, but they do not have their own type or categories.
Arrays are currently always categorical, so they sned and receive data in the same format: category ids. The only difference is that regular categorical variables sned and receive one id per row, where arrays send and receive a list of ids (of equal length to the number of subvariables in the array).
Variables
A complete Variable, then, is simply a Definition combined with its data array.
Expressions
Crunch expressions are used to compute on a dataset, to do nuanced selects, updates, and deletes, and to accomplish many other routine operations. Expressions are JSON objects in which each term is wrapped in an object which declares whether the term is a variable, a value, or a function, etc. While verbose, doing so allows us to be more explicit about the operations we wish to do.
Expressions generally contain references to variables, values, or columns of values, often composed in functions. The output of expressions can be other variables, values, boolean masks, or cube aggregations, depending on the context and expression content. Some endpoints have special semantics, but the general structure of the expressions follows the model described below.
Variable terms
Terms refer to variables when they include a “variable” member. The value is the URL for the desired variable. For example:
{"variable": "../variables/X/"}
{"variable": "https://app.crunch.io/api/datasets/48ffc3/joins/abcd/variables/Y/"}
URLs must either be absolute or relative to the URL of the current request. For example, to refer to a variable in a query at https://app.crunch.io/api/datasets/48ffc3/cube/
, a variable at https://app.crunch.io/api/datasets/48ffc3/variables/9410fc/
may be referenced by its full URL or by “../variables/9410fc/”.
Value terms
Terms refer to data values when they include a “value” member. Its value is any individual data value; that is, a value that is addressable by a column and row in the dataset. For example:
{"value": 13}
{"value": [3, 4, 5]}
Note that individual values may themselves be complex arrays or objects, depending on their type. You may explicitly include a “type” member in the object, or let Crunch infer the type. One way to do this is to use the “typeof” function to indicate that the value you’re specifying corresponds to the exact type of an existing variable. See “functions” below for more details.
Column terms
Terms refer to columns (construct them, actually) when they include a “column” member. The value is an array of data values. You may include “type” and/or “references” members as well.
{"column": [1, 2, 3, 17]}
{"column": [{"?": -2}, 1, 4, 1], "type": {"class": "categorical", "categories": [...], ...}}
Function terms
Terms refer to functions (and operators) when they include a “function” member. The value is the identifier for the desired function. They parameterize the function with an “args” member, whose value is an array of terms, one for each argument. Examples:
{"function": "==", "args": [{"variable": "../variables/X/"}, {"value": 13}]}
{"function": "contains", "args": [{"variable": "../joins/abcd/variables/Y/"}, {"value": "foo"}]}
You may include a “references” member to provide a name, alias, description, etc to the output of the function.
Supported functions
Here is a list of all functions available for crunch expressions. Note that these functions can be used in conjuction to compose an expression.
Binary functions
+
add-
subtract*
multiply/
div divide//
floor division^
power%
modulus&
bitwise and|
bitwise or~
invert
Builtin functions
array
Return the given Frame materialized as an array.as_selected
Return the given variable reduced to the [1, 0, -1] “selections” type.bin
Return column’s values broken into equidistant bins.case
Evaluate the given conditions in order, selecting the corresponding choice.cast
Return a Column of column’s values cast to the given type.char_length
Return the length of each string (or missing reason) in the given column.copy_variable
Returns a copy of the column with a copy of its metadata.combine_categories
Return a column of categories combined according to the category_info.combine_responses
Combine the given categorical variables into a new one.current_batch
Return the batch_id of the current frame.get
Return a subvariable from the given column.lookup
Map each row of source through its keys index to a corresponding value.missing
Return the given column as missing for the given reason.normalize
Return a Column with the given values normalized so sum© == len©.row
Return a Numeric column with row indices.selected_array
Return a bool Array from the given categorical, plus None/none/any .selected_depth
Return a numeric column containing the number of selected categories in each row of the given array.selections
Return the given array, reduced to the [1, 0, -1] “selections” type, plus an__any__
magic subvariable.subvariables
Return a Frame containing subvariables of the given array.tiered
Return a variable formed by collapsing the given array’s subvariables in the given category tiers.typeof
Return (a copy of) the Type of the given column.unmissing
Return the given column with user missing replaced by valid values.
Comparisons
==
equals!=
not equals=><=
betweenbetween
between<
less than>
greater than<=
less than or equal>=
greater than or equalin
inall
True for each row where all subvariables in a multiple_response array are selectedany
True for each row where any subvariable in a multiple_response array is selectedis_none_of_the_above
True for each row where no subvariables in a multiple_response array are selected, unless all subvariables have missing valuescontains
Return a mask where A is an element of array B, or a key of object B.icontains
Case-insensitive version of ‘contains’~=
compare against regular expression (regex)and
logical andor
logical ornot
logical notis_valid
Boolean array of rows which are valid for the given columnis_missing
Boolean array of rows which are missing for the given columnany_missing
Boolean array of rows where any of the subvariables are missingall_valid
Boolean array of rows where all of the subvariables are validall_missing
Boolean array of rows where all of the subvariables are missing
Date Functions
default_rollup_resolution
default_rollup_resolutiondatetime_to_numeric
Convert the given datetime column to numeric.format_datetime
Convert datetime values to strings using the fmt as strftime mask.numeric_to_datetime
Convert the given numeric column to datetime with the given resolution.parse_datetime
Parse string to datetime using optional format stringrollup
Return column’s values (which must be type datetime) into calendrical bins.
Frame Functions
page
Return the given frame, limited/offset by the given values.select
Return a Frame of results from the given map of variables.sheet
Return the given frame, limited/offset in the number of variables.dependents
Return the given frame with only dependents of the given variable.deselect
Return a frame NOT including the indicated variables.adapt
Return the given frame adapted to the given to_key.join
Return a JoinedFrame from the given list of subframes.find
Return a Frame with those variables which match the given criteria.flatten
Return a frame including all variables, plus all subvariables at dotted ids.
Examples
{
"function": "select",
"args": [{
"map": {
<destination id>: {variable: <source frame id>},
<destination id>: {variable: <source frame id>},
...
}
}]
}
- select: Receives an argument which is a map expression in the following shape:
Where destination id
is the ID that the mapped variable will have on the
resulting frame by selecting only the source frame id
variables from the
frame where this function is applied on.
{
"function": "deselect",
"args": [{
"map": {
<destination id>: {variable: <source frame id>},
<destination id>: {variable: <source frame id>},
...
}
}]
}
- deselect: Same as
select
but will exclude the variable ids mentioned from the source frame. On this usage thedestination id
part of themap
argument are disregarded.
Measures Functions
cube_count
cube_distinct_count
cube_max
A measure which returns the maximum value in a column.cube_mean
cube_min
A measure which returns the minimum value in a column.cube_missing_frequencies
Return an object with parallel 'code’ and 'count’ arrays.cube_quantile
cube_stddev
A measure which returns the standard deviation value in a column.cube_sum
cube_valid_count
cube_weighted_max
cube_weighted_min
top
Return the given (1D/1M) cube, filtered to its top N members.
Cube Functions
autocube
Return a cube crossing A by B (which may be None).autofreq
Return a cube of frequencies for A.cube
Return a Cube instance from the given arguments.each
Yield one expression result per item in the given iterable.multitable
Return cubes for each target variable crossed by None + each template variable.transpose
Transpose the given cube, rearranging its (0-based) axes to the given order.stack
Return a cube of 1 more dimension formed by stacking the given array.
Filter terms
Terms that refer to filters entities by URL are shorthand for the boolean expression stored in the entity. So, {"filter": "../filters/59fc4d/"}
yields the Crunch expression contained in the Filter entity’s “expression” attribute. Filter terms can be combined together with other expressions as well. For example, {"function": "and", "args": [{"filter": "../filters/59fc4d/"}, {"function": "==", "args": [{"variable": "../variables/X/"}, {"value": 13}]}]}
would “and” together the boolean expression in filter 59fc4d with the X == 13
expression.
Documents
Shoji
Most representations returned from the API are Shoji Documents. Shoji is a media type designed to foster scalable API’s. Shoji is built with JSON, so any JSON parser should be able to at least deserialize Shoji documents. Shoji adds four document types: Entity, Catalog, View, and Order.
Entity
Anything that can be thought of as “a thing by itself” will probably be represented by a Shoji Entity Document. Entities possess a “body” member: a JSON object where each key/value pair is an attribute name and value. For example:
{
"element": "shoji:entity",
"self": "https://.../api/users/1/",
"description": "Details for a User.",
"specification": "https://.../api/specifications/users/",
"fragments": {
"address": "address/"
},
"body": {
"first_name": "Genghis",
"last_name": "Khan"
}
}
In general, an HTTP GET to the “self” URL will return the document, and a PUT of the same will update it. PUT should not be used for partial updates–use PATCH for that instead. In general, each member included in the “body” of a PATCH message will replace the current representation; attributes not included will not be altered. There is no facility to remove an attribute from an Entity.body via PATCH. In some cases, however, even finer-grained control is possible via PATCH; see the Endpoint Reference for details.
Catalog
Catalogs collect or contain entities. They act as an index to a collection, and indeed possess an “index” member for this:
{
"element": "shoji:catalog",
"self": "https://.../api/users/",
"description": "A list of all the users.",
"specification": "https://.../api/specifications/users/",
"orders": {
"default": "default_order/"
},
"index": {
"2/": {"active": true},
"1/": {"active": false},
"4/": {"active": true},
"3/": {"active": true}
}
}
Each key in the index is a URL (possibly relative to “self”) which refers to a different resource. Often, these are Shoji Entity documents, but not always. The index also allows some attributes to be published as a set, rather than in each individual Entity. This allows clients to act on the collection as a whole, such as when rendering a list of references from which the user might select one entity.
In general, an HTTP GET to the “self” URL will return the document, and a PUT of the same will update it. Many catalogs allow POST to add a new entity to the collection. PUT should not be used for partial updates–use PATCH for that instead. In general, each member included in the “index” of a PATCH message will replace the current representation; tuples not included will not be altered. Tuples included in a PATCH which are not present in the server’s current representation of the index may be added; it is up to each resource whether to support (and document!) this approach or prefer POST to add entities to the collection. In general, catalogs that contain entities get new entities created by POST, while catalogs that collect entities that are contained by other catalogs (e.g. a catalog of users who have permissions on a dataset) will have entities added by PATCH.
Similarly, removing entities from catalogs is supported in one of two ways, typically varying by catalog type. For catalogs that contain entities, entities are removed only by DELETE on the entity’s URL (its key in the Catalog.index). In contrast, for catalogs that collect entities, entities are removed by PATCHing the catalog with a null
tuple. This removes the entity from the catalog but does not delete the entity (which is contained by a different catalog). T
View
Views cut across entities. They can publish nearly any arrangement of data, and are especially good for exposing arrays of arrays and the like. In general, a Shoji View is read-only, and only a GET will be successful.
Order
Orders can arrange any set of strings into an arbitrarily-nested tree; most often, they are used to provide one or more orderings of a Catalog’s index. For example, each user may have their own ordering for an index of variables; the same URL’s from the index keys are arranged in the Order. Given the Catalog above, for example, we might produce an Order like:
{
"element": "shoji:order",
"self": "https://.../api/users/order/",
"graph": [
"2/",
{"group A": ["1/", "3/", "2/"]},
{"group B": ["4/"]}
]
}
This represents the tree:
/ | \
2 {A} {B}
/ | \ \
1 3 2 4
The Order object itself allows lots of flexibility. Each of the following decisions are up to the API endpoint to constrain or not as it sees fit (see the Endpoint Reference for these details):
- Not every string in the original set has to be present, allowing partial orders.
- Strings from the original set which are not mentioned may be ignored, or default to an “ungrouped” group, or other behaviors as each application sees fit.
- Groups may contain member strings and other groups interleaved (but still ordered).
- Groups may exist without any members.
- Members may appear in more than one group.
- Group names may be repeated at different points within the tree.
- Group member arrays, although represented in a JSON array, may be declared to be non-strict in their order (that is, the array should be treated more like an unordered set).
Crunch Objects
Most of the other representations returned from the API are Crunch Objects. They are built with JSON, so any JSON parser should be able to at least deserialize Crunch documents. Crunch adds two document types: Table and Cube.
Table
Tables collect columns of data and (optionally) their metadata into two-dimensional relations.
{
"element": "crunch:table",
"self": "https://.../api/datasets/.../table/?limit=7",
"description": "The data belonging to this Dataset.",
"metadata": {
"1ef0455": {"name": "Education", "type": "categorical", "categories": [...], ...},
"588392a": {"name": "Favorite color", "type": "text", ...}
},
"data": {
"1ef0455": [6, 4, 7, 7, 3, 2, 1],
"588392a": ["green", "red", "blue", "Red", "RED", "pink", " red"]
}
}
Each key in the “data” member is a variable identifier, and its corresponding value is a column of Crunch data values. The data values in a given column are homogeneous, but across columns they are heterogeneous. The lengths of all columns MUST be the same. The “metadata” member is optional; if given, it MUST contain matching keys that correspond to variable definitions.
Like any JSON object, the “data” and “metadata” objects are explicitly unordered. When supplying a crunch:table, such as when POST'ing to datasets/ to create a new dataset, you must supply an Order if you want an explicit variable order.
Cube
Cubes have both input and output formats. The “crunch:cube” element is used for the output only.
Cube input
The input format may vary slightly according to the API endpoint (since some parameters may be inherent in the particular resource), but involves the same basic ingredients.
Example:
{
"dimensions": [
{"variable": "datasets/ab8832/variables/3ffd45/"},
{"function": "+", "args": [{"variable": "datasets/ab8832/variables/2098f1/"}, {"value": 5}]}
],
"measures": {
"count": {"function": "cube_count", "args": []}
}
}
dimensions
An array of input expressions. Each expression contributes one dimension to the output cube. The only exception is when a dimension results in a boolean (true/false) column, in which case the data are filtered by it as a mask instead of adding a dimension to the output.
When a dimension is added, the resulting axis consists of distinct values rather than all values. Variables which are already “categorical” or “enumerated” will simply use their “categories” or “elements” as the extent. Other variables form their extents from their distinct values.
measures
A set of cube functions to populate each cell of the cube. You can request multiple functions over the same dimensions (such as “cube_mean” and “cube_stddev”) or more commonly just one (like “cube_count”). Each member MUST be a ZZ9 cube function designed for the purpose. See ZZ9 User Guide:Cube Functions for a list of such functions and their arguments.
filters
An array containing references to filters that need to be applied to the dataset before starting the cube calculations. It can be an empty array or null, in which case no filtering will be applied.
weight
A reference to a variable to be used as the weight on all cube operations.
Cube output
Cubes collect columns of measure data in an arbitrary number of dimensions. Multiple measures in the same cube share dimensions, effectively overlaying each other. For example, a cube might contain a “count” measure and a “mean” measure with the same shape:
{
"element": "crunch:cube",
"n": 210,
"missing": 12,
"dimensions": [
{"references": {"name": "A", ...}, "type": {"class": "categorical", "categories": [{"id": 1, ...}, {"id": 2, ...}, {"id": 3, ...}]}},
{"references": {"name": "B", ...}, "type": {"class": "categorical", "categories": [{"id": 11, ...}, {"id": 12, ...}]}}
],
"measures": {
"count": {
"metadata": {"references": {}, "type": {"class": "numeric", "integer": true, ...}},
"data": [10, 20, 30, 40, 50, 60],
"n_missing": 12
},
"mean": {
"metadata": {"references": {}, "type": {"class": "numeric", ...}},
"data": [3.5, 17.8, 9.9, 7.32, 0, 23.4],
"n_missing": 12
}
},
"margins": {
"data": [210],
"0": {"data": [30, 70, 110]},
"1": {"data": [90, 120]}
}
}
dimensions
The “dimensions” member is the most straightforward: an array of variable Definition objects. Each one defines an axis of the cube’s output. This may be different from the input dimensions’ definitions. For example, when counting numeric variables, the input dimension might be an expression involving the bin builtin function. Even though the input variable is of type “numeric”, the output dimension would be of type “enum” .
n
The number of rows considered for all measures.
measures
The “measures” member includes one object for each measure. The “metadata” member of each tells you the name, type and other definitions of the measure. The “data” member of each is a flattened array of values for that measure; the dimensions stride into that array in order, with the last dimension varying the fastest. In the example above, the first dimension (“A”) has 3 categories, while “B” has 2; therefore, the “flat” array [10, 20, 30, 40, 50, 60] for the “count” measure is interpreted as the “unflattened” array [[10, 20], [30, 40], [50, 60]]. Graphically:
B:11 | B:12 | |
---|---|---|
A:1 | 10 | 20 |
A:2 | 30 | 40 |
A:3 | 50 | 60 |
This is known in NumPy and other domains as “C order” (versus “Fortran order” which would be interpreted as [[10, 30, 50], [20, 40, 60]] instead).
n_missing
The number of rows that are missing for this measure. Because different measures may have different inputs (the column to take the mean of, for example, or weighted versus unweighted), this number may vary from one measure to another even though the total “n” is the same for all.
margins
The “margins” member is optional. When present, it is a tree of nested margins with one level of depth for each dimension. At the top, we always include the “grand total” for all dimensions. Then, we include a branch for each axis we “unroll”. So, for example, for a 3-dimensional cube of X, Y, and Z, the margins member might contain:
{
"margins": {
"data": [4526],
"0": {
"data": [1755, 2771],
"1": {"data": [
[601, 370, 322, 269, 147, 46],
[332, 215, 596, 523, 437, 668]
]},
"2": {"data": [[1198, 557], [1493, 1278]]}
},
"1": {
"data": [933, 585, 918, 792, 584, 714],
"0": {"data": [
[601, 370, 322, 269, 147, 46],
[332, 215, 596, 523, 437, 668]
]},
"2": {"data": [
[825, 108], [560, 25], [325, 593],
[417, 375], [191, 393], [373, 341]
]}
},
"2": {
"data": [2691, 1835],
"0": {"data": [[1198, 557], [1493, 1278]]},
"1": {"data": [
[825, 108], [560, 25], [325, 593],
[417, 375], [191, 393], [373, 341]
]}
}
}
Again, each branch in the tree is an axis we “unroll” from the grand total. So margins[0][2] contains the margin where X (axis 0) and Z (axis 2) are unrolled, and only Y (axis 1) is still “rolled up”.