Introduction

Crunch exposes a REST API for third parties, and indeed its own UI, to manage datasets. This API is also used by the Python and R libraries. This User Guide is for developers who are writing applications on top of the Crunch REST API, with or without those language bindings. It describes the existing interfaces for the current version and attempts to provide context and examples to guide development.

The documents are organized in three overlapping scopes: a feature guide, which provide higher-level vignettes that illustrate key features; an endpoint reference, which describes individual URIs in detail; and an object reference, which defines the building blocks of the Crunch platform, such as values, columns, types, variables, and datasets.

Feature Guide

Authentication

POST /api/public/login/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
Content-Length: 67

{
    "email": "fake.user@example.com",
    "password": "password"
}

HTTP/1.1 204 No Content
Set-Cookie: token=dac20c82c79a514d572b4f5d7e11cb53; Domain=.crunch.io; Max-Age=31536000; Path=/
Vary: Cookie, Accept-Encoding

library(crunch)
login("fake.user@example.com", "password")
# See ?login for options, including how to store your credentials
# in your .Rprofile

import pycrunch

curl -c cookie-jar
    -X POST
    -d '{"email": "fake.user@example.com", "password": "password"}'
    -H "Content-type: application/json"
    https://app.crunch.io/api/public/login/

# The above command will perform a login and save the login cookie to a file called 'cookie-jar'.
# After this, you can access the endpoint via `curl' commands (POST, GET, PATCH), as long as the '-b cookie-jar' flag is present.  Note, -b not -c.  -c saves cookies, -b submits cookies from the existing file. It is good practice to delete this file when you are done.

Replace “fake.user@example.com” and “password” with your email and password, respectively.

Nearly all interactions with the Crunch API need to be authenticated. The standard password authentication method involves POSTing credentials and receiving a cookie back, which should be included in subsequent requests.

The client should then store the Cookie and pass it along with each subsequent request.

Failure will return 401 Unauthorized.

Crunch also supports OAuth 2.0/OpenID Connect. See the public endpoint reference for more on how to authenticate with OAuth.

If you’d like to add your auth provider to the set of supported providers, contact support@crunch.io

Password policy

Password policy is as follows:

Password must be 8 characters or longer
Password must contain at least 4 unique characters

Examples:

Secure 8 character password: 4B3a8f4$
Secure passphrase: correct horse battery staple

We highly recommend that users use long pass phrases instead of passwords for security and for ease of remembering the credentials for accessing the service.

Importing Data

There are several ways to build a Crunch dataset. The most appropriate method for you to create a dataset depends primarily on the format in which the data is currently stored.

Import from a data file

In some cases, you already have a file sitting on your computer which has source data, in CSV or SPSS format (or a Zip file containing a single file in CSV or SPSS format). You can upload these to Crunch and then attach them to datasets by following these steps.

1. Create a Dataset entity


POST /datasets/ HTTP/1.1
Content-Type: application/shoji
Content-Length: 974
...
{
    "element": "shoji:entity",
    "body": {
        "name": "my survey",
        ...
    }
}
--------
201 Created
Location: /datasets/{dataset_id}/

ds <- newDatasetFromFile("my.csv", name="my survey")
# All three steps are handled within newDatasetFromFile

POST a Dataset Entity to the datasets catalog. See the documentation for POST /datasets/ for details on valid attributes to include in the POST.

2. Upload the file


POST /sources/ HTTP/1.1
Content-Length: 8874357
Content-Type: multipart/form-data; boundary=df5b17ff463a4cb3aa61cf02224c7303

--df5b17ff463a4cb3aa61cf02224c7303
Content-Disposition: form-data; name="uploaded_file"; filename="my.csv"
Content-Type: text/csv

"case_id","q1","q2"
234375,3,"sometimes"
234376,2,"always"
...
--------
201 Created
Location: /sources/{source_id}/

POST the file to the sources catalog.

Note that if the file is large (>100 megabytes), you should consider uploading it to a file-sharing service, like Dropbox.

To import from a URL (rather than a local file), use a JSON body with a location property giving the URL.


POST /sources/ HTTP/1.1
Content-Length: 71
Content-Type: application/json

{"location": "https://www.dropbox.com/s/znpoawnhg0rdzhw/iris.csv?dl=1"}

3. Add the Source to the Dataset

POST /datasets/{dataset_id}/batches/ HTTP/1.1
Content-Type: application/json
...
{
    "element": "shoji:entity",
    "body": {
        "source": "/sources/{source_id}/"
    }
}
--------
202 Continue
Location: /datasets/{dataset_id}/batches/{batch_id}/
...
{
    "element": "shoji:view",
    "value": "/progress/{progress_id}/"
}

POST the URL of the just-created source entity (the Location in the 201 response from the previous step) to the batches catalog of the dataset entity created in step 1.

The POST to the batches catalog will return 202 Continue status, and the response body contains a progress URL. Poll that URL to monitor the completion of the batch addition. See “Progress” for more. The 202 response will also contain a Location header with the URL of the newly created batch.

Metadata document + CSV

This approach may be most natural for importing data from databases that store data by rows. You can dump or export your database to Crunch’s JSON metadata format, plus a CSV of data, and upload those to Crunch, without requiring much back-and-forth with the API.

1. Create a Dataset entity with variable definitions

POST /datasets/ HTTP/1.1
Content-Type: application/shoji
Content-Length: 974
...
{
    "element": "shoji:entity",
    "body": {
        "name": "my survey",
        ...,
        "table": {
            "element": "crunch:table",
            "metadata": {
                "educ": {"name": "Education", "alias": "educ", "type": "categorical", "categories": [...], ...},
                "color": {"name": "Favorite color", "alias": "color", "type": "text", ...},
                "state": {"name": "State", "alias": "state", "view": {"geodata": [{"geodatum": <uri>, "feature_key": "properties.postal-code"}]}}
            },
            "order": ["educ", {'my group": "color"}]
        },
    }
}
--------
201 Created
Location: /datasets/{dataset_id}/

POST a Dataset Entity to the datasets catalog, and in the “body”, include a Crunch Table object with variable definitions and order.

The “metadata” member in the table is an object containing all variable definitions, keyed by variable alias. See the Object Reference: Variable Definitions discussion for specific requirements for defining variables of various types, as well as the example below.

The “order” member is a Shoji Order object specifying the order, potentially hierarchically nested, of the variables in the dataset. The example below illustrates how this can be used. Shoji is JSON, which means the “metadata” object is explicitly unordered. If you wish the variables to have an order, you must supply an order object rather than relying on any order of the “metadata” object.

It is possible to create derived variables using any of the derivation functions available simulaneously in one request when creating the dataset along its metadata. The variable references inside the derivation expressions must point to declared aliases of variables or subvariables.

POST /datasets/ HTTP/1.1
Content-Type: application/shoji
Content-Length: 3294
...
{
    "element": "shoji:entity",
    "body": {
      "name": "Dataset with derived arrays",
      "settings": {
        "viewers_can_export": true,
        "viewers_can_change_weight": false,
        "min_base_size": 3,
        "weight": "weight_variable",
        "dashboard_deck": null
      },
      "table": {
        "metadata": {
           "element": "crunch:table"
           "weight_variable": {
                "name": "weight variable",
                "alias": "weight_variable",
                "type": "numeric"
           },
           "combined": {
              "name": "combined CA", 
              "derivation": {
                "function": "combine_categories", 
                "args": [
                  {
                    "variable": "CA1"
                  }, 
                  {
                    "value": [
                      {
                        "combined_ids": [2], 
                        "numeric_value": 2, 
                        "missing": false, 
                        "name": "even", 
                        "id": 1
                      }, 
                      {
                        "combined_ids": [1], 
                        "numeric_value": 1, 
                        "missing": false, 
                        "name": "odd", 
                        "id": 2
                      }
                    ]
                  }
                ]
              }
            },
          "numeric": {
            "name": "numeric variable",
            "type": "numeric"
          },
          "numeric_copy": {
            "name": "Copy of numeric",
            "derivation": {
                "function": "copy_variable",
                "args": [{"variable": "numeric"}]
            }
          },
          "MR1": {
              "name": "multiple response", 
              "derivation": {
                "function": "select_categories", 
                "args": [
                  {
                    "variable": "CA3"
                  }, 
                  {
                    "value": [
                      1
                    ]
                  }
                ]
              }
            },
          "CA3": {
            "name": "cat array 3", 
            "derivation": {
              "function": "array", 
              "args": [
                {
                  "function": "select", 
                  "args": [
                    {
                      "map": {
                        "var1": {
                          "variable": "ca2-subvar-2", 
                          "references": {
                            "alias": "subvar2", 
                            "name": "Subvar 2"
                          }
                        }, 
                        "var0": {
                          "variable": "ca1-subvar-1", 
                          "references": {
                            "alias": "subvar1", 
                            "name": "Subvar 1"
                          }
                        }
                      }
                    }, 
                    {
                      "value": ["var1", "var0"]
                    }
                  ]
                }
              ]
            }
          }, 
          "CA2": {
            "subvariables": [
              {
                "alias": "ca2-subvar-1", 
                "name": "ca2-subvar-1"
              }, 
              {
                "alias": "ca2-subvar-2", 
                "name": "ca2-subvar-2"
              }
            ], 
            "type": "categorical_array", 
            "name": "cat array 2", 
            "categories": [
              {
                "numeric_value": null, 
                "missing": false, 
                "id": 1, 
                "name": "yes"
              }, 
              {
                "numeric_value": null, 
                "missing": false, 
                "id": 2, 
                "name": "no"
              }, 
              {
                "numeric_value": null, 
                "missing": true, 
                "id": -1, 
                "name": "No Data"
              }
            ]
          }, 
          "CA1": {
            "subvariables": [
              {
                "alias": "ca1-subvar-1", 
                "name": "ca1-subvar-1"
              }, 
              {
                "alias": "ca1-subvar-2", 
                "name": "ca1-subvar-2"
              }, 
              {
                "alias": "ca1-subvar-3", 
                "name": "ca1-subvar-3"
              }
            ], 
            "type": "categorical_array", 
            "name": "cat array 1", 
            "categories": [
              {
                "numeric_value": null, 
                "missing": false, 
                "id": 1, 
                "name": "yes"
              }, 
              {
                "numeric_value": null, 
                "missing": false, 
                "id": 2, 
                "name": "no"
              }, 
              {
                "numeric_value": null, 
                "missing": true, 
                "id": -1, 
                "name": "No Data"
              }
            ]
          }
        }
      }
    }
 }
--------
201 Created
Location: /datasets/{dataset_id}/

The example above does a number of things:

Creates variables numeric and arrays CA1 and CA2.
Makes a shallow copy of variable numeric as numeric_copy.
Makes an ad hoc array CA3 reusing subvariables from CA1 and CA2.
Makes a multiple response view MR1 selecting category 1 from categorical array CA3.

Validation rules

All variables mentioned in the metadata must contain a valid variable definition with a matching alias.

Array variables definitions should contain valid subvariable or subreferences members.

Any attribute that contains a null value will be ignored and get the attribute’s default value instead.

An empty order for the dataset will be handled as if no order was passed in.

2. Add row data

By file:

POST /datasets/{dataset_id}/batches/ HTTP/1.1
Content-Type: text/csv
Content-Length: 8874357
Content-Disposition: form-data; name="file"; filename="thedata.csv"
...
"educ","color"
3,"red"
2,"yellow"
...
--------
202 Continue
Location: /datasets/{dataset_id}/batches/{batch_id}/
...
{
    "element": "shoji:view",
    "value": "/progress/{progress_id}/"
}

By S3 URL:

POST /datasets/{dataset_id}/batches/ HTTP/1.1
Content-Type: application/shoji
Content-Length: 341
...
{
    "element": "shoji:entity",
    "body": {
        "url": "s3://bucket_name/dir/subdir/?accessKey=ASILC6CBA&secretKey=KdJy7ZRK8fDIBQ&token=AQoDYXdzECAa%3D%3D"
    }
}
--------
202 Continue
Location: /datasets/{dataset_id}/batches/{batch_id}/
...
{
    "element": "shoji:view",
    "value": "/progress/{progress_id}/"
}

POST a CSV file or URL to the new dataset’s batches catalog. The CSV must include a header row of variable identifiers, which should be the aliases of the variables (and array subvariables) defined in step (1).

The values in the CSV MUST be the same format as the values you get out of Crunch, and it must match the metadata specified in the previous step. This includes:

Categorical variables should have data identified by the integer category ids, not strings, and all values must be defined in the “categories” metadata for each variable.
Datetimes must all be valid ISO 8601 strings
Numeric variables must have only (unquoted) numeric values
The only special value allowed is an empty “cell” in the CSV, which will be read as the system-missing value “No Data”

Violation of any of these validation criteria will result in a 409 Conflict response status. To resolve, you can either (1) fix your CSV locally and re-POST it, or (2) PATCH the variable metadata that is invalid and then re-POST the CSV.

Imports are done in “strict” mode by default. Strict imports are faster, and using strict mode will alert you if there is any mismatch between data and metadata. However, in some cases, it may be convenient to be more flexible and silently ignore or resolve inconsistencies. For example, you may have a large CSV dumped out of a database, and the data format isn’t exactly Crunch’s format, but it would be costly to read-munge-write the whole file for minor changes. In cases like this, you may append ?strict=0 to the URL of the POST request to loosen that strictness.

With non-strict imports:

The CSV may contain columns not described by the metadata; these columns will be ignored, rather than returning an error response
The metadata may describe variables not contained in the CSV; these variables will be filled with missing values, rather than returning an error response
And more things to come

The CSV can be sent in one of two ways:

Upload a file by POSTing a multipart form
POST a Shoji entity with a “url” in the body, containing all necessary auth keys as query parameters. If the URL points to a single file, it should be a CSV or gzipped CSV, as described above. If the URL points to a directory, the contents will be assumed to be (potentially zipped) batches of a CSV and will be concatenated for appending. In the latter case, only the first CSV in the directory listing should contain a header row.

A 201 response to the POST request indicates success. All rows added in a single request become part of a new Batch, whose URL is returned in the response Location. You may inspect the new rows in isolation by following its batch/ link.

Example

Here’s an example dataset metadata and corresponding csv.

Several things to note:

Everything–metadata, order, and data–is keyed by variable “alias”, not “name”, because Crunch believes that names are for people, not computers, to understand. Aliases must be unique across the whole dataset, while variable “names” must only be unique within their group or array variable.
For categorical variables, all values in the CSV correspond to category ids, not category names, and also not “numeric_values”, which need not be unique or present for all categories in a variable.
The array variables defined in the metadata (“allpets” and “petloc”) don’t themselves have columns in the CSV, but all of their “subvariables” do, keyed by their aliases.
With the exception of those array variable definitions, all variables and subvariables defined in the metadata have columns in the CSV, and there are no columns in the CSV that are not defined in the metadata.
For internal variables, such as a case identifier in this example, that you don’t want to be visible in the UI, you can add them as “hidden” from the beginning by including "discarded": "true" in their definition, as in the example of “caseid”.
Missing values
- Variables with categories (categorical, multiple_response, categorical_array) have missing values defined as categories with "missing": "true"
- Text, numeric, and datetime variables have missing variables defined as “missing_rules”, which can be “value”, “set”, or “range”. See, for example, “q3” and “ndogs”.
- Empty cells in the CSV, if present, will automatically be translated as the “No Data” system missing value in Crunch. See, for example, “ndogs_b”.
Order
- All variables should be referenced by alias in the “order” object, inside a group’s “entities” key. Any omitted variables (in this case, the hidden variable “caseid”) will automatically be added to a group named “ungrouped”.
- Variables may appear in multiple groups.
- Groups may be nested within each other.

Column-by-column

Crunch stores data by column internally, so if your data are stored in a column-major format as well, importing by column may be the most efficient way to import data.

1. Create a Dataset entity

POST /datasets/ HTTP/1.1
Content-Type: application/shoji
Content-Length: 974
...
{
    "element": "shoji:entity",
    "body": {
        "name": "my survey",
        ...
    }
}
--------
201 Created
Location: /datasets/{dataset_id}/

ds <- createDataset("my suryey")

POST a Dataset Entity to the datasets catalog, just as in the first import method.

2. Add Variable definitions and column data

POST /datasets/{dataset_id}/variables/ HTTP/1.1
Content-Type: application/shoji
Content-Length: 38475
...
{
    "element": "shoji:entity",
    "body": {
        "name": "Gender",
        "alias": "gender",
        "type": "categorical",
        "categories": [
            {
                "name": "Male",
                "id": 1,
                "numeric_value": null,
                "missing": false
            },
            {
                "name": "Female",
                "id": 2,
                "numeric_value": null,
                "missing": false
            },
            {
                "name": "Skipped",
                "id": 9,
                "numeric_value": null,
                "missing": true
            }
        ],
        "values": [1, 9, 1, 2, 2, 1, 1, 1, 1, 2, 9, 1]
    }
}
--------
201 Created
Location: /datasets/{dataset_id}/variables/{variable_id}/

# Here's a similar example. R's factor type becomes "categorical".
gender.names <- c("Male", "Female", "Skipped")
gen <- factor(gender.names[c(1, 3, 1, 2, 2, 1, 1, 1, 1, 2, 3, 1)],
    levels=gender.names)
# Assigning an R vector into a dataset will create a variable entity.
ds$gender <- gen

POST a Variable Entity to the newly created dataset’s variables catalog, and include with that Entity definition a “values” key that contains the column of data. Do this for all columns in your dataset.

If the values attribute is not present, the new column will be filled with “No Data” in all rows.

The data passed in values can correspond to either the full data column for the new variable or a single value, in which case it will be used to fill up the column.

In the case of arrays, the single value should be a list containing the correct categorical values.

If the type of the values passed in does not correspond with the variable’s type, the server will return a 400 response indicating the error and the variable will not be created.

Appending Data

Appending data to an existing Dataset is not much different from uploading the initial data; both use a “Batch” resource which represents the process of importing the data from the source into the dataset. Once you have created a Source for your data, POST its URL to datasets/{id}/batches/ to start the import process. That process may take some time, depending on the size of the dataset. The returned Location is the URI of the new Batch; GET the batches catalog and look up the Batch URI in the catalog’s index and inspect its status attribute until it moves from “analyzing” to “appended”. User interfaces may choose here to show a progress meter or some other widget.

During the “analyzing” stage, the Crunch system imports the data into a temporary table, and matches its variables with any existing variables. During the “importing” stage, the new rows will move to the target Dataset, and once “appended”, the new rows will be included in all queries against that Dataset.

Adding a subsequent Source

Once you have created a Dataset, you can upload new files and append rows to the same Dataset as often as you like. If the structure of each file is the same as that of the first uploaded file, Crunch should automatically pass your new rows through exactly the same process as the old rows. If there are any derived variables in your Dataset, new data will be derived in the new rows following the same rules as the old data. You can follow the progress as above via the batch’s status attribute.

Let’s look at an example: you had uploaded an initial CSV of 3 columns, A, B and C. Then:

The Crunch system automatically converted column A from the few strings that were found in it to a Categorical type.
You derived a new column D that consisted of B * C.

Then you decide to upload another CSV of new rows. What will happen?

When you POST to create the second Batch, the service will: 1) match up the new A with the old A and cast the new strings to existing categories by name, and 2) fill column D for you with B * C for each new row.

However, from time to time, the new source has significant differences: a new variable, a renamed variable, and other changes. When you append the first Source to a Dataset, there is nothing with which to conflict. But a subsequent POST to batches/ may result in a conflict if the new source cannot be confidently reconciled with the existing data. Even though you get a 201 Created response for the new batch resource, it will have a status of “conflict”.

Reporting and Resolving Conflicts

When you append a Source to an existing Dataset, the system attempts to match up the new data with the old. If the source’s schema can be automatically aligned with the target Dataset, the new rows from the Batch are appended. When things go wrong, however, the Batch can be inspected to see what conflicted with the target (or vice-versa, in some cases!).

GET the new Batch:

GET /api/datasets/{dataset_id}/batches/{batch_id}/ HTTP/1.1
...
--------
200 OK
Content-Type: application/shoji

{
    "element": "shoji:entity",
    "body": {
        "conflicts": {
          "cdbd11/": {
            "metadata": {},
            "conflicts": [{
              "message": "Types do not match and cannot be converted",
            }]
          }
        }
    }
}

If any variable conflicts, it will possess one or more “conflicts” members. For example, if the new variable “cdbd11” had a different type that could not be converted compared to the existing variable “cdbd11”, the Batch resource would contain the above message. Only unresolvable conflicts will be shown; if a variable is not reported in the conflicts object, it appended cleanly.

See Batches for more details on batch entities and conflicts.

Streaming rows

Existing datasets are best sent to Crunch as a single Source, or a handful of subsequent Sources if gathered monthly or on some other schedule. Sometimes however you want to “stream” data to Crunch as it is being gathered, even one row at a time, rather than in a single post-processing phase. You do not want to make each row its own batch (it’s simply not worth the overhead). Instead, you should make a Stream and send rows to it, then periodically create a Source and Batch from it.

Send rows to a stream

To send one or more rows to a dataset stream, simply POST one or more lines of line-delimited JSON to the dataset’s stream endpoint:

{"var_id_1": 1, "var_id_2": "a"}

by_alias = ds.variables.by('alias')
while True:
    row = my_system.read_a_row()
    importing.importer.stream_rows(ds, {
        'gender': row['gender'],
        'age': row['age']
    })

Streamed values must be keyed either by id or by alias. The variable ids/aliases must correspond to existing variables in the dataset. The Python code shows how to efficiently map aliases to ids. The data must match the target variable types so that we can process the row as quickly as possible. We want no casting or other guesswork slowing us down here. Among other things, this means that categorical values must be represented as Crunch’s assigned category ids, not names or numeric values.

You may also send more than one row at a time if you prefer. For example, your data collection system may already post-process row data in, say, 5 minute increments. The more rows you can send together, the less overhead spent processing each one and the more you can send in a given time. Send multiple lines of line-delimited JSON, or if using pycrunch, a list of dicts rather than a single dict.

Each time you send a POST, all of the rows in that POST are assembled into a new message which is added to the stream. Each message can contain one or more rows of data.

As when creating a new source, don’t worry about sending values for derived variables; Crunch will fill these out for you for each row using the data you send.

Append the new rows to the dataset

The above added new rows to the Stream resource so that you can be confident that your data is completely safe with Crunch. To append those rows to the dataset requires another step. You could stream rows and then, once they are all assembled, append them all as a single Source to the dataset. However, if you’re streaming rows at intervals it’s likely you want to append them to the dataset at intervals, too. But doing so one row at a time is usually counter-productive; it slows the rate at which you can send rows, balloons metadata, and interrupts users who are analyzing the data.

Instead, you control how often you want the streamed rows to be appended to the dataset. When you’re ready, POST to /datasets/{id}/batches/ and provide the “stream” member, plus any extra metadata the new Source should possess:

{
    "stream": null,
    "type": "ldjson",
    "name": "My streamed rows",
    "description": "Yet Another batch from the stream"
}

ds.batches.create({"body": {
    "stream": None,
    "type": "ldjson",
    "name": "My streamed rows",
    "description": "Yet Another batch from the stream"
}})

The “stream” member tells Crunch to acquire the data from the stream to form this Batch. The “stream” member must be null, then the system will acquire all currently pending messages (any new messages which arrive during the formation of this Batch will be queued and not fetched). If there are no pending messages, 409 Conflict is returned instead of 201/202 for the new Batch.

Pending rows will be added automatically

Every hour, the Crunch system goes through all datasets, and for each that has pending streamed data, it batches up the pending rows and adds them to the dataset automatically, as long as the dataset is not currently in use by someone. That way, streamed data will magically appear in the dataset for the next time a user loads it, but if a user is actively working with the dataset, the system won’t update their view of the data and disrupt their session.

See Stream for more details on streams.

Combining datasets

Combining datasets consists on creating a new dataset formed by stacking a list of datasets together. It works under the same rules as a normal append.

To create a new dataset combined from others, it is necessary to POST to the datasets catalog indicating a combine_datasets expression:

POST /api/datasets/


{
  "element": "shoji:entity",
  "body": {
    "name": "My combined dataset",
    "description": "Consists on dsA and dsB",
    "derivation": {
      "function": "combine_datasets",
      "args": [
        {"dataset": "https://app.crunch.io/api/datasets/dsabc/"},
        {"dataset": "https://app.crunch.io/api/datasets/ds123/"}
      ]
    }
  }
}

The server will verify that the authenticated user has view permission to all datasets, else will raise a 400 error.

The resulting dataset will consist on the matched union of all included datasets with the rows in the same order. Private/public variable visibility and exclusion filters will be honored in the result.

Transformations during combination

The combine procedures will perform normal append matching rules which means that any mismatch on aliases or types will not proceed, as well limiting the existing union of variables from the present datasets as the result.

It is possible to provide transformations on the datasets to ensure that they line up on the combination phase and to add extra columns with constant dataset metadata per dataset on the resulting combined result.

Each {"dataset"} argument allows for an extra frame key that can contain a function expression on the desired dataset transformation, for example:

{
    "dataset": "<dataset_url>",
    "frame": {
        "function": "select",
        "args": [{
            "map": {
                "*": {"variable": "*"},
                "dataset_id": {
                    "value": "<dataset_id>",
                    "type": "text",
                    "references": {
                        "name": "Dataset ID",
                        "alias": "dataset_id"
                    }
                }
            }
        }]
    }
}

Selecting a subset of variables to combine

In the same fashion that it is possible to add extra variables to the dataset transforms, it is possible to select which variables only to include.

Note in the example above, we use the "*": {"variable": "*"} expressions which instructs the server to include all variables. Omitting that would cause to only include the selected variables, for example:

{
    "dataset": "<dataset_url>",
    "frame": {
        "function": "select",
        "args": [{
            "map": {
                "A": {"variable": "A"},
                "B": {"variable": "B"},
                "C": {"variable": "C"},
                "dataset_id": {
                    "value": "<dataset_id>",
                    "type": "text",
                    "references": {
                        "name": "Dataset ID",
                        "alias": "dataset_id"
                    }
                }
            }
        }]
    }
}

On this example, the expression indicates to only include variables with IDs A, B and C from the referenced dataset as well as add the new extra variable dataset_id. This would effectively append only these 4 variables instead of the full dataset’s variables.

Merging and Joining Datasets

Crunch supports joining variables from one dataset to another by a key variable that maps rows from one to the other. To add a snapshot of those variables to the dataset, POST an adapt function expression to its variables catalog.

POST /api/datasets/{dataset_id}/variables/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json

{
    "function": "adapt",
    "args": [{
        "dataset": "https://app.crunch.io/api/datasets/{other_id}/"
    }, {
        "variable": "https://app.crunch.io/api/datasets/{other_id}/variables/{right_key_id}/"
    }, {
        "variable": "https://app.crunch.io/api/datasets/{dataset_id}/variables/{left_key_id}/"
    }]
}

-----
HTTP/1.1 202 Accepted


{
    "element": "shoji:view",
    "self": "https://app.crunch.io/api/datasets/{dataset_id}/variables/",
    "value": "https://app.crunch.io/api/progress/5be82a/"
}

A successful request returns 202 Continue status with a progress resource in the response body; poll that to track the status of the asynchronous job that adds the data to your dataset.

Currently Crunch only supports left joins: all rows of the left (current) dataset will be kept, and only rows from the right (incoming) dataset that have a key value present in the left dataset will be brought in. Rows in the left dataset that do not have a corresponding row in the right dataset will be filled with missing values for the incoming variables.

The join key must be of type “numeric” or “text”, must be the same type in both datasets, and must have unique values within each dataset.

Joining a subset of variables

To select certain variables to bring over from the right dataset, include select function expression around the adapt function described above:

POST /api/datasets/{dataset_id}/variables/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json

{
    "function": "select",
    "args": [{
        "map": {
            "{right_var1_id}/": {
                "variable": "https://app.crunch.io/api/datasets/{other_id}/variables/{right_var1_id}/"
            },
            "{right_var2_id}/": {
                "variable": "https://app.crunch.io/api/datasets/{other_id}/variables/{right_var2_id}/"
            },
            "{right_var3_id}/": {
                "variable": "https://app.crunch.io/api/datasets/{other_id}/variables/{right_var3_id}/"
            }
        }
    }],
    "frame": {
        "function": "adapt",
        "args": [{
            "dataset": "https://app.crunch.io/api/datasets/{other_id}/"
        }, {
            "variable": "https://app.crunch.io/api/datasets/{other_id}/variables/{right_key_id}/"
        }, {
            "variable": "https://app.crunch.io/api/datasets/{dataset_id}/variables/{left_key_id}/"
        }]
    }
}

-----
HTTP/1.1 202 Accepted


{
    "element": "shoji:view",
    "self": "https://app.crunch.io/api/datasets/{dataset_id}/variables/",
    "value": "https://app.crunch.io/api/progress/5be82a/"
}

Joining a subset of rows

Rows to consider from the right dataset can also be filtered. To do so, include a filter attribute on the payload, containing either a filter expression, wrapped under {"expression": <expr>}, or an existing filter entity URL (from the right-side dataset), wrapped as {"filter": <url>}.

POST /api/datasets/{dataset_id}/variables/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json

{
    "function": "adapt",
    "args": [{
        "dataset": "https://app.crunch.io/api/datasets/{other_id}/"
    }, {
        "variable": "https://app.crunch.io/api/datasets/{other_id}/variables/{right_key_id}/"
    }, {
        "variable": "https://app.crunch.io/api/datasets/{dataset_id}/variables/{left_key_id}/"
    }],
    "filter": {
        "expression": {
            "function": "==",
            "args": [
                {"variable": "https://app.crunch.io/api/datasets/{other_id}/variables/{variable_id}/"},
                {"value": "<value>"}
            ]
        }
    }
}

You can filter both rows and variables in the same request. Note that the “filter” parameter remains at the top-level function in the expression, which when specifying a variable subset is “select” instead of “adapt”:

POST /api/datasets/{dataset_id}/variables/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json

{
    "function": "select",
    "args": [{
        "map": {
            "{right_var1_id}/": {
                "variable": "https://app.crunch.io/api/datasets/{other_id}/variables/{right_var1_id}/"
            },
            "{right_var2_id}/": {
                "variable": "https://app.crunch.io/api/datasets/{other_id}/variables/{right_var2_id}/"
            },
            "{right_var3_id}/": {
                "variable": "https://app.crunch.io/api/datasets/{other_id}/variables/{right_var3_id}/"
            }
        }
    }],
    "frame": {
        "function": "adapt",
        "args": [{
            "dataset": "https://app.crunch.io/api/datasets/{other_id}/"
        }, {
            "variable": "https://app.crunch.io/api/datasets/{other_id}/variables/{right_key_id}/"
        }, {
            "variable": "https://app.crunch.io/api/datasets/{dataset_id}/variables/{left_key_id}/"
        }]
    },
    "filter": {
        "filter": "https://app.crunch.io/api/datasets/{other_id}/filters/{filter_id}/"
    }
}

Deriving Variables

Derived variables are variables that, instead of having a column of values backing them, are functionally dependent on other variables. In Crunch, users with view-only permissions on a dataset can still make derived variables of their own–just as they can make filters. Dataset editors can also derive other types of variables as permanent additions to the dataset, available for all viewers.

Combining categories

The “combine_categories” function takes two arguments:

A reference to the categorical or categorical_array variable to be combined
A definition of the categories of the new variable, including all members found in categories, plus a “combined_ids” key that maps the derived category to one or more categories (by id) in the input variable.

Given a variable such as:

{
    "element": "shoji:entity",
    "self": "https://app.crunch.io/api/datasets/3ad42c/variables/0000f5/",
    "body": {
        "name": "Education",
        "alias": "educ",
        "type": "categorical",
        "categories": [
            {
                "numeric_value": null,
                "missing": true,
                "id": -1,
                "name": "No Data"
            },
            {
                "numeric_value": 1,
                "missing": false,
                "id": 1,
                "name": "No HS"
            },
            {
                "numeric_value": 2,
                "missing": false,
                "id": 2,
                "name": "High school graduate"
            },
            {
                "numeric_value": 3,
                "missing": false,
                "id": 3,
                "name": "Some college"
            },
            {
                "numeric_value": 4,
                "missing": false,
                "id": 4,
                "name": "2-year"
            },
            {
                "numeric_value": 5,
                "missing": false,
                "id": 5,
                "name": "4-year"
            },
            {
                "numeric_value": 6,
                "missing": false,
                "id": 6,
                "name": "Post-grad"
            },
            {
                "numeric_value": 8,
                "missing": true,
                "id": 8,
                "name": "Skipped"
            },
            {
                "numeric_value": 9,
                "missing": true,
                "id": 9,
                "name": "Not Asked"
            }
        ],
        "description": "Education"
    }
}

POST'ing to the private variables catalog a Shoji Entity containing a ZCL function like:

{
    "element": "shoji:entity",
    "body": {
        "name": "Education (3 category)",
        "description": "Combined from six-category education",
        "alias": "educ3",
        "derivation": {
            "function": "combine_categories",
            "args": [
                {
                    "variable": "https://app.crunch.io/api/datasets/3ad42c/variables/0000f5/"
                },
                {
                    "value": [
                        {
                            "name": "High school or less",
                            "numeric_value": null,
                            "id": 1,
                            "missing": false,
                            "combined_ids": [1, 2]
                        },
                        {
                            "name": "Some college",
                            "numeric_value": null,
                            "id": 2,
                            "missing": false,
                            "combined_ids": [3, 4]
                        },
                        {
                            "name": "4-year college or more",
                            "numeric_value": null,
                            "id": 3,
                            "missing": false,
                            "combined_ids": [5, 6]
                        },
                        {
                            "name": "Missing",
                            "numeric_value": null,
                            "id": 4,
                            "missing": true,
                            "combined_ids": [8, 9]
                        },
                        {
                            "name": "No data",
                            "numeric_value": null,
                            "id": -1,
                            "missing": true,
                            "combined_ids": [-1]
                        }
                    ]
                }
            ]
        }
    }
}

results in a private categorical variable with three valid categories.

Combining the categories of a categorical array is the same as it is for categorical variables. The resulting variable is also of type “categorical_array”. This variable type also has a “subvariables_catalog”, like the variable from which it is derived, and the subvariables contained in it are derived “combine_categories” categorical variables.

Combining responses

For multiple response variables, you may combine responses rather than categories.

Given a variable such as:

{
    "element": "shoji:entity",
    "self": "https://app.crunch.io/api/datasets/455288/variables/3c2e57/",
    "body": {
        "name": "Aided awareness",
        "alias": "aided",
        "subvariables": [
            "../870a2d/",
            "../a8b0eb/",
            "../dc444f/",
            "../8e6279/",
            "../f775ab/",
            "../6405c2/"
        ],
        "type": "multiple_response",
        "categories": [
            {
                "numeric_value": 1,
                "selected": true,
                "id": 1,
                "name": "Selected",
                "missing": false
            },
            {
                "numeric_value": 2,
                "id": 2,
                "name": "Not selected",
                "missing": false
            },
            {
                "numeric_value": 8,
                "id": 3,
                "name": "Skipped",
                "missing": true
            },
            {
                "numeric_value": 9,
                "id": 4,
                "name": "Not asked",
                "missing": true
            },
            {
                "numeric_value": null,
                "id": -1,
                "name": "No data",
                "missing": true
            }
        ],
        "description": "Which of the following coffee brands do you recognize? Check all that apply."
    }
}

POSTing to the variables catalog a Shoji Entity containing a ZCL function like:

{
    "element": "shoji:entity",
    "body": {
        "name": "Aided awareness by region",
        "description": "Combined from aided brand awareness",
        "alias": "aided_region",
        "derivation": {
            "function": "combine_responses",
            "args": [
                {
                    "variable": "https://app.crunch.io/api/datasets/455288/variables/3c2e57/"
                },
                {
                    "value": [
                        {
                            "name": "San Francisco",
                            "combined_ids": [
                                "../870a2d/",
                                "../a8b0eb/",
                                "../dc444f/"
                            ]
                        },
                        {
                            "name": "Portland",
                            "combined_ids": [
                                "../8e6279/",
                                "../f775ab/"
                            ]
                        },
                        {
                            "name": "Chicago",
                            "combined_ids": [
                                "../6405c2/"
                            ]
                        }
                    ]
                }
            ]
        }
    }
}

results in a multiple response variable with three responses. The “selected” state of the responses in the derived variable is an “OR” of the combined subvariables.

Case statements

The “case” function derives a variable using values from the first argument. Each of the remaining arguments contains a boolean expression. These are evaluated in order in an IF, ELSE IF, ELSE IF, …, ELSE fashion; the first one that matches selects the corresponding value from the first argument. For example, if the first two boolean expressions do not match (return False) but the third one matches, then the third value in the first argument is placed into that row in the output. You may include an extra value for the case when none of the boolean expressions match; if not provided, it defaults to the system “No Data” missing value.

{
    "element": "shoji:entity",
    "body": {
        "name": "Market segmentation",
        "description": "Super-scientific classification of people",
        "alias": "segments",
        "derivation": {
            "function": "case",
            "args": [
                {
                    "column": [1, 2, 3, 4],
                    "type": {
                        "value": {
                            "class": "categorical",
                            "categories": [
                                {"id": 3, "name": "Hipsters", "numeric_value": null, "missing": false},
                                {"id": 1, "name": "Techies", "numeric_value": null, "missing": false},
                                {"id": 2, "name": "Yuppies", "numeric_value": null, "missing": false},
                                {"id": 4, "name": "Other", "numeric_value": null, "missing": true}
                            ]
                        }
                    }
                },
                {
                    "function": "and",
                    "args": [
                        {"function": "in", "args": [{"variable": "55fc29/"}, {"value": [5, 6]}]},
                        {"function": "<=", "args": [{"variable": "673dde/"}, {"value": 30}]}
                    ]
                },
                {
                    "function": "and",
                    "args": [
                        {"function": "in", "args": [{"variable": "889dc3/"}, {"value": [4, 5, 6]}]},
                        {"function": ">", "args": [{"variable": "673dde/"}, {"value": 40}]}
                    ]
                },
                {"function": "==", "args": [{"variable": "13cbf4/"}, {"value": 1}]}
            ]
        }
    }
}

Making ad hoc arrays

It is possible to create derived arrays reusing subvariables from other arrays using the array function and indicating the reference for each of its subvariables.

The subvariables of an array are specified using the select function, with its first map argument indicating the IDs for each of these virtual subvariables. These IDs are user defined and can be any string. They remain unique inside the parent variable so they can be reused between different arrays. The second argument of the select function indicates the order of the subvariables in the array. They are referenced by the user defined IDs.

Each of its variables must point to a variable expression, which can take an optional (but recommended) references attribute to specify a particular name and alias for the subvariable, if not specified, the same name from the original will be used and the alias will be padded to ensure uniqueness.

{
  "CA3": {
    "name": "cat array 3", 
    "derivation": {
      "function": "array", 
      "args": [
        {
          "function": "select", 
          "args": [
            {
              "map": {
                "var1": {
                  "variable": "ca2-subvar-2", 
                  "references": {
                    "alias": "subvar2", 
                    "name": "Subvar 2"
                  }
                }, 
                "var0": {
                  "variable": "ca1-subvar-1", 
                  "references": {
                    "alias": "subvar1", 
                    "name": "Subvar 1"
                  }
                }
              }
            }, 
            {
              "value": [
                "var1", 
                "var0"
              ]
            }
          ]
        }
      ]
    }
  }, 
  "CA2": {
    "subvariables": [
      {
        "alias": "ca2-subvar-1", 
        "name": "ca2-subvar-1"
      }, 
      {
        "alias": "ca2-subvar-2", 
        "name": "ca2-subvar-2"
      }
    ], 
    "type": "categorical_array", 
    "name": "cat array 2", 
    "categories": [
      {
        "numeric_value": null, 
        "missing": false, 
        "id": 1, 
        "name": "yes"
      }, 
      {
        "numeric_value": null, 
        "missing": false, 
        "id": 2, 
        "name": "no"
      }, 
      {
        "numeric_value": null, 
        "missing": true, 
        "id": -1, 
        "name": "No Data"
      }
    ]
  }, 
  "CA1": {
    "subvariables": [
      {
        "alias": "ca1-subvar-1", 
        "name": "ca1-subvar-1"
      }, 
      {
        "alias": "ca1-subvar-2", 
        "name": "ca1-subvar-2"
      }, 
      {
        "alias": "ca1-subvar-3", 
        "name": "ca1-subvar-3"
      }
    ], 
    "type": "categorical_array", 
    "name": "cat array 1", 
    "categories": [
      {
        "numeric_value": null, 
        "missing": false, 
        "id": 1, 
        "name": "yes"
      }, 
      {
        "numeric_value": null, 
        "missing": false, 
        "id": 2, 
        "name": "no"
      }, 
      {
        "numeric_value": null, 
        "missing": true, 
        "id": -1, 
        "name": "No Data"
      }
    ]
  }
}

On the above example, the array CA3 uses the array function and uses subvariables ca1-subvar-1 and ca2-subvar-2 from CA1 and CA2 respectively. The references attribute is used to indicate specific name/alias for these subvariables.

Weights

A numeric variable suitable for use as row weights can be constructed from one or more categorical variables and target proportions of their categories. The sample distribution is “raked” iteratively to each categorical marginal target to produce a set of joint values that can be used as weights. Note that available weight variables are shared by all; you may not create private weights. To create a weight variable, POST a JSON variable definition to the variables catalog describing the properties of the weight variable, with an “derivation” member indicating to use the “rake” function, which takes arguments containing an array of variable targets:

POST /api/datasets/{datasetid}/variables/ HTTP/1.1
Content-Type: application/shoji
Content-Length: 739
{
    "name": "weight",
    "description": "my raked weight",
    "derivation": {
        "function": "rake",
        "args": [{
            "variable": variabl1.id,
            "targets": [[1, 0.491], [2, 0.509]]
        }]
    }
}
---------
201 Created
Location: /api/datasets/{datasetid}/variables/{variableid}/

Multiple Response Views

The “select_categories” function allows you to form a multiple response array from a categorical array, or alter the “selected” categories in an existing multiple response array. It takes two arguments:

A reference to a categorical or categorical_array variable
A list of the category ids or category names to mark as “selected”

Given a variable such as:

{
    "element": "shoji:entity",
    "self": "https://app.crunch.io/api/datasets/3ad42c/variables/0000f5/",
    "body": {
        "name": "Cola",
        "alias": "cola",
        "type": "categorical",
        "categories": [
            {"id": -1, "name": "No Data", "numeric_value": null, "missing": true},
            {"id": 0, "name": "Never", "numeric_value": null, "missing": false},
            {"id": 1, "name": "Sometimes", "numeric_value": null, "missing": false},
            {"id": 2, "name": "Frequently", "numeric_value": null, "missing": false},
            {"id": 3, "name": "Always", "numeric_value": null, "missing": false}
        ],
        "subvariables": ["0001", "0002", "0003"],
        "references": {
            "subreferences": {
                "0003": {"alias": "Coke"},
                "0002": {"alias": "Pepsi"},
                "0001": {"alias": "RC"}
            }
        }
    }
}

POST'ing to the private variables catalog a Shoji Entity containing a ZCL function like:

{
    "element": "shoji:entity",
    "body": {
        "name": "Cola likes",
        "description": "Cola preferences",
        "alias": "cola_likes",
        "derivation": {
            "function": "select_categories",
            "args": [
                {"variable": "https://app.crunch.io/api/datasets/3ad42c/variables/0000f5/"},
                {"value": [2, 3]}
            ]
        }
    }
}

…results in a private multiple_response variable where the “Frequently” and “Always” categories are selected.

Text Analysis

Sentiment Analysis

The “sentiment” function allows you to derive a categorical variable from text variable data, which is classified and accumulated in three categories (positive, negative, and neutral). It takes one parameter:

A reference to a text variable

Given a variable such as:

{
    "element": "shoji:entity",
    "self": "https://app.crunch.io/api/datasets/3ad42c/variables/0000f5/",
    "body": {
        "name": "Zest",
        "alias": "zest",
        "type": "text",
        "values": [
            "Zest is best",
            "Zest I can take it or leave it",
            "Zest is the worst"
        ]
    }
}

POSTing to the private variables catalog a Shoji Entity containing a ZCL function like:

{
    "element": "shoji:entity",
    "body": {
        "name": "Zesty Sentiment",
        "description": "Customer sentiment about Zest",
        "alias": "zest_sentiment",
        "derivation": {
            "function": "sentiment",
            "args": [
                {"variable": "https://app.crunch.io/api/datasets/3ad42c/variables/0000f5/"}
            ]
        }
    }
}

Will result in a new categorical variable, where for each row the text variable is classified as “Negative”, “Neutral”, or “Positive” using the VADER English social-media-tuned lexicon.

Other transformations

Arithmetic operations

It is possible to create new numeric variables out of pairs of other numeric variables. The following arithmetic operations are available and will take two numeric variables as their arguments.

“+” for adding up two numeric variables.
“-” returns the difference between two numeric variables.
“*” for the product of two numeric variables.
“/” Real division.
“//” Floor division; Returns always an integer.
“^” Raises the first argument to the power of the second argument
“%” Modulo operation; Accepts floats

The usage is as follows for all operators:

{
    "function": "+",
    "args": [
        {"variable": "https://app.crunch.io/api/datasets/123/variables/abc/"}
        {"variable": "https://app.crunch.io/api/datasets/123/variables/def/"}
    ]
}

bin

Receives a numeric variable and returns a categorical one where each category represents a bin of the numeric values.

Each category on the new variable is annotated with a “boundaries” member that contains the lower/upper bound of each bin.

{
    "function": "bin",
    "args": [
        {"variable": "https://app.crunch.io/api/datasets/123/variables/abc/"}
    ]
}

Optionally it is possible to pass a second argument indicating the desired bin size to use instead of allowing the API to decide them.

{
    "function": "bin",
    "args": [
        {"variable": "https://app.crunch.io/api/datasets/123/variables/abc/"},
        {'value': 100}
    ]
}

case

Returns a categorical variable with its categories following the specified conditions from different variables on the dataset. View Case Statements

cast

Returns a new variable with its type and values casted. Not applicable on arrays or date variable; use Date Functions to work with date variables.

{
    "function": "cast",
    "args": [
        {"variable": "https://app.crunch.io/api/datasets/123/variables/abc/"},
        {"value": "numeric"}
    ]
}

The allowed output variable types are:

numeric
text
categorical

For categorical types it is necessary to indicate the categories as a type definition instead of a string name:

To cast to categorical type, the second argument value should not be a name string (numeric, text) but a type definition indicating a class and categories as follow:

{
    "function": "cast",
    "args": [
        {"variable": "https://app.crunch.io/api/datasets/123/variables/abc/"},
        {"value": {
                "class": "categorical",
                "categories": [
                    {"id": 1, "name": "one", "missing": false, "numeric_value": null},
                    {"id": 2, "name": "two", "missing": false, "numeric_value": null},
                    {"id": -1, "name": "No Data", "missing": true, "numeric_value": null},
                ]
            }
        }
    ]
}

To change the type of a variable a client should POST to the /variable/:id/cast/ endpoint. See Convert type for API examples.

char_length

Returns a numeric variable containing the text length of each value. Only applicable on text variables.

{
    "function": "char_length",
    "args": [
        {"variable": "https://app.crunch.io/api/datasets/123/variables/abc/"}
    ]
}

copy_variable

Returns a shallow copy of the indicated variable maintaining type and data.

{
    "function": "copy_variable",
    "args": [
        {"variable": "https://app.crunch.io/api/datasets/123/variables/abc/"}
    ]
}

Changes on the data of the original variable will be reflected on this copy.

combine_categories

Returns a categorical variable with values combined following the specified combination rules. See Combining categories

combine_responses

Given a list of categorical variables, return the selected value out of them. See Combining responses

row

Returns a numeric variable with row 0 based indices. It takes no arguments.

{
    "function": "row",
    "args": []
}

remap_missing

Given a text, numeric or datetime variable. return a new variable of the same type with its missing values mapped to new codes

{
  "function": "remap_missing",
  "args": [
    {"variable": "varid"},
    {"value": [
        {
            "reason": "Combined 1 and 2",
            "code": 1,
            "mapped_codes": [1, 2]
        },
        {
            "reason": "Only 3",
            "code": 2,
            "mapped_codes": [3]
        },
        {
            "reason": "No Data",
            "code": -1,
            "mapped_codes": [-1]
        }
    ]}
  ]
}

The example above will return a copy of the variable with id varid with the new missing_reasons grouping and mapping following the original codes.

Integrating variables

“Integrating” a variable means to remove its derived properties and turn it into a regular base variable. Doing so will make this variable stop reflecting the expression if new data is added to its original parent variable and new rows will be filled with No Data {"?": -1}.

To integrate a variable it is necessary to PATCH to the variable entity with the derived attribute set to false as so:

PATCH /api/dataset/abc/variables/123/

{
  "element": "shoji:entity",
  "body": {
    "derived": false
  }
}

Will effectively integrate the variable and make its derivation attribute contain null from now in. Note that it is only possible to set the derived attribute to false and never to true.

Creating unlinked derivations

It is possible to create a material copy, or one off copy of a variable or an expression of it.

To create such variables, proceed normally creating a derived variable with the derivation expression, but also include derived: false attribute to it. So the variable will be created with the values of the expression but will be unlinked from the original variable.

POST /api/dataset/abc/variables/

{
  "element": "shoji:entity",
  "body": {
    "derivation": {
      "function": "copy_variable",
      "args": [{"variable": "https://app.crunch.io/api/datasets/abc/variables/123/"}]
    },
    "derived": false
  }
}

Array Variables

Simple variables have only one value per row; sometimes, however, it is convenient to consider multiple values (of the same type) as a single Variable. The Crunch system implements the data as a 2-dimensional array, but the array variable includes two additional attributes: “subvariables”, which is a list of subvariable URLs, and “subreferences”, which is an object of {name, alias, description, …} objects keyed by subvariable URL. There are two types of array variable: categorical array and multiple response.

Categorical arrays

For the “categorical_array” type, a row has multiple values, and may have a different value for each subvariable. For example, you might field a survey where you ask respondents to rate soft drinks by filling in a grid of a set of brands versus a set of ratings:

72. How much do you like each soft drink?
       Not at all   Not much   OK   A bit   A lot
 Coke       o           o      o      o       o
Pepsi       o           o      o      o       o
   RC       o           o      o      o       o

The respondent may only select one rating in each row. To represent that answer data in Crunch, you would define an array. For example, you might POST a Variable Entity with the payload:

{
    "element": "shoji:entity",
    "body": {
        "name": "Soft Drinks",
        "type": "categorical_array",
        "subvariables": [
            "./subvariables/001/", 
            "./subvariables/002/",
            "./subvariables/003/"
         ],
        "subreferences": {
            "./subvariables/002/": {"name": "Coke", "alias": "coke"},
            "./subvariables/003/": {"name": "Pepsi", "alias": "pepsi"},
            "./subvariables/001/": {"name": "RC", "alias": "rc"}
        },
        "categories": [
            {"id": -1, "name": "No Data",    "numeric_value": null, "missing":  true},
            {"id":  1, "name": "Not at all", "numeric_value": null, "missing": false},
            {"id":  2, "name": "Not much",   "numeric_value": null, "missing": false},
            {"id":  3, "name": "OK",         "numeric_value": null, "missing": false},
            {"id":  4, "name": "A bit",      "numeric_value": null, "missing": false},
            {"id":  5, "name": "A lot",      "numeric_value": null, "missing": false},
            {"id": 99, "name": "Skipped",    "numeric_value": null, "missing":  true}
        ],
        "values": [
            [1, 2, {"?": 99}],
            [{"?": -1}, 4, 3],
            [5, 2, {"?": -1}],
        ]
    }
}

The “Soft Drinks” categorical array variable may now be included in analyses like any other variable, but has 2 dimensions instead of the typical 1. For example, a crosstab of a 1-dimensional “Gender” variable with a 1-dimensional “Education” variable yields a 2-D cube. A crosstab of 1-D “Gender” by 2-D “Soft Drinks” yields a 3-D cube.

In rare cases, you may have already added a separate Variable for “Coke”, one for “Pepsi”, and one for “RC”. You may move them to a single array variable by POSTing a Variable Entity for the array that instead of a “subreferences” attribute has a “subvariables” attribute, a list of URL’s of the variables you’d like to bind together:

{
    "body": {
        "name": "Soft Drinks",
        "type": "categorical_array",
        "subvariables": [<URI of the "Coke" variable>, <URI of the "Pepsi" variable>, <URI of the "RC" variable>]
    }
}

The existing variables are removed from the normal order and become virtual subvariables of the new array. This approach will cast all subvariables to a common set of categories if they differ. The existing name and alias of each subvariable will be moved to the array’s “subreferences” attribute.

If you wish to analyze a set of categorical variables as an array without moving them, you need to build a derived array instead.

{
    "body": {
        "name": "Soft Drinks",
        "type": "categorical_array",
        "derivation": {
            "function": "array",
            "args": [{
                "function": "select",
                "args": [{"map": {
                    "000000": {"variable": <URI of the "Coke" variable>},
                    "000001": {"variable": <URI of the "Pepsi" variable>},
                    "000002": {"variable": <URI of the "RC" variable>}
                }}]
            }]
        }
    }
}

Your client library may have helper functions to construct the above more easily. This is a bit more advanced, but consequently more powerful: you can grab subvariables from other existing arrays, use more powerful subsetting functions like “deselect” and “subvariables”, cast, combine, what-have-you.

Multiple response

The second type of array is “multiple_response”. These arrays look very similar to categorical_array variables in their data representations, but are usually gathered very differently and behave differently in analyses. For example, you might field a survey where you ask respondents to select countries they have visited:

38. Which countries have you visited?

[] USA
[] Germany
[] Japan
[] None of the above

The respondent may check the box or not for each row. To represent that answer data in Crunch, you would define an array Variable with separate subreferences for “USA”, “Germany”, “Japan”, and “None of the above”:

{
    "element": "shoji:entity",
    "body": {
        "name": "Countries Visited",
        "type": "multiple_response",
        "subvariables": [
            "./subvariables/001/", 
            "./subvariables/002/",
            "./subvariables/003/",
            "./subvariables/004/"
         ],
        "subreferences": {
            "./subvariables/002/": {"name": "USA", "alias": "visited_usa"},
            "./subvariables/004/": {"name": "Germany", "alias": "visited_germany"},
            "./subvariables/001/": {"name": "Japan", "alias": "visited_japan"},
            "./subvariables/003/": {"name": "None of the above", "alias": "visited_none_of_the_above"}
        },
        "categories": [
            {"id": -1, "name": "No Data",     "numeric_value": null, "missing":  true},
            {"id":  1, "name": "Checked",     "numeric_value": null, "missing": false, "selected": true},
            {"id":  2, "name": "Not checked", "numeric_value": null, "missing": false},
            {"id": 98, "name": "Not shown",   "numeric_value": null, "missing":  true},
            {"id": 99, "name": "Skipped",     "numeric_value": null, "missing":  true}
        ]
    }
}

Aside from the new type name, the primary difference from the basic categorical array is that one or more categories are marked as “selected”. These are then used to dichotomize the categories such that any subvariable response is treated more as if it were true or false (selected or unselected) than maintaining the difference between each category. If POSTing to create “multiple_response”, you may include a “selected_categories” key in the body, containing an array of category names that indicate the dichotomous selection. If you do not include “selected_categories”, there must be at least one “selected”: true category in the subvariables you are binding into the multiple-response variable to indicate the dichotomous selection–see Object Reference#categories. If neither are true, the request will return 400 status.

The “Countries Visited” multiple response variable may now be included in analyses like any other variable, but with a noticeable difference. Rather than contributing a dimension of distinct categories, it instead contributes a dimension of distinct subvariables. For example, a crosstab of a 1-dimensional “Gender” variable with a 1-dimensional “Education” variable yields a 2-D cube: one dimension of the categories of Gender and one dimension of the categories of Education. A crosstab of 1-D “Gender” by the multiple response “Countries Visited” also yields a 2-D cube: one dimension of the categories of Gender but the other dimension has one entry for “USA”, one for “Germany”, one for “Japan”, and one for “None of the above”.

A quirk of multiple response variables is that analyses of them often require knowledge across subvariables: which rows had any subvariable selected, which rows had no subvariable selected, and which rows had all subvariables marked as “missing”. The Crunch system calculates these ancillary “subvariables” for you, and includes them in analysis output. Including an explicit “None of the above” subvariable in the example above complicates this, since Crunch has no way of knowing to treat such subvariables specially; it will faithfully consider the “None of the above” subvariable like any other subvariable when calculating the any/none/missing views. Depending on your application, you may wish to 1) not even include that option in your survey, 2) skip adding that variable to your Crunch dataset, 3) add it but do not bind it into the parent array variable, or 4) include it and have it be treated like any other multiple response subvariable in your analyses.

Non-uniform basis

As presented above, multiple response variables assume that subvariables have a consistent, uniform basis or number of rows in each subvariable. In some cases, the number of valid and missing entries may be wildly different from one subvariable to the next. In a survey example, a new response may be added to a longer-running series, or different responses may be presented to subsets of respondents in the context of an experiment. The boolean field uniform_basis, if false, provides a hint to users that, rather than using the __any__ column (from the selections function output) in an analysis query, they should instead calculate the basis per subvariable by summing the ‘selected’ and ‘not selected’ categories. The field’s default is true.

Adding new subvariables

In the scenario that a variable was left out when creating an array variable, it is possible to modify the array variable so that new subvariables get added (always on the last position).

To do so, the subvariable-to-be should currently be a variable of the dataset and have the same type as the subvariables (“categorical”).

Send a PATCH request containing the url of the new subvariable with an empty object as its tuple:

{
  ...
  "index": {
      "http://.../url/new/subvariable/": {}
  }
}

A 204 response will indicate that the catalog was updated, and the new subvariable now is part of the array variable.

Multidimensional analysis

In the Crunch system, any analysis is also referred to as a “cube”. Cubes are the mechanical means of representing analyses to and from the Crunch system; you can think of them as spreadsheets that might have other than two dimensions. A cube consists of two primary parts: “dimensions” which supply the cube axes, and “measures” which populate the cells. Although both the request and response include dimensions and measures, it is important to distinguish between them. The request supplies expressions for each, while the response has data (and metadata) for each. The request declares what variables to use and what to do with them, while the response includes and describes the results. See Object Reference:Cube for complete details.

Dimensions

Each dimension of an analysis can be simply one variable, a function over it, a traversal of its subvariables (for array variables), or even a combination of multiple variables (e.g. A + B). Any expression you can use in a “select” command can be used as a dimension. The big difference is that the system will consider the distinct values rather than all values of the result. Variables which are already “categorical” or “enumerated” will simply use their “categories” or “elements” as the extent. Other variables form their extents from their distinct values.

For example, if “3ffd45” is a categorical variable with three categories (one of which is “No Data”: -1), then the following dimension expressions:

{
    "dimensions": [
        {"variable": "datasets/ab8832/variables/3ffd45/"},
        {"function": "+", "args": [{"variable": "datasets/ab8832/variables/2098f1/"}, {"value": 5}]}
    ]
}

…would form a result cube with two dimensions: one using the categories of variable “3ffd45”, and one using the distinct values of (variable “2098f1” + 5). If variable “2098f1” has the distinct values [5, 15, 25, 35], then we would obtain a cube with the following extents:

	1	2	-1
5
15
25
35

Each dimension used in a cube query needs to be reduced to distinct values. For categorical or enumerated variables, we only need to refer to the variable, and the system will automatically use the “categories” or “elements” metadata to determine the distinct values. For other types, the default is to scan the variable’s data to find the unique values present and use those. Often, however, we want a more sophisticated approach: numeric variables, for example, are usually more useful when binned into a handful of ranges, like “0 to 10, 10 to 20, …90 to 100” rather than 100 distinct points (or many more when dealing with non-integers). The available dimensioning functions vary from type to type; the most common are:

categorical: {“variable”: url}
text: {“variable”: url}
numeric: Group the distinct values into a smaller number of bins via:
- {“function”: “bin”, “args”: [{“variable”: url}]}
datetime: Roll up seconds into hours, days into months, or any other grouping via:
- {“function”: “rollup”, “args”: [{“variable”: url}, {“value”: variable.rollup_resolution}]}
categorical_array:
- One dimension for the subvariables: {“each”: url}
- One dimension for the categories: {“variable”: url}
multiple response:
- One dimension for the subvariables: {“each”: url}
- One dimension for the selected-ness, which means transforming the array from a set of arbitrary categories to a standard “selected” set of categories (1, 0, -1) via:
  - {“function”: “selections”, “args”: [{“variable”: url}]}

Measures

A set of named functions to populate each cell of the cube. You can request multiple functions over the same dimensions (such as “cube_mean” and “cube_stddev”) or more commonly just one (like “cube_count”). For example:

{"measures": {"count": {"function": "cube_count", "args": []}}}

or:

    {"measures": {
        "mean": {"function": "cube_mean", "args": [{"variable": "datasets/1/variables/3"}]},
        "stddev": {"function": "cube_stddev", "args": [{"variable": "datasets/1/variables/3/"}]}
    }}

When applied to the dimensions we defined above, this second example might fill the table thusly for the “mean” measure:

mean	1	2	-1
5	4.3	12.3	8.1
15	13.1	0.0	9.2
25	72.4	4.2	55.5
35	8.9	9.1	0.4

…and produce a similar one for the “stddev” measure. You can think of multiple measures as producing “overlays” over the same dimensions. However, the actual output format (in JSON) is more compact in that the dimensions are not repeated; see Object Reference:Cube output for details.

ZCL expressions are composable. If you need, for example, to find the mean of a categorical variable’s “numeric_value” attributes, cast the variable to the “numeric” type class before including it as the cube argument:

{"measures": {
    "mean": {
        "function": "cube_mean",
        "args": [{
            "function": "cast",
            "args": [
                {"variable": "datasets/1/variables/3"},
                {"class": "numeric"}
            ]
        }]
    }
}}

Comparisons

Occasionally, it is useful to compare analyses from different sources. A common example is to define “benchmarks” for a given analysis, so that you can quickly compare an analysis to an established target. These are, in effect, one analysis laid over another in such a way that at least one of their dimensions lines up (and typically, using the same measures). These are also therefore defined in terms of cubes: one set which defines the base analyses, and another which defines the overlay.

For example, if we have an analysis over two categorical variables “88dd88” and “ee4455”:

{
    "dimensions": [
        {"variable": "../variables/88dd88/"},
        {"variable": "../variables/ee4455/"}
    ],
    "measures": {"count": {"function": "cube_count", "args": []}}
}

then we might obtain a cube with the following output:

	1	2	-1
1	15	12	9
2	72	8	3
3	23	4	17

Let’s say we then want to overlay a comparison showing benchmarks for 88dd88 as follows:

	1	2	-1	benchmarks
1	15	12	9	20
2	72	8	3	70
3	23	4	17	10

Our first pass at this might be to generate the benchmark targets in some other system, and hand-enter them into Crunch. To accomplish this, we need to define a comparison. First, we need to define the “bases”: the cube(s) to which our comparison applies, which in our case is just the above cube:

{
    "name": "My benchmark",
    "bases": [{
        "dimensions": [{"variable": "88dd88"}],
        "measures": {"count": {"function": "cube_count", "args": []}}
    }]
}

Notice, however, that we’ve left out the second dimension. This means that this comparison will be available for any analysis where “88dd88” is the row dimension. The base cube here is a sort of “supercube”: a superset of the cubes to which we might apply the comparison. We include the measure to indicate that this comparison should apply to a “cube_count” (frequency count) involving variable “88dd88”.

Then, we need to define target data. We are supplying these in a hand-generated way, so our measure is simply a static column instead of a function:

{
    "overlay": {
        "dimensions": [{"variable": "88dd88"}],
        "measures": {
            "count": {
                "column": [20, 70, 10],
                "type": {"function": "typeof", "args": [{"variable": "88dd88"}]}
            }
        }
    }
}

Note that our overlay has to have a dimension, too. In this case, we simply re-use variable “88dd88” as the dimension. This ensures that our target data is interpreted with the same category metadata as our base analysis.

We POST the above to datasets/{id}/comparisons/ and can obtain the overlay output at datasets/{id}/comparisons/{comparison_id}/cube/. See the Endpoint Reference for details.

Multitables

GET datasets/{id}/multitables/ HTTP/1.1

200 OK
{
    "element": "shoji:catalog",
    "index": {
        "1/": {"name": "Major demographics"},
        "2/": {"name": "Political tendencies"}
    }
}

POST datasets/{id}/multitables/ HTTP/1.1

{
    "element": "shoji:entity",
    "body": {
        "name": "Geographical indicators",
        "template": [
            {
                "query": [
                    {
                        "variable": "../variables/de85b32/"
                    }
                ]
            },
            {
                "query": [
                    {
                        "variable": "../variables/398620f/"
                    }
                ]
            },
            {
                "query": [
                    {
                        "function": "bin",
                        "args": [
                            {
                                "variable": "../variables/398620f/"
                            }
                        ]
                    }
                ]
            }
        ],
        "is_public": false
    }
}

201 Created
Location: datasets/{id}/multitables/3/

Analyses as described above are truly multidimensional; when you add another variable, the resulting cube obtains another dimension. Sometimes, however, you want to compare analyses side by side, typically looking at several (even all) variables against a common set of conditioning variables. For example, you might nominate “Gender”, “Age”, and “Race” as the conditioning variables and cross every other variable with those, in order to quickly discover common correlations.

Multi-table definitions mainly provide a template member that clients can use to construct a valid query with the variable(s) of interest.

Crunch provides a separate catalog where you can define and manage these common sets of variables. Like most catalogs, you can GET it to see which multitables are defined.

Template query

A multitable is a set of queries that form groups of ‘columns’ for different later chosen ‘row’ variables. It is defined by a name and a template. At minimum the template must contain a query fragment: this will be later inserted after some function of a row variable to form the dimension of a result. Each template dimension can currently only be a function of one variable.

GET datasets/{id}/multitable/3/ HTTP/1.1

{
    "element": "shoji:entity",
    "body": {
        "name": "Geographical indicators",
        "template": [
            {
                "query": [
                    {
                        "variable": "../variables/de85b32/"
                    }
                ]
            },
            {
                "query": [
                    {
                        "variable": "../variables/398620f/"
                    }
                ]
            },
            {
                "query": [
                    {
                        "function": "bin",
                        "args": [
                            {
                                "variable": "../variables/398620f/"
                            }
                        ]
                    }
                ]
            }
        ]
    }
}

Each multi-table template may be a list of variable references and other information used to construct the dimension and transform its output.

Transforming analyses for presentation

The transform member of an analysis specification (or multitable definition) is a declarative definition of what the dimension should look like after computation. The cube result dimension itself will always be derived from the query part of the request ({variable: $variableId}), {function: f, args: [$variableId, …]}, &c., after which clients should do what is necessary to arrive at the transformed result — changing element names, orders, etc.

Structure

A transform can contain elements or categoriees, which is an array of target transforms for output-dimension elements. Therefore to create a valid element/category transform it is generally necessary to make a cube query, inspect the result dimension, and proceed from there. For categorical and multiple response variables, elements may also be obtained from the variable entity.

Transforms are designed for variables that are more stable than not, with element ids that inhere in the underlying elements, such as category or subvariable ids. Dynamic elements such as results of binning a numeric variable, may not be transformed.

Transformations stored on a variable’s view are the default transforms for that variable. They may be shorter, alternate versions of category names, or contain insertions, described below.

Insertions

In addition to transforming the categories or elements already defined on a cube ‘dimension’, it is possible to insert headings and subtotals to the result. These insertions are attached after an anchor element/category id.

Insertions are processed last, after renaming, reordering, or sorting elements according to the elements/categories transform specification. They are “attached” to their anchor, always following it in the result; or, simply appended to the end of the result. If the result is sorted by some column’s value, it may make the most sense to choose to display insertions last, rather than inserting them into a result table because their values will not be considered when sorting the non-inserted elements themselves.

An insertion is defined by an anchor and a name, which will be displayed alongside the names of categories/elements. It may also contain "function": { "combine": []}, where array arguments are the ids of elements to combine as “subtotals”.

Use an anchor of 0 to indicate an insertion before other results. Any anchor other than 0 that does not match an id in the elements/categories will be included at the end of results.

Examples

Consider the following example result dimension:

Name	missing	id
Element A		0
Element B		1
Element C		2
Don’t know		3
Not asked	true	4

An element transform can specify a new order of output elements, new names, and in the future, bases for hypothesis testing, result sorting, and aggregation of results. A transform has elements that look generally like the dimension’s extent, with some optional properties:

id: (required) id of the target element/category
name: name of new target element/category
sort: -1 or 1 indicating to sort results descending or ascending by this element
compare: neq, leq, geq indicating to test other rows/columns against the hypothesis that they are ≠, ≤, or ≥ to the present element
hide: suppress this element’s row/column from displaying at all. Defaults to false for valid elements, true for missing, so that if an element is added, it will be present until a transform with hide: true is added to suppress it.

A transform with object members can do lots of things. Suppose we want to put Element C first, hide the Don’t know, and more compactly represent the result as just C, A, B:

{
    "transform": {"categories": [
        {"id": 2, "name": "C"},
        {"id": 0, "name": "A"},
        {"id": 1, "name": "B"},
        {"id": 3, "hide": true}
    ]}
}

Example transform in a saved analysis

In a saved analysis the transforms are an array in display_settings with the same extents output dimensions (as well as, of course, the query used to generate them). This syntax makes a univariate table of a multiple response variable and re-orders the result.

{
    "query": {
        "dimensions": [
            {
                "function": "selections",
                "args": [{"variable": "../variables/398620f/"}]
            },
            {"variable": "../variables/398620f/"}
        ],
        "measures": {
            "count": {"function": "cube_count", "args": []}
        }
    },
    "display_settings": {
        "transform": {
            "categories": [{
                "id": "f007",
                "value": "My preferred first item"
            },
            {
                "id": "fee7",
                "value": "The zeroth response"
            },
            {
                "id": "c001",
                "name": "Third response"
            }],
            "insertions": [
                {"anchor": "fee7", "name": "Feet", "function": {"combine": ["f00t", "fee7"]}}
            ]
        }
    }
}

Example transform in a multitable template

In a multitable, the transform is part of each dimension definition object in the template array.

{
    "template": [
        {
            "query": [
                {"variable": "A"}
            ],
            "transform": [{}, {}]
        },
        {
            "query": [
                {
                    "function": "rollup",
                    "args": [
                        {"value": "M"},
                        {"variable": "B"}
                    ]
                }
            ]
        }
    ]
}

More complex multitable templates

The template may contain in addition to variable references and their query arguments, an optional transform: To obtain their multiple output cubes, you GET datasets/{id}/cube?query=<q> where <q> is a ZCL object in JSON format (which must then be URI encoded for inclusion in the querystring). Use the “each” function to iterate over the overview variables’ query, producing one output cube for each one as “variable x”. For example, to cross each of the above 3 variables against another variable “449b421”:

{
    "function": "each",
    "args": [
        {
            "value": "x"
        },
        [
            {
                "variable": "de85b32"
            },
            {
                "variable": "398620f"
            },
            {
                "variable": "c116a77"
            }
        ]
    ],
    "block": {
        "function": "cube",
        "args": [
            [
                {
                    "variable": "449b421"
                },
                {
                    "variable": "x"
                }
            ],
            {
                "map": {
                    "count": {
                        "function": "cube_count",
                        "args": []
                    }
                }
            },
            {
                "value": null
            }
        ]
    }
}

The result will be an array of output cubes:

{
    "element": "shoji:view",
    "value": [
        {
            "query": {},
            "result": {
                "element": "crunch:cube",
                "dimensions": [
                    {
                        "references": "449b421",
                        "type": "etc."
                    },
                    {
                        "references": "de85b32",
                        "type": "etc."
                    }
                ],
                "measures": {
                    "count": {
                        "function": "cube_count",
                        "args": []
                    }
                }
            }
        },
        {
            "query": {},
            "result": {
                "element": "crunch:cube",
                "dimensions": [
                    {
                        "references": "449b421",
                        "type": "etc."
                    },
                    {
                        "references": "398620f",
                        "type": "etc."
                    }
                ],
                "measures": {
                    "count": {
                        "function": "cube_count",
                        "args": []
                    }
                }
            }
        },
        {
            "query": {},
            "result": {
                "element": "crunch:cube",
                "dimensions": [
                    {
                        "references": "449b421",
                        "type": "etc."
                    },
                    {
                        "references": "c116a77",
                        "type": "etc."
                    }
                ],
                "measures": {
                    "count": {
                        "function": "cube_count",
                        "args": []
                    }
                }
            }
        }
    ]
}

Versioning Datasets

All Crunch datasets keep track of the changes you make to them, from the initial import, through name changes and deriving new variables, to appending new rows. You can review the changes to see who did what and when, revert to a previous version, “fork” a dataset to make a copy of it, make changes to the copy, and merge those changes back into the original dataset.

Actions

The list of changes are available in the dataset/{id}/actions/ catalog. GET it and sort/filter by the “datetime” and/or “user” members as desired. Follow the links to an individual action entity to get exact details about what changed.

Viewing Changes Diff

Through the actions catalog it’s possible to retrieve the differences of a “fork” dataset from its “upstream” dataset.

Two endpoints are provided to do so, the dataset/{id}/actions/since_forking and the dataset/{id}/actions/upstream_delta endpoints.

The dataset/{id}/actions/since_forking endpoint will return the state of the fork and the upstream and the the list of actions that were performed on the fork since the two diverged.

>>> forkds.actions.since_forking
pycrunch.shoji.View(**{
    "self": "https://app.crunch.io/api/datasets/051ebb979db44523822ffe29236a6670/actions/since_forking/",
    "value": {
        "dataset": {
            "modification_time": "2017-02-16T11:01:41.807000+00:00",
            "revision": "58a586950183667486130f0c",
            "id": "051ebb979db44523822ffe29236a6670",
            "name": "My fork"
        },
        "actions": [
            {
                "hash": "2a863871-c809-4cad-a20c-9fea86b9e763",
                "state": {
                    "failed": false,
                    "completed": true,
                    "played": true
                },
                "params": {
                    "variable": "fab0c81d16b442089cc50019cf610961",
                    "definition": {
                        "alias": "var1",
                        "type": "text",
                        "name": "var1",
                        "id": "fab0c81d16b442089cc50019cf610961"
                    },
                    "dataset": {
                        "id": "051ebb979db44523822ffe29236a6670",
                        "branch": "master"
                    },
                    "values": [
                        "sample sentence",
                        "sample sentence",
                        "sample sentence",
                        "sample sentence",
                        "sample sentence",
                        "sample sentence",
                        "sample sentence"
                    ],
                    "owner_id": null
                },
                "key": "Variable.create"
            }
        ],
        "upstream": {
            "modification_time": "2017-02-16T11:01:40.131000+00:00",
            "revision": "58a586940183667486130efc",
            "id": "2730c0744cba4d7c9acc9f3551380e49",
            "name": "My Dataset"
        }
    },
    "element": "shoji:view"
})

GET /api/datasets/5de96a/actions/since_forking HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
Content-Length: 1769

{
    "element": "shoji:view",
    "value": {
        "dataset": {
            "modification_time": "2017-02-16T11:01:41.807000+00:00",
            "revision": "58a586950183667486130f0c",
            "id": "051ebb979db44523822ffe29236a6670",
            "name": "My fork"
        },
        "actions": [
            {
                "hash": "2a863871-c809-4cad-a20c-9fea86b9e763",
                "state": {
                    "failed": false,
                    "completed": true,
                    "played": true
                },
                "params": {
                    "variable": "fab0c81d16b442089cc50019cf610961",
                    "definition": {
                        "alias": "var1",
                        "type": "text",
                        "name": "var1",
                        "id": "fab0c81d16b442089cc50019cf610961"
                    },
                    "dataset": {
                        "id": "051ebb979db44523822ffe29236a6670",
                        "branch": "master"
                    },
                    "values": [
                        "sample sentence",
                        "sample sentence",
                        "sample sentence",
                        "sample sentence",
                        "sample sentence",
                        "sample sentence",
                        "sample sentence"
                    ],
                    "owner_id": null
                },
                "key": "Variable.create"
            }
        ],
        "upstream": {
            "modification_time": "2017-02-16T11:01:40.131000+00:00",
            "revision": "58a586940183667486130efc",
            "id": "2730c0744cba4d7c9acc9f3551380e49",
            "name": "My Dataset"
        }
    }
}

The dataset/{id}/actions/upstream_delta endpoint usage and response matches the one of the other endpoint, but the returned actions are instead the ones that were performed on the upstream since the two diverged.

Savepoints

You can snapshot the current state of the dataset at any time with a POST to datasets/{id}/savepoints/. This marks the current point in the actions history, allowing you to provide a description of your progress.

The response will contain a Location header that will lead to the new version created.

In case creating the new version can be created fast enough a 201 response will be issued, when the new version takes too long a 202 response will be issued and the creation will proceed in background. In case of a 202 response the body will be a Shoji:view containing a progress URL where you may query the progress.

>>> svp = ds.savepoints.create({"body": {"description": "TestSVP"}})
pycrunch.shoji.Entity(**{
    "body": {
        "creation_time": "2017-05-09T14:18:07.761000+00:00", 
        "version": "master__000003", 
        "user_name": "captain-68305620", 
        "description": "", 
        "last_update": "2017-05-09T14:18:07.761000+00:00"
    }, 
    "self": "http://local.crunch.io:19404/api/datasets/5283e3f4e3d645c0a750c09e854bdcb1/savepoints/6fbe47c97d8e4290a0c09227d6d6b63a/", 
    "views": {
        "revert": "http://local.crunch.io:19404/api/datasets/5283e3f4e3d645c0a750c09e854bdcb1/savepoints/6fbe47c97d8e4290a0c09227d6d6b63a/revert/"
    }, 
    "element": "shoji:entity"
})

There is no guarantee that creating a savepoint will lead to a savepoint that points to the exact revision the dataset was when the POST was issued. This is because the dataset might have moved forward in the meanwhile. For this reason instead of reponding with a Location header that points to an exact savepoint, the POST savepoints endpoint will respond with Location header that points to /progress/{operation_id}/result URL, which when accessed will redirect to the nearest savepoint for that revision.

Reverting savepoints

You can revert to any savepoint version (throwing away any changes since that time) with a POST to /datasets/{dataset_id}/savepoints/{version_id}/revert/.

It will return a 202 response with a Shoji:view containing a progress URL on its value where the asynchronous job’s status can be observed.

Forking and Merging

A common pattern when collaborating on a dataset is for one person to make changes on their own and then, when all is ready, share the whole set of changes back to the other collaborators. Crunch implements this with two mechanisms: the ability to “fork” a dataset to make a copy, and then “merge” any changes made to it back to the original dataset.

To fork a dataset, POST a new fork entity to the dataset’s forks catalog.

>>> ds.forks.index
{}
>>> forked_ds = ds.forks.create({"body": {"name": "My fork"}}).refresh()
>>> ds.forks.index.keys() == [forked_ds.self]
True
>>> ds.forks.index[forked_ds.self]["name"]
"My fork"

The response will be a 201 response if the fork could happen in the allotted time limit for the request or a 202 if the fork requires too much time and is going to continue in background. Both cases will include a Location header with the URL of the new dataset that has been forked from the current one.

POST /api/datasets/{id}/forks/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
Content-Length: 231

{
    "element": "shoji:entity",
    "body": {"name": "My fork"}
}

----

HTTP/1.1 201 Created
Location: https://app.crunch.io/api/datasets/{forked_id}/

In case of a 202, in addition to the Location headers with the URL of the fork that is going to be created, the response will contain a Shoji view with the url of the endpoint that can be polled to track fork completion

POST /api/datasets/{id}/forks/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
Content-Length: 231

{
    "element": "shoji:entity",
    "body": {"name": "My fork"}
}

----

HTTP/1.1 202 Accepted
Location: https://app.crunch.io/api/datasets/{forked_id}/
...
{
    "element": "shoji:view",
    "value": "/progress/{progress_id}/"
}

The forked dataset can then be viewed and altered like the original; however, those changes do not alter the original until you merge them back with a POST to datasets/{id}/actions/.

ds.actions.post({
    "element": "shoji:entity",
    "body": {"dataset": forked_ds.self, "autorollback": True}
})

POST /api/datasets/5de96a/actions/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
Content-Length: 231

{
    "element": "shoji:entity",
    "body": {
        "dataset": {forked ds URL},
        "autorollback": true
    }
}

----

HTTP/1.1 204 No Content

*or*

HTTP/1.1 202 Accepted

{
    "element": "shoji:view",
    "self": "https://app.crunch.io/api/datasets/5de96a/actions/",
    "value": "https://app.crunch.io/api/progress/912ab3/"
}

The POST to the actions catalog tells the original dataset to replay a set of actions; since we specify a “dataset” url, we are telling it to replay all actions from the forked dataset. Crunch keeps track of which actions are already common between the two datasets, and won’t try to replay those. You can even make further changes to the forked dataset and merge again and again.

Use the “autorollback” member to tell Crunch how to handle merge conflicts. If an action cannot be replayed on the original dataset (typically because it had conflicting changes or has been rolled back), then if “autorollback” is true (the default), the original dataset will be reverted to the previous state before any of the new changes were applied. If “autorollback” is false, the dataset is left to the last action that it could successfully play, which allows you to investigate the problem, repair it if possible (in either dataset as needed), and then POST again to continue the merge from that point.

Per-user settings (filters, decks and slides, variable permissions etc) are copied to the new dataset when you fork. However, changes to them are not merged back at this time. Please reach out to us as you experiment so we can fine-tune which details to fork and merge as we discover use cases.

Merging actions may take a few seconds, in which case the POST to actions/ will return 204 when finished. Merging many or large actions, however, may take longer, in which case the POST will return 202 with a Location header containing the URL of a progress resource.

Filtered Merges

When merging actions it is possible to provide a filter to select which actions should be replayed from the other dataset. It is currently possible to filter them by key and by hash.

When filtering by hash, only the provided actions will be merged:

ds.actions.post({
    "element": "shoji:entity",
    "body": {"dataset": forked_ds.self,
             "filter": {"hash": ["000003"]}}
})

When filtering by key, only the actions that are part of that category will be merged:

ds.actions.post({
    "element": "shoji:entity",
    "body": {"dataset": forked_ds.self,
             "filter": {"key": ["Variable.create"]}}
})

Recording the filtered actions

If you know that you are going to merge from the same two datasets multiple times it is possible to tell crunch to remember the filtered actions so that a subsequent merge to the same target won’t try to apply them again if they were skipped in a previous merge.

This behaviour can be changed by providing remember: True option to the filter, which means that the filtered actions will be recorded and a subsequent merge won’t try to apply them to the target if they are not explicitly filtered again.

ds.actions.post({
    "element": "shoji:entity",
    "body": {"dataset": forked_ds.self,
             "remember": True,
             "filter": {"key": ["Variable.create"]}}
})

Note that only the actions skipped during this merge are recorded, so the previous example won’t skipp all the Variable.create actions forever, but will only remember the action that was skipped at that time.

Endpoint Reference

Public

/

/public/

{
    "views": {
        "signup_resend": "https://app.crunch.io/api/public/signup_resend/",
        "inquire": "https://app.crunch.io/api/public/inquire/",
        "password_reset": "https://app.crunch.io/api/public/password_reset/",
        "signup": "https://app.crunch.io/api/public/signup/",
        "oauth2redirect": "https://app.crunch.io/api/public/oauth2redirect/",
        "change_email": "https://app.crunch.io/api/public/change_email/",
        "login": "https://app.crunch.io/api/public/login/",
        "config": "https://app.crunch.io/api/public/config/",
        "password_change": "https://app.crunch.io/api/public/password_change/"
    }
}

Application configuration

GET /public/config/

When accessing Crunch from a configured application via its subdomain:

https://mycompany.crunch.io/api/public/config/

A GET request on /public/config/ return a Shoji Entity with the subdomain’s available configurations, if any; if none exists, the body will be empty.

{
    "element": "shoji:entity",
    "body": {
        "name": "Your Company",
        "logo": {
            "small": "https://s.crunch.io/logos/yours_small.png",
            "large": "https://s.crunch.io/logos/yours_large.png"
        },
        "palette": {
            "brand": {
                "primary": "#FFAABB",
                "secondary": "#G4EEBB",
                "message": "#BAA5E7"
            }
        },
        "manifest": {}
    }
}

CrunchBox

A CrunchBox represents a snapshot of a crunch dataset. These snapshots are intended for public proliferation and therefore the endpoints for interacting with this data is housed under the unauthed API path.

The share endpoint is for retrieving the HTML code for rendering the share page, complete with the meta data utilized by social sharing platform crawlers in constructing a share-preview. Among this metadata is a url to a preview image of the rendered CrunchBox.

GET /crunchbox/share/ HTTP/1.1

Required parameters for this endpoint:

Parameter	Type	Description
data	string	CrunchBox widget url (URL encoded) e.g. “https%3A%2F%2Fs.crunch.io%2Fwidget%2Findex.html%23%2Fds%2Fa1b2c3d4e5f6g7h8%2Frow%2F000001%2Fcolumn%2F000000” (the encoded string of “https://s.crunch.io/widget/index.html#/ds/a1b2c3d4e5f6g7h8/row/000001/column/000000”)

Optional parameters for this endpoint:

Parameter	Type	Description
ref	string	referring url (URL encoded) to pull content from the referring page for inclusion on the CrunchBox share page and provide a link back to the referrer e.g. “http%3A%2F%2Fcrunch.io%2Fcrunching-the-data-of-politics” (the encoded string of “http://crunch.io/crunching-the-data-of-politics”)

Preview

The preview endpoint is used to preemptively initiate rendering a given CrunchBox configuration to a raster image. This image will be requested by social network platform crawlers during construction of the post share preview. The preview-rendering process can be time-consuming. Therefore, it is preferable to initiate it as soon as is reasonable before a request for the image data.

This endpoint returns no data.

POST /crunchbox/preview/ HTTP/1.1

Parameter	Type	Description
data	string	CrunchBox widget url (URL encoded) e.g. “https%3A%2F%2Fs.crunch.io%2Fwidget%2Findex.html%23%2Fds%2Fa1b2c3d4e5f6g7h8%2Frow%2F000001%2Fcolumn%2F000000” (the encoded string of “https://s.crunch.io/widget/index.html#/ds/a1b2c3d4e5f6g7h8/row/000001/column/000000”)

Accounts

Accounts provide an organization-level scope for Crunch.io customers. All Users belong to one and only one Account. Account managers can administer their various users and entities and have visibility on them.

Permissions

A user is an “account manager” if their account_permissions have alter_users set to True.

Account entity

The account entity is available on the API root following the Shoji views.account path, which will point to the authenticated user’s account.

If the account has a name, it will be available here, as well as the path to the account’s users.

If the authenticated user is an account manager, the response will include paths to these additional catalogs: * Account projects * Account teams * Account datasets

GET

GET /account/

{
  "element": "shoji:entity",
  "body": {
    "name": "Account's name",
    "id": "abcd",
    "oauth_providers": [{
      "id": "provider",
      "name": "Service auth"
    }, {
      "id": "provider",
      "name": "Service auth"
    }]
  },
  "catalogs": {
    "teams": "http://app.crunch.io/api/account/teams/",
    "projects": "http://app.crunch.io/api/account/projects/",
    "users": "http://app.crunch.io/api/account/users/",
    "datasets": "http://app.crunch.io/api/account/datasets/",
    "applications": "http://app.crunch.io/api/account/applications/"
  }
}

Applications

GET /account/applications/

GET returns a Shoji Catalog with the list of all the configured subdomains an account has.

{
    "element":"shoji:catalog",
    "index": {
        "./mycompany/": {}
    }
}

POST a Shoji Entity here to make a new application. The subdomain:

must be unique system-wide, case insensitive
can only contain letters, numbers, and - (dash)
must be between 3 and 32 characters in length
cannot start with - or a number

If the requested subdomain is unavailable or invalid, the server will return a 400 response.

{
    "element": "shoji:entity",
    "body": {
      "name": "my company",
      "subdomain": "mycompany",
      "palette": {
          "brand": {
                "primary": "#FFAABB", // Color of links, interactable things
                "secondary": "#G4EEBB", // Titles and such
                "message": "#BAA5E7"
            }
      },
      "manifest": {}
    }
}

Attributes name and subdomain are required; palette and manifest are optional. Note that you cannot specify logos in the POST request. Use the created entity’s logo/ resource to upload the image files to the app (see below).

Application entity

GET /account/applications/app_id/

GET this endpoint for a Shoji Entity containing all details about the configured application.

{
    "element":"shoji:entity",
    "body": {
        "name": "Application name",
        "subdomain": "mycompany",
        "logos": {
            "small": "<URL>",
            "large": "<URL>",
            "favicon": "<URL>"
        },
        "palette": {
            "brand": {
                "primary": "#FFAABB", // Color of links, interactable things
                "secondary": "#G4EEBB", // Titles and such
                "message": "#BAA5E7"
            }
        },
        "manifest": {}
    },
    "views": {
        "logo": "https://app.crunch.io/api/account/applications/mycompany/logo/"
    }
}

PATCH this endpoint to change the name, palette, or manifest. Logos are controlled by the logo subresource.

Attribute	Type	Description
name	string	Name of the configured application on the given subdomain
logo	object	Contains two attributes, `large`, `small` and `favicon`, with different resolution company logos
palette	object	Contains three colors, `primary`, `secondary` and `message`, under the `brand` attribute to theme the web app
manifest	object	Optional, contains further client configurations

Change application logo

POST /account/applications/app_id/logo/

To set/change an application’s logo the client needs to make a multipart/form-data request containing either or both large and small fields containing the desired image files to use. Only account admins are authorized to change this resource.

POST /account/applications/app_id/logo/ HTTP/1.1
Content-Type: multipart/form-data; boundary=----------123456789
Content-Length: 500326

----------123456789
Content-Disposition: form-data; name="large"; filename="newlogo.jpg"
Content-Type: image/jpeg

xxxxxxxxxx
----------123456789
Content-Disposition: form-data; name="small"; filename="newlogo_small.jpg"
Content-Type: image/jpeg

xxxxxxxxxx
----------123456789--

HTTP/1.1 204

The server will update the images accordingly. The only valid file extensions are GIF, JPEG and PNG image files.

Account users

Provides a catalog of all the users that belong to this account. Any account member can GET, but only account managers can POST/PATCH on it.

GET

GET /account/users/

{
  "element": "shoji:catalog",
  "index": {
    "http://app.crunch.io/api/users/123/": {
      "id_method": "pwhash",
      "id_provider": null,
      "email": "email@example.com",
      "name": "Steve Austin",
      "dataset_permissions": {
        "view": true,
        "edit": false
      },
      "account_permissions": {
        "alter_users": false,
        "create_datasets": false
      }
    },
    "http://app.crunch.io/api/users/234/": {
      "id_method": "pwhash",
      "id_provider": null,
      "email": "email1@example.com",
      "name": "Shawn Michaels",
      "dataset_permissions": {
        "view": true,
        "edit": true
      },
      "account_permissions": {
        "alter_users": true,
        "create_datasets": true
      }
    },
    "http://app.crunch.io/api/users/345/": {
      "id_method": "oauth",
      "id_provider": "google",
      "email": "email2@example.com",
      "name": "Rocky Maivia",
      "dataset_permissions": {
        "view": true,
        "edit": true
      },
      "account_permissions": {
        "alter_users": false,
        "create_datasets": true
      }
    }
  }
}

POST

Account members can POST to the account’s users catalog to create new users. If the a user with the provided email address already exists in the application (on another account), the server will return a 400 response.

POST /account/users/

{
  "element": "shoji:entity",
  "body": {
      "email": "new_email@example.com",
      "name": "Initial name",
      "account_permissions": {
        "alter_users": false,
        "create_datasets": true
      },
      "teams": ["<list of team urls>"],
      "projects": ["<list of project urls>"],
      "id_method": "pwhash/oauth",
      "id_provider": "",
      "send_invite": true,
      "url_base": "http://app.crunch.io/"
  }
}

It is possible to create a user to belong to different teams or projects by including those teams or projects’ urls in the payload, for example:

{
  "element": "shoji:entity",
  "body": {
      "email": "new_email@example.com",
      "name": "Initial name",
      "account_permissions": {
        "alter_users": false,
        "create_datasets": true
      },
      "teams": ["https://app.crunch.io/api/teams/abc/", "https://app.crunch.io/api/teams/123/"],
      "projects": ["https://app.crunch.io/api/projects/def/"],
      "id_method": "pwhash"
  }
}

The teams and projects attributes are optional and can be omited or empty lists.

PATCH

PATCH to the users’ catalog allows account admins to edit users’ permissions in batch. It is only possible to change the account_permissions attribute. Additionally, it is possible to delete users from the account by sending null as their tuple.

PATCH /account/users/

{
  "element": "shoji:catalog",
  "index": {
    "http://app.crunch.io/api/users/123/": {
      "account_permissions": {
        "alter_users": false,
        "create_datasets": false
      }
    },
    "http://app.crunch.io/api/users/234/": null
  }
}

Account datasets

Only account managers have access to this catalog. It is a read only shoji catalog containing all the datasets that users of this account have created (potentially very large catalog).

Account managers have implicit editor access to all the account datasets.

GET /account/datasets/

{
  "element": "shoji:catalog",
  "index": {
        "https://app.crunch.io/api/datasets/cc9161/": {
            "owner_name": "James T. Kirk",
            "name": "The Voyage Home",
            "description": "Stardate 8390",
            "archived": false,
            "size": {
                "rows": 1234,
                "columns": 67
            },
            "is_published": true,
            "id": "cc9161",
            "owner_id": "https://app.crunch.io/api/users/685722/",
            "start_date": "2286",
            "end_date": null,
            "streaming": "no",
            "creation_time": "1986-11-26T12:05:00",
            "modification_time": "1986-11-26T12:05:00",
            "current_editor": "https://app.crunch.io/api/users/ff9443/",
            "current_editor_name": "Leonard Nimoy"
        },
        "https://app.crunch.io/api/datasets/a598c7/": {
            "owner_name": "Spock",
            "name": "The Wrath of Khan",
            "description": "",
            "archived": false,
            "size": {
                "rows": null,
                "columns": null
            },
            "is_published": true,
            "id": "a598c7",
            "owner_id": "https://app.crunch.io/api/users/af432c/",
            "start_date": "2285-10-03",
            "end_date": "2285-10-20",
            "streaming": "no",
            "creation_time": "1982-06-04T09:16:23.231045",
            "modification_time": "1982-06-04T09:16:23.231045",
            "current_editor": null,
            "current_editor_name": null
        }
  }
}

Account projects

This catalog is available for account managers and lists all the projects that the users have created. Account managers have implicit edit access on all projects.

GET /account/projects/

{
  "element": "shoji:catalog",
  "index": {
        "https://app.crunch.io/api/projects/cc9161/": {
          "name": "Project 1",
          "id": "cc9161",
          "owner": "http://app.crunch.io/api/users/abcdef/"
        },
        "https://app.crunch.io/api/projects/a598c7/": {
          "name": "Project 2",
          "id": "a598c7",
          "owner": "http://app.crunch.io/api/users/123456/"
        }
  }
}

Account teams

This catalog is available for account managers and lists all the teams that the users have created. Account managers have implicit edit access on all teams.

GET /account/teams/

{
  "element": "shoji:catalog",
  "index": {
        "https://app.crunch.io/api/teams/cc9161/": {
          "name": "Team 1",
          "id": "cc9161",
          "owner": "http://app.crunch.io/api/users/123456/"
        },
        "https://app.crunch.io/api/teams/a598c7/": {
          "name": "Team 2",
          "id": "a598c7",
          "owner": "http://app.crunch.io/api/users/123456/"
        }
  }
}

Account Collaborators

An account collaborator is a Crunch.io user that is not a member of your account and has access to some/any of your account’s datasets.

Account admins can visit the account’s collaborators catalog to view the list of all collaborators for all datasets of the account.

GET /account/collaborators/

This catalog lists all the users that are not members of the account that have access to any of the account’s datasets, projects or teams.

Each element in the catalog tuple links to the user’s entity endpoint and has the name and email attribute.

{
  "element": "shoji:catalog",
  "index": {
        "https://app.crunch.io/api/users/cc9161/": {
          "name": "John doe",
          "email": "user1@example.com",
          "active": true,
        },
        "https://app.crunch.io/api/users/a598c7/": {
          "name": "John notdoe",
          "email": "user2@example.com",
          "active": true,
        }
  }
}

Collaborators order

GET /account/collaborators/order/

It is possible to group collaborators using a Shoji order.

It is possible to PATCH the graph attribute with a standard shoji order payload indicating the groups and collaborators (user URLs) for each group.

Collaborators datasets

The full list of datasets a collaborator has access to is available through its user’s entity endpoint by following the visible_datasets catalog.

Batches

Catalog

/datasets/{id}/batches/

GET

A GET request on this resource returns a Shoji Catalog enumerating the batches present in the Dataset. Each tuple in the index includes a “status” member, which may be one of “analyzing”, “conflict”, “error”, “importing”, “imported”, or “appended”.

{
    "element": "shoji:catalog",
    "self": "...datasets/837498a/batches/",
    "index": {
        "0/": {"status": "appended"},
        "2/": {"status": "error"},
        "3/": {"status": "importing"}
    }
}

POST

A POST to this resource adds a new batch. The request payload can contain (1) the URL of another Dataset, (2) the URL of a Source object, or (3) a Crunch Table definition with variable metadata, row data, or both.

A successful request will return either 201 status, if sufficiently fast, or 202, if the task is large enough to require processing outside of the request cycle. In both cases, the newly created batch entity’s URL is returned in the Location header. The 202 response contains a body with a Progress resource in it; poll that URL for updates on the completion of the append. See Progress.

Batches are created in analyzing state and will be advanced through importing, imported, and appended states if there are no problems. If there was a problem in processing it, its status will be conflict or error. Note that the response status code will always be 202 for asynchronous or 201 for synchronous creation of the batch whether there were conflicts or not. So you need to GET the new batch’s URL to see if the data is good to go (status appended).

If an append is already in process on the dataset, the POST request will return 409 status.

Appending a dataset

To append a Dataset, POST a Shoji Entity with a dataset URL. You must have at least view (read) permissions on this dataset. Internally, this action will create a Source entity pointing to that dataset.

{
  "element": "shoji:entity",
  "body": {
      "dataset": "<url>"
  }
}

The variables from the incoming dataset to be included by default will depend on the current user’s permissions. Those with edit permissions on the incoming dataset will append all public and hidden (discarded = true) variables. Those with only view permissions will just include public variables that aren’t hidden.

To append only certain variables from the incoming dataset, include an where attribute in the entity body. See Frame functions for how to compose the where expression.

{
  "element": "shoji:entity",
  "body": {
      "dataset": "<url>",
      "where": {
          "function":"select",
          "args": [
                {"map":
                    {"000001": {"variable": "<url>"},
                     "000002": {"variable": "<url>"}}
                }
          ]
      }
  }
}

Users with edit permissions on the incoming dataset can select hidden variables to be included, but viewers cannot. Editors and viewers can however both specify their personal variables to be included.

To select a subset of rows to append, include an filter attribute in the entity body, containing a Crunch filter expression.

{
  "element": "shoji:entity",
  "body": {
      "dataset": "<url>",
      "where": {
          "function":"select",
          "args": [
                {"map":
                    {"000001": {"variable": "<url>"},
                     "000002": {"variable": "<url>"}}
                }
          ]
      },
      "filter": {
          "function":"<",
          "args": [
                {"variable": "<url>"},
                {"value": "<value>"}
          ]
      }  
  }
}

Appending a source

POST a Shoji Entity with a Source URL. The user must have permission to view the Source entity. Use Source appending to send data in CSV format that matches the schema of the Dataset.

{
  "element": "shoji:entity",
  "body": {
      "source": "<url>"
  }
}

Appending a Crunch Table

The variables IDs must match those of the target dataset since their types will be matched based on ID. The data is expected to match the target dataset’s variable types. This action will create a new Source entity, its name and description will match those provided on the JSON response, if not provided they’ll default to empty string.

{
  "element": "crunch:table",
    "name": "<optional string>",
    "description": "<optional string>",
    "data": {
      "var_url_1": [1, 2, 3, ...],
      "var_url_2": ["a", "b", ...]
    }
}

Append Failures

For single appends, if a batch fails, the dataset will be automatically reverted back to the state it was before the append; the batch is automatically deleted.

When multiple appends are performed in immediate succession, it’s not efficient to checkpoint the state of each one. In this case, only the first append is rolled back on failure.

Checking if an append will cause problems

/datasets/{id}/batches/compare/

An append cannot proceed if there are any conditions in the involved datasets that will cause ambiguous situations. If such datasets were to be appended the server will return a 409 response.

It is possible to verify these conditions before trying the append using the batches compare endpoint.

GET /datasets/4bc6af/batches/compare/?dataset=http://app.crunch.io/api/datasets/3e2cfb/

The response will contain a conflicts key that can contain either current, incoming or union depending on the type and location of the problem. The response status will always be 200, with conflicts, described below, or an empty body.

current refers to issues find on the dataset where new data would be added
incoming has issues on the far dataset that contains the new data to add
union expresses problems on the combined variables(metadata) of the final dataset after append.


{
    "union": {...},
    "current": {...},
    "incoming": {...}
}

A successful response will not contain any of the keys returning an empty object.

{}

The possible keys in the conflicts and verifications made are:

Variables missing alias: All variables should have a valid alias string. This will indicate the IDs of those that don’t.
Variables missing name: All variables should have a valid name string. This will indicate the IDs of those that don’t.
Variables with duplicate alias: In the event of two or more variables sharing an alias, they will be reported here. When this occurs as a union conflict, it is likely that names and aliases of a variable or subvariable in current and incoming are swapped (e.g., VariantOne:AliasOne, Variant1:Alias1 in current but VariantOne:Alias1, Variant1:AliasOne in incoming).
Variables with duplicate name: Variable names should be unique across non subvariables.
Subvariable in different arrays per dataset: If a subvariable is used for different arrays that are impossible to match, it will be reported here. User action will be needed to fix this.

For each of these, a list of variable IDs will be made available indicating the conflicting entities. Union conflicting ids generally refer to variables in the current dataset and may be referenced by alias in incoming.

Lining up datasets for append/combine

/datasets/align/

Given that some datasets may be close to being fit for appending but could need some work before proceeding, the align endpoint provides API expressions that can be used directly on the append steps as where parameter in order to avoid such conflicts.

Currently, this endpoint will provide an expression that will exclude the troubling variables from the append.

Exclude different arrays that may share subvariables by alias.
Exclude variables with matching aliases but different types.

Those are currently not allowed and would reject the append operation.

To use this endpoint, the client needs to provide a list of variables they wish to line up together as a list of lists.


[
  [
    {"variable": "http://app.crunch.io/api/datasets/abc/variables/123/"},
    {"variable": "http://app.crunch.io/api/datasets/def/variables/234/"},
    {"variable": "http://app.crunch.io/api/datasets/hij/variables/345/"}
  ],
  [
    {"variable": "http://app.crunch.io/api/datasets/abc/variables/678/"},
    {"variable": "http://app.crunch.io/api/datasets/def/variables/789/"},
    {"variable": "http://app.crunch.io/api/datasets/hij/variables/890/"}
  ],
  [
    {"variable": "http://app.crunch.io/api/datasets/abc/variables/1ab/"},
    {"variable": "http://app.crunch.io/api/datasets/def/variables/ab2/"},
    {"variable": "http://app.crunch.io/api/datasets/hij/variables/b23/"}
  ]
]

The example above indicates that the client wishes to line up three variables from three datasets as indicated by the groups.

From the input, the endpoint wil analyze the groups and return an expression which will only include those variables that can be appended without conflict among all of them. This expression is ready to be used as a where parameter on the append /batches/ endpoint.

The payload needs to be sent as JSON encoded variables POST parameter:

POST /datasets/align/

{
"element": "shoji:entity",
"body": {
    "variables": [
      [
        {"variable": "http://app.crunch.io/api/datasets/abc/variables/123/"},
        {"variable": "http://app.crunch.io/api/datasets/def/variables/234/"},
        {"variable": "http://app.crunch.io/api/datasets/hij/variables/345/"}
      ],
      [
        {"variable": "http://app.crunch.io/api/datasets/abc/variables/678/"},
        {"variable": "http://app.crunch.io/api/datasets/def/variables/789/"},
        {"variable": "http://app.crunch.io/api/datasets/hij/variables/890/"}
      ],
      [
        {"variable": "http://app.crunch.io/api/datasets/abc/variables/1ab/"},
        {"variable": "http://app.crunch.io/api/datasets/def/variables/ab2/"},
        {"variable": "http://app.crunch.io/api/datasets/hij/variables/b23/"}
      ]
    ]}
}

The response will be a 202 with a Progress resource in it; poll that URL for updates on the completion and follow Location once it completed. See Progress.

On completion the align response will be a shoji:view containing the where expression used for each dataset:

{
  "element": "shoji:view",
  "value": {
    "abc": {"function": "select", "args": [{"map": {
      "678": {"variable": "678"},
      "1ab": {"variable": "1ab"}
    }}]},
    "def": {"function": "select", "args": [{"map": {
      "789": {"variable": "789"},
      "ab2": {"variable": "ab2"}
    }}]},
    "hij": {"function": "select", "args": [{"map": {
      "890": {"variable": "890"},
      "b23": {"variable": "b23"}
    }}]}
  }
}

Following the example above, in the case that the first group could not be appended because conflicts between their variables, it will be excluded from the final expressions.

Later, using the expressions obtained, it is possible to append all the datasets to a new one without conflicts.

POST /datasets/abd/batches/

{
    "element": "shoji:entity",
    "body": {
      "dataset": "http://app.crunch.io/api/datasets/abc/",
      "where": {"function": "select", "args": [{"map": {
          "678": {"variable": "678"},
          "1ab": {"variable": "1ab"}
        }}]}
    }
}

POST /datasets/abd/batches/

{
    "element": "shoji:entity",
    "body": {
      "dataset": "http://app.crunch.io/api/datasets/def/",
      "where": {"function": "select", "args": [{"map": {
          "789": {"variable": "789"},
          "ab2": {"variable": "ab2"}
        }}]}
    }
}

POST /datasets/abd/batches/

{
    "element": "shoji:entity",
    "body": {
      "dataset": "http://app.crunch.io/api/datasets/hij/",
      "where": {"function": "select", "args": [{"map": {
          "890": {"variable": "890"},
          "b23": {"variable": "b23"}
        }}]}
    }
}

Entity

/datasets/{id}/batches/{id}/

A GET on this resource returns a Shoji Entity describing the batch, and a link to its Crunch Table (see next).

{
    "conflicts": {},
    "source_children": {},
    "target_children": {},
    "source_columns": 3500,
    "source_rows": 235490,
    "target_columns": 3499,
    "target_rows": 120000,
    "error": "",
    "progress": 100.0,
    "source": "<url>",
    "status": "appended"
}

The conflicts object

Each batch has a “conflicts” member describing any unresolvable differences found between variables in the two datasets. On a successful append, this object will be empty; if the batch status is “conflict”, the object will contain conflict information keyed by id of the variable in the target dataset. The conflict data for each variable follows this shape:

{
    "metadata": {
        "name": "<string>",
        "alias": "<string>",
        "type": "<string>",
        "categories": [{}]
    },
    "source_id": "<id of the matching variable in the source frame",
    "source_metadata": {
        "name, etc": "as above"
    },
    "conflicts": [{
        "message": "<string>"
    }]
}

Each conflict has four attributes: metadata about the variable on the target dataset (unless it is a variable that only exists on the source dataset), source_id and source_metadata, which describe the corresponding variable in the source frame (if any), and a conflicts member. The conflicts member contains an array with a list of individual conflicts that indicate what situations were found during batch preparation.

If there are conflicts in your batch, address the conflicting issues in your datasets, DELETE the batch entity from the failed append attempt, and POST a new one.

Table

/datasets/{id}/batches/{id}/table/{?offset,limit}

A GET returns the rows of data from the Dataset for the identified batch as a Crunch Table.

Boxdata

Boxdata is the data that Crunch provides to the CrunchBox for rendering web components that are made publicly available. This endpoint provides a catalog of data that has been precomputed to provide visualizations cubes of json data. Metadata associated with this raw computed data is accessed and manipulated through this endpoint.

Catalog

/datasets/{id}/boxdata/

A Shoji Catalog of boxdata for a given dataset.

GET catalog

When authenticated and authorized to view the given dataset, GET returns 200 status with a Shoji Catalog of boxdata associated with the dataset. If authorization is lacking, response will instead be 404.

Catalog tuples contain the following keys:

Name	Type	Description
title	string	Human friendly identifier
notes	string	Other information relevent for this CrunchBox
header	string	header information for the CrunchBox
footer	string	footer information for the CrunchBox
dataset	string	URL of the dataset associated with the CrunchBox
filters	object	A Crunch expression indicating which filters to include in the CrunchBox
where	object	A Crunch expression indicating which variables to include in the CrunchBox. An undefined value is equavilent to specifying all dataset variables.
creation_time	string	A timestamp of the date when this CrunchBox was created

{
    "element": "shoji:catalog",
    "self": "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/boxdata/",
    "index": {
        "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/boxdata/44a4d477d70c85da4b8298677e527ad8/": {
            "user_id": "00002",
            "footer": "This is for the footer",
            "notes": "just a couple of variables",
            "title": "z and str",
            "dataset": "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/",
            "header": "This is for the header",
            "creation_time": "2017-03-14T00:13:42.024000+00:00",
            "filters": {
                "function": "identify",
                "args": [
                    {
                        "filter": [
                            "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/filters/da9d86e43381443d9d708dc29c0c6308/",
                            "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/filters/80638457c8bd4731990eebdc3baee839/"
                        ]
                    }
                ]
            },
            "where": {
                "function": "identify",
                "args": [
                    {
                        "id": [
                            "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/variables/000002/",
                            "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/variables/000003/"
                        ]
                    }
                ]
            },
            "id": "44a4d477d70c85da4b8298677e527ad8"
        },
        "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/boxdata/75ff1d67ed698e0986f1c1c3daebf9a2/": {
            "user_id": "00002",
            "title": "xz",
            "dataset": "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/",
            "filters": null,
            "creation_time": "2017-03-14T00:13:42.024000+00:00",
            "where": {
                "function": "identify",
                "args": [
                    {
                        "id": [
                            "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/variables/000000/"
                        ]
                    }
                ]
            },
            "id": "75ff1d67ed698e0986f1c1c3daebf9a2"
        }
    }
}

POST catalog

Use POST to create a new datasource for CrunchBox. Note that new boxdata is only created when there is a new combination of where and filter data. If the same variables and filteres are indicated by the POST data, the existing combination will result in a modification of metadata associated with the cube data. This is to keep avoid recomputing analysis needlessly.

A POST to this resource must be a Shoji Entity with the following “body” attributes:

Name	Description
title	Human friendly identifier
notes	Other information relevent for this CrunchBox
header	header information for the CrunchBox
footer	footer information for the CrunchBox
dataset	URL of the dataset associated with the CrunchBox
filters	A Crunch expression indicating which filters to include
where	A Crunch expression indicating which variables to include
display_settings	Options to customize how it looks and behaves


{
    "element": "shoji:entity",
    "body": {
        "where": {
            "function": "select",
            "args": [{
                "map": {
                  "000002": {"variable": "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/variables/000002/"},
                  "000003": {"variable": "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/variables/000003/"}
                }
            }]
        },
        "filters": [
          {"filter": "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/filters/da9d86e43381443d9d708dc29c0c6308/"},
          {"filter": "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/filters/80638457c8bd4731990eebdc3baee839/"}
        ],
        "force": false,
        "title": "z and str",
        "notes": "just a couple of variables",
        "header": "This is for the header",
        "footer": "This is for the footer"
    }
}

Display Settings

The display_settings member of a CrunchBox payload allows you to customize several aspects of how it will be displayed.

A minBaseSize member will suppress display of values in tables or graphs where the sample size is below a given threshold.

To customize a CrunchBox’s color scheme, you may include an optional palette member in the display_settings of the body of the request to create or edit the boxdata. There are four types of customization available.

{"display_settings": {
    "minBaseSize": {"value": 50},
    "palette": {
        "brand": {
            "primary": "#111111",
            "secondary": "#222222",
            "messages": "#333333"
        },
        "static_colors": ["#444444", "#555555", "#666666"],
        "category_lookup": {
            "category name": "#aaaaaa",
            "another category:": "bbbbbb"
        }
    }
}}

Brand

The CrunchBox interface uses three colors, named Primary, Secondary, and Messages. By default, these are Crunch brand colors of green, blue, and purple. These are used, for example, as the background colors at the top of the interface and the color of the filter selector.

Static colors

Include an array of static_colors and every categorical color will be taken from the list in order. If none of your variables have more categories than colors provided here, the generator (below) will never be used, but category lookup will be performed.

Base

If the number of categories exceeds the number of static colors, or no static colors are specified, “base” colors are used to generate a categorical palette. By default, these are also the Crunch green, blue, and purple, and are not overridden by brand. Each color is interpolated in HCL space from itself to Hue + 100, Lightness + 20; and then colors are ordered to maximize sequential absolute distance in L*a*b* space so adjacent colors can be easily distinguished.

Category Lookup

Finally, you may include an object where keys are exact category names that should always be assigned a specific color. Using semantically resonant colors in this manner is a boon for interpretation and is highly recommended when possible. For example, to ensure that the Green Party is a verdant shade, include a member such as "Green": "#00dd00". Building a category lookup list requires some attention to the specific categories in a dataset; they must match exactly, and not partially; to ensure that “Green Party” is also green, include an additional "Green Party" key with the same value. Lookup values are processed last, replacing erstwhile static or generated colors.

Entity

/datasets/{id}/boxdata/{id}/

This endpoint represents each of the boxdata entities listed in the catalog.

The body of any of the entities is the same as the catalog’s tuple:

GET

Returns the body of the boxdata entity

{
    "user_id": "00002",
    "footer": "This is for the footer",
    "notes": "just a couple of variables",
    "title": "z and str",
    "dataset": "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/",
    "header": "This is for the header",
    "filters": {
        "function": "identify",
        "args": [
            {
                "filter": [
                    "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/filters/da9d86e43381443d9d708dc29c0c6308/",
                    "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/filters/80638457c8bd4731990eebdc3baee839/"
                ]
            }
        ]
    },
    "where": {
        "function": "identify",
        "args": [
            {
                "id": [
                    "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/variables/000002/",
                    "https://beta.crunch.io/api/datasets/e7834a8b5aa84c50bcb868fc3b44fd22/variables/000003/"
                ]
            }
        ]
    },
    "id": "44a4d477d70c85da4b8298677e527ad8"
}

DELETE

Deletes the boxdata entity. Returns 204.

Comparisons

Entity

/datasets/{id}/comparisons/{id}/

A Shoji Entity with the following “body” attributes:

Name	Type	Description
name	string
bases	array of cube input objects	one for each analysis to which the comparison applies
overlay	cube	input object defining the comparison data

See the Feature Guide for a discussion of the cube objects. POST one to the catalog (see below) to create a new comparison. GET to retrieve the complete Entity. PUT a new one to replace it. PATCH a subset of the attributes as desired. DELETE to remove the comparison.

The Entity also includes a “cube” link in its “catalogs” object; a GET on this link returns the output of the overlay cube. See “Cube” next.

Cube

/datasets/{id}/comparisons/{id}/cube/

A GET on this endpoint returns the output of the “overlay” cube query for the given comparison. The response will be a Crunch Cube with “dimensions” and “measures” members.

Catalog

/datasets/{id}/comparisons/

A Shoji Catalog of comparison entities associated to the specified dataset.

GET catalog

When authenticated and authorized to view the given dataset, GET returns 200 status with a Shoji Catalog of the dataset’s comparisons. If authorization is lacking, response will instead be 404.

Catalog tuples contain the following keys:

Name	Type	Description
name	string	Human-friendly string identifier
bases	array of cube input objects	References to the dimensions and measures for which the comparison is valid
cube	URL	Link to generate the comparison data

The catalog looks something like this:

{
    "element": "shoji:catalog",
    "self": "https://app.crunch.io/api/datasets/5ee0a0/comparisons/",
    "specification": "https://app.crunch.io/api/specifications/comparisons/",
    "description": "List of the comparisons for this dataset",
    "index": {
        "491fe3/": {
            "name": "All actors",
            "bases": [{
                "dimensions": [{"variable": "../variables/0f7378/"}, {"variable": "../variables/8451cb/"}],
                "measures": {"count": {"function": "cube_count", "args": []}}
            }],
            "cube": "491fe3/cube/"
        },
        "9942ce/": {
            "name": "Awareness: sector average",
            "bases": [{
                "dimensions": [{"variable": "../variables/bf31fc/"}],
                "measures": {"count": {"function": "cube_count", "args": []}}
            }],
            "cube": "9942ce/cube/"
        }
    }
}

PATCH catalog

Use PATCH to edit the “name” and/or “bases” of one or more comparisons. A successful request returns a 204 response.

Authorization is required: you must have “edit” privileges on the dataset, as shown in the “permissions” object in the dataset’s catalog tuple. If you try to PATCH and are not authorized, you will receive a 403 response and no changes will be made.

Because this catalog contains its entities rather than collecting them, do not PATCH to add or delete comparisons. POST to the catalog to create new comparisons, and DELETE individual comparison entities.

POST catalog

Use POST to add a new comparison entity to the catalog. A 201 indicates success and includes the URL of the newly-created comparison in the Location header.

Datasets

Datasets are the primary containers of statistical data in Crunch. Datasets contain a collection of variables, with which analyses can be composed, saved, and exported. These analyses may include filters, which users can define and persist. Users can also share datasets with each other.

Datasets are comprised of one or more batches of data uploaded to Crunch, and additional batches can be appended to datasets. Similarly, variables from other datasets can be joined onto a dataset.

As with other objects in Crunch, references to the set of dataset entities are exposed in a catalog. This catalog can be organized and ordered.

Catalog

GET

GET /datasets/ HTTP/1.1

library(crunch)
login()

# Upon logging in, a GET /datasets/ is done automatically, to populate:
listDatasets() # Shows the names of all datasets you have
listDatasets(refresh=TRUE) # Refreshes that list (and does GET /datasets/)

# To get the raw Shoji object, should you need it,
crGET("https://app.crunch.io/api/datasets/")

{
    "element": "shoji:catalog",
    "self": "https://app.crunch.io/api/datasets/",
    "catalogs": {
        "by_name": "https://app.crunch.io/api/datasets/by_name/{name}/"
    },
    "views": {
        "search": "https://app.crunch.io/api/datasets/search/"
    },
    "orders": {
        "order": "https://app.crunch.io/api/datasets/order/"
    },
    "specification": "https://app.crunch.io/api/specifications/datasets/",
    "description": "Catalog of Datasets that belong to this user. POST a Dataset representation (serialized JSON) here to create a new one; a 201 response indicates success and returns the location of the new object. GET that URL to retrieve the object.",
    "index": {
        "https://app.crunch.io/api/datasets/cc9161/": {
            "owner_name": "James T. Kirk",
            "name": "The Voyage Home",
            "description": "Stardate 8390",
            "archived": false,
            "permissions": {
                "edit": false,
                "change_permissions": false,
                "view": true
            },
            "size": {
                "rows": 1234,
                "columns": 67
            },
            "is_published": true,
            "id": "cc9161",
            "owner_id": "https://app.crunch.io/api/users/685722/",
            "start_date": "2286",
            "end_date": null,
            "streaming": "no",
            "creation_time": "1986-11-26T12:05:00",
            "modification_time": "1986-11-26T12:05:00",
            "current_editor": "https://app.crunch.io/api/users/ff9443/",
            "current_editor_name": "Leonard Nimoy"
        },
        "https://app.crunch.io/api/datasets/a598c7/": {
            "owner_name": "Spock",
            "name": "The Wrath of Khan",
            "description": "",
            "archived": false,
            "permissions": {
                "edit": true,
                "change_permissions": true,
                "view": true
            },
            "size": {
                "rows": null,
                "columns": null
            },
            "is_published": true,
            "id": "a598c7",
            "owner_id": "https://app.crunch.io/api/users/af432c/",
            "start_date": "2285-10-03",
            "end_date": "2285-10-20",
            "streaming": "no",
            "creation_time": "1982-06-04T09:16:23.231045",
            "modification_time": "1982-06-04T09:16:23.231045",
            "current_editor": null,
            "current_editor_name": null
        }
    },
    "template": "{\"name\": \"Awesome Dataset\", \"description\": \"(optional) This dataset is awesome because I made it, and you can do it too.\"}"
}

GET /datasets/

When authenticated, GET returns 200 status with a Shoji Catalog of datasets to which the authenticated user has access. Catalog tuples contain the following attributes:

Name	Type	Default	Description
name	string		Required. The name of the dataset
description	string	“”	A longer description of the dataset
id	string		The dataset’s id
archived	bool	false	Whether the dataset is “archived” or active
permissions	object	`{"edit": false}`	Authorizations on this dataset; see Permissions
owner_id	URL		URL of the user entity of the dataset’s owner
owner_name	string	“”	That user’s name, for display
size	object	`{"rows": 0, "columns": 0, "unfiltered_rows": 0}`	Dimensions of the dataset
creation_time	ISO-8601 string		Datetime at which the dataset was created in Crunch
modification_time	ISO-8601 string		Datetime of the last modification for this dataset globally
start_date	ISO-8601 string		Date/time for which the data in the dataset corresponds
end_date	ISO-8601 string		End date/time of the dataset’s data, defining a start_date:end_date range
streaming	string	Possible values “no”, “finished” and “streaming” to enable/disable streaming
current_editor	URL or null		URL of the user entity that is currently editing the dataset, or `null` if there is no current editor
current_editor_name	string or null		That user’s name, for display
is_published	boolean	true	Indicates if the dataset is published to viewers or not

Drafts

A dataset marked as is_published: false can only be accessed by dataset editors. They will still be available on the catalog for all shared users but API clients should know to display these to the appropriate users.

The is_published flag of a dataset can be changed by editors from the catalog or directly on the dataset entity.

PATCH

PATCH /api/datasets/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
Content-Length: 231

{
    "element": "shoji:catalog",
    "index": {
        "https://app.crunch.io/api/datasets/a598c7/": {
            "description": "Stardate 8130.4"
        }
    }
}

HTTP/1.1 204 No Content

library(crunch)
login()

# Dataset objects contain information from
# the catalog tuple and the dataset entity.
# Editing attributes by <- assignment will
# PATCH or PUT the right payload to the
# right place--you don't have to think about
# catalogs and entities.
ds <- loadDataset("The Wrath of Khan")
description(ds)
## [1] ""
description(ds) <- "Stardate 8130.4"
description(ds)
## [1] "Stardate 8130.4"

# If you needed to touch HTTP more directly,
# you could:
payload <- list(
    `https://app.crunch.io/api/datasets/a598c7/`=list(
        description="Stardate 8130.4"
    )
)
crPATCH("https://app.crunch.io/api/datasets/",
    body=toJSON(payload))

PATCH /datasets/

Use PATCH to edit the “name”, “description”, “start_date”, “end_date”, or “archived” state of one or more datasets. A successful request returns a 204 response. The attributes changed will be seen by all users with access to this dataset; i.e., names, descriptions, and archived state are not merely attributes of your view of the data but of the datasets themselves.

Authorization is required: you must have “edit” privileges on the dataset(s) being modified, as shown in the “permissions” object in the catalog tuples. If you try to PATCH and are not authorized, you will receive a 403 response and no changes will be made.

The tuple attributes other than “name”, “description”, and “archived” cannot be modified here by PATCH. Attempting to modify other attributes, or including new attributes, will return a 400 response. Changing permissions is accomplished by PATCH on the permissions catalog, and changing the owner is a PATCH on the dataset entity. The “owner_name” and “current_editor_name” attributes are modifiable, assuming authorization, by PATCH on the associated user entity. Dataset “size” is a cached property of the data, changing only if the number of rows or columns in the dataset change. Dataset “id”, “modification_time” and “creation_time” are immutable/system generated.

When PATCHing, you may include only the keys in each tuple that are being modified, or you may send the complete tuple. As long as the keys that cannot be modified via PATCH here are not modified, the request will succeed.

Note that, unlike other Shoji Catalog resources, you cannot PATCH to add new datasets, nor can you PATCH a null tuple to delete them. Attempting either will return a 400 response. Creating datasets is allowed only by POST to the catalog, while deleting datasets is accomplished via a DELETE on the dataset entity.

Changing ownership

Any changes to the ownership of a dataset need to be done by the current editor.

Only the dataset owner can change the ownership to another user. This can be done by PATCH request with the new owners’ email of API URL. The new owner must have advanced permissions on Crunch.

Other editors of the dataset can change the ownership of a dataset only to a Project as long as they andthe current owner of the dataset are both editors on such project.

POST

POST /api/datasets/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
Content-Length: 88

{
    "element": "shoji:entity",
    "body": {
        "name": "Trouble with Tribbles",
        "description": "Stardate 4523.3"
    }
}

HTTP/1.1 201 Created
Location: https://app.crunch.io/api/datasets/223fd4/

library(crunch)
login()

# To create just the dataset entity, you can
ds <- createDataset("Trouble with Tribbles",
    description="Stardate 4523.3")

# More likely, you'll have a data.frame or
# similar object in R, and you'll want to send
# it to Crunch. To do that,
df <- read.csv("~/tribbles.csv")
ds <- newDataset(df, name="Trouble with Tribbles",
    description="Stardate 4523.3")

POST /datasets/

POST a JSON object to create a new Dataset; a 201 indicates success, and the returned Location header refers to the new Dataset resource.

The body must contain a “name”. You can also include a Crunch Table in a “table” key, as discussed in the Feature Guide. The full set of possible attributes to include when POSTing to create a new dataset entity are:

Name	Type	Description
name	string	Human-friendly string identifier
description	string	Optional longer string
archived	boolean	Whether the variable should be hidden from most views; default: false
owner	URL	Provide a project URL to set the owner to that project; if omitted, the authenticated user will be the owner
notes	string	Blank if omitted. Optional notes for the dataset
start_date	date	ISO-8601 formatted date with day resolution
end_date	date	ISO-8601 formatted date with day resolution
streaming	string	Only “streaming”, “finished” and “no” available values to define if a dataset will accept streaming data or not
is_published	boolean	If false, only project editors will have access to this dataset
weight_variables	array	Contains aliases of weight variables to start this dataset with; variables must be numeric type.
table	object	Metadata definition for the variables in the dataset
maintainer	URL	User URL that will be the maintainer of this dataset in case of system notifications; if omitted, the authenticated user will be the maintainer
settings	object	Settings object containing `weight`, `viewers_can_export`, `viewers_can_change_weight`, `viewers_can_share`, `dashboard_deck`, and/or `min_base_size` attributes. If a “weight” is specified, it will be automatically added to “weight_variables” if not already specified there.

Other catalogs

In addition to /datasets/, there are a few other catalogs of datasets in the API:

Team datasets

/teams/{team_id}/datasets/

A Shoji Catalog of datasets that have been shared with this team. These datasets are not included in the primary dataset catalog. See teams for more.

Project datasets

/projects/{project_id}/datasets/

A Shoji Catalog of datasets that belong to this project. These datasets are not included in the primary dataset catalog. See projects for more.

Filter datasets by name

/datasets/by_name/{dataset_name}/

The by_name catalog returns (on GET) a Shoji Catalog that is a subset of /datasets/ where the dataset name matches the “dataset_name” value. Matches are case sensitive.

Verbs other than GET are not supported on this subcatalog. PATCH and POST at the primary dataset catalog.

Dataset order

The dataset order allows each user to organize the order in which their datasets are presented.

This endpoint returns a shoji:order. Like all shoji orders, it may not contain all available datasets. The catalog should always be the authoritative source of available datasets.

Any dataset not present on the order graph should be considered to be at the bottom of the root list in arbitrary order.

GET

GET /datasets/{dataset_id}/order/


{
    "element": "shoji:order",
    "self": "/datasets/{dataset_id}/order/",
     "graph": [
        "dataset_url",
        {"group": [
            "dataset_url"
        ]}
     ]
}

PUT

Receives a complete shoji:order payload and replaces the existing graph with the new one.

It cannot contain dataset references that are not in the dataset catalog, else the API will return a 400 response.

Standard shoji:order graph validation will apply.

PATCH

Same semantics as PUT

Entity

GET

GET /datasets/{dataset_id}/

URL Parameters

Parameter	Description
dataset_id	The id of the dataset

Dataset attributes

Name	Type	Default	Description
name	string		Required. The name of the dataset
description	string	“”	A longer description of the dataset
notes	string	“”	Additional information you want to associate with this dataset
id	string		The dataset’s id
archived	bool	false	Whether the dataset is “archived” or active
permissions	object	`{"edit": false}`	Authorizations on this dataset; see Permissions
owner_id	URL		URL of the user entity of the dataset’s owner
owner_name	string	“”	That user’s name, for display
size	object	`{"rows": 0, "unfiltered_rows", "columns": 0}`	Dimensions of the dataset
creation_time	ISO-8601 string		Datetime at which the dataset was created in Crunch
start_date	ISO-8601 string		Date/time for which the data in the dataset corresponds
end_date	ISO-8601 string		End date/time of the dataset’s data, defining a start_date:end_date range
streaming	string	Possible values are “no”, “finished” and “streaming” to determine if a dataset is streamed or not
current_editor	URL or null		URL of the user entity that is currently editing the dataset, or `null` if there is no current editor
current_editor_name	string or null		That user’s name, for display
maintainer	URL	The URL of the dataset maintainer. Will always point to a user
app_settings	object	`{}`	A place for API clients to store values they need per dataset; It is recommended that clients namespace their keys to avoid collisions

Dataset catalogs

A dataset contains a number of catalog resources that contain collections of related objects. They are available under the catalogs attribute of the dataset Shoji entity.

{
  "batches": "http://app.crunch.io/api/datasets/c5d751/batches/",
  "joins": "http://app.crunch.io/api/datasets/c5d751/joins/",
  "parent": "http://app.crunch.io/api/datasets/",
  "variables": "http://app.crunch.io/api/datasets/c5d751/variables/",
  "actions": "http://app.crunch.io/api/datasets/c5d751/actions/",
  "savepoints": "http://app.crunch.io/api/datasets/c5d751/savepoints/",
  "filters": "http://app.crunch.io/api/datasets/c5d751/filters/",
  "multitables": "http://app.crunch.io/api/datasets/c5d751/multitables/",
  "comparisons": "http://app.crunch.io/api/datasets/c5d751/comparisons/",
  "forks": "http://app.crunch.io/api/datasets/c5d751/forks/",
  "decks": "http://app.crunch.io/api/datasets/c5d751/decks/",
  "permissions": "http://app.crunch.io/api/datasets/c5d751/permissions/"
}

Catalog name	Resource
batches	Returns all the batches (successful and failed) used for this dataset. See Batches.
joins	Contains the list of all datasets joined to the current dataset. See Joins.
parent	Indicates the catalog where this dataset is found (project or main dataset catalog)
variables	Catalog of all public variables of this dataset. See Variables.
actions	All actions executed on this dataset
savepoints	Lists the saved versions for this dataset. See Versions.
filters	Makes available the public and user-created filters. See Filters.
multitables	Similar to filters, displays all available multitables. See Multitables
comparisons	Contains all available comparisons. See Comparisons.
forks	Returns all the forks created from this dataset
decks	The list of all decks on this dataset for the authenticated user
permissions	Returns the list of all users and teams with access to this dataset. See Permissions.

PATCH

PATCH /datasets/{dataset_id}/

See above about PATCHing the dataset catalog for all attributes duplicated on the entity and the catalog. You may PATCH those attributes on the entity, but you are encouraged to PATCH the catalog instead. The two attributes appearing on the entity and not the catalog, “notes” is modifiable by PATCH here.

A successful PATCH request returns a 204 response. The attributes changed will be seen by all users with access to this dataset; i.e., names, descriptions, and archived state are not merely attributes of your view of the data but of the datasets themselves.

Authorization is required: you must have “edit” privileges on this dataset. If you try to PATCH and are not authorized, you will receive a 403 response and no changes will be made. If you have edit permissions but are not the current editor of this dataset, PATCH requests of anything other than “current_editor” will respond with 409 status. You will need first to PATCH to make yourself the current editor and then proceed to make the desired changes.

When PATCHing, you may include only the keys that are being modified, or you may send the complete entity. As long as the keys that cannot be modified via PATCH here are not modified, the request will succeed.

Changing dataset ownership

If you are the current editor of a dataset you can change its owner by PATCHing the owner attribute witht he URL of the new owner.

Only Users, Teams or Projects can be set as owners of a dataset.

Users: New owner needs to be advanced users to be owner of a dataset.
Teams: Authenticated user needs to be a member of the team.
Projects: Authenticated user needs to have edit permissions on the project.

Copying over from another dataset

In the needed case to copy over the work from another dataset to the current one, it is possible to issue a PATCH request with the copy_from attribute pointing to the URL of the source dataset to use.

{
  "element": "shoji:entity",
  "body": {
    "copy_from": "https://app.crunch.io/api/datasets/1234/"
  }
}

All dataset attributes, permissions, derivations, private variables, etc will be brought over to the current dataset:

Decks
Filters
Multitables
Comparisons
Personal variable order
Derived variables
Personal variables
Permissions

The response will be a shoji:entity containing as a body an object with keys for each entity type that uas not been copied. In the case of variables these entities will display their name, alias and owner (if personal).

All the URLs will refer to entities on the source dataset.

{
    "element": "shoji:entity",
    "body": {
        "variables": {
            "https://app.crunch.io/dataset/1234/variables/abc/": {
                "name": "Variable name",
                "alias": "Variable alias",
                "owner_url": "https://app.crunch.io/users/qwe/",
                "owner_name": "Angus MacGyver"
            },
            "https://app.crunch.io/dataset/1234/variables/cde/": {
                "name": "Variable name",
                "alias": "Variable alias",
                "owner_url": null,
                "owner_name": null
            }
        },
        "filters": {
            "https://app.crunch.io/filters/abcd/": {
                "name": "filter name",
                "owner_url": "http://app.crunch.io/users/qwe/"
            },
            "http://app.crunch.io/filters/cdef/": {
                "name": "filter name",
                "owner_url": "https://app.crunch.io/users/qwe/"
            }
        }
    }
}

It is possible to copy information only for one user from another dataset, the payload will need the extra user key. It can contain either a user URL or a user email:

{
  "element": "shoji:entity",
  "body": {
    "copy_from": "https://app.crunch.io/api/datasets/1234/",
    "user": "https://app.crunch.io/api/users/abcd/"
  }
}

DELETE

DELETE /datasets/{dataset_id}/

With sufficient authorization, a successful DELETE request removes the dataset from the Crunch system and responds with 204 status.

Views

Applied filters

Cube

/datasets/{id}/cube/?q

See Multidimensional Analysis.

Export

GET `/datasets/{id}/export/` HTTP/1.1
Host: app.crunch.io

GET returns a Shoji View of available dataset export formats.

{
    "element": "shoji:view",
    "self": "https://app.crunch.io/api/datasets/223fd4/export/",
    "views": {
        "spss": "https://app.crunch.io/api/datasets/223fd4/export/spss/",
        "csv": "https://app.crunch.io/api/datasets/223fd4/export/csv/"
    }
}

A POST request on any of the export views will return 202 status with a Progress response in the body and a Location header pointing to the location of the exported file to be downloaded. Poll the progress URL for status on the completion of the export. When complete, GET the Location URL from the original response to download the file.

POST `/api/datasets/f2364cc66e604d63a3be3e8811fc902f/export/spss/` HTTP/1.1

    {
      "where": {
        "function": "select",
        "args":[
          {
            "map": {
              "https://app.crunch.io/api/datasets/f2364cc66e604d63a3be3e8811fc902f/variables/000000/": {"variable": "https://app.crunch.io/api/datasets/f2364cc66e604d63a3be3e8811fc902f/variables/000000/"},
              "https://app.crunch.io/api/datasets/f2364cc66e604d63a3be3e8811fc902f/variables/000001/": {"variable": "https://app.crunch.io/api/datasets/f2364cc66e604d63a3be3e8811fc902f/variables/000001/"},
              "https://app.crunch.io/api/datasets/f2364cc66e604d63a3be3e8811fc902f/variables/000002/": {"variable": "https://app.crunch.io/api/datasets/f2364cc66e604d63a3be3e8811fc902f/variables/000002/"}
              }
          }
        ]
      }
    }

HTTP/1.1 202 Accepted
Content-Length: 176
Access-Control-Allow-Methods: OPTIONS, AUTH, POST, GET, HEAD, PUT, PATCH, DELETE
Access-Control-Expose-Headers: Allow, Location, Expires
Content-Encoding: gzip
Location: https://crunch-io.s3.amazonaws.com/exports/dataset_exports/f2364cc66e604d63a3be3e8811fc902f/My_Dataset.sav?Signature=sOmeSigNaTurE%3D&Expires=1470265052&AWSAccessKeyId=SOMEKEY

To export a subset of the dataset, instead perform a POST request and include a JSON body with an optional “filter” expression for the rows and a “where” attribute to specify variables to include.

Attribute	Description	Example
filter	A Crunch filter expression defining a filter for the given export	`{"function": "==", "args": [{"variable": "000000"}, {"value": 1}]}`
where	A Crunch expression defining which variables to export. Refer to Frame functions for the available functions to here.	`{"function": "select", "args": [{"map": {"000000": {"variable": 000000"}}}]}`
options	An object of extra settings, which may be format-specific. See below.	`{"use_category_ids": true}`

See “Expressions” for more on Crunch expressions.

The following rules apply for all formats:

The dataset’s exclusion filter will be applied; however, any of the user’s personal “applied filters” are not, unless they are explicitly included in the request.
Hidden/discarded variables are not exported unless editors use a where clause, then it will be evaluated over all non hidden variables.
Personal (private) variables are not exported unless indicated, then only the current user’s personal variables will be exported
Variables (columns) will be ordered in a flattened version of the dataset’s hierarchical order.
Derived variables will be exported with their values, without their functional links.

Some format-specific properties and options:

Format	Attribute	Description	Default
csv	use_category_ids	Export categorical data as its numeric IDs instead of category names?	false
csv	missing_values	If present, will use the specified string to indicate missing values. If omitted, will use the missing reason strings	omitted
csv	header_field	Use the variable’s alias/name/description in the CSV header row, or `null` for no header row	“alias”
spss	var_label_field	Use the variable’s name/description as SPSS variable label	“description”
spss	prefix_subvariables	Prefix subvariable labels with the parent array variable’s label?	false
all	include_personal	Include the user’s personal variables in the exported file?	false

SPSS

Categorical-array and multiple-response variables will be exported as “mrsets”, as supported by SPSS. If the prefix_subvariables option is set to true, then the subvariables’ labels will be prefixed with the parent’s label.

To pick which variable field to use on the label field on the SPSS variables, use the var_label_field in the options attribute in the POST body. The only valid fields are description and name.

CSV

By default, categorical variable values will be exported using the category name and missing values will use their corresponding reason string for all variables.

The missing values will be exported with their configured missing reason in the CSV file. If specified on the missing_values export option, then all missing values on all columns will use such string instead of the reason.

To control the output of the header row, use the header_field option. Valid values for this option are:

alias (default)
name
description
null - Sending null will make the resulting CSV without a header row.

Refer to the options described on the table above for the csv format to change this behavior.

Match

The match endpoint provides a list of matches indicating which variables match amongst the datasets provided. To use it, send a post request representing an ordered list of datasets you would like to match. Include the “minimum_matches” parameter in your graph if you would like to limit the output of the matches based on the number of datasets matching. The default minim_matches is 2. Currently, only alias is utilized to match the variables to one another.

The result of a match endpoint request can be one of two things. If the same match has been completed previously, the api with return a 201 status code and a Location header to the existing results. Otherwise, the endpoint will return a 202 status code, with a Progress result that provides status information as the match is completed. Either request will result in the location header being set to the URI for staticly generated comparison result that can be accessed with the match is completed.

The results are a Shoji Entity with an attribute matches. The matches are listed by order of the number of variables matched. Each variable inside the matches will contain the dataset, the variable id and the confidence that the variable matches the others in the list. The order of the variables inside the matches returned will match the order of the datasets provided. The first variable will also contain some additional information to allow previewing a match. To retrieve complete details about all the matching variables the endpoints listed in metadata field can be called, those provide all the matching metadata chunked by groups of matches.

POST /datasets/match/ HTTP/1.1

{
    "element": "shoji:entity",
    "body":  {
        "datasets": [
            "http://app.crunch.io/api/datasets/8274bf/",
            "http://app.crunch.io/api/datasets/699a33/",
            "http://app.crunch.io/api/datasets/8274bf/",
            "http://app.crunch.io/api/datasets/699a33/"                   
        ],
         "minimum_matches": 3
    }
}

Response:

201 Created
Host: app.crunch.io
Location: http://app.crunch.io/api/datasets/matches/394d9e/

GET /api/datasets/matches/394d9e/

{

    "element": "shoji:order",
    "self": "http://app.crunch.io:50976/api/datasets/match/3c7df5/", 
    "body": {
        "matches": [
            [
                {
                    "alias": "SomeVariable", 
                    "confidence": 1, 
                    "name": "Some Variable", 
                    "variable": "521b5c014e1e474fa5173d95000bd6e9", 
                    "desc": "This is some variable", 
                    "dataset": "8274bfb842d645728a49634414b999c4"
                }, 
                {
                    "variable": "3fa1d3358888474eb949ae586e80f9a4", 
                    "confidence": 1, 
                    "dataset": "699a3315c3f347d4923257380938f9b9"
                }
            ],
            [
                {
                    "alias": "AnotherVariableThatHasMatches", 
                    "confidence": 1, 
                    "name": "Another Variable", 
                    "variable": "234e8e76d0e1a32667ab33bc30a9900", 
                    "desc": "This is another variable", 
                    "dataset": "8274bfb842d645728a49634414b999c4"
                }, 
                {
                    "variable": "9373729ac990b009e0a90dca99092789", 
                    "confidence": 1, 
                    "dataset": "699a3315c3f347d4923257380938f9b9"
                }
            ],
            ...
        ], 
        "metadata": [
            "http://app.crunch.io/api/datasets/match/3c7df5/0-500/"
        ]
    }
}

Summary

/datasets/{id}/summary/{?filter}

Query Parameters

Parameter	Description
filter	A Crunch filter expression

GET returns a Shoji View with summary information about this dataset containing its number of rows (weighted and unweighted, with and without your applied filters), as well as the number of variables and columns. The column count will differ from the variable count when derived and array variables are present–these variable types don’t necessarily have their own columns of d ata behind them. The column count is useful for estimating load time and file size when exporting.

If a filter is included, the “filtered” counts will be with respect to that expression. If omitted, your applied filters will be used.

{
    "element": "shoji:view",
    "self": "https://app.crunch.io/api/datasets/223fd4/summary/",
    "value": {
        "unweighted": {
            "filtered": 2000,
            "total": 2000
        },
        "weighted": {
            "filtered": 2000.0,
            "total": 2000.0
        },
        "variables": 529,
        "columns": 530
    }
}

Fragments

Table

State

Exclusion

/datasets/{id}/exclusion/

Exclusion filters allow you to drop rows of data without permanently deleting them.

GET on this resource returns a Shoji Entity with a filter “expression” attribute in its body. Rows that match the filter expression will be excluded from all views of the data.

PATCH the “expression” attribute to modify. An empty “expression” object, like {"body": {"expression": {}}}, is equivalent to “no exclusion”, i.e. no rows are dropped.

Stream

Stream lock

When a dataset is configured to receive streaming data, the /stream/ endpoint will accept POST requests to append new rows to the streaming queue.

A dataset is able to receive streaming data while its streaming attribute is set to streaming.

While a dataset is receiving streams, any other kind of append is disabled returning 409 if attempted. Only streaming data is allowed.

The following operations are forbidden on a dataset while it is accepting streaming rows in order to protect the schema.

Deleting public non derived variables
Casting variables (Includes changing resolution on datetime variables)
Changing variable aliases
Deleting categories from categorical variables
Changing ID of category IDs
Removing subvariables from arrays
Merging forks
Reverting to savepoints
Modifying the Primary Key, once it has been set

To change the streaming configuration of the dataset, PATCH the entity’s streaming attribute to either streaming, finished or no according to the following table:

Value	Allows schema changes	Accepts streaming rows
`streaming`	No	Yes
`finished`	No	No
`no`	Yes	No

Note that only the dataset maintainer is allowed to modify the streaming attribute.

Sending rows

/datasets/{id}/stream/

Stream allows for sending data to a dataset as it is gathered.

GET on this resource returns a Shoji Entity with two attributes in its body:

{
    "element": "shoji:entity",
    "self": "https://app.crunch.io/api/datasets/223fd4/stream/",
    "description": "A stream for this Dataset. Each stream acts as a write buffer, from which Sources are periodically made and appended as Batches to the owning Dataset.",
    "body":{
        "pending_messages": 1,
        "received_messages": 8
    }
}

Attribute	Description
pending_messages	The number of messages the stream has that have yet to be appended to the dataset (note: a message might contain more than one row, each POST that is made to `/datasets/{id}/stream/` will result in a single message).
received_messages	The total number of messages that this stream has received.

POST to this endpoint to add rows. The payload should be a multi line string where each line contains a json representation of objects indicating the value for each variable keyed by alias.

{"alias1": 1, "alias2": "value", "alias3": 0}
{"alias1": 99, "alias2": "other", "alias3": 2}
{"alias1": 10, "alias2": "empty", "alias3": 1}

Settings

/datasets/{id}/settings/

The dataset settings allow editors to store dataset wide permissions and configurations for it.

Will always return all the available settings with default values a dataset can have.

{
    "element": "shoji:entity",
    "self": "https://app.crunch.io/api/datasets/223fd4/settings/",
    "body": {
        "viewers_can_export": false,
        "viewers_can_change_weight": false,
        "viewers_can_share": true,
        "weight": "https://app.crunch.io/api/datasets/223fd4/variables/123456/"
    }
}

To make changes, clients should PATCH the settings they wish to change with new values. Additional settings are not allowed, the server will return a 400 response.

Setting	Description
viewers_can_export	When false, only editor can export; else, all users with view access can export the data
viewers_can_change_weight	When true, all users with access can set their own personal weight; else, the editor configured `weight` will be applied to all without option to change
viewers_can_share	When true, all users with access can share the dataset with other users or teams; Defaults to `True`
weight	Default initial weight for all new users on this dataset, and when `viewers_can_change_weight` is false, this variable will be the always-applied weight for all viewers of the dataset.
dashboard_deck	When set, points to a deck that will become publicly visible and be used as dashboard by the web client

Preferences

/datasets/{id}/preferences/

The dataset preferences provide API clients with a key/value store for settings or customizations each would need for each user.

By default, all dataset preferences start out with only a weight key set to null, unless otherwise set. Clients can PATCH to add additional attributes.

{
    "element": "shoji:entity",
    "self": "https://app.crunch.io/api/datasets/223fd4/preferences/",
    "body": {
      "weight": null
    }
}

To delete attributes from the preferences resources, PATCH them with null.

Preferences are unordered; clients should not assume that they are ordered.

Weight

If the dataset has viewers_can_change_weight setting set to false, then all users’ preferences weight will be set to the dataset wide configured weight without option to change it. Attempts to modify it will return a 403 response.

Primary key

/datasets/{dataset_id}/pk/

URL Parameters

Parameter	Description
dataset_id	The id of the dataset

Setting a primary key on a dataset causes updates (particularly streamed updates) mentioning existing rows to be updated instead of new rows being inserted. A primary key can only be set on a variable that is type “numeric” or “text” and that has no duplicate or missing values, and it can only be set after that variable has been added to the dataset.

GET

GET /api/datasets/{dataset_id}/pk/ HTTP/1.1
Host: app.crunch.io

--------
200 OK
Content-Type:application/json;charset=utf-8

{
    "element": "shoji:entity",
    "body": {
        "pk": ["https://app.crunch.io/api/datasets/{dataset_id}/variables/000001/"],
    }
}

>>> # "ds" is dataset via pycrunch
>>> ds.pk.body.pk
['https://app.crunch.io/api/datasets/{dataset_id}/variables/000001/']

GET /datasets/{dataset_id}/pk/

GET on this resource returns a Shoji Entity. It contains one body key: pk, which is an array. The “pk” member indicates the URLs of the variables in the dataset which comprise the primary key. If there is no primary key for this dataset, the pk value will be [].

POST

POST /api/datasets/{dataset_id}/pk/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
Content-Length: 15

{"pk": ["https://app.crunch.io/api/datasets/{dataset_id}/variables/000001/"]}

--------
204 No Content

>>> # "ds" is dataset via pycrunch
>>> ds.pk.post({'pk':['https://app.crunch.io/api/datasets/{dataset_id}/variables/000001/']})
>>> ds.pk.body.pk
['000001']

POST /datasets/{dataset_id}/pk/

When POSTing, set the body to a JSON object containing the key “pk” to modify the primary key. The “pk” key should be a list containing zero or more variable URLs. The variables referenced must be either text or numeric type and must have no duplicate or missing values. Setting pk to [] is equivalent to deleting the primary key for a dataset.

DELETE

DELETE /api/datasets/{dataset_id}/pk/ HTTP/1.1
Host: app.crunch.io

--------
204 No Content

>>> # "ds" is dataset via pycrunch
>>> ds.pk.delete()
>>> ds.pk.body.pk
[]

DELETE /datasets/{dataset_id}/pk/

DELETE the “pk” resource to delete the primary key for this dataset. Upon success, this method returns no body and a 204 response code.

Catalogs

Users

/datasets/{dataset_id}/users/

This catalog exposes the full list of users that have access to the dataset via the different sources:

When the dataset belongs to a project, as project members
Members of teams that are shared with the dataset
Direct shares to specific users

This endpoint only supports GET, the response will be a catalog with each user as member with the tuple indicating the coalesced permissions and information about the type of access:

Attribute	Description
name	Name of the user
email	Email of the user
teams	URLs of teams with dataset access this user belongs to
last_accessed	Timestamp of last access to dataset via web app
project_member	If dataset is part of a project and this user too
coalesced_permissions	Permissions this user has to this access, combining all sources


{
  "https://app.crunch.io/api/users/411aa32a075b4b57bf25a4ace1baf920/": {
    "name": "Jean-Luc Picard", 
    "last_accessed": "2017-02-25T00:00:00+00:00",
    "teams": [
      "https://app.crunch.io/api/teams/c6dbeb7c57e34dd08ab2316f3363e895/", 
      "https://app.crunch.io/api/teams/d0abf4e933fc44e38190247ae4d593f9/"
    ], 
    "project_member": false, 
    "email": "jeanluc@crunch.io", 
    "coalesced_permissions": {
      "edit": true, 
      "change_permissions": true, 
      "view": true
    }
  }, 
  "https://app.crunch.io/api/users/60f18c51699b4ba992721197743286a4/": {
    "name": "William Riker", 
    "last_accessed": null, 
    "teams": [
      "https://app.crunch.io/api/teams/d0abf4e933fc44e38190247ae4d593f9/"
    ], 
    "project_member": false, 
    "email": "number1@crunch.io", 
    "coalesced_permissions": {
      "edit": false, 
      "change_permissions": false, 
      "view": true
    }
  }, 
  "https://app.crunch.io/api/users/80d89e4e876344ecb46c528a910e3877/": {
    "name": "Geordi La Forge", 
    "last_accessed": "2017-01-31T00:00:00+00:00", 
    "teams": [
      "https://app.crunch.io/api/teams/c6dbeb7c57e34dd08ab2316f3363e895/", 
      "https://app.crunch.io/api/teams/d0abf4e933fc44e38190247ae4d593f9/"
    ], 
    "project_member": true, 
    "email": "geordilf@crunch.io", 
    "coalesced_permissions": {
      "edit": true, 
      "change_permissions": true, 
      "view": true
    }
  }
}

Actions

Batches

/datasets/{dataset_id}/batches/

See Batches and the feature guides for importing and appending.

Decks

/datasets/{dataset_id}/decks/

See Decks.

Comparisons

Filters

/datasets/{dataset_id}/filters/

See Filters.

Forks

Joins

Multitables

Permissions

/datasets/{dataset_id}/permissions/

See Permissions.

Savepoints

/datasets/{dataset_id}/savepoints/

See Versions.

Variables

/datasets/{dataset_id}/variables/

See Variables.

Weight variables

Decks

Decks allow you to store analyses for future reference or for export. Decks correspond to a single dataset, and they are personal to each user unless they have been set as “public”. Each deck contains a list of slides, and each slide contains analyses.

Catalog

/datasets/{id}/decks/

GET

A GET request on the catalog endpoint will return all the decks available for this dataset for the current user. This includes decks created by the user, as well as public decks shared with all users of the dataset.

{
    "element": "shoji:catalog",
    "self": "https://app.crunch.io/api/datasets/223fd4/decks/",
    "index": {
        "https://app.crunch.io/api/datasets/cc9161/decks/4fa25/": {
          "name": "my new deck",
          "creation_time": "1986-11-26T12:05:00",
          "id": "4fa25",
          "is_public": false,
          "owner_id": "https://app.crunch.io/api/users/abcd3/",
          "owner_name": "Real Person",
          "team": null
        },
        "https://app.crunch.io/api/datasets/cc9161/decks/2b53e/": {
          "name": "Default deck",
          "creation_time": "1987-10-15T11:45:00",
          "id": "2b53e",
          "is_public": true,
          "owner_id": "https://app.crunch.io/api/users/4cba5/",
          "owner_name": "Other Person",
          "team": "https://app.crunch.io/api/teams/58acf7/"
        }
    },
    "order": "https://app.crunch.io/api/datasets/223fd4/decks/order/"
}

The decks catalog tuples contain the following keys:

Name	Type	Description
name	string	Human-friendly string identifier
creation_time	timestamp	Time when this deck was created
id	string	Global unique identifier for this deck
is_public	boolean	Indicates whether this is a public deck or not
owner_id	url	Points to the owner of this deck
owner_name	string	Name of the owner of the deck (referred by `owner_id`)
team	url	If the deck is shared through a team, it will point to it. `null` by default

To determine if a deck belongs to the current user, check the owner_id attribute.

POST

POST a shoji:entity to create a new deck for this dataset. The only required body attribute is “name”; other attributes are optional.

{
    "element": "shoji:entity",
    "self": "https://app.crunch.io/api/datasets/223fd4/decks/",
    "body": {
        "name": "my new deck",
        "description": "This deck will contain analyses for a variable",
        "is_public": false,
        "team": "https://app.crunch.io/api/teams/58acf7/"
    }
}

HTTP/1.1 201 Created
Location: https://app.crunch.io/api/datasets/223fd4/decks/2b3c5e/

The shoji:entity POSTed accepts the following keys

PATCH

It is possible to bulk-edit many decks at once by PATCHing a shoji:catalog to the decks’ catalog.

{
    "element": "shoji:catalog",
    "index": {
        "https://app.crunch.io/api/datasets/cc9161/decks/4fa25/": {
          "name": "Renamed deck",
          "is_public": true
        }
    },
    "order": "https://app.crunch.io/api/datasets/223fd4/decks/order/"
}

The following attributes are editable via PATCHing this resource:

name
description
is_public

For decks that the current user owns, “name”, “description” and “is_public” are editable. Only the deck owner can edit the mentioned attributes on a deck even if the deck is public. Other deck attributes are not editable and will respond with 400 status if the request tries to change them.

On success, the server will reply with a 204 response.

Entity

/datasets/{id}/decks/{id}/

GET

GET a deck entity resource to return a shoji:entity with all of its attributes:

{
    "element": "shoji:entity",
    "self": "https://app.crunch.io/api/datasets/223fd4/decks/223fd4/",
    "body": {
        "name": "Presentation deck",
        "id": "223fd4",
        "creation_time": "1987-10-15T11:45:00",
        "description": "Explanation about the deck",
        "is_public": false,
        "owner_id": "https://app.crunch.io/api/users/abcd3/",
        "owner_name": "Real Person",
        "team": "https://app.crunch.io/api/teams/58acf7/"
    }
}

Name	Type	Description
name	string	Human-friendly string identifier
id	string	Global unique identifier for this deck
creation_time	timestamp	Time when this deck was created
description	string	Longer annotations for this deck
is_public	boolean	Indicates whether this is a public deck or not
owner_id	url	Points to the owner of this deck
owner_name	string	Name of the owner of the deck (referred by `owner_id`)
team	url	If the deck is shared through a team, it will point to it. `null` by default

PATCH

To edit a deck, PATCH it with a shoji:entity. The server will return a 204 response on success or 400 if the request is invalid.

{
    "element": "shoji:entity",
    "self": "https://app.crunch.io/api/datasets/223fd4/decks/223fd4/",
    "body": {
        "name": "Presentation deck",
        "id": "223fd4",
        "creation_time": "1987-10-15T11:45:00",
        "description": "Explanation about the deck",
        "team": "https://app.crunch.io/api/teams/58acf7/"
    }
}

HTTP/1.1 204 No Content

For deck entities that the current user owns, “name”, “description”, “teams” and “is_public” are editable. Other deck attributes are not editable.

DELETE

To delete a deck, DELETE the deck’s entity URL. On success, the server returns a 204 response.

Order

/datasets/{id}/decks/order/

The deck order resource allows the user to arrange how API clients, such as the web application, will present the deck catalog. The deck order contains all decks that are visible to the current user, both personal and public. Unlike many other shoji:order resources, this order does not allow grouping or nesting: it will always be a flat list of slide URLs.

GET

Returns a Shoji Order response.

{
  "element": "shoji:order",
  "self": "https://app.crunch.io/api/datasets/223fd4/decks/order/",
  "graph": [
    "https://app.crunch.io/api/datasets/223fd4/decks/1/",
    "https://app.crunch.io/api/datasets/223fd4/decks/2/",
    "https://app.crunch.io/api/datasets/223fd4/decks/3/"
  ]
}

PATCH

PATCH the order resource to change the order of the decks. A 204 response indicates success.

If the PATCH payload contains only a subset of available decks, those decks not referenced will be appended at the bottom of the top level graph in arbitrary order.

{
  "element": "shoji:order",
  "self": "https://app.crunch.io/api/datasets/223fd4/decks/order/",
  "graph": [
    "https://app.crunch.io/api/datasets/223fd4/decks/1/",
    "https://app.crunch.io/api/datasets/223fd4/decks/3/"
  ]
}

Including invalid URLs, such as URLs of decks that are not present in the catalog, will return a 400 response from the server.

The deck order should always be a flat list of URLs. Nesting or grouping is not supported by the web application. Server will return a 400 response if the order supplied in the PATCH request has nesting.

Slides

Each deck contains a catalog of slides into which analyses are saved.

Catalog

/datasets/{id}/decks/{deck_id}/slides/

GET

Returns a shoji:catalog with the slides for this deck.


{
    "element": "shoji:catalog",
    "self": "https://app.crunch.io/api/datasets/123/decks/123/slides/",
    "orders": {
        "flat": "https://app.crunch.io/api/datasets/123/decks/123/slides/flat/"
    },
    "specification": "https://app.crunch.io/api/specifications/slides/",
    "description": "A catalog of the Slides in this Deck",
    "index": {
        "https://app.crunch.io/api/datasets/123/decks/123/slides/123/": {
            "analysis_url": "https://app.crunch.io/api/datasets/123/decks/123/slides/123/analyses/123/",
            "subtitle": "z",
            "display": {
                "value": "table"
            },
            "title": "slide 1"
        },
        "https://app.crunch.io/api/datasets/123/decks/123/slides/456/": {
            "analysis_url": "https://app.crunch.io/api/datasets/123/decks/123/slides/456/",
            "subtitle": "",
            "display": {
                "value": "table"
            },
            "title": "slide 2"
        }
    }
}

Each tuple on the slides catalog contains the following keys:

Name	Type	Description
analysis_url	url	Points to the first (and typically only) analysis contained on this slide
title	string	Optional title for the slide
subtitle	string	Optional subtitle for the slide
display	object	Stores settings used to load the analysis

POST

To create a new slide, POST a slide body to the slides catalog. It is necessary to include at least one analysis on the new slide.

The body should contain an analyses attribute that contains an array with one or many analyses bodies as described in the below section, should be wrapped as a shoji:entity.

On success, the server returns a 201 response with a Location header containing the URL of the newly created slide entity with its first analysis.

{
  "title": "New slide",
  "subtitle": "Variable A and B",
  "analyses": [
    {
      "query": {},
      "query_environment": {},
      "display_settings": {}
    },
    {
      "query": {},
      "query_environment": {},
      "display_settings": {}
    }
  ]
}

On each analysis, only a query field is required to create a new slide; other attributes are optional.

Slide attributes:

Name	Type	Description
title	string	Optional title for the slide
subtitle	string	Optional subtitle for the slide

Analysis attributes:

Name	Type	Description
query	object	Contains a valid analysis query, required
subtitle	string	Optional subtitle for the slide
display_settings	object	Contains a set of attribtues to be interpreted by the client to render and export the analysis
query_environment	object	Contains the `weight` and `filter` applied during the analysis, they will be applied up on future evaluation/render/export

Old format

It is possible to create slides with one single initial analysis by POSTing an analysis body directly to the slides catalog. It will create a slide automatically with the new analysis on it:

{
  "title": "New slide",
  "subtitle": "Variable A and B",
  "query": {},
  "query_environment": {},
  "display_settings": {}
}

PATCH

It is possible to bulk-edit several slides at once by PATCHing a shoji:catalog to this endpoint.

The only editable attributes with this method are:

title
subtitle

Other attributes should be considered read-only.

Submitting invalid attributes or references to other slides results in a 400 error response.

To edit the first or any of the slide’s analyses query attributes it is necessary to PATCH the individual analysis entity.

Entity

/datasets/223fd4/decks/slides/a126ce/

Each slide in the Slide Catalog contains reference to its first analysis.

GET

{
    "element": "shoji:entity",
    "self": "/api/datasets/123/decks/123/slides/123/",
    "catalogs": {
        "analyses": "/api/datasets/123/decks/123/slides/123/analyses/"
    },
    "description": "Returns the detail information for a given slide",
    "body": {
        "deck_id": "123",
        "subtitle": "z",
        "title": "slide 1",
        "analysis_url": "/api/datasets/123/decks/123/slides/123/analyses/123/",
        "display": {
            "value": "table"
        },
        "id": "123"
    }
}

DELETE

Perform a DELETE request on the Slide entity resource to delete the slide and its analyses.

PATCH

It is possible to edit a slide entity by PATCHing with a shoji:entity.

The editable attributes are:

title
subtitle

The other attributes are considered read-only.

Order

/datasets/223fd4/decks/slides/flat/

The owner of the deck can specify the order of its slides. As with deck order, the slide order must be a flat list of slide URLs.

GET

Returns the list of all the slides in the deck.

{
    "element": "shoji:order",
    "self": "/api/datasets/123/decks/123/slides/flat/",
    "description": "Order of the slides on this deck",
    "graph": [
        "/api/datasets/123/decks/123/slides/123/",
        "/api/datasets/123/decks/123/slides/456/"
    ]
}

PATCH

To make changes to the order, a client should PATCH the full shoji:order resource to the endpoint with the new order on its graph attribute.

Any slide not mentioned on the payload will be added at the end of the graph in arbitrary order.

{
    "element": "shoji:order",
    "self": "/api/datasets/123/decks/123/slides/flat/",
    "description": "Order of the slides on this deck",
    "graph": [
        "/api/datasets/123/decks/123/slides/123/",
        "/api/datasets/123/decks/123/slides/456/"
    ]
}

This is a flat order: grouping or nesting is not allowed. PATCHing with a nested order will generate a 400 response.

Analysis

Each slide contains one or more analyses. An analysis – a table or graph with some specific combination of variables defining measures, rows, columns, and tabs; settings such as percentage direction and decimal places – can be saved to a deck, which can then be exported, or the analysis can be reloaded in whole in the application or even exported as a standalone embeddable result.

Catalog

/api/datasets/123/decks/123/slides/123/analyses/

POST

To create multiple analyses on a slide, clients should POST analyses to the slide’s analyses catalog.

{
    "query": {
        "dimensions" : [],
        "measures": {}
    },
    "query_environment": {
        "filter": [
            {"filter": "<url>"},
            {"function": "expression", "args": [], "name": "(Optional)"}
        ],
        "weight": "url"
    },
    "display_settings": {
        "decimalPlaces": {
            "value": 0
        },
        "percentageDirection": {
            "value": "colPct"
        },
        "vizType": {
            "value": "table"
        },
        "countsOrPercents": {
            "value": "percent"
        },
        "uiView": {
            "value": "expanded"
        }
    }
}

The server will return a 201 response with the new slide created. In case of invalid analysis attributes, a 400 response will be returned indicating the problems.

PATCH

It is possible to delete many analyses at once from the catalog sending null as their tuple. It is not possible to delete all the analysis from a slide. For that it is necessary to delete the slide itself.

{
    "/api/datasets/123/decks/123/slides/123/analyses/1/": null,
    "/api/datasets/123/decks/123/slides/123/analyses/2/": {}
}

A 204 response will be returned on success.

Order

As analyses get added to a slide, they will be stored on a shoji:order resource.

Like other order resources, it will expose a graph attribute that contains the list of created analyses having new ones added at the end.

If an incomplete set of analyses is sent to the graph, the missing analyses will be added in arbitrary order.

This is a flat order and does not allow nesting.

Entity

An analysis is defined by a query, query environment, and display settings. To save an analysis, POST these to a deck as a new slide.

Display settings can be anything a client may need to reproduce the view of the data returned from the query. The settings the Crunch web client uses are shown here, but other clients are free to store other attributes as they see fit. Display settings should be objects with a value member.

{
    "query": {
        "dimensions" : [],
        "measures": {}
    },
    "query_environment": {
        "filter": [
            {"filter": "<url>"},
            {"function": "expression", "args": [], "name": "(Optional)"}
        ],
        "weight": "url"
    },
    "display_settings": {
        "decimalPlaces": {
            "value": 0
        },
        "percentageDirection": {
            "value": "colPct"
        },
        "vizType": {
            "value": "table"
        },
        "countsOrPercents": {
            "value": "percent"
        },
        "uiView": {
            "value": "expanded"
        }
    }
}

Name	Description
query	Includes the query body for this analysis
query_environment	An object with a `weight` and `filters` to be used for rendering/evaluating this analysis
display_settings	An object containing client specific instructions on how to recreate the analysis

PATCH

To edit an analysis, PATCH its URL with a shoji:entity.

The editable attributes are:

query
query_environment
display_settings

Providing invalid values for those attributes or extra attributes will be rejected with a 400 response from the server.

DELETE

It is possible to delete analyses from a slide as long as there is always one analysis left.

Attempting to delete the last analysis of a slide will cause a 409 response from the server indicating the problem.

Filters

Catalog

/datasets/{id}/filters/

GET on this resource returns a Shoji Catalog with the list of Filters that the current user can use on this Dataset.

This index contains two kinds of filters: public and private, denoted by the is_public tuple attribute. Private filters are those created by the authenticated user, and they cannot be accessed by other users. Public filters are available to all users who are authorized to view the dataset.

{
    "name": "My filter",
    "is_public": true,
    "id": "1442ea",
    "owner_id": "https://app.crunch.io/api/users/4152de/",
    "team": "https://app.crunch.io/api/teams/680abc/"
}

The only tuple attribute editable via PATCHing the catalog is the “name”. A 204 response indicates a successful PATCH. Attempting to PATCH any other attribute will return a 400 response.

POST a Shoji Entity to this catalog to create a new filter. Entities must include a name and an expression. If omitted, is_public defaults to False. A successful POST yields a 201 response that will contain a Location header with the URL of the newly created filter.

All users with access to the dataset can create private filters; however, only the current dataset editor can create public filters (is_public: true). Attempting to create a public filter when not the current dataset editor results in a 403 response.

Entity

/datasets/{id}/filters/{id}/

GET this resource to return a Shoji Entity containing the requested filter.

{
    "element": "shoji:entity",
    "self": "https://app.crunch.io/api/datasets/ac64ef/filters/1442ea/",
    "body": {
        "id": "1442ea",
        "name": "My filter",
        "is_public": true,
        "expression": {},
        "last_update": "2015-12-31",
        "creation_time": "2015-11-12T12:34:56",
        "team": "https://app.crunch.io/api/teams/680abc/"
    }
}

PATCH an entity to edit its expression, name, team or is_public attributes. Successful PATCH requests return 204 status. As with the POSTing new entities to the catalog, only the dataset’s current editor can alter a filter.

The expression attribute must contain a valid Crunch filter expression.

The team attribute will point to the team this filter is shared with, in case it isn’t shared with any teams, it will default to null.

See expressions in the Object Reference for more details.

Applied filters

/datasets/{id}/filters/applied/

A Shoji order containing the filters applied by the current user.

{
    "element": "shoji:order",
    "self": "http://app.crunch.io/api/datasets/ac64ef/filters/applied/",
    "graph": [
        "http://app.crunch.io/api/datasets/ac64ef/filters/28ef72/",
        "http://app.crunch.io/api/datasets/ac64ef/filters/0ac6e1/",
    ]
}

PUT the applied endpoint to change the which filters are applied for other operations. The graph parameter indicates which filters are applied. Successful PUT requests return 204 status.

Filter Order

GET /datasets/{id}/filters/order/

A Shoji order containing the persisted filter order.

{
    "element": "shoji:order",
    "self": "http://app.crunch.io/api/datasets/ac64ef/filters/order/",
    "graph": [
        "http://app.crunch.io/api/datasets/ac64ef/filters/28ef72/",
        "http://app.crunch.io/api/datasets/ac64ef/filters/0ac6e1/",
    ]
}

PATCH the order to change the order of the filters. The graph parameter indicates the order. Private filters are not included in the order. Any filters that are missing are appended to the end of the order. Successful PATCH requests return 204 status.

Filtering endpoints

Some endpoints will support filtering, they will accept a filter GET parameter that can be a JSON encoded object that can contain either the URL of a filter (available through the Filters catalog) or a filter expression or a filter URL.

To filter using a filter URL using JSON pass in an object as the filter parameter:

{
    "filter": "http://app.crunch.io/api/datasets/ac64ef/filters/28ef72/"
}

GET /datasets/id/summary/?filter=%7B%22filter%22%3A%22http%3A%2F%2Fapp.crunch.io%2Fapi%2Fdatasets%2Fac64ef%2Ffilters%2F28ef72%2F%22%7D HTTP/1.1

It is also possible to send straight filter URLs without a JSON wrapping:

GET /datasets/id/summary/?filter=http%3A%2F%2Fapp.crunch.io%2Fapi%2Fdatasets%2Fac64ef%2Ffilters%2F28ef72%2F HTTP/1.1

Or multiple filters that will be ANDed together

GET /datasets/id/summary/?filter=http%3A%2F%2Fapp.crunch.io%2Fapi%2Fdatasets%2Fac64ef%2Ffilters%2F28ef72%2F&filter=http%3A%2F%2Fapp.crunch.io%2Fapi%2Fdatasets%2Fac64ef%2Ffilters%2F28ef72%2F HTTP/1.1

To filter using a filter expression, pass a Crunch filter expression as the filter parameter, like:

    {
        "function": "==",
        "args": [
            {"variable": "http://app.crunch.io/api/datasets/ac64ef/variables/aae3c2/"},
            {"value": 1}
        ]
    }

GET /datasets/id/summary/?filter=%7B%22function%22%3A%22%3D%3D%22%2C%22args%22%3A%5B%7B%22variable%22%3A%22http%3A%2F%2Fapp.crunch.io%2Fapi%2Fdatasets%2Fac64ef%2Fvariables%2Faae3c2%2F%22%7D%2C%7B%22value%22%3A1%7D%5D%7D HTTP/1.1

Filter expressions can be combined with filter URLs to make reference to other filters, like so:

    {
        "function": "and",
        "args": [
            {
                "filter": "http://app.crunch.io/api/datasets/ac64ef/filters/28ef72/"
            },
           {
                "function": "==",
                "args": [
                    {"variable": "http://app.crunch.io/api/datasets/ac64ef/variables/aae3c2/"},
                    {"value": 1}
                ]
            }
        ]
    }

GET /datasets/id/summary/?filter=%7B%22function%22%3A+%22and%22%2C+%22args%22%3A+%5B%7B%22filter%22%3A+%22http%3A%2F%2Fapp.crunch.io%2Fapi%2Fdatasets%2Fac64ef%2Ffilters%2F28ef72%2F%22%7D%2C+%7B%22function%22%3A+%22%3D%3D%22%2C+%22args%22%3A+%5B%7B%22variable%22%3A+%22http%3A%2F%2Fapp.crunch.io%2Fapi%2Fdatasets%2Fac64ef%2Fvariables%2Faae3c2%2F%22%7D%2C+%7B%22value%22%3A+1%7D%5D%7D%5D%7D HTTP/1.1

Geodata

Geodata allow you to associate a variable with features in a FeatureCollection of geojson or topojson.

Catalog

/geodata/

GET

Crunch maintains a few geojson/topojson resources and publishes them on CDN. GET the catalog https://app.crunch.io/api/geodata/ for an index of available geographies, each of which then includes a location to download the actual geojson or topojson.

{
    "element": "shoji:catalog",
    "self": "https://app.crunch.io/api/geodata/",
    "index": {
        "https://app.crunch.io/api/geodata/7ae898e210b04a9a8992314452c6677b/": {
            "description": "use properties.name or properties.postal-code",
            "created": "2016-07-08T16:33:44.601000+00:00",
            "name": "US States GeoJSON Name + Postal Code",
            "location": "https://s.crunch.io/geodata/leafletjs/us-states.geojson",
            "id": "7ae898e210b04a9a8992314452c6677b"
        }
    }
}

The geodata catalog tuples contain the following keys:

Name	Type	Description
name	string	Human-friendly string identifier
created	timestamp	Time when the item was created
id	string	Global unique identifier for this deck
location	uri	Location of crunch-curated geojson/topojson file. Users may need to inspect this actual file to learn about details of the FeatureCollection and individual Features.
description	string	Any additional information about the geodatum
metadata	object	Information regarding the actual data provided by the location. For now, the properties in the geodata features are extracted for the purpose of matching geodata to variable categories.

Entity

GET

GET /geodata/{geodata_id}/

Crunch maintains a few geojson/topojson resources and publishes them on CDN. Most of their properties, with the exception of metadata, are present on the catalog tuple, described above; metadata is an open field but may be populated at creation time by a Crunch utility that extracts and aggregates across features of geojson and topojson resources. For other formats, users may supply relevant metadata for the geodatum resource.

{
    "element": "shoji:entity",
    "self": "https://app.crunch.io/api/geodata/7ae898e210b04a9a8992314452c6677b/",
    "body": {
        "description": "use properties.name or properties.postal-code",
        "created": "2016-07-08T16:33:44.601000+00:00",
        "name": "US States GeoJSON Name + Postal Code",
        "location": "https://s.crunch.io/geodata/leafletjs/us-states.geojson",
        "id": "7ae898e210b04a9a8992314452c6677b",
        "metadata": {
            "status": "success",
            "properties": {
                "postal-code": [
                    "AL",
                    "AK",
                    "AZ", "etc."
                ],
                "name": [
                    "Alabama",
                    "Arkansas",
                    "Alaska", "etcetera"
                ]
            }
        }
    }
}

DELETE

DELETE /geodata/{geodata_id}/

Deletes the geodata entity. Returns 204.

Geodata for common applications

https://app.crunch.io/api/geodata/7ae898e210b04a9a8992314452c6677b/ US States – Use properties.name or properties.postal-code as your feature_key depending on the variable (state name or abbreviation), or id is FIPS code.
https://app.crunch.io/api/geodata/8f9f5fed101042c4815d2dd1fd248cec/ World – properties include ISO3166 name as well as ISO3166-1 Alpha-3 abbrev
https://app.crunch.io/api/geodata/d878d8471090417fa361536733e5f176/ UK Regions – properties.EER13NM matches a YouGov stylization of United Kingdom region names.

Creating new public Geodatum

Users with permission to create datasets can also create geodata, although in practice Crunch curates and makes available many common geographies, listed in the geodata catalog. Note that geodata created outside of the Crunch domain (ie without a .crunch.io domain in the URL) will not be available in whaam due to browser constraints. If you would like to make your geodatum public and have Crunch serve it, please contact us!

Adding a new geodatum is as easy as POSTing it to the geodata catalog, most easily via pycrunch. Crunch will attempt to download the geodata file and analyze the properties present on the features (generally polygons), which can then be associated with Crunch variables. The metadata extraction and summary can help you align variables and select the right property to associate with your Crunch geographic variable by category name.

Include a format member in the payload (on post or patch) to trigger automatic metadata extraction. The server will fetch and aggregate properties from FeatureCollections in order to provide hints for eventual consumers of the Crunch geodatum. The automatic feature extractor supports GeoJSON and TopoJSON formats; you may register a Shapefile (shp) or other resource as a Crunch geodatum, but will have to supply metadata hints yourself and are advised to indicate its non-json format.

The lists of properties returned in the metadata are correlated, such that if a feature in your geodata is missing a given property, it will return null.

>>> import pycrunch
>>> site = pycrunch.connect("me@mycompany.com", "yourpassword", "https://app.crunch.io/api/")
>>> geodata = self.site.geodata.create(as_entity({'name': 'test_geojson',
                                                  'location': 'https://s.crunch.io/geodata/leafletjs/us-states.geojson',
                                                  'description': '',
                                                  'format': 'geojson'}))
>>> geodata.body.metadata
pycrunch.elements.JSONObject(**{
    "postal-code": [
        "AL", 
        "AK", 
        "AZ", 
        "AK", 
        "CA", ...],
    "name": [
        "Alabama", 
        "Alaska", 
        "Arizona", 
        "Arkansas", 
        "California", ...]})

Modifying your public Geodata

You can modify any Geodatum that you own. Note that you can transfer ownership to another user if you change the owner_id of your geodatum. You may also change the metadata of your geodatum, but keep in mind that if you do this you will override any automated metadata extraction that Crunch provides. If you modify the location of the geodatum and do not provide a metadata parameter in the patch, Crunch will automatically extract metadata as long as the location is publicly accessible.

>>> import pycrunch
>>> site = pycrunch.connect("me@mycompany.com", "yourpassword", "https://app.crunch.io/api/")
>>> entity = site.geodata.index['<geodatum_url>'].entity
>>> entity.patch({'description': 'US States'})
>>> entity.refresh()
>>> entity.body.description
US States

Associating Variables with Geodata

To make maps with variables, update a variable’s view (or include with metadata at creation) as follows, where feature_key is key defined for each Feature in the geojson/topojson that matches the relevant field on the variable at hand (generally category names).

{"view": { "geodata": [
        {"geodatum": "<uri>",
         "feature_key": "properties.name"}
    ]}
}

Joins

Catalog

/datasets/{id}/joins/

A GET on this resource returns a Shoji Catalog enumerating the joins present in the Dataset. Each tuple in the index includes a “left_key” and a “right_key” member, each of which MUST be a variable URI. The left_key MUST be a variable in the current dataset, and the right_key SHOULD be a variable in another dataset. Both variables MUST be unique, and should be values taken from the same domain. For example, you might have a principal dataset which is a survey, with a respondent_id variable as a unique key. If you join a separate demographic dataset that has a unique column of the same respondent ids, you might see:

{
    "element": "shoji:catalog",
    "self": "https://app.crunch.io/api/datasets/837498a/joins/",
    "index": {
        "https://app.crunch.io/api/datasets/837498a/joins/demo/": {
            "left_key": "https://app.crunch.io/api/datasets/837498a/variables/1ef71d/",
            "right_key": "https://app.crunch.io/api/datasets/de3095/variables/19471d/"
        }
    }
}

A PATCH to this resource may add joins (by including new index members), alter existing joins (by replacing existing index members), or deleting joins (by setting existing members to null). A 204 indicates success. As with any Shoji Catalog, the URI of each entity in the index is the key.

Variables in joined datasets may then be used in analyses as if they were part of the principal dataset, simply by using their URI in this join’s variables catalog (see below). The joined dataset includes one row for each row in the principal dataset, by taking the key in the principal and looking up the corresponding key and row in the subordinate dataset. Rows in the principal which have no corresponding row in the subordinate are filled with the “No Data” missing value.

In order to create or alter a new join, the authenticated user will need to have reading access to the right dataset otherwise the server will respond with a 400 error.

The variable url sent for the left key must be a valid url for the current dataset. It is not allowed to use a different dataset as a left table.

Entity

/datasets/{id}/joins/{id}/

A GET on this resource returns a Shoji Entity describing the join, and a link to its Crunch Table (see next). Currently, the Join entity only contains the batch_id for its frame, and therefore isn’t very useful for clients. The entity resource is not editable; PATCH the joins catalog instead.

Joined variables catalog

/datasets/{id}/joins/{id}/variables/

A variables catalog which describes variables in the subordinate dataset. See Variables for more details.

Multitables

Catalog

/datasets/{dataset_id}/multitables/

GET

{
    "element": "shoji:catalog",
    "self": "/api/datasets/123/multitables/",
    "specification": "/api/specifications/multitables/",
    "description": "List of multitable definitions for this dataset",
    "index": {
        "/api/datasets/123/multitables/7ab1e/": {
            "is_public": false,
            "owner_id": "/api/users/b055/",
            "name": "Basic Demographics",
            "id": "7ab1e",
            "team": "/api/teams/56789/"
        }
    }
}

GET on this resource returns a Shoji Catalog with the list of Multitables that the current user can use on this Dataset.

This index contains two kinds of multitables: those that belong to the dataset, denoted by the is_public tuple attribute; and those that belong to the current user. Personal multitables are those created by the authenticated user, and they cannot be accessed by other users. Dataset multitables are available to all users who are authorized to view the dataset.

POST

POST a Shoji Entity to this catalog to create a new multitable definition. Entities must include a name and template; the template must contain a series of objects with a query and optionally transform. If omitted, is_public defaults to false. In similar fashion, team will default to null unless a specific team URL is provided.

A successful POST yields a 201 response that will contain a Location header with the URL of the newly created multitable.

All users with access to the dataset can create personal multitable definitions; however, only the current dataset editor can create public multitables (is_public: true) which everyone with access to the dataset can see. Attempting to create a public multitable when not the current dataset editor results in a 403 response.

Copying Multitables between datasets

It is possible to copy over a multitables between datasets as long as the permissions allow it.

Multitable copying requires that all the variables present in the template of the origin multitable exist on the target dataset and that they all have the same type. Copied multitables will be private by default.

POST a shoji entity to the catalog with indicating the URL of the multitable to copy:

{
    "element": "shoji:entity",
    "body": {
        "name": "Name of my copy",
        "multitable": "/api/datasets/123/multitables/7ab1e/"
    }
}

As shown in the example, it is possible to assign a new name to the copy. By default all copies will be private unless specified in the body.

PATCH

There are no elements of the catalog that can be changed via PATCH.

Entity

/datasets/{dataset_id}/multitables/{multitable_id}/

GET

{
    "element": "shoji:entity",
    "self": "datasets/123/multitables/7ab1e/",
    "views": {
        "tabbook": "/datasets/123/multitables/7ab1e/tabbook/"
    },
    "specification": "https://app.crunch.io/api/specifications/multitables/",
    "description": "Detail information for one multitable definition",
    "body": {
        "name": "Basic Demographics",
        "user": "/api/users/b055/",
        "template": [{
            "query": [{
                "variable": "/datasets/123/variables/abc/"
            }]
        }, {
            "query": [{
                "variable": "/datasets/123/variables/def/"
            }]
        }],
        "is_public": false,
        "id": "7ab1e",
        "team": "/api/teams/56789/"
    }
}

GET on this resource returns a Shoji entity containing the requested multitable definition.

PATCH

PATCH the entity to edit its name, template, team or is_public attributes. Successful PATCH requests return 204 status. As with the POSTing new entities to the catalog, only the dataset’s current editor can alter is_public.

The template attribute must contain a valid multitable definition.

Views

Multitable entities have a “tabbook” view. See below.

Permissions

Authorization to view, edit, and manage a dataset is controlled by the dataset’s permissions catalog:

/datasets/{id}/permissions/

The permissions catalog is a Shoji Catalog that collects (not contains) Users. There are no permission “entities” to retrieve, create, or delete: all action is achieved directly on the permissions catalog.

GET Catalog

{
    "element": "shoji:catalog",
    "self": "https://app.crunch.io/api/datasets/1/permissions/",
    "description": "Lists all the users that have access to this dataset",
    "index": {
        "https://app.crunch.io/api/users/42/": {
            "dataset_permissions": {
                "edit": true,
                "change_permissions": true,
                "view": true
            },
            "is_owner": true,
            "name": "Lauren Ipsum",
            "email": "lipsum@crunch.io"
        }
    }
}

If authorized to view the dataset, a successful GET returns a Shoji Catalog indicating the users who have access to this dataset and their respective permissions. This includes the current, authorized user making the request. Index tuples are keyed by User URL.

Tuple values include:

Name	Type	Description
name	string	Display name of the user
email	string	Email address of the user
is_owner	boolean	Whether this user is the dataset’s “owner”
dataset_permissions	object	Attributes governing the user’s authorization; see below

Supported dataset_permissions, all boolean, are:

view: Whether the user can view the dataset. Note that “viewing” is not limited to just GET requests, for dataset viewers may create filters, private variables, and saved analyses, for example.
edit: Whether the user can edit the dataset. When editing, users with this permission may modify the common data of a dataset, including things like public filters available to all viewers of the dataset.
change_permissions: Whether the user may alter other users’ authorization on this dataset, i.e., PATCH tuples for users that already exist on the catalog.

PATCH Catalog

The PATCH verb is used to make all modifications to dataset authorization: modifying existing permissions, revoking permissions for users with access, and granting access to users.

Modify existing

To change the permissions a user has, PATCH new dataset_permissions, like:

{
    "https://app.crunch.io/api/users/42/": {
        "dataset_permissions": {
            "edit": false,
            "view": true
        }
    },
    "send_notification": true,
    "dataset_url": "https://app.crunch.io/dataset/1"
 }

Only the “dataset_permissions” key in the tuple can be modified by PATCHing this catalog. Other keys, such as “name”, are included only for facilitating human-readable display of the catalog. If sent, these other keys will be ignored. To modify users’ names, see users.

If a subset of dataset_permissions are included in the payload, only the specified permissions will have their values updated. Omitted permissions will remain unchanged.

Multiple users’ permissions can be modified in a single request by including multiple tuples keyed by User URL.

The “send_notification” key in the payload is optional; if included and True, the server will send an email invitation to all newly added users (see below), as well as to users who are granted “edit” privileges.

If “send_notification” is included and true, you may also include a “dataset_url”, which is the URL that will be included in the email notifying the users that they now have access to the dataset. The web application will send the “browse” view URL, for example, so that when the user receives the email notification, the link they follow will take them to the relevant dataset. If “send_notification” is true and “dataset_url” is omitted, the email link will default to https://app.crunch.io/.

Add new user from within account

To add a user (i.e. share with them), there are two cases. First, if the user to be added is a member of the current user’s account, PATCH similar to above, using this user’s URL as key:

{
    "/users/id/": {
        "dataset_permissions": {
            "edit": false,
            "view": true
        },
        "profile": {
            "weight": null, 
            "applied_filters": []

        }
    }
}

This payload may include a “profile” member, which are initial values with which to populate the sharee’s user-dataset-profile.

Valid “profile” members include:

weight: a URL to one of the dataset’s weight variables; if omitted, the sharer’s current weight variable will be used
applied_filters: an array of filter URLs which are shared with all dataset viewers. If any of the specified filters are private, the PATCH request will return 400 status. Default value for “applied_filters” is [].

If the “profile” member is not included, the newly shared users will be created with their user dataset preferences matching the sharer’s current weight.

Revoking access

To revoke users’ access to this dataset (aka “unshare” with them), PATCH a null tuple for their user URLs:

{
    "/users/id/": null
}

Note that all of these PATCHes for add/edit/remove access to the dataset can be done in a single request that combines them all.

Validation

The server will insist, and clients should also validate, that

There is one and only one user with edit: true privileges for a dataset; if not, the PATCH request will return 400.
The users who are receiving new authorization via PATCH must have corresponding dataset_permissions on their account authorization. For example, the user who is updated to have edit: true has a dataset_permission of edit: true on their account authorization. If not, the PATCH request will return 400.
The user that is PATCHing this catalog must have share: true for this dataset; if not, the PATCH request will return 403.

Inviting new users

It is possible to share a dataset with people that are not users of Crunch yet. To do so, it is necessary to send in an email address instead of a user URL as a sharing key.

{
    "somebody@email.com": {
        "dataset_permissions": {
            "edit": false,
            "view": true
        },
        "profile": {
            "weight": null, 
            "applied_filters": []
        }
    },
    "send_notifications": true,
    "url_base": "https://app.crunch.io/password/change/${token}/",
    "dataset_url": "https://app.crunch.io/dataset/1/"
}

A new user with such email address will be created and added to the account of the user that is making the request. The new user will receive an invitation email to Crunch.io with an activation link. In case the user exists on other or the same account, no changes to the user will be made.

If “send_notification” was included and true in the request, the user will receive a notification email informing her about the new shared dataset if requested so. New users, unless they have an OAuth provider specified, will need to set a password, and the client application should send a URL template that directs them to a place where they can set that password. To do so, include a “url_base” attribute in the payload, a URL template with a ${token} variable into which the server will insert the password-setting token. For the Crunch web application, this template is https://app.crunch.io/password/change/${token}/.

Progress

Progress resources provide information about the current state of a long-running server process in Crunch. Some requests at certain endpoints may return 202 status containing a progress URL in the body, at which one can monitor the progress of the request that was accepted and not yet completed.

GET

GET /progress/{id}/ HTTP/1.1

{
    "element": "shoji:view",
    "self": "https:/app.crunch.io/api/progress/{id}/",
    "value": {
        "progress": 22,
        "message": "exported 2 variables"
    }
}

GET on a Progress view returns a Shoji View containing information about the status of the indicated process. The “progress” attribute contains a integer between -1 and 100. Positive progress values indicate that the job is being processed, while a negative value indicates that an error occurred in processing. Zero entails that the job has not been started, while 100 indicated completion. Additionally, if the id from the request URL does not exist, GET will nevertheless return 200 status and indicate "progress": 100.

Optionally, the View will provide a message regarding current status.

You must be authenticated to GET this resource.

Projects

Projects represent groups of users or teams that share a common set of datasets. Any user can belong to none or many projects.

They live under /projects/ and will list the projects that the authenticated user is a member or owner of.

Catalog

The projects catalog will list all the projects the authenticated user is a member of. Here you can create new projects via POST

GET

GET /projects/ HTTP/1.1

{
  "element": "shoji:catalog",
  "self": "http://app.crunch.io/api/projects/",
  "index": {
    "http://app.crunch.io/api/projects/4643/": {
      "name": "Project 1",
      "id": "4643",
      "icon": "",
      "permissions": {"view":true, "edit": "true"}
    },
    "http://app.crunch.io/api/projects/6c01/": {
      "name": "Project 2",
      "id": "6c01",
      "icon": "",
      "description": "Description of project 2",
      "permissions": {"view":true, "edit": "true"}
    }
  }
}

Name	Type	Default	Description
name	string		Required when creating the project
description	string	“”	Longer description pf tje [rpkect
id	string	autogenerated	The project’s id
icon	url	“”	Url for the icon file for the project. Empty string if not set
permissions	object	{}	permissions possessed by querying user against project

POST

New projects need a name (no uniqueness enforced) and will make the authenticated user its initial member and editor.

POST /projects/ HTTP/1.1

Payload example:

{
    "body": {
        "name": "My new project",
        "icon_url": "http://cdn.sample.com/project-icon.png"
    }
}

Creating a project with an icon

To create one with a starting icon you can POST an icon_url attribute indicating a url where to fetch that icon from (has to be a publicly accessible url).

If the server cannot read that URL the request will return a 409 error.

On success a copy of the file will be stored as the icon to be serve

If the icon_url attribute is not provided the API will pick an available icon from the icons catalog.

Default icon

The API can provide default icons to be used in new projects. Performing a GET request will return a Shoji:catalog with a list of available icons for the client to pick.

GET /icons/ HTTP/1.1

{
  "element": "shoji:catalog",
  "self": "http://app.crunch.io/api/icons/",
  "index": {
    "http://app.crunch.io/api/icons/01/": {},
    "http://app.crunch.io/api/icons/02/": {},
    "http://app.crunch.io/api/icons/03/": {},
    "http://app.crunch.io/api/icons/04/": {}
  }
}

Entity

GET

GET /projects/6c01/ HTTP/1.1

{
  "element": "shoji:entity",
  "self": "http://app.crunch.io/api/projects/6c01/",
  "catalogs": {
    "datasets": "http://app.crunch.io/api/projects/6c01/datasets/",
    "members": "http://app.crunch.io/api/projects/6c01/members/"
  },
  "views": {
    "icon": "http://app.crunch.io/api/projects/6c01/icon/"
  },
  "body": {
    "name": "Project 2",
    "description": "Long description text",
    "icon": "",
    "user_icon": false,
    "id": ""
  }
}

Name	Type	Default	Description
name	string		Required when creating the project
description	string	“”	Longer description of the project
id	string	autogenerated	The project’s id
icon	url	“”	Url for the icon file for the project; empty string if not set
user_icon	boolean	autogenerated	Will indicate false if the icon used on creation is from the provided catalog

Note about the icon attribute that points to the actual image file where the configured icon is. This url does not point to the views.icon Shoji view url.

The views.icon Shoji view endpoint is used to PUT the icon as a file upload for this project.

PATCH

The attributes that are allowed to be edited for a projet are:

name
description
icon_url

Only project editors can make these changes.

DELETE

Deleting a project will NOT delete its datasets. It will change their ownership to the authenticated user. Only the project current owner can delete a project.

DELETE /projects/6c01/ HTTP/1.1

Projects order

Returns the shoji:order in which the projects should be displayed for the user. This entity is independent for each user.

As the user is added to more projects, these will be added at the end of the shoji:order.

GET

Will return a shoji:order containing a flat list of all the projects where the current user belongs to.

GET /projects/order/ HTTP/1.1

{
  "element": "shoji:order",
  "self": "http://app.crunch.io/api/projects/order/",
  "graph": [
    "https://app.crunch.io/api/projects/cc9161/",
    "https://app.crunch.io/api/projects/a598c7/"
  ]
}

PUT

In order to change the order of the projects, the client will need to PUT the full payload back to the server.

The graph attribute should contain all projects included, else it will return a 400 response.

After a successful PUT request, the server will reply with a 204 response.

PUT /projects/order/ HTTP/1.1

{
  "element": "shoji:order",
  "self": "http://app.crunch.io/api/projects/order/",
  "graph": [
    "https://app.crunch.io/api/projects/cc9161/",
    "https://app.crunch.io/api/projects/a598c7/"
  ]
}

Members

Use this endpoint to manage the users that have access to this project.

Members permissions

Members of a project can be either viewers or editors. By default all members will be viewers and a selected group of them (at least one) will be editor.

These permissions are available on the members catalog under the permissions attribute on each member’s tuple.

The possible permissions are:

edit
view

That can have boolean values. Those with edit: true are considered project editors.

Project editors have edit privileges on all datasets as well as permissions to make changes on the project itself such as changing its name, icon, members management or change members’ permissions.

A project can have users or teams as members. Teams represent groups of users to be handled together. When a team gets access to a project, all members of the team inherit those permissions. In the case that a user has access to a project through several teams or direct access, the final permissions will be added together.

GET

Returns a catalog with all users and teams that have access to this project and their project permissions in the following format:

GET /projects/abcd/members/ HTTP/1.1

{
  "element": "shoji:catalog",
  "self": "http://app.crunch.io/api/projects/6c01/members/",
  "index": {
    "http://app.crunch.io/api/users/00002/": {
      "name": "Jean-Luc Picard",
      "email": "captain@crunch.io",
      "collaborator": false,
      "permissions": {
        "edit": true,
        "view": true
      },
      "allowed_dataset_permissions": {
        "edit": true,
        "view": true
      }
    },
    "http://app.crunch.io/api/users/00005/": {
      "name": "William Riker",
      "email": "firstofficer@crunch.io",
      "collaborator": false,
      "permissions": {
        "edit": false,
        "view": true
      },
      "allowed_dataset_permissions": {
        "edit": false,
        "view": true
      }
    },
    "http://app.crunch.io/api/teams/000a5/": {
      "name": "Viewers teams",
      "permissions": {
        "edit": false,
        "view": true
      }
    }
  }
}

The catalog will be indexed by each entity’s URL and its tuple will contain basic information (name and email) as well as the permissions each user has on the given project.

All project members have read access to this resource, but the allowed_dataset_permissions is only present to project editors. It contains the maximum dataset permissions each user can have. Assigning anything more permissive will not have effect.

PATCH

Use this method to add or remove members from the project. Only project editors have this capabilities, else you will get a 403 response.

To add a new user, PATCH a catalog keyed by the new user URL and an empty object for its value or a permissions tuple to set specific permissions (only edit allowed at this point).

To remove users, PATCH a catalog keyed by the user you want to remove and null for its value.

Note that you cannot remove yourself from the project, you will get a 400 response.

It is possible to perform many additions/removals in one request, the following example adds users /users/001/ and deletes users /users/002/

It is allowed to invite/add users to the project by email address. If the email is registered on the system the user will be invited to the project. If the email is not part of Crunch.io a new user invitation will be sent to that email with instructions to set up their account. They will be automatically part of this project only.

Attempting to remove users also allows to do so by email. In the case that the email does not exist, the server will return a 400 response.

PATCH /projects/abcd/members/ HTTP/1.1

{
  "element": "shoji:catalog",
  "self": "http://app.crunch.io/api/projects/6c01/members/",
  "index": {
    "http://app.crunch.io/api/users/001/": {},
    "http://app.crunch.io/api/teams/00a/": {},
    "http://app.crunch.io/api/users/002/": {
      "permissions": {
        "edit": true
      }
    },
    "http://app.crunch.io/api/users/003/": null,
    "user@email.com": {},
    "send_notification": true,
    "url_base": "https://app.crunch.io/password/change/${token}/",
    "project_url": "https://app.crunch.io/${project_id}/",
  }
}

Sending notifications

The users invited to a project can be both existing Crunch.io users or new users that don’t have a user account associated with the email.

If desired, the API can send automated email notifications to the involved users indicating that they now belong to the project.

It is necessary to add the send_notification boolean key on the index PATCHed to command the API to send these emails. Else, no notification will be sent.

When sending notifications, it is necessary for the client to include a url_base key as well that includes a string template that should point to a client location where the password resetting should happen for brand new users.

The server will replace the ${token} part of the string with the generated token and will be included on the notification email as a link for the invited user to configure their account in order to use the app.

Additionally, to indicate the URL of the project, the client can provide a project_url key that should be formatted as a URL containing a ${project_id} part that the server will replace with the project’s ID.

This behavior is the same as described for inviting new users when sharing a dataset

Users

A read only endpoint that lists all the individual users that have access to this project, independent from their access type (via team or direct project membership).

The payload shares a similar shape as the members endpoint, but this catalog contains only users.

GET /projects/abcd/users/ HTTP/1.1

{
  "element": "shoji:catalog",
  "self": "http://app.crunch.io/api/projects/6c01/members/",
  "index": {
    "http://app.crunch.io/api/users/00002/": {
      "name": "Jean-Luc Picard",
      "email": "captain@crunch.io",
      "collaborator": false,
      "allowed_dataset_permissions": {
        "edit": true,
        "view": true
      },
      "teams": []
    },
    "http://app.crunch.io/api/users/00005/": {
      "name": "William Riker",
      "email": "firstofficer@crunch.io",
      "collaborator": false,
      "allowed_dataset_permissions": {
        "edit": false,
        "view": true
      },
      "teams": ["http://app.crunch.io/api/teams/000a5/"]
    }
  }
}

Datasets

Will list all the datasets that have this project as their owner.

Adding datasets to projects

The way to add a dataset to a project is by changing the dataset’s owner to the id of the project you want to take ownership.

You must have edit and be current editor on any given dataset to change its owner and you must also have edit permissions on the target project.

PATCH to dataset entity

Send a PATCH request to the dataset entity that you want to make part of the project.

PATCH /datasets/cc9161/ HTTP/1.1

{"owner":"https://app.crunch.io/api/projects/abcd/"}

GET

Will show the list of all datasets where this project is their owner, the shape of the dataset tuple will be the same as in other dataset catalogs.

GET /projects/6c01/datasets/ HTTP/1.1

{
  "element": "shoji:catalog",
  "self": "http://app.crunch.io/api/projects/6c01/datasets/",
  "orders": {
    "order": "http://app.crunch.io/api/projects/6c01/datasets/order/"
  },
  "index": {
    "https://app.crunch.io/api/datasets/cc9161/": {
        "owner_name": "James T. Kirk",
        "name": "The Voyage Home",
        "description": "Stardate 8390",
        "archived": false,
        "permissions": {
            "edit": false,
            "change_permissions": false,
            "view": true
        },
        "size": {
            "rows": 1234,
            "columns": 67
        },
        "id": "cc9161",
        "owner_id": "https://app.crunch.io/api/users/685722/",
        "start_date": "2286",
        "end_date": null,
        "streaming": "no",
        "creation_time": "1986-11-26T12:05:00",
        "modification_time": "1986-11-26T12:05:00",
        "current_editor": "https://app.crunch.io/api/users/ff9443/",
        "current_editor_name": "Leonard Nimoy"
    },
    "https://app.crunch.io/api/datasets/a598c7/": {
        "owner_name": "Spock",
        "name": "The Wrath of Khan",
        "description": "",
        "archived": false,
        "permissions": {
            "edit": true,
            "change_permissions": true,
            "view": true
        },
        "size": {
            "rows": null,
            "columns": null
        },
        "id": "a598c7",
        "owner_id": "https://app.crunch.io/api/users/af432c/",
        "start_date": "2285-10-03",
        "end_date": "2285-10-20",
        "streaming": "no",
        "creation_time": "1982-06-04T09:16:23.231045",
        "modification_time": "1982-06-04T09:16:23.231045",
        "current_editor": null,
        "current_editor_name": null
    }
  }
}

Icon

The icon endpoint for a project is a ShojiView that allows to change the project’s icon via file upload or URL.

GET

On GET, it will return a shoji:view with its value containing a url to the icon file or empty string in case there isn’t an icon for this project yet.

By default all new projects have an empty icon URL.

GET /projects/6c01/icon/ HTTP/1.1

{
  "element": "shoji:view",
  "self": "http://app.crunch.io/api/projects/6c01/icon/",
  "value": ""
}

PUT

PUT to this endpoint to change a project’s icon.

There are two ways to change the icon, either via file upload or via icon URL.

Only the project’s editors can change the project’s icon.

Valid image extensions: ‘png’, 'gif’, 'jpg’, 'jpeg’ - Others will 400

File upload

The request should have be a standard multipart/form-data file upload with the file field named icon. The file’s contents will be stored and made available under the project’s url. The API will return a 201 response with the stored icon’s URL on its Location header.

PUT /projects/6c01/icon/ HTTP/1.1
Content-Disposition: form-data; name="icon"; filename="newicon.jpg"
Content-Type: image/jpeg

HTTP/1.1 201 Created
Location: https://app.crunch.io/api/datasets/223fd4/

Icon URL

Expects a Shoji:view request with its value pointing to a publicly accessible image resource that will be used as the project’s icon. This image will be copied to an API local location.

PUT /projects/6c01/datasets/icon/ HTTP/1.1

{
  "element": "shoji:view",
  "self": "http://app.crunch.io/api/projects/6c01/datasets/icon/",
  "value": "http://public.domain.com/icon.png"
}

HTTP/1.1 201 Created
Location: https://app.crunch.io/api/datasets/223fd4/

POST

Same as PUT

Datasets order

Contains the shoji:order in which the datasets of this project are to be ordered.

This is endpoint available for all project members but can only be updated by the project’s editors.

GET

Will return the shoji:order response containing the datasets that belong to the project.

GET /projects/6c01/datasets/order/ HTTP/1.1

{
  "element": "shoji:order",
  "self": "http://app.crunch.io/api/projects/6c01/datasets/order/",
  "graph": [
    "https://app.crunch.io/api/datasets/cc9161/",
    "https://app.crunch.io/api/datasets/a598c7/"
  ]
}

PUT

Allow to make modifications to the shoji:order for the contained datasets. Only the project’s editors can make these changes.

Trying to include an invalid dataset or an incomplete list will return a 400 response.

PUT /projects/6c01/datasets/order/ HTTP/1.1

{
  "element": "shoji:order",
  "self": "http://app.crunch.io/api/projects/6c01/datasets/order/",
  "graph": [
    "https://app.crunch.io/api/datasets/cc9161/",
    {
      "group": "https://app.crunch.io/api/datasets/a598c7/"
    }
  ]
}

Search

You can perform a cross-dataset search of dataset metadata (including variables) via the search endpoint. This search will return associated variables and dataset metadata. A query string, along with filtering properties can be provided to the search endpoint in order to refine the results. The query string provided is only used in plain-text format, any non-text or numeric characters are ignored at this time.

Results are limited only to those datasets the user has access to. Offset and limit parameters are also provided in order to provide performance-chunking options. The limit and offset are returned in relationship to datasets related to the search. You have reached the limit of available search entries when there are no longer records in the dataset field.

Here are the parameters that can be passed to the search endpoint.

Parameter	Type	Description
q	String	query string
f	Json Object	used to filter the output of the search (see below)
limit	Integer	limit the number of dataset results returned by the api to less than this amount (default: 10)
offset	Integer	offset into the search index to start gathering results from pre-filter
max_variables_per_dataset	Integer	limit the number of variables that match to this number (default: 100, max: 100) (deprecated, use variable_limit)
embedded_variables	Boolean	embed the results within the dataset results (this will become the default in the future)
projection	Json Object	used to limit the fields that should be returned in the search results. ID is always provided.
scope	Json Object	used to limit the fields that the search should look at.
grouping	String	One of `datasets` or `variables`. Tells if search results should be grouped by datasets or variables.
variable_limit	Integer	Limit the number of variables returned per dataset to this value, (default: 100, max: 100)
variable_offset	Integer	Offset into the variables returned per dataset, default 0
max_subfield_entries_per_variable	Integer	Number of items in the subfields of a variable (such as categories or subvariables), (default: 10, max: 100)

Providing a Projection:

projection argument must be a JSON array containing the name of the fields that should be projected for datasets and variables. The fields are specified with the namespace they refer to, like "variables.fieldname" and "datasets.fieldname". The namespace is the same as the key where the relevant search results are returned. Performing a search with an invalid field will pinpoint the invalid one and provide the list of accepted values.

Providing a Scope:

scope parameter must be a JSON array containing the name of the fields that should be used to resolve the query. Much like projection paramter this one accepts a list of fields with their namespace (datasets or variables). T he provided query will be looked up only in the specified fields if a scope is provided. A special field name * is accepted to specify that default fields should be looked for a specific namespace. A scope like datasets.name, variables.* will search the query in the default variable fields and in dataset name.

Grouping:

Default grouping is datasets which will enable searching in dataset data and its variables. The returned entries in “datasets” are datasets that match the query or contain a variable that matches it. Search results are limited to 1000 variables per dataset when grouping per dataset.

Switching to variables grouping makes the search only look in variables. Note that a “datasets” field is still returned, the entries there are the datasets the matching variables are part of, not datasets that match the query. This is done to allow providing dataset details for a variable without the need for a second call to fetch the dataset info.

Allowable filter parameters:

Parameter	Type	Description
dataset_ids	array of strings	limit results to particular dataset_ids or urls (user must have read access to that dataset)
team	string	url or id of the team to limit results (user must have read access to the team)
project	string	url or id of the project to limit results (user must have access to the project)
organization	string	if you are the owner for a given organization, you can filter all of the search results pertaining to the datasets in your organization.
user	string	url or id of the user that has read access to the datasets to limit results (user must match with the provided one)
owner	string	url or id of the dataset owner to limit results
label	string	The dataset must be in a folder or subfolder with the given name.
start_date	array of strings	array of `[begin, end]` range of values in ISO8601 format. Provide same for exact matching.
end_date	array of strings	array of `[begin, end]` range of values in ISO8601 format. Provide same for exact matching.
modification_time	array of strings	array of `[begin, end]` range of values in ISO8601 format. Provide same for exact matching.
creation_time	array of strings	array of `[begin, end]` range of values in ISO8601 format. Provide same for exact matching.

Fields Searched

Here is a list of the fields that are searched by the Crunch search endpoint

Field	Type	Description
category_names	List of Strings	Category names (associated with categorical variables)
dataset_id	String	ID of the dataset
description	String	description of the variable
id	String	ID of the variable
name	String	name of the variable
owner	String	owner’s ID of the variable
subvar_names	List of Strings	Names of the subvariables associated with the variable
users	List of Strings	User IDs having read-access to the variable
group_names	List of Strings	group names (from the variable ordering) associated with the variable
dataset_labels	List of Objects	dataset_labels associated with the user associated with the variable
dataset_name	String	dataset_name associated with this variable
dataset_owner	String	ID of the owner of the dataset associated with the variable
dataset_users	List of Strings	User IDs having read-access to the dataset associated with the variable
dataset_teams	List of Strings	Team IDs having read-access to the dataset associated with the variable
dataset_projects	List of Strings	Project IDs having read-access to the dataset associated with the variable

Grouping by datasets:

GET /search/?q={query}&f={filter}&limit={limit}&offset={offset}&projection={projection}&grouping=datasets  HTTP/1.1

import pycrunch
site = pycrunch.connect("me@mycompany.com", "yourpassword", "https://app.crunch.io/api/")
results = site.follow('search', 'q=findme&embedded_variables=True').value
datasets_found = results['groups'][0]['datasets']
variables_by_dataset = {k, v.get('variables', []) for k, v in datasets_found.iteritems()}

```json
{
   "element": "shoji:view",
    "self": "https://app.crunch.io/api/search/?q=blue&grouping=datasets",
    "description": "Returns a view with relevant search information",
    "value": {
        "groups": [
            {
                "group": "Search Results",
                "datasets": {
                    "https://app.crunch.io/api/datasets/173b4eec13f542588b9b0a9cbcd764c9/": {
                        "labels": [],
                        "name": "econ_few_columns_0",
                        "description": ""
                    },
                    "https://app.crunch.io/api/datasets/4473ab4ee84b40b2a7cd5cab4548d584/": {
                        "labels": [],
                        "name": "simple_alltypes",
                        "description": ""
                    }
                },
                "variables": {
                    "https://app.crunch.io/api/datasets/4473ab4ee84b40b2a7cd5cab4548d584/variables/000000/": {
                        "dataset_labels": [],
                        "users": [
                            "00002"
                        ],
                        "alias": "x",
                        "dataset_end_date": null,
                        "category_names": [
                            "red",
                            "green",
                            "blue",
                            "4",
                            "8",
                            "9",
                            "No Data"
                        ],
                        "dataset_start_date": null,
                        "name": "x",
                        "dataset_description": "",
                        "dataset_archived": false,
                        "group_names": null,
                        "dataset": "https://app.crunch.io/api/datasets/4473ab4ee84b40b2a7cd5cab4548d584/",
                        "dataset_id": "bb987b45a5b04caba10dec4dad7b37a8",
                        "dataset_created_time": null,
                        "subvar_names": [],
                        "dataset_name": "export test 94",
                        "description": "Numeric variable with value labels"
                    }
                },
                "variable_count": 14,
                "totals": {
                    "variables": 4,
                    "datasets": 2
                }
            }
        ]
    }
}

Search results are limited to 1000 variables per dataset.

Grouping by variables:

GET /search/?q={query}&f={filter}&limit={limit}&offset={offset}&grouping=variables  HTTP/1.1

{
 "element": "shoji:view",
 "self": "https://app.crunch.io/api/search/?q=Atchafalaya&grouping=variables", 
 "description": "Returns a view with relevant search information", 
 "value": {
  "groups": [{
      "group":"Search Results",
      "totals":{
        "variables":2,
        "datasets":2
      },
      "buckets":{
        "Qk9XX0FGX05hbWU":[
          "http://app.crunch.io:29668/api/datasets/825b87ff955049128b9d48b614abbe99/variables/000008/",
          "http://app.crunch.io:29668/api/datasets/fcd37212fe0d4b8eb8804ffb7ccb933d/variables/000008/"
        ]
      },
      "order":[
        "http://app.crunch.io:29668/api/datasets/825b87ff955049128b9d48b614abbe99/variables/000008/",
        "http://app.crunch.io:29668/api/datasets/fcd37212fe0d4b8eb8804ffb7ccb933d/variables/000008/"
      ],
      "variables":{
        "http://app.crunch.io:29668/api/datasets/fcd37212fe0d4b8eb8804ffb7ccb933d/variables/000008/":{
          "alias":"BOW_AF_Name",
          "category_names":[
            "East Cote Blanche Bay",
            "Atchafalaya Bay, Delta, Gulf waters",
            "Barataria Bay",
            "Bayou Grand Caillou",
            "Bayou du Large",
            "Bays Gardene, Black, American and Crabe",
            "Calcasieu Lake",
            "Calcasieu River and Ship Channel",
            "California Bay and Breton Sound",
            "Grid 12",
            "..."
          ],
          "bucket":"Qk9XX0FGX05hbWU",
          "name":"BOW_AF_Name",
          "dataset":"http://app.crunch.io:29668/api/datasets/fcd37212fe0d4b8eb8804ffb7ccb933d/"
        },
        "http://app.crunch.io:29668/api/datasets/825b87ff955049128b9d48b614abbe99/variables/000008/":{
          "alias":"BOW_AF_Name",
          "category_names":[
            "East Cote Blanche Bay",
            "Atchafalaya Bay, Delta, Gulf waters",
            "Barataria Bay",
            "Bayou Grand Caillou",
            "Bayou du Large",
            "Bays Gardene, Black, American and Crabe",
            "Calcasieu Lake",
            "Calcasieu River and Ship Channel",
            "California Bay and Breton Sound",
            "Grid 12",
            "..."
          ],
          "bucket":"Qk9XX0FGX05hbWU",
          "name":"BOW_AF_Name",
          "dataset":"http://app.crunch.io:29668/api/datasets/825b87ff955049128b9d48b614abbe99/"
        }
      },
      "datasets":{
        "http://app.crunch.io:29668/api/datasets/fcd37212fe0d4b8eb8804ffb7ccb933d/":{
          "modification_time":"2017-06-22T17:00:36.571000",
          "archived":false,
          "description":"",
          "end_date":null,
          "name":"test_variable_search_matching_2",
          "labels":null,
          "creation_time":"2017-06-22T17:00:37.024000",
          "id":"fcd37212fe0d4b8eb8804ffb7ccb933d",
          "projects":[

          ],
          "start_date":null
        },
        "http://app.crunch.io:29668/api/datasets/825b87ff955049128b9d48b614abbe99/":{
          "modification_time":"2017-06-22T17:00:34.681000",
          "archived":false,
          "description":"",
          "end_date":null,
          "name":"test_variable_search_matching_1",
          "labels":null,
          "creation_time":"2017-06-22T17:00:35.151000",
          "id":"825b87ff955049128b9d48b614abbe99",
          "projects":[

          ],
          "start_date":null
        }
      }
    }
  ]
 }
}

Sources

Catalog

/sources/

A Shoji Catalog representing the Sources added by this User. POST a multipart form here, with an “uploaded_file” field containing the file to upload; 201 indicates success, and the returned Location header refers to the new Source resource.

The uploaded sources will use the file’s filename as their .name attribute and will have blank description.

The catalog will include the sources’ .name and .description

Alternately, you may POST a urlencoded payload with a source_url parameter that points to a publicly accessible URL. Both “http” and the “s3” scheme are supported. This endpoint will then download such file synchronously and verify that it is a valid source file. It will be made available for the current user sources catalog.

Regular Shoji POST payloads are also supported to create new sources from remote source URLs. A location attribute should be included in the Shoji:entity body POSTed.

{
  "element": "shoji:entity",
  "body": {
    "location": "<url>",
    "name": "Optional name",
    "description": "Optional description"
  }
}

Entity

/sources/{id}/

A Shoji Entity representing a single Source. Its “body” member contains:

name: A friendly name for the Source.
type: a string declaring the media type of the source. One of (“csv”, “spss”).
user_id: the id of the User who created the Source.
location: an absolute URI to the data. Currently, the only supported scheme is “crunchfile://”, which indicates a file uploaded to Crunch.io.
settings: an object containing configuration for translating the source to crunch internals. Its members vary by type:
- csv:
  - strict: an integer. If 1, extra columns or undefined category ids in the CSV will raise an error. If 0, they will be added to the dataset.

A PUT must contain a JSON object with members from the Shoji Entity “body” which the client intends to update. 204 indicates success.

A DELETE destroys the Source resource. 204 indicates success.

/sources/{id}/file/

A GET returns the original source file.

Tab books

/datasets/{dataset_id}/multitables/{multitable_id}/tabbook/

The default tabbook view of a multitable will generate an excel (.xlsx) workbook containing each variable in the dataset crosstabbed with a given multitable.

POST

A successful POST request to /datasets/{dataset_id}/multitables/{multitable_id}/tabbook/ will generate a download location to which the exporter will write this file when it is done computing (it may take some time for large datasets). The server will return a 202 response indicating that the export job started with a Location header indicating where the final exported file will be available. The response’s body will contain the URL for the progress URL where to query the state of the export job. Clients should note the download URL, monitor progress, and when complete, GET the download location. See Progress for details.

Requesting the same job, if still in progress, will return the same 202 response indicating the original progress to check. If the export is finished, the server will 302 redirect to the destination for download.

If there have been changes on the dataset attributes, a new tab book will be generated regardless of the status of any other pending exports.

POST /api/datasets/a598c7/multitables/7ab1e/tabbook/ HTTP/1.1

HTTP/1.1 202 Accepted
Location: https://s3-url/filename.xlsx

{
    "element": "shoji:view",
    "self": "https://app.crunch.io/api/datasets/a598c7/multitables/{id}/tabbook/",
    "value": "https://app.crunch.io/api/progress/5be83a/"
}

Alternatively, you can request a JSON output for your tab book by adding an Accept request header.

POST /api/datasets/a598c7/multitables/7ab1e/tabbook/ HTTP/1.1
Accept: application/json

{
    "meta": {
        "dataset": {
            "name": "weighted_simple_alltypes",
            "notes": ""
        },
        "layout": "many_sheets",
        "sheets": [
            {
                "display_settings": {
                    "countsOrPercents": {
                        "value": "percent"
                    },
                    "currentTab": {
                        "value": 0
                    },
                    "decimalPlaces": {
                        "value": 0
                    },
                    "percentageDirection": {
                        "value": "colPct"
                    },
                    "showEmpty": {
                        "value": false
                    },
                    "showNotes": {
                        "value": false
                    },
                    "slicesOrGroups": {
                        "value": "groups"
                    },
                    "valuesAreMeans": {
                        "value": false
                    },
                    "vizType": {
                        "value": "table"
                    }
                },
                "filters": null,
                "name": "x",
                "weight": "z"
            },
            ... (one entry for each sheet)  
        ],
        "template": [
            {
                "query": [
                    {
                        "args": [
                            {
                                "variable": "000002"
                            }
                        ],
                        "function": "bin"
                    }
                ]
            },
            {
                "query": [
                    {
                        "args": [
                            {
                                "variable": "00000a"
                            },
                            {
                                "value": null
                            }
                        ],
                        "function": "rollup"
                    }
                ]
            }
        ]
    },
    "sheets": [
        {
            "result": [
                {
                    "result": {
                        "counts": [
                            1,
                            1,
                            1,
                            1,
                            1,
                            1,
                            0
                        ],
                        "dimensions": [
                            {
                                "derived": false,
                                "references": {
                                    "alias": "x",
                                    "description": "Numeric variable with value labels",
                                    "name": "x"
                                },
                                "type": {
                                    "categories": [
                                        {
                                            "id": 1,
                                            "missing": false,
                                            "name": "red",
                                            "numeric_value": 1
                                        },
                                        {
                                            "id": 2,
                                            "missing": false,
                                            "name": "green",
                                            "numeric_value": 2
                                        },
                                        {
                                            "id": 3,
                                            "missing": false,
                                            "name": "blue",
                                            "numeric_value": 3
                                        },
                                        {
                                            "id": 4,
                                            "missing": false,
                                            "name": "4",
                                            "numeric_value": 4
                                        },
                                        {
                                            "id": 8,
                                            "missing": true,
                                            "name": "8",
                                            "numeric_value": 8
                                        },
                                        {
                                            "id": 9,
                                            "missing": false,
                                            "name": "9",
                                            "numeric_value": 9
                                        },
                                        {
                                            "id": -1,
                                            "missing": true,
                                            "name": "No Data",
                                            "numeric_value": null
                                        }
                                    ],
                                    "class": "categorical",
                                    "ordinal": false
                                }
                            }
                        ],
                        "measures": {
                            "count": {
                                "data": [
                                    0.0,
                                    0.0,
                                    1.234,
                                    0.0,
                                    3.14159,
                                    0.0,
                                    0.0
                                ],
                                "metadata": {
                                    "derived": true,
                                    "references": {},
                                    "type": {
                                        "class": "numeric",
                                        "integer": false,
                                        "missing_reasons": {
                                            "No Data": -1
                                        },
                                        "missing_rules": {}
                                    }
                                },
                                "n_missing": 5
                            }
                        },
                        "n": 6
                    }
                },
                {
                    "result": {
                        "counts": [
                            1,
                            0,
                            0,
                            0,
                            0,
                            0,
                            1,
                            0,
                            0,
                            0,
                            0,
                            0,
                            0,
                            1,
                            0,
                            0,
                            0,
                            0,
                            1,
                            0,
                            0,
                            0,
                            0,
                            0,
                            0,
                            0,
                            0,
                            0,
                            0,
                            1,
                            1,
                            0,
                            0,
                            0,
                            0,
                            0,
                            0,
                            0,
                            0,
                            0,
                            0,
                            0
                        ],
                        "dimensions": [
                            {
                                "derived": false,
                                "references": {
                                    "alias": "x",
                                    "description": "Numeric variable with value labels",
                                    "name": "x"
                                },
                                "type": {
                                    "categories": [
                                        {
                                            "id": 1,
                                            "missing": false,
                                            "name": "red",
                                            "numeric_value": 1
                                        },
                                        {
                                            "id": 2,
                                            "missing": false,
                                            "name": "green",
                                            "numeric_value": 2
                                        },
                                        {
                                            "id": 3,
                                            "missing": false,
                                            "name": "blue",
                                            "numeric_value": 3
                                        },
                                        {
                                            "id": 4,
                                            "missing": false,
                                            "name": "4",
                                            "numeric_value": 4
                                        },
                                        {
                                            "id": 8,
                                            "missing": true,
                                            "name": "8",
                                            "numeric_value": 8
                                        },
                                        {
                                            "id": 9,
                                            "missing": false,
                                            "name": "9",
                                            "numeric_value": 9
                                        },
                                        {
                                            "id": -1,
                                            "missing": true,
                                            "name": "No Data",
                                            "numeric_value": null
                                        }
                                    ],
                                    "class": "categorical",
                                    "ordinal": false
                                }
                            },
                            {
                                "derived": true,
                                "references": {
                                    "alias": "z",
                                    "description": "Numberic variable with missing value range",
                                    "name": "z"
                                },
                                "type": {
                                    "class": "enum",
                                    "elements": [
                                        {
                                            "id": -1,
                                            "missing": true,
                                            "value": {
                                                "?": -1
                                            }
                                        },
                                        {
                                            "id": 1,
                                            "missing": false,
                                            "value": [
                                                1.0,
                                                1.5
                                            ]
                                        },
                                        {
                                            "id": 2,
                                            "missing": false,
                                            "value": [
                                                1.5,
                                                2.0
                                            ]
                                        },
                                        {
                                            "id": 3,
                                            "missing": false,
                                            "value": [
                                                2.0,
                                                2.5
                                            ]
                                        },
                                        {
                                            "id": 4,
                                            "missing": false,
                                            "value": [
                                                2.5,
                                                3.0
                                            ]
                                        },
                                        {
                                            "id": 5,
                                            "missing": false,
                                            "value": [
                                                3.0,
                                                3.5
                                            ]
                                        }
                                    ],
                                    "subtype": {
                                        "class": "numeric",
                                        "missing_reasons": {
                                            "No Data": -1
                                        },
                                        "missing_rules": {}
                                    }
                                }
                            }
                        ],
                        "measures": {
                            "count": {
                                "data": [
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    1.234,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    3.14159,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0
                                ],
                                "metadata": {
                                    "derived": true,
                                    "references": {},
                                    "type": {
                                        "class": "numeric",
                                        "integer": false,
                                        "missing_reasons": {
                                            "No Data": -1
                                        },
                                        "missing_rules": {}
                                    }
                                },
                                "n_missing": 5
                            }
                        },
                        "n": 6
                    }
                },
                {
                    "result": {
                        "counts": [
                            1,
                            0,
                            0,
                            1,
                            0,
                            0,
                            0,
                            1,
                            0,
                            0,
                            1,
                            0,
                            0,
                            0,
                            1,
                            0,
                            0,
                            1,
                            0,
                            0,
                            0
                        ],
                        "dimensions": [
                            {
                                "derived": false,
                                "references": {
                                    "alias": "x",
                                    "description": "Numeric variable with value labels",
                                    "name": "x"
                                },
                                "type": {
                                    "categories": [
                                        {
                                            "id": 1,
                                            "missing": false,
                                            "name": "red",
                                            "numeric_value": 1
                                        },
                                        {
                                            "id": 2,
                                            "missing": false,
                                            "name": "green",
                                            "numeric_value": 2
                                        },
                                        {
                                            "id": 3,
                                            "missing": false,
                                            "name": "blue",
                                            "numeric_value": 3
                                        },
                                        {
                                            "id": 4,
                                            "missing": false,
                                            "name": "4",
                                            "numeric_value": 4
                                        },
                                        {
                                            "id": 8,
                                            "missing": true,
                                            "name": "8",
                                            "numeric_value": 8
                                        },
                                        {
                                            "id": 9,
                                            "missing": false,
                                            "name": "9",
                                            "numeric_value": 9
                                        },
                                        {
                                            "id": -1,
                                            "missing": true,
                                            "name": "No Data",
                                            "numeric_value": null
                                        }
                                    ],
                                    "class": "categorical",
                                    "ordinal": false
                                }
                            },
                            {
                                "derived": true,
                                "references": {
                                    "alias": "date",
                                    "description": null,
                                    "name": "date"
                                },
                                "type": {
                                    "class": "enum",
                                    "elements": [
                                        {
                                            "id": 0,
                                            "missing": false,
                                            "value": "2014-11"
                                        },
                                        {
                                            "id": 1,
                                            "missing": false,
                                            "value": "2014-12"
                                        },
                                        {
                                            "id": 2,
                                            "missing": false,
                                            "value": "2015-01"
                                        }
                                    ],
                                    "subtype": {
                                        "class": "datetime",
                                        "missing_reasons": {
                                            "No Data": -1
                                        },
                                        "missing_rules": {},
                                        "resolution": "M"
                                    }
                                }
                            }
                        ],
                        "measures": {
                            "count": {
                                "data": [
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    1.234,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    3.14159,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0,
                                    0.0
                                ],
                                "metadata": {
                                    "derived": true,
                                    "references": {},
                                    "type": {
                                        "class": "numeric",
                                        "integer": false,
                                        "missing_reasons": {
                                            "No Data": -1
                                        },
                                        "missing_rules": {}
                                    }
                                },
                                "n_missing": 5
                            }
                        },
                        "n": 6
                    }
                }
            ]
        },
        ... (one entry for each sheet)
    ]
}

POST body parameters

At the top level, the tab book endpoint can take filtering and variable limiting parameters.

Name	Type	Default	Description	Example
filter	object	null	Filter by Crunch Expression. Variables used in the filter should be fully-expressed urls.	[{“filter”:“https://app.crunch.io/api/datasets/45fc0d5ca0a945dab7d05444efa3310a/filters/5f14133582f34b8b85b408830f4b4a9b/”}]
where	object	null	Crunch Expression signifying which variables to use	{ “function”: “select”, “args”: [ { “map”: { “https://app.crunch.io/api/datasets/45fc0d5ca0a945dab7d05444efa3310a/variables/000004/”: { “variable”: “https://app.crunch.io/api/datasets/45fc0d5ca0a945dab7d05444efa3310a/variables/000004/” }, “https://app.crunch.io/api/datasets/45fc0d5ca0a945dab7d05444efa3310a/variables/000003/”: { “variable”: “https://app.crunch.io/api/datasets/45fc0d5ca0a945dab7d05444efa3310a/variables/000003/” } } } ] }
options	object	{}	further options defining the tabbook output.
weight	url	null	Provide a weight for the tabbook generation, if the weight is omitted from the request, the currently selected weight is used. If “null” is provided, then the tabbook generation will be unweighted.	“http://app.crunch.io/api/datasets/45fc0d5ca0a945dab7d05444efa3310a/variables/5f14133582f34b8b85b408830f4b4a9b/”

Options

Options for generating tab books

Name	Type	Default	Description	Example
display_settings	object	{}	a set of settings to define how the output should be displayed	See Below.
layout	string	many_sheets	“many_sheets” indicates each variable should have its own Sheet in the xls spreadsheet. “single_sheet” indicates all output should be in the same sheet.	single_sheet

Display Settings

Further tab book viewing options.

Name	Type	Default	Description	Example
decimalPlaces	object	0	number of decimal places to diaplay	{“value”: 0}
vizType	object	table	Visialization Type	{value:table},
countsOrPercents	object	percent	use counts or percents	{value:percent}
percentageDirection	object	row or column based percents		{value:colPct}
showNotes	object	display variable notes in sheet header		{value:false}
slicesOrGroups	object	groups	slices or groups	{value:groups}
valuesAreMeans	object	false	are values means?	{value:false}

Table

All datasets contain a /table/ endpoint which allows access the full data values. It provides granular control over the rows and columns for each dataset.

Fetching values

GET

Dataset editors can GET to this resource and obtain a Shoji Table of the dataset’s data. It will expose all the variables that are visible by the authenticated user (Public + personals created by them if requested) as well as the exclusion filter applied (if any).

To include the personal variables on the output table the client should include the include_personal GET parameter on the request with a True value.

A metadata section contains the definitions of all the variables matched by variable ID with the corresponding entry under data.

Dataset viewers can only access the metadata portion of the response. This means they cannot make use of the limit and offset parameters to query data unless the dataset’s setting viewers_can_export is set to True, else the server will respond with a 403 response.

GET /datasets/:id/table/ HTTP/1.1

{
  "self": "https:\/\/alpha.crunch.io\/api\/datasets\/:id\/table\/",
  "element": "crunch:table",
  "data": {
    "000007": [ 1, 1, 2 ],
    "000004": [ 1, 1, 1 ],
    "000005": [ 1, 0, 1 ],
    "000003": [ "red", "green", "MORE JUNK" ],
    "000000": [ 1, 2, 9 ],
    "000001": [ "2000-01-01T00:00:00", "2000-01-02T00:00:00", { "?": -1 } ],
    "000008": [ 1, 2, 3 ],
    "000009": [ 2, 3, 4 ],
    "00000c": [ [ 1, 1, 2 ], [ 1, 2, 3 ], [ 2, 3, 4 ] ]
  },
  "description": "A Crunch Table of data for this dataset.",
  "metadata": {
    "000004": {
      "alias": "bool1",
      "type": "categorical",
      "name": "mymrset | Response #1",
      "categories": [
        { "numeric_value": 1, "selected": true, "id": 1, "name": "1", "missing": false },
        { "numeric_value": 0, "id": 0, "name": "0", "missing": false },
        { "numeric_value": null, "id": -1, "name": "No Data", "missing": true }
      ],
      "description": "bool1"
    },
    "000005": {
      "alias": "bool2",
      "type": "categorical",
      "name": "mymrset | Response #2",
      "categories": [
        { "numeric_value": 1, "selected": true, "id": 1, "name": "1", "missing": false },
        { "numeric_value": 0, "id": 0, "name": "0", "missing": false },
        { "numeric_value": null, "id": -1, "name": "No Data", "missing": true }
      ],
      "description": "bool2"
    },
    "000003": {
      "alias": "str",
      "type": "text",
      "name": "str",
      "missing_reasons": { "No Data": -1 },
      "description": "40 character string"
    },
    "000000": {
      "alias": "x",
      "type": "categorical",
      "name": "x",
      "categories": [
        { "numeric_value": 1, "id": 1, "name": "red", "missing": false },
        { "numeric_value": 2, "id": 2, "name": "green", "missing": false },
        { "numeric_value": 3, "id": 3, "name": "blue", "missing": false },
        { "numeric_value": 4, "id": 4, "name": "4", "missing": false },
        { "numeric_value": 8, "id": 8, "name": "8", "missing": true },
        { "numeric_value": 9, "id": 9, "name": "9", "missing": false },
        { "numeric_value": null, "id": -1, "name": "No Data", "missing": true }
      ],
      "description": "Numeric variable with value labels"
    },
    "000001": {
      "name": "y",
      "type": "datetime",
      "missing_reasons": { "No Data": -1 },
      "alias": "y",
      "resolution": "s",
      "description": "Date variable"
    },
    "00000c": {
      "alias": "categorical_array",
      "type": "categorical_array",
      "name": "categorical_array",
      "subvariables": ["000007", "000008", "000009"],
      "subreferences": {
        "000009": {"alias": "ca_subvar_1", "name": "ca_subvar_1", "description": ""},
        "000007": {"alias": "ca_subvar_2", "name": "ca_subvar_2", "description": ""},
        "000008": {"alias": "ca_subvar_3", "name": "ca_subvar_3", "description": ""}
      },
      "categories": [
        { "numeric_value": null, "selected": false, "id": 1, "missing": false, "name": "a" },
        { "numeric_value": null, "selected": false, "id": 2, "missing": false, "name": "b" },
        { "numeric_value": null, "selected": false, "id": 3, "missing": false, "name": "c" },
        { "numeric_value": null, "selected": false, "id": 4, "missing": false, "name": "d" },
        { "numeric_value": null, "selected": false, "id": -1, "missing": true, "name": "No Data" }
      ],
      "description": ""
    }
  }
}

Filtering

This endpoint accepts the same filter parameters described under Filtering Endpoints

Teams

Teams contain references to users and datasets. By sharing a dataset with a team, you can grant access to a set of users at once, and by adding a user to a team, you can grant them access to a set of datasets.

Catalog

/teams/

GET

GET /teams/ HTTP/1.1
Host: app.crunch.io
--------
200 OK
Content-Type: application/json

// Example team catalog:

{
    "element": "shoji:catalog",
    "self": "https://app.crunch.io/api/teams/",
    "description": "List of all the teams where the current user is member",
    "index": {
        "https://app.crunch.io/api/teams/d07edb/": {
            "name": "The A-Team",
            "permissions": {
              "team_admin": true
            }
        },
        "https://app.crunch.io/api/teams/67fe89/": {
            "name": "Palo Alto Data Science",
            "permissions": {
              "team_admin": false
            }
        }
    }
}

teams <- getTeams()
names(teams)
## [1] "The A-Team" "Palo Alto Data Science"

POST

To create a new team, POST a Shoji Entity with a team “name” in the body. No other attributes are required, and you will be automatically assigned as a “team_admin”.

POST /teams/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
...
{
    "element": "shoji:entity",
    "body": {
        "name": "My new team with ytpo"
    }
}
--------
201 Created
Location: /teams/03df2a/

# Create a new team by assigning into the teams catalog
teams[["My new team with ytpo"]] <- list()
names(teams) # Let's see that it was created
## [1] "The A-Team" "Palo Alto Data Science"
## [3] "My new team with ytpo"

# You can also assign members to the team when you create it,
# even though the POST /teams/ API does not support it.
teams[["New team with members"]] <- list(members="fake.user@example.com")

Entity

/teams/{team_id}/

GET

GET /teams/d07edb/ HTTP/1.1
Host: app.crunch.io
--------
200 OK
Content-Type: application/json

// Example team entity

{
    "element": "shoji:entity",
    "self": "https://app.crunch.io/api/teams/d07edb/",
    "description": "Details for a specific team",
    "body": {
        "creator": "https://app.crunch.io/api/users/41c69d/",
        "id": "d07edb",
        "name": "The A-Team"
    },
    "catalogs": {
        "datasets": "https://app.crunch.io/api/teams/d07edb/datasets/",
        "members": "https://app.crunch.io/api/teams/d07edb/members/"
    }
}

# Access a team by name using $ or [[ from the team catalog
a.team <- teams[["The A-Team"]]
name(a.team)
## [1] "The A-Team"
self(a.team)
## [1] "https://app.crunch.io/api/teams/d07edb/"

A GET request on a team entity URL returns the same “name”, “id” and “creator” attributes as shown in the team catalog, as well as references to the “datasets” and “members” catalogs corresponding to the team. Authorization is required: if the requesting user is not a member of the team, a 404 response will result.

PATCH

Team names are editable by PATCHing the team entity. Authorization is required: only team members with “team_admin” permission may edit the team’s name; other team members will receive a 403 response on PATCH.

PATCH /teams/03df2a/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
{
    "element": "shoji:entity",
    "body": {
        "name": "My new team without typo"
    }
}
--------
204 No Content

name(teams[["My new team with ytpo"]]) <- "My new team without typo"
names(teams) # Check that it was updated
## [1] "The A-Team" "Palo Alto Data Science"
## [3] "My new team without typo"

Team members catalog

/teams/{team_id}/members/

The team members catalog is a Shoji Catalog similar in nature to the dataset permissions catalog. It collects references to users and defines the authorizations they have with respect to the team. All information about the member relationships is contained in the catalog–there are no “member entities”–and all changes to team membership, whether adding, modifying, or removing users, is done via PATCH.

GET

GET /teams/d07edb/members/ HTTP/1.1
Host: app.crunch.io
--------
200 OK
Content-Type: application/json

{
    "element": "shoji:catalog",
    "self": "https://app.crunch.io/api/teams/d07edb/members/",
    "description": "Catalog of users that belong to this team",
    "index": {
        "https://app.crunch.io/api/users/47193a/": {
            "name": "B. A. Baracus",
            "permissions": {
                "team_admin": false
            }
        },
        "https://app.crunch.io/api/users/41c69d/": {
            "name": "Hannibal",
            "permissions": {
                "team_admin": true
            }
        }
    }
}

members(team)

Tuple values include:

Name	Type	Description
name	string	Display name of the user
permissions	object	Attributes governing the user’s authorization on the team

Supported permissions, all boolean, include:

team_admin: Allows add/remove and manage the members and permissions of a team as well modify and delete the team in question. Defaults as false.

PATCH

Authorization is required: team members who do not have the “team_admin” permission and who attempt to PATCH the member catalog will receive a 403 response. As with the team entity, non-members will receive 404 on attempted PATCH.

PATCH a partial Shoji Catalog to add users to the team, to modify permissions of members already on the team, and to remove team members. The examples below illustrate each of those actions separately, but all can be done together in a single PATCH request, in fact.

In the “index” attribute of the catalog, object keys must be either (a) URLs of User entities or (b) email addresses. They can be mixed in a single PATCH request. Using email address allows you to invite a user to Crunch while adding them to the team if they do not yet have a Crunch account, but it is also valid as a reference to Users that already exist.

Add and modify members

PATCH /teams/d07edb/members/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
{
    "element": "shoji:catalog",
    "index": {
        "https://app.crunch.io/api/users/47193a/": {
            "permissions": {
                "team_admin": true
            }
        },
        "https://app.crunch.io/api/users/e3211a/": {},
        "templeton.peck@army.gov": {
            "permissions": {
                "team_admin": true
            }
        }
    },
    "send_notification": true,
    "url_base": "https://app.crunch.io/password/change/${token}/"
}
--------
204 No Content

If the index object keys correspond to users that already appear in the member catalog, their permissions will be updated with the corresponding value. In this example, user 47193a, B. A. Baracus, has been given the team_admin permission.

If the index object keys do not correspond to users already found in the member catalog, the indicated users will be added to the team. And, if the indicated user, as specified by email address, does not yet exist, they will be invited to Crunch and added to the team. In this example, we added existing user e3211a, implicitly with team_admin set to False, to the team, and we also added “templeton.peck@army.gov”, who did not previously have a Crunch account.

If “send_notification” was included and true in the request, new-to-Crunch users will receive a notification email informing them that they have been invited to Crunch. New users, unless they have an OAuth provider specified, will need to set a password, and the client application should send a URL template that directs them to a place where they can set that password. To do so, include a “url_base” attribute in the payload, a URL template with a ${token} variable into which the server will insert the password-setting token. For the Crunch web application, this template is https://app.crunch.io/password/change/${token}/.

A GET on the members catalog shows the updated catalog.

GET /teams/d07edb/members/ HTTP/1.1
Host: app.crunch.io
--------
200 OK
Content-Type: application/json

{
    "element": "shoji:catalog",
    "self": "https://app.crunch.io/api/teams/d07edb/members/",
    "description": "Catalog of users that belong to this team",
    "index": {
        "https://app.crunch.io/api/users/47193a/": {
            "name": "B. A. Baracus",
            "permissions": {
                "team_admin": true
            }
        },
        "https://app.crunch.io/api/users/41c69d/": {
            "name": "Hannibal",
            "permissions": {
                "team_admin": true
            }
        },
        "https://app.crunch.io/api/users/e3211a/": {
            "name": "Howling Mad Murdock",
            "permissions": {
                "team_admin": false
            }
        },
        "https://app.crunch.io/api/users/89eb3a/": {
            "name": "templeton.peck@army.gov",
            "permissions": {
                "team_admin": true
            }
        }
    }
}

Removing members

To remove members from the team, PATCH the catalog with a null value:

PATCH /teams/d07edb/members/ HTTP/1.1
Host: app.crunch.io
Content-Type: application/json
```json
{
    "element": "shoji:catalog",
    "index": {
        "https://app.crunch.io/api/users/e3211a/": null
    }
}
--------
204 No Content

GET /teams/d07edb/members/ HTTP/1.1
Host: app.crunch.io
--------
200 OK
Content-Type: application/json

{
    "element": "shoji:catalog",
    "self": "https://app.crunch.io/api/teams/d07edb/members/",
    "description": "Catalog of users that belong to this team",
    "index": {
        "https://app.crunch.io/api/users/47193a/": {
            "name": "B. A. Baracus",
            "permissions": {
                "team_admin": true
            }
        },
        "https://app.crunch.io/api/users/41c69d/": {
            "name": "Hannibal",
            "permissions": {
                "team_admin": true
            }
        },
        "https://app.crunch.io/api/users/89eb3a/": {
            "name": "templeton.peck@army.gov",
            "permissions": {
                "team_admin": false
            }
        }
    }
}

Team datasets catalog

/teams/{team_id}/datasets/

The team datasets catalog only supports the GET verb. To add a dataset to a team, you must PATCH its permissions catalog.

GET

GET returns a Shoji Catalog of datasets that have been shared with this team. See datasets for details.

Users

Catalog

/users/{?email,id}

A successful GET on this resource returns a Shoji Catalog whose “index” URL’s refer to User objects. If the “email” or “id” parameters are provided, the result is narrowed to Users matching those parameters.

This method only supports GET requests. To add users they need to be added from each account users’ catalog. This endpoint ensures that the new users belong to an account and get an invitation accordingly.

Entity

/users/{id}/{?reason_url}

A Shoji Entity with the following body members:

name
id
email
id_method (optional)
id_provider (optional, and only if id_method == ‘oauth’)

The id_method member can be one of {'oauth’, 'pwhash’}. If not present, 'pwhash’ is assumed.

The authenticated user can only access another user’s entity endpoint IF any of the following are true:

Both belong to the same account
They are both members of a common team
Authenticated user is account admin and viewed user is collaborator on such account

A user themselves or with “alter_users” account permission can PUT new attributes via a JSON-like request body. A 200 indicates success.

Send invitation email

/users/{id}/invite/

A POST to this resource sends an invitation from the current user to the identified User. A 204 indicates success. The current user must have “can_alter_users” account permission or 403 is returned instead.

If a “url_base” parameter is included in the request body, it will be used to form links inside the invitation.

Change password

/users/{id}/password/

A POST on this resource must consist of a JSON object with the members “old_pw” and “new_pw”. A 204 indicates success, a 400 indicates failure.

Reset user’s password

/users/{id}/password_reset/

A GET on this resource always returns 204. A POST will send a reset password notification to the identified user. A 204 indicates success.

If a “url_base” parameter is included in the request body, it will be used to form links inside the notification.

Change user’s email

/users/{id}/change_email/

A POST on this resource must consist of a JSON object with the members “pw” and “email”. A 204 indicates potential success to change the users email address to the newly provided email. The user should check their email and verify they own the email address in question.

If the password does not match the users current password they will receive an error message (400 Bad Request). If the user is an oauth account, then the email address may not be changed (409 Conflict).

If the user ID does not match the current signed in user, an 403 Forbidden will be sent back.

Expropriate a user

An account admin can expropriate a user from the same account. This will change ownership of all of the affected user’s teams, projects and datasets to a new owner.

The new owner must also be part of the same account and should have create_datasets permissions set to true.

POST /users/{id}/expropriate/

{
  "element": "shoji:entity",
  "body": {
    "owner": "http://app.crunch.io/api/users/123abc/"
  }
}

The new owner provided can be a user URL or a user email.

User Datasets

/account/users/{id}/datasets/

This URL is only accessible and available to account admins.

This Shoji catalog lists all the datasets that are owned by this user.

User Visible datasets

/users/{id}/visible_datasets/

This endpoint is only available and accessible to account admins.

Returns a Shoji catalog listing all the datasets (archived or not) that a any user has access to, either via direct share, via team access or project membership.

{
    "https://app.crunch.io/api/datasets/wsx345/": {
        "name": "survey data",
        "last_access_time": "2017-02-25",
        "access_type": {
            "teams": ["https://app.crunch.io/api/teams/abx/"],
            "project": "https://app.crunch.io/api/projects/qwe/",
            "direct": true
        },
        "permissions": {
          "edit": true,
          "view": true,
          "change_permissions": true
        }
    },
    "https://app.crunch.io/api/datasets/a2c4b2/": {
        "name": "responses dataset",
        "last_access_time": "2016-11-09",
        "access_type": {
            "teams": [],
            "project": null,
            "direct": true
        },
        "permissions": {
          "edit": false,
          "view": true,
          "change_permissions": false
        }
    }
}

The tuples contain information of the type of access the user has to each dataset via the access_type attribute. It includes:

The list of teams that provide access to this dataset
The project that provides access to this dataset or null
If the user has a direct share to this dataset

The permissions attribute indicates the final coalesced permissions this user enjoys on the given dataset.

Variables

Catalog

/datasets/{id}/variables/{?relative}

A Shoji Catalog of variables.

GET catalog

When authenticated and authorized to view the given dataset, GET returns 200 status with a Shoji Catalog of variables in the dataset. If authorization is lacking, response will instead be 404.

Array subvariables are not included in the index of this catalog. Their metadata are instead accessible in each array variable’s “subvariables_catalog”.

Private variables are not included in the index of this catalog, although entities may be present at variables/{id}/. See Private Variables for an index of those.

Catalog tuples contain the following keys:

Name	Type	Description
name	string	Human-friendly string identifier
alias	string	More machine-friendly, traditional name for a variable
description	string	Optional longer string
id	string	Immutable internal identifier
notes	string	Optional annotations for a variable
discarded	boolean	Whether the variable should be hidden from most views; default: false
derived	boolean	Whether the variable is a function of another; default: false
type	string	The string type name, one of “numeric”, “text”, “categorical”, “datetime”, “categorical_array”, or “multiple_response”
subvariables	array of URLs	For arrays, array of (ordered) references to subvariables
subvariables_catalog	URL	For arrays, link to a Shoji Catalog of subvariables
resolution	string	Present in datetime variables; current resolution of data
rollup_resolution	string	Present in datetime variables; resolution used for rolled up summaries
geodata	URL	Present only in variables that have geodata associated; points to the catalog of geodata related to this variable
uniform_basis	boolean	Whether each subvariable should be considered the same length as the total array. Only on `multiple_response`

The catalog has two optional query parameters:

Name	Type	Description
relative	string	If “on”, all URLs in the “index” will be relative to the catalog’s “self”

With the relative flag enabled, the variable catalog looks something like this:

{
    "element": "shoji:catalog",
    "self": "https://app.crunch.io/api/datasets/5ee0a0/variables/",
    "orders": {
        "hier": "https://app.crunch.io/api/datasets/5330a0/variables/hier/",
        "personal": "https://app.crunch.io/api/datasets/5330a0/variables/personal/",
        "weights": "https://app.crunch.io/api/datasets/5ee0a0/variables/weights/"
    },
    "specification": "https://app.crunch.io/api/specifications/variables/",
    "description": "List of Variables of this dataset",
    "index": {
        "a77d9f/": {
            "name": "Birth Year",
            "derived": false,
            "discarded": false,
            "alias": "birthyear",
            "type": "numeric",
            "id": "a77d9f",
            "notes": "",
            "description": "In what year were you born?"
        },
        "9e4c84/": {
            "name": "Comments",
            "derived": false,
            "discarded": false,
            "alias": "qccomments",
            "type": "text",
            "id": "9e4c84",
            "notes": "Global notes about this variable.",
            "description": "Do you have any comments on your experience of taking this survey (optional)?"
        },
        "aad4ad/": {
            "subvariables_catalog": "aad4ad/subvariables/",
            "name": "An Array",
            "derived": true,
            "discarded": false,
            "alias": "arrayvar",
            "subvariables": [
                "439dcf/",
                "1c99ea/"
            ],
            "notes": "All variable types can have notes",
            "type": "categorical_array",
            "id": "aad4ad",
            "description": ""
        }
    }
}

PATCH catalog

Use PATCH to edit the “name”, “description”, “alias”, or “discarded” state of one or more variables. A successful request returns a 204 response. The attributes changed will be seen by all users with access to this dataset; i.e., names, descriptions, aliases, and discarded state are not merely attributes of your view of the data but of the datasets themselves.

Authorization is required: you must have “edit” privileges on the dataset being modified, as shown in the “permissions” object in the dataset’s catalog tuple. If you try to PATCH and are not authorized, you will receive a 403 response and no changes will be made.

The tuple attributes other than “name”, “description”, “alias”, and “discarded” cannot be modified here by PATCH. Attempting to modify other attributes, or including new attributes, will return a 400 response. Variable “type” can only be modified by the “cast” method, described below. The “subvariables” can be modified by PATCH on the variable entity. “subvariables_catalog” is a URL to a different variable catalog and is thus not editable, though you can navigate to its location and modify subvariable attributes there. A variable’s “id” and its “derived” state are immutable.

Note that, because this catalog contains its entities (rather than collecting them), you cannot PATCH to add new variables, nor can you PATCH a null tuple to delete them. Attempting either will return a 400 response. Creating variables is allowed only by POST to the catalog, while deleting variables is accomplished via a DELETE on the variable entity.

{
    "element": "shoji:catalog",
    "index": {
        "9e4c84/": {
            "discarded": true
        }
    }
}

PATCHing this payload on the above catalog will return a 204 status. A subsequent GET of the catalog returns the following response; note the change in line 24.

{
    "element": "shoji:catalog",
    "self": "https://app.crunch.io/api/datasets/5ee0a0/variables/",
    "orders": {
        "hier": "https://app.crunch.io/api/datasets/5330a0/variables/hier/",
        "personal": "https://app.crunch.io/api/datasets/5330a0/variables/personal/",
        "weights": "https://app.crunch.io/api/datasets/5ee0a0/variables/weights/"
    },
    "specification": "https://app.crunch.io/api/specifications/variables/",
    "description": "List of Variables of this dataset",
    "index": {
        "a77d9f/": {
            "name": "Birth Year",
            "derived": false,
            "discarded": false,
            "alias": "birthyear",
            "type": "numeric",
            "id": "a77d9f",
            "notes": "",
            "description": "In what year were you born?"
        },
        "9e4c84/": {
            "name": "Comments",
            "derived": false,
            "discarded": true,
            "alias": "qccomments",
            "type": "text",
            "id": "9e4c84",
            "notes": "Global notes about this variable.",
            "description": "Do you have any comments on your experience of taking this survey (optional)?"
        },
        "aad4ad/": {
            "subvariables_catalog": "aad4ad/subvariables/",
            "name": "An Array",
            "derived": true,
            "discarded": false,
            "alias": "arrayvar",
            "subvariables": [
                "439dcf/",
                "1c99ea/"
            ],
            "notes": "All variable types can have notes",
            "type": "categorical_array",
            "id": "aad4ad",
            "description": ""
        }
    }
}

POST catalog

A POST to this resource must be a Shoji Entity with the following “body” attributes:

name
type
If “type” is “categorical”, “multiple_response”, or “categorical_array”: categories: an array of category definitions
If “type” is “multiple_response” or “categorical_array”: subvariables: an array of URLs of variables to be “bound” together to form the array variable
If “type” is “multiple_response” or “categorical_array”: subreferences: an object keyed by each of the subvariable URLs where each value contains partial variable definitions, which will be created as categorical subvariables of the array. If included, the array definition must include “categories”, which are shared among the subvariables.
If type is “multiple_response”, the definition may include selected_categories: an array of category names present in the subvariables. This will mark the specified category or categories as the “selected” response in the multiple response variable. If no “selected_categories” array is provided, the new variable will use any categories already flagged as “selected”: true. If no such category exists, the response will return a 400 status.
If “type” is “datetime”: resolution: a string, such as “Y”, “Q”, “M”, “W”, “D”, “h”, “m”, “s”, “ms”, that indicates the unit size of the datetime data.

See Variable Definitions for more details and examples of valid attributes, and Feature Guide: Arrays for more information on the various cases for creating array variables.

It is encouraged, but not required, to include an “alias” in the body. If omitted, one will be generated from the required “name”.

You may also include “values”, which will create the column of data corresponding to this variable definition. See Importing Data: Column-by-column for details and examples.

You may instead also include an “derivation” to derive a variable as a function of other variables. In this case, “type” is not required because it depends on the output of the specified derivation function. For details and examples, see Deriving Variables.

A 201 indicates success and includes the URL of the newly-created variable in the Location header.

Private variables catalog

/datasets/{id}/variables/private/{?relative}

GET returns a Shoji Catalog of variables, as described above, containing those variables that are private to the authenticated user. You may PATCH this catalog to edit names, aliases, descriptions, etc. of the private variables. POST, however, is not supported at this endpoint. To create new private variables, POST to the main variables catalog with a "private": true body attribute.

Hierarchical Order

/datasets/{id}/variables/hier/

Dataset global order containing references to all public variables.

GET

Returns a Shoji Order.

PATCH

Will expect a Shoji Order representation containing a replacement or new grouped entities. This allows one to create new groups on the fly or overwrite existing groups with new ‘entities’.

The match happens by each group name and will overwrite the values of each group with the received one.

After PATCH any variable not present in the order will always be appended to the root of the graph.

PUT

Receives a Shoji Order representation with a completely new graph. Any previously existing group will be eliminated and any new groups will be added. This will overwrite the complete set of current groups.

After PUT any variable not present on any of the groups will always be appended to the root of the graph.

Personal Variable Order

/datasets/{id}/variables/personal/

Unlike the hierarchical order, the personal variable order returns different content per user. Each user can add variable references to it including personal variables and will not be shared with other users.

The personal variable order defaults to an empty Shoji order until each user makes changes to it.

The allowed variables on this order are: * Any public variable available on the variable catalog * Any personal variable or subvariable for the authenticated user * Any subvariable of an array variable on the variable catalog

GET

Returns a Shoji Order for this user.

PATCH

Same as hierarchical order, receives a Shoji Order representation to overwrite the existing order. Personal variables are allowed here.

PUT

Behaves sames as PATCH.

Weights

/datasets/{id}/variables/weights/

GET

GET a shoji:order that contains the urls of the variables that have been designated as possible weight variables.

PATCH

PATCH the graph with a list of the desired list of weight variables. The list will always be overwritten with the new values. This order can only be a flat list of URLs, any nesting will be rejected with a 400 response.

If the dataset has a default weight variable configured, it will always be present on the response even if it wasn’t included on a PATCH request.

Removing variables from this list will have the side effect of changing any user’s preference that had such variables set as their weight to the current dataset’s default weight.

Only numeric variables are allowed to be used as weight. If a variable of another type is included in the list, the server will abort and return a 409 response.

{
  "graph": ["https://app.crunch.io/api/datasets/42d0a3/variables/42229f"]
}

PUT

Behaves sames as PATCH.

Entity

/datasets/{id}/variables/{id}/

A Shoji Entity which exposes most of the metadata about a Variable in the dataset.

GET

Variable entities’ body attributes contain the following:

Name	Type	Description
name	string	Human-friendly string identifier
alias	string	More machine-friendly, traditional name for a variable
description	string	Optional longer string
id	string	Immutable internal identifier
notes	string	Optional annotations for the variable
discarded	boolean	Whether the variable should be hidden from most views; default: false
private	boolean	If true, the variable is only visible to the owner and is only included in the private variables catalog, not the common catalog
owner	url	If the variable is private it will point to the url of its owner; null for non private variables
derived	boolean	Whether the variable is a function of another; default: false
type	string	The string type name
categories	array	If “type” is “categorical”, “multiple_response”, or “categorical_array”, an array of category definitions (see below). Other types have an empty array
subvariables	array of URLs	For array variables, an ordered array of subvariable ids
subreferences	object of objects	For array variables, an object of {“name”: …, “alias”: …, …} objects keyed by subvariable url
resolution	string	For datetime variables, a string, such as “Y”, “M”, “D”, “h”, “m”, “s”, “ms”, that indicates the unit size of the datetime data.
derivation	object	For derived variables, a Crunch expression which was used to derive this variable; or null
format	object	An object with various members to control the display of Variable data (see below)
view	object	An object with various members to control the display of Variable data (see below)
dataset_id	string	The id of the Dataset to which this Variable belongs
missing_reasons	object	An object whose keys are reason phrases and whose values are missing codes; missing entries in Variable data are represented by a {“?”: code} missing marker; clients may look up the corresponding reason phrase for each code in this one-to-one map

Category objects have the following members:

Name	Type	Description
id	integer	identifier for the category, corresponding to values in the column of data
name	string	A unique label identifying the category
numeric_value	numeric	A quantity assigned to this category for numeric aggregation. May be `null`.
missing	boolean	If true, the given category is marked as “missing”, and is omitted from most calculations.
selected	boolean	For categories in multiple response variables, those with `"selected": true` which values correspond to the “response” being selected. If omitted, the category is treated as not selected. Multiple response variables must have at least one category marked as selected and may have more than one.

Format objects may contain:

Name	Type	Description
data	object	An object with an integer “digits” member, stating how many digits to display after the decimal point when showing data values
summary	object	An object with an integer “digits” member, stating how many digits to display after the decimal point when showing aggregates values

View objects may contain:

Name	Type	Description
show_codes	boolean	For categorical types only; if true, numeric values are shown
show_counts	boolean	If true, show counts; if false, show percents
include_missing	boolean	For categorical types only; if true, include missing categories
include_noneoftheabove	boolean	For multiple response types only; if true, display a “none of the above” category in the requested summary or analysis
rollup_resolution	string	For datetime variables, a unit to which data should be “rolled up” by default. See “resolution” above.

PATCH

PATCH variable entities to edit their metadata. Send a Shoji Entity with a “body” member containing the attributes to modify. Omitted body attributes will be unchanged.

Successful requests return 204 status. Among the actions achievable by PATCHing variable entities:

Editing category attributes and adding categories. Include all categories.
Remove categories by sending all categories except for the ones you wish to remove. You can only remove categories that don’t have any corresponding data values. Attempting to remove categories that have data associated will fail with a 400 response status.
Reordering or removing subvariables in an array. Unlike categories, subvariables cannot be added via PATCH here.
Editing derivation expressions
Editing format and view settings
Changing a datetime variable’s resolution

Actions that are best or only achieved elsewhere include:

changing variable names, aliases, and descriptions, which is best accomplished by PATCHing the variable catalog, as described above;
changing a variable’s type, which can only be done by POSTing to the variable’s “cast” resource (see Convert type below);
editing names, aliases, and descriptions of subvariables in an array, which is done by PATCHing the array’s subvariable catalog;
altering missing rules.

Variable “id” and “dataset_id” are immutable.

Example:

{
  "subvariables": [
    "http://app.crunch.io/api/datasets/d4db9831e08a4922b054e49b47a0045c/variables/00000c/subvariables/0008/",
    "http://app.crunch.io/api/datasets/d4db9831e08a4922b054e49b47a0045c/variables/00000c/subvariables/0007/",
    "http://app.crunch.io/api/datasets/d4db9831e08a4922b054e49b47a0045c/variables/00000c/subvariables/0009/"
  ],
  "subreferences": {
    "http://app.crunch.io/api/datasets/d4db9831e08a4922b054e49b47a0045c/variables/00000c/subvariables/0008/": {
      "alias": "subvar_2",
      "name": "v2_new_name",
      "description": null
    },
    "http://app.crunch.io/api/datasets/d4db9831e08a4922b054e49b47a0045c/variables/00000c/subvariables/0007/": {
      "alias": "subvar_1_new_name",
      "name": "v1_new_name",
      "description": null
    },
    "http://app.crunch.io/api/datasets/d4db9831e08a4922b054e49b47a0045c/variables/00000c/subvariables/0009/": {
      "alias": "subvar_3",
      "name": "subvar_3",
      "description": "new description"
    }
  }
}

POST

Calling POST on an array resource will “unbind” the variable. On success, POST returns 200 status with a Shoji View, containing the URLs of the (formerly sub-)variables, which are promoted to regular variables.

DELETE

Calling DELETE on this resource will delete the variable. On success, DELETE returns 200 status with an empty Shoji View. Deleting an array deletes all its subvariable data as well.

Summary

/datasets/{id}/variables/{id}/summary/{?filter}

A collection of summary information describing the variable. A successful GET returns an object containing various scalars and tabular results in various formats. The set of included members varies by variable type. Exclusions, filters, and weights may all alter the output.

For example, given a numeric variable with data [1, 2, 3, 4, 5, 4, {“?”: -1}, 3, 5, {“?”: -1}, 4, 3], a successful GET with no exclusions, filters, or weights returns:

{
    "count": 12,
    "valid_count": 10,
    "fivenum": [
        ["0", 1.0],
        ["0.25", 3.0],
        ["0.5", 3.5],
        ["0.75", 4.0],
        ["1", 5.0],
    ],
    "missing_count": 2,
    "min": 1.0,
    "median": 3.5,
    "histogram": [
        {"at": 1.5, "bins": [1.0, 2.0], "value": 1},
        {"at": 2.5, "bins": [2.0, 3.0], "value": 1},
        {"at": 3.5, "bins": [3.0, 4.0], "value": 3},
        {"at": 4.5, "bins": [4.0, 5.0], "value": 5}
    ],
    "stddev": 1.2649110640673518,
    "max": 5.0,
    "mean": 3.4,
    "missing_frequencies": [{"count": 2, "value": "No Data"}],
}

numeric

The members include several counts:

count: The number of entries in the variable.
valid_count: The number of entries in the variable which are not missing.
missing_count: The number of entries in the variable which are missing.
missing_frequencies: An array of row objects. Each row represents a distinct missing reason, and includes the reason phrase as the “value” member and the number of entries which are missing for that reason as the “count” member.
histogram: An array of row objects. Each row represents a discrete interval in the probability distribution, whose boundaries are given by the “bins” pair. An “at” member is included giving the midpoint between the two boundaries. The “value” member gives a count of entries which fall into the given bin. as well as basic summary statistics:
fivenum: An array of five [quartile, point] pairs, where the “quartile” element is one of the strings “0”, “0.25”, “0.5”, “0.75”, “1”, representing the min, first quartile, median, third quartile, and max boundaries to divide the data values into four equal groups. The “point” is the real number at each boundary, and is estimated using the same algorithm as Excel or R’s “algorithm 7”, where h is: (N - 1)p + 1.
min, median, max: taken from “fivenum”, above.
mean: the sum of the values divided by the number of values, or, if weighted, the sum of weight times value divided by the sum of the weights.
stddev: The standard deviation of the values.

categorical

The basic counts are included:

count: The number of entries in the variable.
valid_count: The number of entries in the variable which are not missing.
missing_count: The number of entries in the variable which are missing.
missing_frequencies: An array of row objects. Each row represents a distinct missing reason, and includes the reason phrase as the “value” member. The number of entries which are missing for that reason is included as the “count” member.

And the typical “frequencies” member is expanded into a custom “categories” member:

categories: An array of row objects. Each row represents a distinct category (whether valid or missing), and includes its id the _id member (note the leading underscore), and its name as the “name” member. The “missing” member is true or false depending on whether the category is marked missing or not. The number of entries which possess that value is included as the “count” member.

text

The basic counts are included:

count: The number of entries in the variable.
valid_count: The number of entries in the variable which are not missing.
missing_count: The number of entries in the variable which are missing.
nunique: The number of distinct values in the data.
sample: A sample of 5 entries of the data.

In addition:

max_chars: The number of characters of the longest value in the data.

Univariate frequencies

/datasets/{id}/variables/{id}/frequencies/{?filter,exclude_exclusion_filter}

An array of row objects, giving the count of distinct values. The exact members vary by type:

numeric: Each row represents a distinct valid value, and includes it as the “value” member. The number of entries which possess that value is included as the “count” member.
categorical: Each row represents a distinct category (whether valid or missing), and includes its id the _id member (note the leading underscore), and its name as the “name” member. The “missing” member is true or false depending on whether the category is marked missing or not. The number of entries which possess that value is included as the “count” member.
text: Each row represents a distinct valid value, and includes it as the “value” member. The number of entries which possess that value is included as the “count” member. The length of the array is limited to 10 entries; if more than 10 distinct values are present in the data, an 11th row is added with a “value” member of “(Others)”, summing their counts.

Transforming

Convert type

/datasets/{id}/variables/{id}/cast/

A POST to this resource, with a JSON request body of {“cast_as”: type}, will alter the variable to the given type. If the variable cannot be cast to the given type, 409 is returned. See next to obtain a preview summary of such a cast before committing to it.

Casting to datetime

From Numeric: Need to include keys: offset as an ISO-8601 date string and resolution which is one of the following strings:
- Y: Year
- Q: Quarter
- M: Month
- W: Week
- D: Day
- h: Hour
- m: Minutes
- s: Seconds
- ms: Milliseconds
From Text: Need to include a format key containing a valid strftime string to format with.
From Categorical: Need to include a format key containing a valid strftime string to format with.

Casting from datetime

To Numeric: Not supported
To Text: Need to include a format key containing a valid strftime string that matches the variable values to parse with.
To Categorical: Need to include a format key containing a valid strftime string that matches the category names to parse with.

Array variables

Multiple Response: Not supported
Categorical Array: Not supported

/datasets/{id}/variables/{id}/cast/?cast_as={type}

A GET on this resource will return the same response as ../summary would if the variable were cast to the given type. If the given type is not valid, 404 is returned.

Attributes

Missing values

/datasets/{id}/variables/{id}/missing_rules/

A Shoji Entity whose “body” member contains an array of missing rule objects. POST a {reason: rule} to this URL to add a new rule. Rules take one of the following forms:

{'value’: v}: Entries which match the given value will be marked as missing for the given reason.
{'set’: [v1, v2, …]}: Entries which are present in the given set will be marked as missing for the given reason.
{'range’: [lower, upper], 'inclusive’: [true, false]}: Entries which exist between the given boundaries will be marked as missing for the given reason. If either “inclusive” element is null, the corresponding boundary is unbounded.
{'function’: ’…’, 'args’: […]}: Entries which match the given filter function will be marked as missing for the given reason. This is typically a tree of simple rules logical-OR’d together.

Example:

[
  {
    "Invalid": {"value": 0},
    "Sarai doesn't know how to use a calculator :(": {"range": [1000, null], "inclusive": [true, false]}
  }
]

Subvariables

/datasets/{id}/variables/{id}/subvariables/

GET

This endpoint will return 404 for any variable that is not an array variable (Multiple response and Categorical variable).

For array variables, this endpoint will return a Shoji Catalog containing a tuples for the subvariables. The tuples will have the same shape as the main variables catalog.

PATCH

On PATCH, this endpoint allows modification to the variables attributes exposed on the tuples (name, description, alias, discarded).

It is possible to add new subvariables to the array variable in question. To do so include the URL of another variable (currently existing on the dataset) on the payload with an empty tuple and such variable will be converted into a subvariable and added at the end.

In the case of derived arrays, an attempt to PATCH this catalog will return a 405 response. This is because the list of subvariables for this array is a function of its derivation expression. The correct way to make modifications to derived arrays’ subvariables is by editing its derivation attribute with the desired expressions for each of them.

Values

/datasets/{id}/variables/{id}/values/{?start,total,filter}

A GET on this set of resources will return a JSON array of values from the variable’s data. Numeric variables will return numbers, text variables will return strings, and categorical variables will return category names for valid categories and {“?”: code} missing markers for missing categories. The “start” and “total” parameters paginate the results. The “filter” is a Crunch filter expression.

Note that this endpoint is only accessible by dataset editors unless the viewers_can_export dataset setting is set to true, else the server will return a 403 response.

Private Variables

/datasets/{id}/variables/private/

Private variables are variables that, instead of being shared with everyone, are viewable only by the user that created them. In Crunch, users with view-only permissions on a dataset can still make variables of their own–just as they can make private filters.

Private variables are not shown in the common variable catalog. Instead, they have their own Shoji Catalog of private variables belonging to the specified dataset for the authenticated user. Aside from this separate catalog, private variable entities and the catalog behave just as described above for public variables.

Versions

Datasets have a collection of versions, points in time to which you can roll back.

Catalog

GET

GET /datasets/{dataset_id}/savepoints/?limit,offset

When authenticated, GET returns 200 status with a (paginated) Shoji Catalog of versions to which the dataset can be reverted. Catalog tuples contain the following attributes:

Name	Type	Default	Description
user_display_name	string	“”	The name of the user who saved this version
description	string		An informative note about the version, as in a commit message
version	string		An internal identifier for the saved version
creation_time	datetime		Timestamp for when the version was created
last_update	datetime		Timestamp for when the version was last updated
revert	url		URL to POST to in order to roll back to this version; see below

Query parameters:

Name	Type	Default	Description
limit	integer	1000	How many versions to include in the catalog response
offset	integer	0	How many versions to skip before returning `limit` versions

For pagination purposes, catalog tuples are sorted from most to least recent. However, since JSON objects are unordered, you cannot rely on the order of the tuples within the payload you receive.

POST

POST /datasets/{dataset_id}/savepoints/

To create a new version, POST a JSON object to the versions catalog. Object attributes may contain:

Name	Type	Required	Description
description	string	No	An informative note about the version, as in a commit message

A successful POST will return 201 status with the URL of the newly created version entity in the Location header. If the current user is not an editor of the dataset, POSTing will return a 403 status.

PATCH

No version attributes may be modified by PATCHing the catalog. PATCH will return a 405 status.

Entity

GET

GET /datasets/{dataset_id}/savepoints/{version_id}/

Version entities expose a subset of attributes found in the catalog tuples:

Name	Type	Default	Description
user_display_name	string	“”	The name of the user who saved this version
description	string		An informative note about the version, as in a commit message
version	string		An internal identifier for the saved version

PATCH

PATCH /datasets/{dataset_id}/savepoints/{version_id}/

The version’s “description” may be modified by PATCHing its entity. A successful request returns 204 status. If the current user is not an editor of the dataset, PATCHing will return a 403 status.

Reverting

POST /datasets/{dataset_id}/savepoints/{version_id}/revert/

To roll back to a saved version, POST an empty body to the version’s “revert” URL, found both inside the catalog tuple and in the “views” attribute of the entity. A successful request will return 204 status.

Reverting a dataset will not change its current ownership.

Xlsx

The xlsx endpoint takes as input a prepared table (intended for use with multitables) and returns an xlsx file, with some basic formatting conventions.

A POST request to /api/xlsx/ will return an xlsx file directly, with correct content-disposition and type headers.

POST

POST /api/xlsx/ HTTP/1.1

HTTP/1.1 200 OK
Content-Disposition: attachment; filename=Crunch-export.xlsx
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

{
    "element": "shoji:entity",
    "body": {
        "result": [
            {
                "rows": [],
                "etc.": "described below"
            }
        ]
    }
}

Endpoint Parameters

At the top level, the xlsx takes a result array and display_settings object which defines some formatting to be used on the values. Multiple tables can be placed on a single sheet.

Result

Name	Type	Typical element	Description
rows	array	`{"value": 30, "class": "formatted"}`	Cells are objects with at least a `value` member, and optional `class`, where a value of `"formatted"` prevents the exporter from applying any number format to the result cell
colLabels	array	`{"value": "All"}`	Array of objects with a `value` member
colTitles	array	`"Age"`	Array of strings
spans	array	`4`	array of integers matching the length of colTitles, indicating the number of cells to be joined for each colTitle after the first one. The first colTitle is assumed to be only one column wide.
rowTitle	string	`"Dog food brands"`	A title, which is formatted bold above the first column of the table (the rowLabels, below)
rowLabels	array	`{"value": "Canine Crunch"}`	labels for rows of the table
rowVariableName	string	`"Preferred dog food"`	title to display at the very top left of the result sheet
filter_names	array	`"Breed: Dachshund"`	Names of any filters to print beneath the table, will be labeled “Filters”. If multiple result objects are included in the payload, the filter names from the first result are used, and placed at the bottom of the sheet beneath all results.

Display Settings

Further customization for the resulting output.

Name	Type	Default	Description	Example
decimalPlaces	object	0	number of decimal places to diaplay	`{"value": 0}`
countsOrPercents	object	percent	use counts or percents	`{"value": "percent"}`
percentageDirection	object	{“value”: “colPct”}	row or column based percents	`{"value": "colPct"}`
valuesAreMeans	object	false	are values means? (If so, will be formatted with decimal places)	`{"value": false}`

Quirks

Because the formatted output was designed to display values computed by other clients, it abuses some assumptions about the tables it is displaying. Some of these are enumerated below.

Rows have a ‘marginal’ column positioned first after the row label.
If display settings indicate rowPct, rows have an additional marginal column intended to show unconditional N for each row.
The remaining row labels are all accounted for in the sum of spans.
Column titles are placed in merged cells above one or more labels.
The same filter(s) are applied to all tables on a page.
No “freeze panes” are applied to the result.
If the table contains percentages, they should be percentages, not proportions (0 to 100, not 0 to 1).

Complete example

{"element":"shoji:entity",
"body":{
    "result": [
  {
    "filter_names": ["Name_of_filter"],
    "rows": [
      [
        {
          "value": 50,
          "class": "marginal marginal-percentage"
        },
        {
          "value": 50,
          "pValue": 0,
          "class": "subtable-0 col-0"
        },
        {
          "value": 50,
          "pValue": 0,
          "class": "subtable-0 col-1"
        }
      ],
      [
        {
          "value": 50,
          "class": "marginal marginal-percentage"
        },
        {
          "value": 50,
          "pValue": 0,
          "class": "subtable-0 col-0"
        },
        {
          "value": 50,
          "pValue": 0,
          "class": "subtable-0 col-1"
        }
      ],
      [
        {
          "value": 0,
          "class": "marginal marginal-percentage"
        },
        {
          "value": 0,
          "pValue": null,
          "class": "subtable-0 col-0"
        },
        {
          "value": 0,
          "pValue": null,
          "class": "subtable-0 col-1"
        }
      ],
      [
        {
          "value": 0,
          "class": "marginal marginal-percentage"
        },
        {
          "value": 0,
          "pValue": null,
          "class": "subtable-0 col-0"
        },
        {
          "value": 0,
          "pValue": null,
          "class": "subtable-0 col-1"
        }
      ]
    ],
    "colLabels": [
      {
        "value": "All"
      },
      {
        "value": "2014",
        "class": "col-0"
      },
      {
        "value": "2015",
        "class": "col-1"
      }
    ],
    "spans": [
      2
    ],
    "rowLabels": [
      {
        "value": "a",
        "class": "row-label"
      },
      {
        "value": "b",
        "class": "row-label"
      },
      {
        "value": "c",
        "class": "row-label"
      },
      {
        "value": "d",
        "class": "row-label"
      }
    ],
    "rowTitle": "ca_subvar_1",
    "rowVariableName": "categorical_array",
    "colTitles": [
      "quarter"
    ]
  },
  {
    "rows": [
      [
        {
          "value": 16.666666666666664,
          "class": "marginal marginal-percentage"
        },
        {
          "value": 25,
          "pValue": 0.24821309601845032,
          "class": "subtable-0 col-0"
        },
        {
          "value": 0,
          "pValue": -0.2482130960184501,
          "class": "subtable-0 col-1"
        }
      ],
      [
        {
          "value": 50,
          "class": "marginal marginal-percentage"
        },
        {
          "value": 50,
          "pValue": 0,
          "class": "subtable-0 col-0"
        },
        {
          "value": 50,
          "pValue": 0,
          "class": "subtable-0 col-1"
        }
      ],
      [
        {
          "value": 33.33333333333333,
          "class": "marginal marginal-percentage"
        },
        {
          "value": 25,
          "pValue": -0.5464935495198773,
          "class": "subtable-0 col-0"
        },
        {
          "value": 50,
          "pValue": 0.5464935495198773,
          "class": "subtable-0 col-1"
        }
      ],
      [
        {
          "value": 0,
          "class": "marginal marginal-percentage"
        },
        {
          "value": 0,
          "pValue": null,
          "class": "subtable-0 col-0"
        },
        {
          "value": 0,
          "pValue": null,
          "class": "subtable-0 col-1"
        }
      ]
    ],
    "colLabels": [
      {
        "value": "All"
      },
      {
        "value": "2014",
        "class": "col-0"
      },
      {
        "value": "2015",
        "class": "col-1"
      }
    ],
    "spans": [
      2
    ],
    "rowLabels": [
      {
        "value": "a",
        "class": "row-label"
      },
      {
        "value": "b",
        "class": "row-label"
      },
      {
        "value": "c",
        "class": "row-label"
      },
      {
        "value": "d",
        "class": "row-label"
      }
    ],
    "rowTitle": "ca_subvar_2",
    "rowVariableName": "categorical_array",
    "colTitles": [
      "quarter"
    ]
  },
  {
    "rows": [
      [
        {
          "value": 0,
          "class": "marginal marginal-percentage"
        },
        {
          "value": 0,
          "pValue": null,
          "class": "subtable-0 col-0"
        },
        {
          "value": 0,
          "pValue": null,
          "class": "subtable-0 col-1"
        }
      ],
      [
        {
          "value": 33.33333333333333,
          "class": "marginal marginal-percentage"
        },
        {
          "value": 50,
          "pValue": 0.045500259780248964,
          "class": "subtable-0 col-0"
        },
        {
          "value": 0,
          "pValue": -0.045500259780248964,
          "class": "subtable-0 col-1"
        }
      ],
      [
        {
          "value": 16.666666666666664,
          "class": "marginal marginal-percentage"
        },
        {
          "value": 25,
          "pValue": 0.24821309601845032,
          "class": "subtable-0 col-0"
        },
        {
          "value": 0,
          "pValue": -0.2482130960184501,
          "class": "subtable-0 col-1"
        }
      ],
      [
        {
          "value": 50,
          "class": "marginal marginal-percentage"
        },
        {
          "value": 25,
          "pValue": -0.0005320055485602548,
          "class": "subtable-0 col-0"
        },
        {
          "value": 100,
          "pValue": 0.0005320055485602548,
          "class": "subtable-0 col-1"
        }
      ]
    ],
    "colLabels": [
      {
        "value": "All"
      },
      {
        "value": "2014",
        "class": "col-0"
      },
      {
        "value": "2015",
        "class": "col-1"
      }
    ],
    "spans": [
      2
    ],
    "rowLabels": [
      {
        "value": "a",
        "class": "row-label"
      },
      {
        "value": "b",
        "class": "row-label"
      },
      {
        "value": "c",
        "class": "row-label"
      },
      {
        "value": "d",
        "class": "row-label"
      }
    ],
    "rowTitle": "ca_subvar_3",
    "rowVariableName": "categorical_array",
    "colTitles": [
      "quarter"
    ]
  }
],
"display_settings":{
        "valuesAreMeans": {"value": false},
        "countsOrPercents": {"value": "percent"},
        "percentageDirection": {"value": "colPct"},
        "decimalPlaces": {"value": 1}
    }
}
}

Object Reference

version 0.15

The Crunch REST API takes a decidedly column-oriented approach to data. A “column” is simply a sequence of values of the same type. A “variable” binds a name (and other metadata) to the column, and indeed may possess a series of columns over its lifetime as inserts and updates are made to it. A “dataset” is a set of variables. Each variable in the dataset is sorted the same way; the variables together form a relation. Reading the N'th item from each variable produces a row.

Interaction with the Crunch REST API is by variables and columns. When you add data to Crunch, you send a set of columns. When you fetch data from Crunch, you send a set of variable expressions and receive a set of columns. When you update data in Crunch, you send a set of expressions which tells Crunch how to update variables with new column data.

The Crunch API consists of just a few primitive objects, arranged differently for each request and response. Learning the basic components will help you create the most complicated queries.

Response types

Shoji entity

A Shoji entity is identified by the element key having value shoji:entity. Its principal attribute is the body key, which is an object containing the attributes that describe the entity.

Shoji catalog

A catalog is identified by its element key having value shoji:catalog with its principal attribute being index that contains an object keyed by the URLs of the entities it contains and for each key an object (tuple) with attributes from the referenced entity.

Shoji catalogs are not ordered. For its ordered representations they may provide an orders set of Shoji order resources.

Shoji view

A Shoji view is identified by its element key having value shoji:view with its principal attribute being value. This can contain any arbitrary JSON object.

Shoji order

Shoji orders are identified by the element key having a value shoji:order. Their principal attribute is the graph key which is an array containing the order of present resources.

A shoji order may be associated with a catalog. In such case it will contain a subset or totality of the entities present in the catalog. The catalog remains as the authoritative source of available entities.

Any entity not present on the order but present in the catalog may be considered to belong at the bottom of the root of the graph in an arbitrary order, or may be excluded from view.

Statistical data

Identifiers

Datasets, variables, and other resources are always identified by strings. All identifiers are case-sensitive, and may contain any unicode character, including spaces. Examples:

“q1”
“My really useful dataset”
“变量”

Data Values

Individual data values follow the JSON representations where possible. JSON exposes the following types: number, string, array, object, true, false, null. Crunch adds additional types with special syntax (see Types, below). Examples:

13
45.330495
“foo”
[3, 4, 5]
{“bar”: {“a”: [12.4, 89.2, 0]}}
true
null
“2014-03-02T14:29:59Z”

Because a single JSON type may be used to represent multiple Crunch types, you should never rely on the JSON type to interpret the class of a datum. Instead, inspect the type object (see below) to interpret the data.

Missing values

Crunch provides a robust “missing entries” system. Anywhere a (valid) data value can appear, a missing value may also appear. Missing values are represented by an object with a single “?” key. The value is a missing integer code (see Missing reasons, below); negative integers are reserved for system-generated reasons, user-defined reasons are automatically assigned positive integers. Examples:

{“?”: -1}
{“?”: 24}

Arrays

A set of data values (and/or missing values) which are of the same type can be ordered in an array. All entries in an array are of the same Crunch type.

Examples:

[13, 4, 5, {“?”: -2}, 7, 2]
[“foo”, “bar”]

Enumerations

Some arrays, rather than repeating a small set of large values, benefit from storing a small integer code instead, moving the larger values they represent into the metadata, and doing lookups when needed to encode/decode. The “categorical” type is the most common example of this: rather than store an array of large string names like [“Internet Explorer”, “Internet Explorer”, “Firefox”, …] it instead stores integer codes like: [1, 1, 2], placing the longer strings in the metadata as type.categories = [{“id”: 1, “name”: “Internet Explorer”, …}, …]. We call this encoding process enumeration, and its opposite, where the coded are re-expanded into their original values, elaboration.

Enumeration also provides the opportunity to order the possible values, as well as include potential values which do not yet exist in the data array itself.

Enumeration typically causes the volume of data to shrink dramatically, and can speed up very common operations like filtering, grouping, and almost any data transfer. Because of this, it is common to:

Enumerate a data array as early as possible. Indeed, when a variable can be enumerated, the fastest way to insert new data is to send the new values as the integer codes.
Elaborate a data array as late as possible. As long as the metadata is shipped along with the enumerated data, the transfer size and therefore time is much smaller. Many cases do not even call for a complete elaboration of the entire column.

Variable Definitions

Crunch employs a structural type system rather than a nominative one. The variable definition includes more knowledge than just the type name (numeric, text, categorical, etc); we also learn details about range, precision, missing values and reasons, order, etc. For example:

{
    "type": "categorical",
    "name": "Party ID",
    "description": "Do you consider yourself generally a Democrat, a Republican, or an Independent?",
    "categories": [
        {
            "name": "Republican",
            "numeric_value": 1,
            "id": 1,
            "missing": false
        },
        {
            "name": "Democrat",
            "numeric_value": -1,
            "id": 2,
            "missing": false
        },
        {
            "name": "Independent",
            "numeric_value": 0,
            "id": 3,
            "missing": false
        }
    ]
}

This section describes the metadata of a variable as exposed across HTTP, both expected response values and valid input values.

Variable types

The “type” of a Variable is a string which defines the superset of values from which the variable may draw. The type governs not only the set of values but also their syntax. (See below.)

The following types are defined for public use:

text
numeric
categorical
datetime
multiple_response
categorical_array

Variable names

Variables in Crunch have multiple attributes that provide identifying information: “name”, “alias”, and “description”.

name

Crunch takes a principled stand that variable “names” should be for people, not for computers.

You may be used to domains that have variable “name”, “label”, and “description”. Name is some short, unique, machine-friendlier ID like “Q2”; label is short and human-friendly, something like “Brand awareness”, and description is where you might put question wording if you have survey data. Crunch has “alias”, “name”, and “description”. What you may be used to thinking of as a variable name, we consider as an alias: something for more internal use, not something appropriate for a polished dataset ready to share with people who didn’t create the dataset (See more in the “Alias” section below). In Crunch, the variable’s “name” is what you may be used to thinking of as a label.

All variables must have a name, and these names must be unique across all variables, including “hidden” variables (see below) but excluding subvariables (see “Subvariables” below). Within an array variable, subvariable names must be unique. (You can think of subvariable names within an array as being variable_name.subvariable_name, and with that approach, all “variable names” must be unique.)

Names must be a string of length greater than zero, and any valid unicode string is allowed. See “Identifiers” above.

alias

Alias is a string identifier for variables. It must be unique across all variables, including subvariables, such that it can be used as an identifier. This is what legacy statistical software typically calls a variable name.

Aliases have several uses. Client applications, such as those exposing a scripting interface, may want to use aliases as a more machine-friendly, yet still human-readable, way of referencing variables. Aliases may also be used to help line up variables across different import batches.

When creating variables via the API, alias is not a required field; if omitted, an alias will be generated. If an alias is supplied, it must be unique across all variables, including subvariables, and the new variable request will be rejected if the alias is not unique. When data are imported from file formats that have unique variable names, those names will in many cases be used as the alias in Crunch.

description

Description is an optional string that provides more information about the variable. It is displayed in the web application on variable summary cards and with analyses.

Type-specific attributes

These attributes must be present for the specified variable types when creating a variable, but they are not defined for other types.

subvariables

Multiple Response and Categorical Array variables contain an array of subvariable references. In the HTTP API, these are presented as URLs. To create a variable of type “multiple_response” or “categorical_array”, you must include a “subvariables” member with an array of subvariable references. These variables will become the subvariables in the new array variable.

Like Categories, the array of subvariables within an array variable indicate the order in which they are presented; to reorder them, save a modified array of subvariable ids/urls.

subreferences

Multiple Response and Categorical Array variables contain an object of subvariable “references”: names, alias, description, etc. To create a variable of type “multiple_response” or “categorical_array” directly, you must include a “subreferences” member with an object of objects. These label the subvariables in the new array variable.

The shape of each subreferences member must contain a name and optionally an alias. Note that the subreferences is an unordered object. The order of the subvariables is read from the “subvariables” attribute.

{
    "type": "categorical_array",
    "name": "Example array",
    "categories": [
        {
            "name": "Category 1",
            "numeric_value": 1,
            "id": 1,
            "missing": false
        },
        {
            "name": "Category 2",
            "numeric_value": 0,
            "id": 2,
            "missing": false
        }
    ],
    "subvariables": [
      "/api/datasets/abcdef/variables/abc/subvariables/1/",
      "/api/datasets/abcdef/variables/abc/subvariables/2/",
      "/api/datasets/abcdef/variables/abc/subvariables/3/"
    ],
    "subreferences": {
        "/api/datasets/abcdef/variables/abc/subvariables/2/": {"name": "subvariable 2", "alias": "subvar2_alias"},
        "/api/datasets/abcdef/variables/abc/subvariables/1/": {"name": "subvariable 1"},
        "/api/datasets/abcdef/variables/abc/subvariables/3/": {"name": "subvariable 3"}
    }
}

resolution

Datetime variables must have a resolution string that indicates the unit size of the datetime data. Valid values include “Y”, “M”, “D”, “h”, “m”, “s”, and “ms”. Every datetime variable must have a resolution.

Other definition attributes

These attributes may be supplied on variable creation, and they are included in API responses unless otherwise noted.

format

An object with various members to control the display of Variable data:

data: An object with a “digits” member, stating how many digits to display after the decimal point.
summary: An object with a “digits” member, stating how many digits to display after the decimal point.

view

An object with various members to control the display of Variable data:

show_codes: For categorical types only. If true, numeric values are shown.
show_counts: If true, show counts; if false, show percents.
include_missing: For categorical types only. If true, include missing categories.
include_noneoftheabove: For multiple-response types only. If true, display a “none of the above” category in the requested summary or analysis.
geodata: A list of associations of a variable to Crunch geodatm entities. PATCH a variable entity amending the view.geodata in order to create, modify, or remove an association. An association is an object with required keys geodatum, feature_key, and optional match_field. The geodatum must exist; feature_key is the name of the property of each ‘feature’ in the geojson/topojson that corresponds to the match_field of the variable (perhaps a dotted string for nested properties; e.g. ”properties.postal-code”). By default, match_field is “name”: a categorical variable will match category names to the feature_key present in the given geodatum.

discarded

Discarded is a boolean value indicating whether the variable should be viewed as part of the dataset. Hiding variables by setting discarded to True is like a soft, restorable delete method.

Default is false.

private

If true, the variable will not show in the common variable catalog; instead, it will be included in the personal variables catalog.

missing_reasons

An object whose keys are reason strings and whose values are the codes used for missing entries.

Crunch allows any entry in a column to be either a valid value or a missing code. Regardless of the class, missing codes are represented in the interface as an object with a single “?” key mapped to a single missing integer code. For example, a segment of [4.56, 9.23, {“?”: -1}] includes 2 valid values and 1 missing value.

The missing codes map to a reason phrase via this “missing reasons” type member. Entries which are missing for reasons determined by the system are negative integers. Users may define their own missing reasons, which receive positive integer codes. Zero is a reserved value.

In the above example, the code of -1 would be looked up in a missing reasons map such as:

{
    "missing reasons": {
        "no data": -1,
        "type mismatch": -2,
        "my backup was corrupted": 1
    }
}

See the Endpoint Reference for user-defined missing reasons.

Categorical variables do not require a missing_reasons object because the categories array contains the information about missingness.

Values

When creating a new variable, one can also include a “values” member that contains the data column corresponding to the variable metadata. See Importing Data: Column-by-column. This subsection outlines how the various variable types have their values formatted both when one supplies values to add to the dataset and when one requests values from a dataset.

Text

Text values are an array of quoted strings. Missing values are indicated as {"?": <integer>}, as discussed above, and all integer missing value codes must be defined in the “missing_reasons” object of the variable’s metadata.

Numeric

A “numeric” value will always be e.g. 500 (a number, without quotes) in the JSON request and response messages, not “500” (a string, with quotes). Missing values are handled as with text variables.

Categorical

Insert an array of integers that correspond to the ids of the variable’s categories. Only integers found in the category ids are allowed. That is, you cannot insert values for which there is no category metadata. It is, however, permitted to have categories defined for which there are no values.

Datetime

Datetime input and output are in ISO-8601 formatted strings.

Arrays

Crunch supports array type variables, which contain an array of subvariables. “Multiple response” and “Categorical array” are both arrays of categorical subvariables. Subvariables do not exist as independent variables; they are exposed as “virtual” variables in some places, and can be analyzed independently, but they do not have their own type or categories.

Arrays are currently always categorical, so they sned and receive data in the same format: category ids. The only difference is that regular categorical variables sned and receive one id per row, where arrays send and receive a list of ids (of equal length to the number of subvariables in the array).

Variables

A complete Variable, then, is simply a Definition combined with its data array.

Expressions

Crunch expressions are used to compute on a dataset, to do nuanced selects, updates, and deletes, and to accomplish many other routine operations. Expressions are JSON objects in which each term is wrapped in an object which declares whether the term is a variable, a value, or a function, etc. While verbose, doing so allows us to be more explicit about the operations we wish to do.

Expressions generally contain references to variables, values, or columns of values, often composed in functions. The output of expressions can be other variables, values, boolean masks, or cube aggregations, depending on the context and expression content. Some endpoints have special semantics, but the general structure of the expressions follows the model described below.

Variable terms

Terms refer to variables when they include a “variable” member. The value is the URL for the desired variable. For example:

{"variable": "../variables/X/"}
{"variable": "https://app.crunch.io/api/datasets/48ffc3/joins/abcd/variables/Y/"}

URLs must either be absolute or relative to the URL of the current request. For example, to refer to a variable in a query at https://app.crunch.io/api/datasets/48ffc3/cube/, a variable at https://app.crunch.io/api/datasets/48ffc3/variables/9410fc/ may be referenced by its full URL or by “../variables/9410fc/”.

Value terms

Terms refer to data values when they include a “value” member. Its value is any individual data value; that is, a value that is addressable by a column and row in the dataset. For example:

{"value": 13}
{"value": [3, 4, 5]}

Note that individual values may themselves be complex arrays or objects, depending on their type. You may explicitly include a “type” member in the object, or let Crunch infer the type. One way to do this is to use the “typeof” function to indicate that the value you’re specifying corresponds to the exact type of an existing variable. See “functions” below for more details.

Column terms

Terms refer to columns (construct them, actually) when they include a “column” member. The value is an array of data values. You may include “type” and/or “references” members as well.

{"column": [1, 2, 3, 17]}
{"column": [{"?": -2}, 1, 4, 1], "type": {"class": "categorical", "categories": [...], ...}}

Function terms

Terms refer to functions (and operators) when they include a “function” member. The value is the identifier for the desired function. They parameterize the function with an “args” member, whose value is an array of terms, one for each argument. Examples:

{"function": "==", "args": [{"variable": "../variables/X/"}, {"value": 13}]}
{"function": "contains", "args": [{"variable": "../joins/abcd/variables/Y/"}, {"value": "foo"}]}

You may include a “references” member to provide a name, alias, description, etc to the output of the function.

Supported functions

Here is a list of all functions available for crunch expressions. Note that these functions can be used in conjuction to compose an expression.

Binary functions

+ add
- subtract
* multiply
/ div divide
// floor division
^ power
% modulus
& bitwise and
| bitwise or
~ invert

Builtin functions

array Return the given Frame materialized as an array.
as_selected Return the given variable reduced to the [1, 0, -1] “selections” type.
bin Return column’s values broken into equidistant bins.
case Evaluate the given conditions in order, selecting the corresponding choice.
cast Return a Column of column’s values cast to the given type.
char_length Return the length of each string (or missing reason) in the given column.
copy_variable Returns a copy of the column with a copy of its metadata.
combine_categories Return a column of categories combined according to the category_info.
combine_responses Combine the given categorical variables into a new one.
current_batch Return the batch_id of the current frame.
get Return a subvariable from the given column.
lookup Map each row of source through its keys index to a corresponding value.
missing Return the given column as missing for the given reason.
row Return a Numeric column with row indices.
selected_array Return a bool Array from the given categorical, plus None/none/any .
selected_depth Return a numeric column containing the number of selected categories in each row of the given array.
selections Return the given array, reduced to the [1, 0, -1] “selections” type, plus an __any__ magic subvariable.
subvariables Return a Frame containing subvariables of the given array.
tiered Return a variable formed by collapsing the given array’s subvariables in the given category tiers.
typeof Return (a copy of) the Type of the given column.
unmissing Return the given column with user missing replaced by valid values.

Comparisons

== equals
!= not equals
=><= between
between between
< less than
> greater than
<= less than or equal
>= greater than or equal
in in
all True for each row where all subvariables in a multiple_response array are selected
any True for each row where any subvariable in a multiple_response array is selected
is_none_of_the_above True for each row where no subvariables in a multiple_response array are selected, unless all subvariables have missing values
contains Return a mask where A is an element of array B, or a key of object B.
icontains Case-insensitive version of ‘contains’
~= compare against regular expression (regex)
and logical and
or logical or
not logical not
is_valid Boolean array of rows which are valid for the given column
is_missing Boolean array of rows which are missing for the given column
any_missing Boolean array of rows where any of the subvariables are missing
all_valid Boolean array of rows where all of the subvariables are valid
all_missing Boolean array of rows where all of the subvariables are missing

Date Functions

default_rollup_resolution default_rollup_resolution
datetime_to_numeric Convert the given datetime column to numeric.
format_datetime Convert datetime values to strings using the fmt as strftime mask.
numeric_to_datetime Convert the given numeric column to datetime with the given resolution.
parse_datetime Parse string to datetime using optional format string
rollup Return column’s values (which must be type datetime) into calendrical bins.

Frame Functions

page Return the given frame, limited/offset by the given values.
select Return a Frame of results from the given map of variables.
sheet Return the given frame, limited/offset in the number of variables.
dependents Return the given frame with only dependents of the given variable.
deselect Return a frame NOT including the indicated variables.
adapt Return the given frame adapted to the given to_key.
join Return a JoinedFrame from the given list of subframes.
find Return a Frame with those variables which match the given criteria.
flatten Return a frame including all variables, plus all subvariables at dotted ids.

Examples


{
  "function": "select",
  "args": [{
    "map": {
      <destination id>: {variable: <source frame id>},
      <destination id>: {variable: <source frame id>},
      ...
    }
  }]

}

select: Receives an argument which is a map expression in the following shape:

Where destination id is the ID that the mapped variable will have on the resulting frame by selecting only the source frame id variables from the frame where this function is applied on.

{
  "function": "deselect",
  "args": [{
    "map": {
      <destination id>: {variable: <source frame id>},
      <destination id>: {variable: <source frame id>},
      ...
    }
  }]

}

deselect: Same as select but will exclude the variable ids mentioned from the source frame. On this usage the destination id part of the map argument are disregarded.

Measures Functions

cube_count
cube_distinct_count
cube_max A measure which returns the maximum value in a column.
cube_mean
cube_min A measure which returns the minimum value in a column.
cube_missing_frequencies Return an object with parallel 'code’ and 'count’ arrays.
cube_quantile
cube_stddev A measure which returns the standard deviation value in a column.
cube_sum
cube_valid_count
cube_weighted_max
cube_weighted_min
top Return the given (1D/1M) cube, filtered to its top N members.

Cube Functions

autocube Return a cube crossing A by B (which may be None).
autofreq Return a cube of frequencies for A.
cube Return a Cube instance from the given arguments.
each Yield one expression result per item in the given iterable.
multitable Return cubes for each target variable crossed by None + each template variable.
transpose Transpose the given cube, rearranging its (0-based) axes to the given order.
stack Return a cube of 1 more dimension formed by stacking the given array.

Filter terms

Terms that refer to filters entities by URL are shorthand for the boolean expression stored in the entity. So, {"filter": "../filters/59fc4d/"} yields the Crunch expression contained in the Filter entity’s “expression” attribute. Filter terms can be combined together with other expressions as well. For example, {"function": "and", "args": [{"filter": "../filters/59fc4d/"}, {"function": "==", "args": [{"variable": "../variables/X/"}, {"value": 13}]}]} would “and” together the boolean expression in filter 59fc4d with the X == 13 expression.

Documents

Shoji

Most representations returned from the API are Shoji Documents. Shoji is a media type designed to foster scalable API’s. Shoji is built with JSON, so any JSON parser should be able to at least deserialize Shoji documents. Shoji adds four document types: Entity, Catalog, View, and Order.

Entity

Anything that can be thought of as “a thing by itself” will probably be represented by a Shoji Entity Document. Entities possess a “body” member: a JSON object where each key/value pair is an attribute name and value. For example:

{
    "element": "shoji:entity",
    "self": "https://.../api/users/1/",
    "description": "Details for a User.",
    "specification": "https://.../api/specifications/users/",
    "fragments": {
        "address": "address/"
    },
    "body": {
        "first_name": "Genghis",
        "last_name": "Khan"
    }
}

In general, an HTTP GET to the “self” URL will return the document, and a PUT of the same will update it. PUT should not be used for partial updates–use PATCH for that instead. In general, each member included in the “body” of a PATCH message will replace the current representation; attributes not included will not be altered. There is no facility to remove an attribute from an Entity.body via PATCH. In some cases, however, even finer-grained control is possible via PATCH; see the Endpoint Reference for details.

Catalog

Catalogs collect or contain entities. They act as an index to a collection, and indeed possess an “index” member for this:

{
    "element": "shoji:catalog",
    "self": "https://.../api/users/",
    "description": "A list of all the users.",
    "specification": "https://.../api/specifications/users/",
    "orders": {
        "default": "default_order/"
    },
    "index": {
        "2/": {"active": true},
        "1/": {"active": false},
        "4/": {"active": true},
        "3/": {"active": true}
    }
}

Each key in the index is a URL (possibly relative to “self”) which refers to a different resource. Often, these are Shoji Entity documents, but not always. The index also allows some attributes to be published as a set, rather than in each individual Entity. This allows clients to act on the collection as a whole, such as when rendering a list of references from which the user might select one entity.

In general, an HTTP GET to the “self” URL will return the document, and a PUT of the same will update it. Many catalogs allow POST to add a new entity to the collection. PUT should not be used for partial updates–use PATCH for that instead. In general, each member included in the “index” of a PATCH message will replace the current representation; tuples not included will not be altered. Tuples included in a PATCH which are not present in the server’s current representation of the index may be added; it is up to each resource whether to support (and document!) this approach or prefer POST to add entities to the collection. In general, catalogs that contain entities get new entities created by POST, while catalogs that collect entities that are contained by other catalogs (e.g. a catalog of users who have permissions on a dataset) will have entities added by PATCH.

Similarly, removing entities from catalogs is supported in one of two ways, typically varying by catalog type. For catalogs that contain entities, entities are removed only by DELETE on the entity’s URL (its key in the Catalog.index). In contrast, for catalogs that collect entities, entities are removed by PATCHing the catalog with a null tuple. This removes the entity from the catalog but does not delete the entity (which is contained by a different catalog). T

View

Views cut across entities. They can publish nearly any arrangement of data, and are especially good for exposing arrays of arrays and the like. In general, a Shoji View is read-only, and only a GET will be successful.

Order

Orders can arrange any set of strings into an arbitrarily-nested tree; most often, they are used to provide one or more orderings of a Catalog’s index. For example, each user may have their own ordering for an index of variables; the same URL’s from the index keys are arranged in the Order. Given the Catalog above, for example, we might produce an Order like:

{
    "element": "shoji:order",
    "self": "https://.../api/users/order/",
    "graph": [
        "2/",
        {"group A": ["1/", "3/", "2/"]},
        {"group B": ["4/"]}
    ]
}

This represents the tree:

      /  |  \
     2  {A} {B}
       / | \  \
      1  3  2  4

The Order object itself allows lots of flexibility. Each of the following decisions are up to the API endpoint to constrain or not as it sees fit (see the Endpoint Reference for these details):

Not every string in the original set has to be present, allowing partial orders.
Strings from the original set which are not mentioned may be ignored, or default to an “ungrouped” group, or other behaviors as each application sees fit.
Groups may contain member strings and other groups interleaved (but still ordered).
Groups may exist without any members.
Members may appear in more than one group.
Group names may be repeated at different points within the tree.
Group member arrays, although represented in a JSON array, may be declared to be non-strict in their order (that is, the array should be treated more like an unordered set).

Crunch Objects

Most of the other representations returned from the API are Crunch Objects. They are built with JSON, so any JSON parser should be able to at least deserialize Crunch documents. Crunch adds two document types: Table and Cube.

Table

Tables collect columns of data and (optionally) their metadata into two-dimensional relations.

{
    "element": "crunch:table",
    "self": "https://.../api/datasets/.../table/?limit=7",
    "description": "The data belonging to this Dataset.",
    "metadata": {
        "1ef0455": {"name": "Education", "type": "categorical", "categories": [...], ...},
        "588392a": {"name": "Favorite color", "type": "text", ...}
    },
    "data": {
        "1ef0455": [6, 4, 7, 7, 3, 2, 1],
        "588392a": ["green", "red", "blue", "Red", "RED", "pink", " red"]
    }
}

Each key in the “data” member is a variable identifier, and its corresponding value is a column of Crunch data values. The data values in a given column are homogeneous, but across columns they are heterogeneous. The lengths of all columns MUST be the same. The “metadata” member is optional; if given, it MUST contain matching keys that correspond to variable definitions.

Like any JSON object, the “data” and “metadata” objects are explicitly unordered. When supplying a crunch:table, such as when POST'ing to datasets/ to create a new dataset, you must supply an Order if you want an explicit variable order.

Cube

Cubes have both input and output formats. The “crunch:cube” element is used for the output only.

Cube input

The input format may vary slightly according to the API endpoint (since some parameters may be inherent in the particular resource), but involves the same basic ingredients.

Example:

{
    "dimensions": [
        {"variable": "datasets/ab8832/variables/3ffd45/"},
        {"function": "+", "args": [{"variable": "datasets/ab8832/variables/2098f1/"}, {"value": 5}]}
    ],
    "measures": {
        "count": {"function": "cube_count", "args": []}
    }
}

dimensions

An array of input expressions. Each expression contributes one dimension to the output cube. The only exception is when a dimension results in a boolean (true/false) column, in which case the data are filtered by it as a mask instead of adding a dimension to the output.

When a dimension is added, the resulting axis consists of distinct values rather than all values. Variables which are already “categorical” or “enumerated” will simply use their “categories” or “elements” as the extent. Other variables form their extents from their distinct values.

measures

A set of cube functions to populate each cell of the cube. You can request multiple functions over the same dimensions (such as “cube_mean” and “cube_stddev”) or more commonly just one (like “cube_count”). Each member MUST be a ZZ9 cube function designed for the purpose. See ZZ9 User Guide:Cube Functions for a list of such functions and their arguments.

filters

An array containing references to filters that need to be applied to the dataset before starting the cube calculations. It can be an empty array or null, in which case no filtering will be applied.

weight

A reference to a variable to be used as the weight on all cube operations.

Cube output

Cubes collect columns of measure data in an arbitrary number of dimensions. Multiple measures in the same cube share dimensions, effectively overlaying each other. For example, a cube might contain a “count” measure and a “mean” measure with the same shape:

{
    "element": "crunch:cube",
    "n": 210,
    "missing": 12,
    "dimensions": [
        {"references": {"name": "A", ...}, "type": {"class": "categorical", "categories": [{"id": 1, ...}, {"id": 2, ...}, {"id": 3, ...}]}},
        {"references": {"name": "B", ...}, "type": {"class": "categorical", "categories": [{"id": 11, ...}, {"id": 12, ...}]}}
    ],
    "measures": {
        "count": {
            "metadata": {"references": {}, "type": {"class": "numeric", "integer": true, ...}},
            "data": [10, 20, 30, 40, 50, 60],
            "n_missing": 12
        },
        "mean": {
            "metadata": {"references": {}, "type": {"class": "numeric", ...}},
            "data": [3.5, 17.8, 9.9, 7.32, 0, 23.4],
            "n_missing": 12
        }
    },
    "margins": {
        "data": [210],
        "0": {"data": [30, 70, 110]},
        "1": {"data": [90, 120]}
    }
}

dimensions

The “dimensions” member is the most straightforward: an array of variable Definition objects. Each one defines an axis of the cube’s output. This may be different from the input dimensions’ definitions. For example, when counting numeric variables, the input dimension might be an expression involving the bin builtin function. Even though the input variable is of type “numeric”, the output dimension would be of type “enum” .

n

The number of rows considered for all measures.

measures

The “measures” member includes one object for each measure. The “metadata” member of each tells you the name, type and other definitions of the measure. The “data” member of each is a flattened array of values for that measure; the dimensions stride into that array in order, with the last dimension varying the fastest. In the example above, the first dimension (“A”) has 3 categories, while “B” has 2; therefore, the “flat” array [10, 20, 30, 40, 50, 60] for the “count” measure is interpreted as the “unflattened” array [[10, 20], [30, 40], [50, 60]]. Graphically:

	B:11	B:12
A:1	10	20
A:2	30	40
A:3	50	60

This is known in NumPy and other domains as “C order” (versus “Fortran order” which would be interpreted as [[10, 30, 50], [20, 40, 60]] instead).

n_missing

The number of rows that are missing for this measure. Because different measures may have different inputs (the column to take the mean of, for example, or weighted versus unweighted), this number may vary from one measure to another even though the total “n” is the same for all.

margins

The “margins” member is optional. When present, it is a tree of nested margins with one level of depth for each dimension. At the top, we always include the “grand total” for all dimensions. Then, we include a branch for each axis we “unroll”. So, for example, for a 3-dimensional cube of X, Y, and Z, the margins member might contain:

{
"margins": {
    "data": [4526],
    "0": {
        "data": [1755, 2771],
        "1": {"data": [
            [601, 370, 322, 269, 147, 46],
            [332, 215, 596, 523, 437, 668]
        ]},
        "2": {"data": [[1198, 557], [1493, 1278]]}
    },
    "1": {
        "data": [933, 585, 918, 792, 584, 714],
        "0": {"data": [
            [601, 370, 322, 269, 147, 46],
            [332, 215, 596, 523, 437, 668]
        ]},
        "2": {"data": [
            [825, 108], [560, 25], [325, 593],
            [417, 375], [191, 393], [373, 341]
        ]}
    },
    "2": {
        "data": [2691, 1835],
        "0": {"data": [[1198, 557], [1493, 1278]]},
        "1": {"data": [
            [825, 108], [560, 25], [325, 593],
            [417, 375], [191, 393], [373, 341]
        ]}
    }
}

Again, each branch in the tree is an axis we “unroll” from the grand total. So margins[0][2] contains the margin where X (axis 0) and Z (axis 2) are unrolled, and only Y (axis 1) is still “rolled up”.