Cloud Collaboration with
Crunch and crunch

![](assets/public-logo.png)
- Instant, visual, collaborative data analysis - Intuitive GUI for easy exploration — no programming required - Public REST API, libraries for integration with other domains (R, Python, …) — for when you want to code

Why Crunch?

Why Crunch?

- Interface design limitations of traditional software - Technical skill required ≫ conceptual complexity of questions
  • E.g. % of males 18–24 that like popcorn at the movies? Significantly different from females?

Why Crunch?

Because collaboration with data is hard.

Even harder with…

- Different skill levels - Different tools/software - Different objectives
Leads to communication via export/import, copy/paste. Iteration is painful.
[gui application]

Challenge: Interface Design

Single source of truth is great, but
- Different skill levels - Different concepts - Different objectives

How do we design interfaces that work for these different audiences?

The crunch package

Idiomatic R interface to cloud service

https://github.com/Crunch-io/rcrunch/

install.packages("crunch")
devtools::install_github("Crunch-io/rcrunch/pkg")

Idiomatic R

- The `data.frame` - Columnar, heterogeneous types - Indexing with `$`, `[`, `[[` - Formulas

The crunch package

- Uses same public HTTP API as web app - (Almost) never need to know that - Presents abstraction that datasets and variables are in local memory - Design interface around what an R user needs to do, not what HTTP dictates
[using rcrunch]
![](assets/crunch.png) ## How? - Lots of S4 classes and methods - `[` sometimes GETs - `names<-` calls PATCH - `$<-` does POST or PATCH - … except when I had to use S3 - `as.data.frame.CrunchDataset`

How?

Focus on interface by test-driving

					
with(test.authentication, {
   with(test.dataset(df), {
       try(ds$v3a <- ds$v3 + 5)
       test_that("A derived variable is created on the server", {
           expect_true("v3a" %in% names(allVariables(refresh(ds))))
       })
       ...
   })
})

How?

Focus on interface by test-driving

					
setup.and.teardown <- function (setup, teardown, obj.name=".setup") {
   structure(list(setup=setup, teardown=teardown, obj.name=obj.name),
       class="SUTD")
}

with.SUTD <- function (data, expr, ...) {
   env <- parent.frame()
   on.exit(data$teardown())
   assign(data$obj.name, data$setup(), envir=env)
   try(eval(substitute(expr), envir=parent.frame()))
}

test.authentication <- setup.and.teardown(
   function () suppressMessages(login()),
   logout)

Collaboration: Crunch style

- Make all more productive, less frustrated - Everyone can work at own level and meet needs - Less time for data analyst to do menial tasks, more time for what they’re good at - Less latency in communication

Deliver dynamic data, not static reports

Summary

- Collaboration is hard, especially with diverse skills and domains - Cloud can solve “single source of truth” but interface design problem remains - Crunch.io provides cloud computing + intelligent interface design for both technical and non-technical audiences

Questions? Comments?