This is a presentation by Peter Sefton, Michael Lynch, Liz Stokes and Gerard Devine, delivered at eResearch Australasia 2018 by Peter Sefton.

Launching DataCrate v1.0: a general purpose data packaging format for research data distribution and web-display
### Notes - Slide 1 In this presentation we will launch version 1.0 of the DataCrate standard. The presentation will cover: - The motivation for this work, and prior art - why we needed to bring together the standards we did in the way that we did. - A walk-through of example data crates from a variety of sources, speleology, clinical trials, simulation, social history, environmental science and microbiology. - An introduction to tools for making data crates with an appeal to attendees to join us in making more tools, for more new kinds of data. - A demonstration of how DataCrates are being used at UTS to move data though the research lifecycle - archiving and publishing data.
### Notes - Slide 2 The following people contributed to this presentation: - - - -

πŸ’»+ πŸ’Ύ + πŸ“¦ = πŸ™…
### Notes - Slide 3 There were no existing generic data packaging standards with human and and machine readable πŸ™… === FACE WITH NO GOOD GESTURE

Motivation: package data with maximum useful context
Who …  made it? funded the work? 
What … format are these files? … is the research about?
Where … was it collected? … is it about? 
Why … was it done?  … <link to publication>
How … were these files created? … can I repeat that process?
### Notes - Slide 4 Our motivation was to be able to display and distribute data sets with useful "who, what where" metadata in a way that is easy to for coders to target, and easy for researchers to consume, both as readers and programmers who might want to run code against a data set.

### Notes - Slide 5 We have a growing list of examples.

### Notes - Slide 6 DataCrate provides human-readable [HTML data about files](,5/6,/14/,j/pg/index.html) including detailed metadata.

Ability to describe file provenance
### Notes - Slide 7 This slide shows a CreateAction, where an `instrument` - a [Lidar scanner]( - was used by an `agent` - the person - to create two files.

Software can be an instrument too
### Notes - Slide 8 [This]( shows a software package (`instrument`) acting on a file (`object`) used to create another file (`result`) - a sepia version of a picture.

All metadata is available in JSON-LD
### Notes - Slide 9 DataCrates [contain metadata in JSON-LD](

... so relationships can be visualized
### Notes - Slide 10 Why do we want machine readable data? One reason would be to generate visualisations that help people understand relationships in the data set. Here’s a demo I coded up in about half-an hour before the conference that shows how we might visualise the the way files are created. It shows a Person (me) who is the agent in two CreateActions, one where the `instrument` is a camera/lens combination and the `object` is the place being pictured, and the result is a file, and one where the `object` is said file, the `instrument` is a software package, and the result is a sepia version of the original photo.

URIs as names for things
### Notes - Slide 11 Each term used has a link to its definition, eg:

(πŸ”§πŸ”¨πŸ”©πŸ”ͺπŸ”¬)ing is an issue for JSON-lD
### Notes - Slide 12 Tooling is a problem. JSON-LD is a great format, but: There are no utility libraries for things like looking up context keys.

### Notes - Slide 13 [Calcyte]( uses multi-worksheet spreadsheets for data entry, based on an idea of Mike Lake’s.

### Notes - Slide 14 This works, but it’s not an ideal user iterface.

### Notes - Slide 15 Gerard Devine is [developing a tool]( which will allow DataCrate export from the Australian National Data Service funded HIEv system. HIEv DataCrate - At the Hawkesbury Institute for the Environment at Western Sydney University, a bespoke data capture application (HIEv) harvests a wide range of environmental data (and associated file level metadata) from both automated sensor networks and analysed datasets generated by researchers. Leveraging built-in APIs within the HIEv a new packaging function has been developed, allowing for selected datasets to be identified and packaged in the DataCrate standard, complete with metadata automatically exported from the HIEv metadata holdings into the JSON-LD format. Going forward this will allow datasets within HIEv to be published regularly and in an automated fashion, in a format that will increase their potential for reuse.

### Notes - Slide 16 Christian Evenhuis is developing a tool for [exporting microscope images]( from [Omero](

### Notes - Slide 17 Chris is working to describe the equipment used in the Microbial Imaging Facility (MIF), Here’s a [page for a microscope](, this is part of work in progress to descibe as much of the context of research in MIF as possible.

### Notes - Slide 18 Peter Sefton has developed [code]( to export [Omeka Classic]( repositories to DataCrate. [This]( is an example of one from the University of Western Sydney curated by Katrina Trewin. This uses the [Portland Common Data Model]( for modelling repository structure. We are using these data sets to help develop an Omeka service based on the [Omeka S]( software, along with data from Dspace extracted using another nascent [code project](

### Notes - Slide 19 Provisioner grew out of two basic requirements, which seem to conflict with one another: - We want to be able integrate research data management into the tools researchers actually use to do their research, rather than as an add-on to an existing process (like DC) - Any such system should give the researchers something besides data management – access to facilities and software, easier publication, etc - We don’t want to build a monolith, and even if we wanted to build a monolith, we wouldn’t be allowed to – the current mood is SAAS, on-premises only if necessary, no single points of failure - The UNIX philosophy of small parts, loosely joined, and the idea that data has gravity

It’s standards all the way down
Oxford Common File Layout ← Static file-based repositories
Data Crate builds on Bagit ←  Data packages w/ checksums, content by ref ← Main metadata standard / Repo metadata standard β†’ PCDM
JSON-LD ← Linked data in programmer-friendly format
### Notes - Slide 20 - [Static Oxford Common File Layout]( - [DataCrate]( - [Bagit]( - []( - [Portland Common Data Model]( - [JSON-LD](

### Notes - Slide 21 Next step is to take this to an international meeting to see if we can get some agreement between project using similar approaches.

### Notes - Slide 22 [Dataspice]( does a similar thing to DataCrate - they could easily be aligned.

### Notes - Slide 23 [Research Object Bundle]( also tries to package data with JSON-LD data, but in a way that is (we think) more complicated to implement, and without the human-readable web-site embedded in the package.

Help wanted! 
We invite you to:

-  Critique the standard

-  Generate some more sample data sets as a spec for people who will ...

   -  ... write a packaging tool

   -  Export from data management system (eg MyTardis :) 

   -  Write a GUI or web tool for people to create DataCrates

   -  Help add viz to our HTML pages
### Notes - Slide 24 Please help.

### Notes - Slide 25 Please contribute to or use the [spec](