<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>UTS eResearch</title><link href="/" rel="alternate"></link><link href="/feed/atom" rel="self"></link><id>/</id><updated>2020-12-17T00:00:00+11:00</updated><entry><title>De-identifying survey data</title><link href="/2020/12/17/deidentifying_surveys.htm" rel="alternate"></link><published>2020-12-17T00:00:00+11:00</published><updated>2020-12-17T00:00:00+11:00</updated><author><name>Fiona Tweedie</name></author><id>tag:None,2020-12-17:/2020/12/17/deidentifying_surveys.htm</id><summary type="html">&lt;p&gt;Researchers collecting data about individuals have various obligations to meet if they wish to share or publish the data. They may be required to seek written consent from participants to share their data, or ensure that it is de-identified to a point where it can’t be re-identified. The following …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Researchers collecting data about individuals have various obligations to meet if they wish to share or publish the data. They may be required to seek written consent from participants to share their data, or ensure that it is de-identified to a point where it can’t be re-identified. The following gives two case studies of occasions where data custodians thought that they had adequately de-identified data, but close examination found that re-identification may be possible.&lt;/p&gt;
&lt;h2&gt;Defining personal information&lt;/h2&gt;
&lt;p&gt;Under the NSW Privacy and Personal Information Protection Act, &lt;a href="https://www.legislation.nsw.gov.au/view/html/inforce/current/act-1998-133#sec.4"&gt;personal information&lt;/a&gt; is&lt;/p&gt;
&lt;p&gt;&lt;em&gt;“any information or opinion (including information or an opinion forming part of a database and whether or not recorded in a material form) about an individual whose identity is apparent or can reasonably be ascertained from the information or opinion”.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Personal information can include:&lt;/p&gt;
&lt;p&gt;Name, address, email address, phone number, date of birth, photographs, voice and video recordings, biometric information, IP address, location information from a mobile device, and government identifiers such as tax file number, Medicare number and driver’s licence number.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.legislation.nsw.gov.au/view/html/inforce/current/act-1998-133#sec.19"&gt;Sensitive personal information&lt;/a&gt; includes information relating to an individual’s ethnic or racial origin, political opinions, religious or philosophical beliefs, trade union membership or sexual activities. The &lt;a href="https://www.legislation.gov.au/Details/C2020C00025"&gt;Australian Privacy Act&lt;/a&gt; recognises sensitive personal information as a special class of personal information.&lt;/p&gt;
&lt;p&gt;The NSW Health Records Information Protection Act defines &lt;a href="https://www.legislation.nsw.gov.au/view/html/inforce/current/act-2002-071#sec.6"&gt;health information&lt;/a&gt; as personal information that is about an individual’s physical or mental health, health services they have received, genetic information, healthcare identifiers, and information collected in the course of providing health care or donating body parts or organs.&lt;/p&gt;
&lt;h2&gt;De-identification&lt;/h2&gt;
&lt;p&gt;De-identifying data is not a straightforward process and researchers must take a number of factors into consideration prior to publishing data. As well as direct identifiers within the dataset itself, contextual information may make it possible to identify individuals. Thus, even if researchers have removed these identifiers, the dataset may not be suitable for publication if the participants consented to sharing of de-identified data. Refer to the &lt;a href="https://staff.uts.edu.au/topichub/Pages/Doing%20my%20job/UTS%20Governance/Privacy/Privacy%20principles/De-identification/de-identification.aspx"&gt;de-identification pages&lt;/a&gt; from the privacy office for further guidance.&lt;/p&gt;
&lt;h2&gt;Case study: External data sources&lt;/h2&gt;
&lt;p&gt;In 2016 the Australian Department of Health published a dataset of Medicare and PBS data that researchers at the University of Melbourne were able to demonstrate had not been adequately de-identified. Cases in which they were able to re-identify data included cross-referencing surgery information with media reports of prominent individuals’ surgery, and using the dates that a woman gave birth, to identify individuals. Additionally, the researchers found that they could use the provider numbers in the dataset to identify multiple patients of the same medical practice. The Australian Privacy Commissioner found that the Department of Health had committed a privacy breach in publishing the dataset. The important lesson in this case is that an individual need not be identifiable to any casual observer but that someone with additional contextual information could identify an individual.&lt;/p&gt;
&lt;h2&gt;Case study: A survey of cancer patients&lt;/h2&gt;
&lt;p&gt;Researchers from UTS sought to publish responses to a survey of cancer patients about their experiences. All survey respondents provided information about the type of cancer they had had, when it had been diagnosed, how it had been treated, and their current status. In addition, respondents were given a number of opportunities to provide contextual comments throughout the survey. Close reading of these comments identified several instances of information that could, in context with other information identify respondents included:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Contact details:&lt;/strong&gt; One individual included their email address in their response. Email addresses, along with telephone numbers, are generally considered to be personal information.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Dates of military service:&lt;/strong&gt; The dates of someone’s tenure of a job, including military service, can be used in combination with other data to identify them.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Treatment centre:&lt;/strong&gt; Information about the specific centre that an individual attended, alongside the information about their cancer type, dates, and treatment could allow identification.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Participation in a support program:&lt;/strong&gt; Several members of a sporting team for cancer survivors mentioned joining this team, making them potentially identifiable to each other. Several other respondents mentioned a specific support or exercise program that they had accessed. Depending on the size of the group or program, this may make them identifiable to other participants.&lt;/p&gt;
&lt;h2&gt;Lessons&lt;/h2&gt;
&lt;p&gt;Researchers need to be alert to the possibility that participants in a study may be identifiable from a dataset even if direct identifiers have been removed. Factors that may make participants identifiable include membership of a small population (e.g., having a rare disease, being a member of a small team in an organisation), or being an outlier in a study (e.g., significantly older than most of the cohort).&lt;/p&gt;
&lt;p&gt;Researchers need to take into consideration the possibility that external information may make an individual identifiable. For example, an athlete whose knee reconstruction was reported in the media may be identifiable in a dataset of knee surgery patients. Members of the study may also be able to identify each other from details in the data, for example as having attended the same program.&lt;/p&gt;
&lt;p&gt;Researchers seeking to publish survey data should pay particular attention to free-text responses when preparing their dataset as participants may include information that could identify them, either directly or in context with other data.&lt;/p&gt;
</content><category term="blog"></category></entry><entry><title>An open, composable standards–based research eResearch platform: Arkisto</title><link href="/2020/11/23/Arkisto.htm" rel="alternate"></link><published>2020-11-23T00:00:00+11:00</published><updated>2020-11-23T00:00:00+11:00</updated><author><name>Peter Sefton</name></author><id>tag:None,2020-11-23:/2020/11/23/Arkisto.htm</id><summary type="html">&lt;p&gt;This is a talk delivered in recorded format by Peter Sefton, Nick Thieberger, Marco La Rosa and Mike Lynch at eResearch Australasia 2020.&lt;/p&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/Arkisto/Slide01.png' alt='' title='1' border='1'  width='85%'/&gt;
&lt;p&gt;Research data from all disciplines has interest and value that extends beyond funding cycles and must be managed and preserved for the long term. However, much of …&lt;/p&gt;&lt;/section&gt;</summary><content type="html">&lt;p&gt;This is a talk delivered in recorded format by Peter Sefton, Nick Thieberger, Marco La Rosa and Mike Lynch at eResearch Australasia 2020.&lt;/p&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/Arkisto/Slide01.png' alt='' title='1' border='1'  width='85%'/&gt;
&lt;p&gt;Research data from all disciplines has interest and value that extends beyond funding cycles and must be managed and preserved for the long term. However, much of the effort in eResearch goes into building systems which provide functionality and services that operate on data but which actually put data at risk. For instance, loading data into a particular tool often means that the data is not be easily retrievable if that tool or service cannot be sustained. At worst, the data is lost.&lt;/p&gt;
&lt;p&gt;In this presentation we will introduce the standards based Arkisto platform and show a number of examples from multiple disciplines of current Arkisto deployments, including an institutional Research Data Portal, a snapshot of the Expert Nation history project, crowd-sourced data from historical criminology, and the Pacific And Regional Archive for Digital Sources in Endangered Cultures (PARADISEC).&lt;/p&gt;
&lt;/section&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/Arkisto/Slide02.png' alt='' title='2' border='1'  width='85%'/&gt;
&lt;p&gt;Across the sector we build services that operate on data but which actually put data at risk.&lt;/p&gt;
&lt;/section&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/Arkisto/Slide03.png' alt='' title='3' border='1'  width='85%'/&gt;
&lt;p&gt;The Arkisto (https://arkisto-platform.github.io/why/) approach is to work with a set of standards which make data available for long term access.
The closest emoji I could find to represent standards was this “standard poodle”.  Previously I used a toothbrush - on the basis that “standards are like toothbrushes, everyone wants to use their own”&lt;/p&gt;
&lt;/section&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/Arkisto/Slide04.png' alt='🐩#1: Oxford Common File Layout
&lt;p&gt;🐩#1: Oxford Common File Layout&lt;/p&gt;
&lt;p&gt;' title='4' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;The first of the two core standards is the Oxford Common File Layout (OCFL) to organize data in a repository as a set of files. This approach is scalable indefinitely, and reduces the risk that data will be locked up in monolithic systems.&lt;/p&gt;
&lt;p&gt;This diagram by Mike Lynch shows a series of different sized collections of data, each with a label. The labels (manifests)  in this case are purely about data integrity - and contain checksums. The bundles of data are the next level up as we move on to look at Standard number 2.&lt;/p&gt;
&lt;/section&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/Arkisto/Slide05.png' alt='🐩#2: Research Object Crate
&lt;p&gt;🐩#2: Research Object Crate&lt;/p&gt;
&lt;p&gt;' title='5' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.researchobject.org/ro-crate/"&gt;RO-Crate&lt;/a&gt; is the standard Arkisto uses for packaging and describing data sets. It is based on other standards:&lt;/p&gt;
&lt;p&gt;JSON-LD - an encoding scheme for linked-data using the universally accepted JSON format (Javascript Object Notation) to encode information about Data Entities such as files and folders and Contextual Entities such as people, places, instruments, data licenses. Files can be described in detail.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://schema.org/"&gt;Schema.org&lt;/a&gt; is used as the main ontology for classes and properties - it has coverage for all the basic Who What Where style metadata and is used by Google’s dataset search and a number of other projects. There are a few terms from other ontologies where Schema.org does not have coverage.&lt;/p&gt;
&lt;p&gt;RO-Crates may also have an HTML human readable summary of data. If you find a stray crate in your downloads folder it is easy to click on the HTML file and get a summary of what’s inside - they can also be hosted on the web using a plain-old webserver.&lt;/p&gt;
&lt;/section&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/Arkisto/Slide06.png' alt='
&lt;p&gt;' title='6' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;This is a screenshot of an RO-Crate in the &lt;a href="https://data.research.uts.edu.au/publication/67222dfddbd57f659ded65a0cd5c70e1/"&gt;UTS data portal&lt;/a&gt;. We are looking at its HTML summary.&lt;/p&gt;
&lt;/section&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/Arkisto/Slide07.png' alt='
&lt;p&gt;' title='7' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;With extensive metadata.&lt;/p&gt;
&lt;/section&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/Arkisto/Slide08.png' alt='TOOLS 🧰 ⚒️ and SPECS
OCFL Spec: https://ocfl.io/
Research Object Crate (RO-Crate) Spec: http://www.researchobject.org/ro-crate
UTS: https://github.com/UTS-eResearch/ro-crate-js
UTS OCFL JS Implementation: https://github.com/uts-eresearch/ocfl-js
UTS RO Crate / SOLR portal: https://github.com/uts-eresearch/oni-express
UTS Describo: https://github.com/UTS-eResearch/describo
UTS Describo Data Packs: https://github.com/UTS-eResearch/describo-data-packs
CoEDL OCFL JS implementation: https://github.com/CoEDL/ocfl-js
CoEDL Modern PARADISEC: https://github.com/CoEDL/modpdsc
CoEDL OCFL tools: https://github.com/CoEDL/ocfl-tools
TOOLS 🧰 ⚒️ and SPECS
OCFL Spec: https://ocfl.io/
Research Object Crate (RO-Crate) Spec: http://www.researchobject.org/ro-crate
UTS: https://github.com/UTS-eResearch/ro-crate-js
UTS OCFL JS Implementation: https://github.com/uts-eresearch/ocfl-js
UTS RO Crate / SOLR portal: https://github.com/uts-eresearch/oni-express
UTS Describo: https://github.com/UTS-eResearch/describo
UTS Describo Data Packs: https://github.com/UTS-eResearch/describo-data-packs
CoEDL OCFL JS implementation: https://github.com/CoEDL/ocfl-js
CoEDL Modern PARADISEC: https://github.com/CoEDL/modpdsc
CoEDL OCFL tools: https://github.com/CoEDL/ocfl-tools
' title='8' border='1'  width='85%'/&gt;
&lt;p&gt;A growing set of Arkisto-compatible software tools allow data ingest into repositories, and the creation of data discovery portals that connect data to analytical, visualisation and computing tools.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;OCFL Spec: &lt;a href="https://ocfl.io/"&gt;https://ocfl.io/&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Research Object Crate (RO-Crate) Spec: &lt;a href="http://www.researchobject.org/ro-crate"&gt;http://www.researchobject.org/ro-crate&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;UTS: &lt;a href="https://github.com/UTS-eResearch/ro-crate-js"&gt;https://github.com/UTS-eResearch/ro-crate-js&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;UTS OCFL JS Implementation: &lt;a href="https://github.com/uts-eresearch/ocfl-js"&gt;https://github.com/uts-eresearch/ocfl-js&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;UTS RO Crate / SOLR portal: &lt;a href="https://github.com/uts-eresearch/oni-express"&gt;https://github.com/uts-eresearch/oni-express&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;UTS Describo: &lt;a href="https://github.com/UTS-eResearch/describo"&gt;https://github.com/UTS-eResearch/describo&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;UTS Describo Data Packs: &lt;a href="https://github.com/UTS-eResearch/describo-data-packs"&gt;https://github.com/UTS-eResearch/describo-data-packs&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;CoEDL OCFL JS implementation: &lt;a href="https://github.com/CoEDL/ocfl-js"&gt;https://github.com/CoEDL/ocfl-js&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;CoEDL Modern PARADISEC: &lt;a href="https://github.com/CoEDL/modpdsc"&gt;https://github.com/CoEDL/modpdsc&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;CoEDL OCFL tools: &lt;a href="https://github.com/CoEDL/ocfl-tools"&gt;https://github.com/CoEDL/ocfl-tools&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/Arkisto/Slide09.png' alt='
&lt;p&gt;' title='9' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;One important tool is Describo, a desktop (and soon to be online) tool for describing data using the RO-Crate standard. It creates linked-data descriptions that can describe a dataset at the top level, and also individual files or variables inside files.&lt;/p&gt;
&lt;p&gt;There are two projects working on the online version of Describo - one at UTS and one led by CERN working with the European National Research Networks.&lt;/p&gt;
&lt;/section&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/Arkisto/Slide10.png' alt='' title='10' border='1'  width='85%'/&gt;
&lt;p&gt;Describo can be configured for use in specific domains, for example in cultural archives like PARADISEC. This slide shows how users can create entities and link them, and select from pre-defined data loaded in as part of the profile.&lt;/p&gt;
&lt;/section&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/Arkisto/Slide11.png' alt='
&lt;p&gt;' title='11' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;Arkisto currently has two data discovery tools that index the contents of an OCFL repository so humans and machines can discover data and connect to analytical, visualisation and computing tools. This is Michael Lynch’s diagram showing, from the left, how data can be “delivered” to a repository via standard tools (such as rsync) over SSH.&lt;/p&gt;
&lt;p&gt;An indexing process uses Solr (or another index like Elasticsearch) builds an index of RO-Crate metadata that can be then used for search and faceted browsing over data. There is a user on the right, requesting access to a dataset  and a security guard checking her credentials - the user has the rights to see datasets with a license - * - so in this case the system can serve the content to her.&lt;/p&gt;
&lt;/section&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/Arkisto/Slide12.png' alt='' title='12' border='1'  width='85%'/&gt;
&lt;p&gt;Here is an example from the PARADISEC indexer. As per the Arkisto appraoach, the PARADISEC site is data driven - objects are stored on disk in OCFL using RO-Crate to describe each research object. Indexing tools walk the OCFL filesystem looking for RO-Crates then, using the crate metadata in addition to the OCFL inventory metadata, construct appropriate indexes into the content. In this example we can see version 1 of this item and the metadata we get from the OCFL inventory.&lt;/p&gt;
&lt;/section&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/Arkisto/Slide13.png' alt='Expert Nation Snapshot
&lt;p&gt;Expert Nation Snapshot&lt;/p&gt;
&lt;p&gt;' title='13' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;This is an example of a faceted search interface constructed using the Oni portal tool developed at UTS. This image shows a data-export from the Expert Nation &lt;a href="https://expertnation.org/"&gt;https://expertnation.org/&lt;/a&gt; project (tagline “Universities, War and 1920s &amp;amp; 30s Australia”) led by  Associate Professor &lt;a href="https://www.uts.edu.au/staff/tamson.pietsch"&gt;Tamson Pietsch&lt;/a&gt;. Professor Pietsch asked us to create an archival snapshot of the state of the dataset to support a book. We are working with Pietsch’s team to configure the portal to be useful in exporting the data. The SectorName facet is particularly important; it shows that the health sector was by far the biggest employer of returned service people.&lt;/p&gt;
&lt;/section&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/Arkisto/Slide14.png' alt='
&lt;p&gt;' title='14' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;Here is another dataset - this time we are looking at an RO-Crate sitting on a plain old web site (not a search portal). This is a screenshot of a map with a time-window function showing where one Laura Adams was convicted of 42 offences between 1918 and 1942. The power of the Arkisto platform, based on Standards, is that adding this kind of functionality to other collections with geographical features in it is a matter of writing a few simple bits of code. The component can be re-used because both the data and metadata use the RO-Crate standard (which in turn is built on other standards). The data in this demonstrator came from Alana Piper’s &lt;a href="https://criminalcharacters.com/"&gt;Criminal Characters&lt;/a&gt; project.&lt;/p&gt;
&lt;/section&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/Arkisto/Slide15.png' alt='
&lt;p&gt;' title='15' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;This is a screenshot of geographical data about a single offender’s sentences that has been exported in to the Time Layered Cultural Map.&lt;/p&gt;
&lt;p&gt;We are working on making this an automated service so that any Arkisto portal can be configured to display relevant geo-data and also to be able to export it for analysis to other tools via APIs including at large scale.&lt;/p&gt;
&lt;p&gt;The researcher, Dr Alana Piper says:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Analytical possibilities here would be uploading all offenders in bulk and
comparing the 'range' results to determine what types of offences or other
factors are associated with higher/lower levels of mobility.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/Arkisto/Slide16.png' alt='Modern PARADISEC
&lt;p&gt;Modern PARADISEC&lt;/p&gt;
&lt;p&gt;' title='16' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;A modern catalog driven from OCFL and RO-Crate. This is the landing page built from the content indexed into elastic search. We can see the number of collections, items, contributors and universities at a glance. There are controls for jumping to a specific item or collection and a simple auto-complete search for quickly finding known content. The bottom half is a dynamic list of the most recently updated items.&lt;/p&gt;
&lt;/section&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/Arkisto/Slide17.png' alt='' title='17' border='1'  width='85%'/&gt;
&lt;p&gt;PARADISEC has viewers for various content types: video and audio with time aligned transcriptions, image set viewers, and document viewers (xml, pdf and microsoft formats). We are working on making these viewers available across Arkisto sites by having a standard set of hooks for adding viewer plugins to a site as needed.&lt;/p&gt;
&lt;/section&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/Arkisto/Slide18.png' alt='' title='18' border='1'  width='85%'/&gt;
&lt;p&gt;PARADISE has advanced search and deep indexing into transcriptions with the ability to play segments directly from the search interface.&lt;/p&gt;
&lt;/section&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/Arkisto/Slide19.png' alt='
&lt;p&gt;' title='19' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;This is another Arkisto based website - it’s a confidential, access-controlled database of successful grant applications built using an OCFL repository and RO-Crate objects and presented by the Oni portal.&lt;/p&gt;
&lt;/section&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/Arkisto/Slide20.png' alt='
&lt;p&gt;' title='20' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;The Arkisto website has a growing list of use cases for different data pipelines - here’s a sketch of the architecture we’re working on for &lt;a href="https://www.uts.edu.au/staff/shauna.murray"&gt;Associate Professor Shauna Murray’s&lt;/a&gt; group at UTS - managing data from a sensor network in estuaries along the NSW coast.&lt;/p&gt;
&lt;p&gt;See the &lt;a href="https://arkisto-platform.github.io/use-cases/"&gt;Use Cases&lt;/a&gt; page for more.&lt;/p&gt;
&lt;/section&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/Arkisto/Slide21.png' alt='
&lt;p&gt;https://arkisto-platform.github.io&lt;/p&gt;
&lt;p&gt;https://arkisto-platform.github.io
' title='21' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;To conclude:&lt;/p&gt;
&lt;p&gt;Arkisto is a flexible research platform which can be used to assemble a variety of data pipelines, for a variety of disciplines.&lt;/p&gt;
&lt;p&gt;The emphasis is on FIRST keeping data safe and re-usable by storing and describing it using standards, so that in the absence of budget and resources to maintain complex virtual labs the data are still available for re-use. We THEN use our growing set of interoperable tools to build data hubs with re-usable data viewer plugins and standards-based interoperable analytical services.&lt;/p&gt;
&lt;p&gt;There are active projects underway at the University of Melbourne and University of Technology Sydney across a wide range of disciplines and we are seeking funding to enhance the platform.&lt;/p&gt;
&lt;/section&gt;
</content><category term="Repositories"></category></entry><entry><title>eResearch Australasia 2019 trip report</title><link href="/2019/11/06/eResearch_2019.htm" rel="alternate"></link><published>2019-11-06T00:00:00+11:00</published><updated>2019-11-06T00:00:00+11:00</updated><author><name>Peter Sefton</name></author><id>tag:None,2019-11-06:/2019/11/06/eResearch_2019.htm</id><summary type="html">&lt;p&gt;By Mike Lynch and Peter Sefton&lt;/p&gt;
&lt;p&gt;Mike Lynch and Peter Sefton attended the 2019 eResearch Australasia conference
in Brisbane from 22-24 October 2019, where we presented a few things - and a
pre-conference summit on the 21st held by the Australian Research Data Commons,
where Mike presented our report from our …&lt;/p&gt;</summary><content type="html">&lt;p&gt;By Mike Lynch and Peter Sefton&lt;/p&gt;
&lt;p&gt;Mike Lynch and Peter Sefton attended the 2019 eResearch Australasia conference
in Brisbane from 22-24 October 2019, where we presented a few things - and a
pre-conference summit on the 21st held by the Australian Research Data Commons,
where Mike presented our report from our small discovery project on scalable
repository technology. UTS paid for the trip.&lt;/p&gt;
&lt;h1&gt;What we presented - our work on Simple Scalable Research Data Repositories&lt;/h1&gt;
&lt;p&gt;We've posted fleshed-out versions of our conference papers as usual. Mike
presented a short version of
&lt;a href="/2019/11/05/eResearch2019_lighting_ocfl_nginx.htm"&gt;ARDC funded work on data repositories&lt;/a&gt;
at both the the summit and the conference and Peter had also put in an abstract
for
&lt;a href="/2019/11/05/FAIR%20Repo%20-%20eResearch%20Presentation.htm"&gt;longer version&lt;/a&gt;
which is less technically focussed and gives more of the context for why this
work is important.&lt;/p&gt;
&lt;p&gt;Peter &lt;a href="/2019/11/05/RO-Crate%20eResearch%20Australasia%202019.htm"&gt;presented an update on our ongoing work on describing and packaging
research data - this time focussing on the new merged standard Research Object
Crate&lt;/a&gt;(RO-Crate) and looking at what's coming next.&lt;/p&gt;
&lt;h2&gt;Diversity breakfast&lt;/h2&gt;
&lt;p&gt;I (Mike) went to the breakfast given in honour of the late Dr Jacky Pallas, a senior
figure in the eResearch community who had given a keynote at last year's
eResearch Australasia&lt;/p&gt;
&lt;p&gt;The speaker was Dr Toni Collis, a research software engineer and director of
&lt;a href="https://womeninhpc.org/"&gt;Women in High Performance Computing&lt;/a&gt;, on how lack of
diversity is damaging your research, making the point that diverse research and
support teams can be demonstrated to be more effective in terms of performance,
and the importance of equity, diversity and inclusivity in attracting and
retaining talent.&lt;/p&gt;
&lt;h2&gt;Research as a primary function of Electronic Health Records&lt;/h2&gt;
&lt;p&gt;Prof Nikolajz Zeps' presentation was one of a couple of talks which sold
themselves as being provocative, in that they were arguing for a loosening of
traditional, restrictive health care ethics and consent practices so that data
could be made more readily available for research. He made the point that data
sharing is very difficult in the Australian healthcare system not just because
of ethical restrictions but because the system is so fragmented. He argued for
an integration of research and clinical data consent and management practices,
which would allow the information flows required for medical research to be used
for the health care system itself to better monitor the effectiveness of
treatments and patient outcomes. Moving the consent process from one of research
ethics to clinical ethics can make things simpler, in terms of administration.&lt;/p&gt;
&lt;p&gt;(I was less impressed by the other provocative keynote in which the speaker said
&amp;quot;no-one ever died as the result of a health-care data breach&amp;quot;, which I thought
was a bit of posturing, even though it got applause from some of the audience.)&lt;/p&gt;
&lt;h1&gt;Notable presentations&lt;/h1&gt;
&lt;h2&gt;Galaxy Australia and the Australian Bioinformatic Commons&lt;/h2&gt;
&lt;p&gt;These two presentations were part of the bioinformatics stream, about the
Australian node of the global Galaxy workflow and computational platform, and
the Australian BioCommons Pathfinder Project.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="http://conference.eresearch.edu.au/wp-content/uploads/2019/08/2019_eResearch_96_Galaxy-Australia-%E2%80%93-the-Bring-Your-Own-Data-platform.pdf"&gt;Gareth Price, Galaxy Australia - the “Bring Your Own Data” platform enabling multi-omics analysis for biology researchers&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://usegalaxy.org.au/"&gt;Galaxy Australia&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="http://conference.eresearch.edu.au/wp-content/uploads/2019/08/2019_eResearch_53_The-Australian-Bioinformatics-Commons-as-an-exemplar.pdf"&gt;Jeff Christiansen, The Australian Bioinformatics Commons&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="http://www.bioplatforms.com/biocommons/"&gt;The Australian BioCommons Pathfinder Project&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;9 Reproducible Research Things&lt;/h2&gt;
&lt;p&gt;Amanda Miotto from Griffith University's eResearch team presented on
&lt;a href="http://conference.eresearch.edu.au/wp-content/uploads/2019/10/2019-eResearch_-29-_8-Reproducible-Research-things.pdf"&gt;running workshops to introduce researchers to good reproducibility practice&lt;/a&gt; which focussed on immediate benefits, like safeguarding a research team against individual members leaving or falling ill, and practical steps.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://guereslib.github.io/Reproducible-Research-Things/"&gt;The workshop materials are available on GitHub&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;Physiome Journal&lt;/h2&gt;
&lt;p&gt;An &lt;a href="http://conference.eresearch.edu.au/wp-content/uploads/2019/08/2019_eResearch_18_Peer-Reviewing-Reproducibility-The-Physiome-Journal.pdf"&gt;interesting presentation by Karin Lundengård of the Auckland Bioengineering Institute&lt;/a&gt; about &lt;a href="https://journal.physiomeproject.org/"&gt;Physiome Journal&lt;/a&gt;, which will publish validated and reproducible mathematical models of physiological processes. The models will also be made available as Jupyter notebooks and shared on the &lt;a href="https://gigantum.com/"&gt;Gigantum&lt;/a&gt; platform.&lt;/p&gt;
&lt;h2&gt;HASS BOF&lt;/h2&gt;
&lt;p&gt;A lot of what we spoke about in this BOF, which was chaired by Ingrid Mason,
echoed what I (Mike) had heard the week before at a Big Data for the Digital
Humanities symposium in Canberra which was organised by the ARDC and AARNet (and
which I should write a blog post about).&lt;/p&gt;
&lt;p&gt;The challenges faced by digital humanities researchers and support staff mirror
one another - researchers are unsure of the right way to engage with technical
staff and vice versa, and good collaboration is too labour-intensive to be
sustainable if it's going to be spread beyond a minority of researchers who are
already linked in to support networks.&lt;/p&gt;
&lt;p&gt;This gave me a kind of wistful feeling about an earlier
&lt;a href="http://conference.eresearch.edu.au/wp-content/uploads/2019/10/2019-eResearch-Underwood-and-Carroll-1.pdf"&gt;keynote from  Dell&lt;/a&gt;
about machine learning in medical science, because the speaker was very
enthusiastic about moving HPC tools out of the realm where researchers needed to
become technology experts to use them at all, into something more like commodity
software. Although there are some areas of the humanities where this sort of
thing is starting to happen - transcription is one which came up both in
Canberra and Brisbane.&lt;/p&gt;
&lt;h1&gt;Data Discovery&lt;/h1&gt;
&lt;p&gt;I (Peter) chaired a session on Data Discovery with a couple of lead-in
talks that outlined what's going on in the world of generic research data
discovery, leading into a discussion.&lt;/p&gt;
&lt;p&gt;From our viewpoint at UTS it was useful to get confirmation that discovery services are converging on using
&lt;a href="https://schema.org"&gt;Schema.org&lt;/a&gt; for high-level description of data sets, for
indexing by other services. Which is good, because that's the horse we bet on at UTS.
It's being used by both &lt;a href="https://researchdata.ands.org.au/"&gt;Research Data Australia&lt;/a&gt; (RDA) and
the new player
&lt;a href="https://toolbox.google.com/datasetsearch"&gt;Google dataset search&lt;/a&gt; (that's run by
a tiny team apparently, but it will have a huge impact on how everyone has to
structure their metadata).&lt;/p&gt;
&lt;p&gt;Amir Aryani (Swinburne) and Melroy Almeida (Australian Access Federation)
&lt;a href="https://conference.eresearch.edu.au/wp-content/uploads/2019/08/2019_eResearch_133_Australian-ORCID-Research-Graph.pdf"&gt;presented on ORCID graph&lt;/a&gt;
looking at collaboration networks. This is testament to the power of using
strong, URI-based identifiers, once you start doing that, metadata changes from
an un-reliable soup of differently spelled ambiguous names, to something you can
do real analytics on.&lt;/p&gt;
&lt;p&gt;Adrian Burton (ARDC, ex ANDS) has been dealing with metadata a long time - he
put it that the Schema.org approach had won, and suggested that this might be a
bit of a loss. The &lt;a href="https://en.wikipedia.org/wiki/RIF-CS"&gt;RIF-CS&lt;/a&gt; standard that ANDS
inherited and built RDA around had a entity based model, with Collections,
Parties, Activities and services (based on
&lt;a href="https://en.wikipedia.org/wiki/ISO_2146"&gt;ISO 2146&lt;/a&gt;) rather than simple flat
name-value metadata. I agree that the entity model was a strength of RIF-CS But
actually, for those that want to convey rich context about data, then schema.org
with linked data can do everything RIF-CS can, more elegantly and with more
detail. See the work we've been doing on &lt;a href="/2019/11/05/RO-Crate%20eResearch%20Australasia%202019.htm"&gt;RO-Crate&lt;/a&gt; which takes things to a
deeper level with descriptions of files and (soon) even variables inside files
including provenance chains (what people and equipment did to &lt;em&gt;make&lt;/em&gt; those files
from observations, or other files).&lt;/p&gt;
&lt;p&gt;The leaders of that session have done a follow up survey, so I think they'll be
putting out more info soon.&lt;/p&gt;
&lt;h1&gt;Preservation&lt;/h1&gt;
&lt;p&gt;The &lt;a href="/2019/11/05/RO-Crate%20eResearch%20Australasia%202019.htm"&gt;RO-Crate talk&lt;/a&gt; I gave (Peter here) was in a stream on Digital Preservation and data packaging.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://twitter.com/ErinGLib"&gt;Erin Gallant&lt;/a&gt; and
&lt;a href="https://twitter.com/DigiGav"&gt;Gavin Kennedy, also from AARNet&lt;/a&gt; about digital
preservation.&lt;/p&gt;
&lt;img alt="picture of the panel" src="/blog/eResearch2019/panel.png"/&gt;
&lt;p&gt;I was talking, not taking notes, but we discussed what the research community
and cultural collections folks can learn from each other - actually I think we
made some of the same mistakes, both the eResearch community and GLAM sector
invested in big silos which ended up not just storing data, but making it
difficult to move, re-use. To labour the metaphor a bit, silos have small holes
in the bottom, so getting data in and out is slow.&lt;/p&gt;
&lt;p&gt;Mike's diagram of an OCFL Repository shows an alternative approach - instead of
putting data in a container with constricted ingress and egress, lay it all out
in the open. I'm not an expert in preservation systems, but I do know that
that's the approach taken by the open source
&lt;a href="https://www.archivematica.org/en/"&gt;Archivematica&lt;/a&gt; preservation system (Note:
I've done a bit of work for Artefactual systems which looks after it), it works
as an application that sits &lt;em&gt;beside&lt;/em&gt; a set of files on disk - if needed you can
use the grandparent of all APIs, that is file operations to fetch data. All of
our talks we gave, linked above, were about this idea in one way or another.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Picture of data layed-out and labelled in rows (like a field not a
silo) - by Mike Lynch" src="/blog/eResearch2019/field.png"/&gt;&lt;/p&gt;
&lt;h1&gt;Trusted Repository certification - one for the UTS Roadmap&lt;/h1&gt;
&lt;p&gt;I (Peter again) attended
&lt;a href="https://conference.eresearch.edu.au/wp-content/uploads/2019/08/2019_eResearch_106_Trusted-Data-Community.pdf"&gt;a session MCed by Richard Ferrers&lt;/a&gt;
from ARDC with contributions from people from a range of institutions and
repositories who are part of an ARDC community of practice.&lt;/p&gt;
&lt;p&gt;They talked about the Core Trust Seal repository certification program - and the process of getting certified.&lt;/p&gt;
&lt;p&gt;Here's some background on CTS:&lt;/p&gt;
&lt;blockquote&gt;
&lt;h3&gt;Core Certification and its Benefits&lt;/h3&gt;
&lt;p&gt;Nowadays certification standards are available at different levels, from a core level to extended and formal
levels. Even at the core level, certification offers many benefits to a repository and its stakeholders.
Core certification involves a minimally intensive process whereby data repositories supply evidence that they
are sustainable and trustworthy. A repository first conducts an internal self-assessment, which is then
reviewed by community peers. Such assessments help data communities—producers, repositories, and
consumers—to improve the quality and transparency of their processes, and to increase awareness of and
compliance with established standards. This community approach guarantees an inclusive atmosphere in
which the candidate repository and the reviewers closely interact.&lt;/p&gt;
&lt;p&gt;In addition to external benefits, such as building stakeholder confidence, enhancing the reputation of the
repository, and demonstrating that the repository is following good practices, core certification provides a
number of internal benefits to a repository. Specifically, core certification offers a benchmark for comparison
and helps to determine the strengths and weaknesses of a repository.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Right now at UTS we're in the process of making a new Digital Strategy, aligned the &lt;a href="https://www.uts.edu.au/about/uts-2027-strategy"&gt;UTS 2027 Strategy&lt;/a&gt; - one of the core goals (which are still evolving so we can't link to them just yet) is to have trusted systems. CTS would be a great way for the IT Department (that's us) to demonstrate to the organisation that we have the governance, technology and operational model in place to run a repository.&lt;/p&gt;
&lt;p&gt;We're talking now about getting at least the first step (self certification) on the 2021 Roadmap - but before that, we'll see if we can join the community discussion and start planning.&lt;/p&gt;
&lt;p&gt;&lt;a rel="license" href="http://creativecommons.org/licenses/by/3.0/au/"&gt;&lt;img alt="Creative Commons Licence" style="border-width:0" src="https://i.creativecommons.org/l/by/3.0/au/88x31.png" /&gt;&lt;/a&gt;&lt;br /&gt;This work is licensed under a &lt;a rel="license" href="http://creativecommons.org/licenses/by/3.0/au/"&gt;Creative Commons Attribution 3.0 Australia License&lt;/a&gt;.&lt;/p&gt;
</content><category term="DataCrate, Repositories, eResearch"></category></entry><entry><title>FAIR Simple Scalable Static Research Data Repository</title><link href="/2019/11/05/FAIR%20Repo%20-%20eResearch%20Presentation.htm" rel="alternate"></link><published>2019-11-05T00:00:00+11:00</published><updated>2019-11-05T00:00:00+11:00</updated><author><name>Michael Lynch</name></author><id>tag:None,2019-11-05:/2019/11/05/FAIR Repo - eResearch Presentation.htm</id><summary type="html">&lt;p&gt;This presentation was given by Peter Sefton &amp;amp; Michael Lynch at the eResearch Australasia 2019 Conference in Brisbane, on the 24th of October 2019.&lt;/p&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/FAIR Repo - eResearch Presentation/Slide01.png' alt='FAIR Simple Scalable Static Research Data Repository
Dr Peter Sefton and Michael Lynch
University of Technology Sydney
' title='FAIR Simple Scalable Static Research Data Repository
Dr Peter Sefton and Michael Lynch
University of Technology Sydney
' border='1'  width='85%'/&gt;
&lt;p&gt;Welcome - we’re going to share this presentation. Peter/Petie will talk through the two major standards we’re building on, and Mike will talk about the …&lt;/p&gt;&lt;/section&gt;</summary><content type="html">&lt;p&gt;This presentation was given by Peter Sefton &amp;amp; Michael Lynch at the eResearch Australasia 2019 Conference in Brisbane, on the 24th of October 2019.&lt;/p&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/FAIR Repo - eResearch Presentation/Slide01.png' alt='FAIR Simple Scalable Static Research Data Repository
Dr Peter Sefton and Michael Lynch
University of Technology Sydney
' title='FAIR Simple Scalable Static Research Data Repository
Dr Peter Sefton and Michael Lynch
University of Technology Sydney
' border='1'  width='85%'/&gt;
&lt;p&gt;Welcome - we’re going to share this presentation. Peter/Petie will talk through the two major standards we’re building on, and Mike will talk about the software stack we ended up with.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/FAIR Repo - eResearch Presentation/Slide02.png' alt='The project in a nutshell
A static, file-based research data repository platform using open standards and off-the-shelf web technology
OCFL – versioned file storage
RO-Crate – dataset / object metadata
Solr – index and discovery
nginx – baked in access control
&lt;p&gt;' title='The project in a nutshell
A static, file-based research data repository platform using open standards and off-the-shelf web technology
OCFL – versioned file storage
RO-Crate – dataset / object metadata
Solr – index and discovery
nginx – baked in access control&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;This project is about building highly scalable research data repositories quickly, cheaply and above all sustainably by using Standards for organizing and describing data.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/FAIR Repo - eResearch Presentation/Slide03.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;We had a grant to continue our OCFL work from the &lt;a href="https://ror.org/038sjwq14"&gt;Australian Research Data Commons&lt;/a&gt;. (I’ve used the new Research Organisation Registry (ROR) ID for ARDC, just because it’s new and you should all check out the ROR).&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/FAIR Repo - eResearch Presentation/Slide04.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;h3&gt;OCFL Specifications&lt;/h3&gt;
&lt;blockquote&gt;
This Oxford Common File Layout (OCFL) specification describes an application-independent approach to the storage of digital information in a structured, transparent, and predictable manner. It is designed to promote long-term object management best practices within digital repositories.
Specifically, the benefits of the OCFL include:
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Completeness, so that a repository can be rebuilt from the files it stores&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Parsability, both by humans and machines, to ensure content can be understood in the absence of original software&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Robustness against errors, corruption, and migration between storage technologies&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Versioning, so repositories can make changes to objects allowing their history to persist&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Storage diversity, to ensure content can be stored on diverse storage infrastructures including conventional filesystems and cloud object stores&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Source: &lt;a href="https://ocfl.io"&gt;https://ocfl.io&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/FAIR Repo - eResearch Presentation/Slide05.png' alt='TODO	
Progressively drill down on the architecture until it’s “just”  a partition.
' title='TODO	
Progressively drill down on the architecture until it’s “just”  a partition.
' border='1'  width='85%'/&gt;
&lt;p&gt;Here’s a screenshot of what an OCFL object looks like - it’s a series of versioned directories, each with a detailed inventory.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/FAIR Repo - eResearch Presentation/Slide06.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;One of the standards we are using is RO-Crate - for describing research data sets. I presented this at eResearch as well [TODO - link]&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/FAIR Repo - eResearch Presentation/Slide07.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;This is an example of an RO-Crate showing that each Crate has a human-readable HTML view as well as a machine readable view.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/FAIR Repo - eResearch Presentation/Slide08.png' alt='
&lt;p&gt;🖥️
👩🏾‍🔬
' title='&lt;/p&gt;
&lt;p&gt;🖥️
👩🏾‍🔬
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;The two views (human and machine) of the data are equivalent - in fact the HTML version is generated from the JSON-LD version using a tool called &lt;a href="https://code.research.uts.edu.au/eresearch/CalcyteJS"&gt;CalcyteJS&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/FAIR Repo - eResearch Presentation/Slide09.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;This is a screenshot of work very much in progress - it’s a shows an example of the repository system working at the smallest scale, showing a single collection, “Farms to Freeways”; a social history project from Western Sydney, which we have exported into RO-Crate format as a demonstration. Each of the participants has been indexed for discovery. In a more deployment for a institutional repository, datasets would be indexed at the top level only. The point is to show that this software will be highly configurable.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/FAIR Repo - eResearch Presentation/Slide10.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;OCFL needs some explaining. I’ve had a couple of conversations with developers
where it takes them a little while to get what it’s for.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/FAIR Repo - eResearch Presentation/Slide11.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;But they DO get it the standard is well designed.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/FAIR Repo - eResearch Presentation/Slide12.png' alt='over to Mike ...
&lt;p&gt;' title='over to Mike ...&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/FAIR Repo - eResearch Presentation/Slide13.png' alt='Off-the-shelf components
&lt;p&gt;' title='Off-the-shelf components&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;Solr is an efficient search engine.&lt;/p&gt;
&lt;p&gt;nginx is an industry-standard scalable web server, used by companies like DropBox and Netflix&lt;/p&gt;
&lt;p&gt;Both are standard, open-source, easy to deploy and keep patched: unlike dedicated data repositories, which tend be fussy and make your server team swear.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/FAIR Repo - eResearch Presentation/Slide14.png' alt='ocfl-nginx
Resolves a URL:
&lt;p&gt;https://my.repo/3eacb986d1.v4/PATH/TO/FILE.html&lt;/p&gt;
&lt;p&gt;To a file in an ocfl repository:&lt;/p&gt;
&lt;p&gt;/mnt/ocfl/3e/ac/b9/86/d1/v4/content/PATH/TO/FILE.html&lt;/p&gt;
&lt;p&gt;' title='ocfl-nginx
Resolves a URL:&lt;/p&gt;
&lt;p&gt;https://my.repo/3eacb986d1.v4/PATH/TO/FILE.html&lt;/p&gt;
&lt;p&gt;To a file in an ocfl repository:&lt;/p&gt;
&lt;p&gt;/mnt/ocfl/3e/ac/b9/86/d1/v4/content/PATH/TO/FILE.html&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;Mapping incoming URLs to the right file in the ocfl repository is straightforward and done with an extension in nginx’s minimal flavour of JavaScript. This slide simplifies things a bit: in real life we have URL ids which solr maps to OIDs and then to a pairtree path.&lt;/p&gt;
&lt;p&gt;Todo: we want to use the &lt;a href="http://mementoweb.org/guide/howto/"&gt;Memento standard&lt;/a&gt; so that clients can request versioned rsources.&lt;/p&gt;
&lt;p&gt;We are also looking at versioned DOIs pointing to versioned URLs and resources&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/FAIR Repo - eResearch Presentation/Slide15.png' alt='Code
&lt;p&gt;' title='Code&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/UTS-eResearch/ocfl-js"&gt;ocfl-js - a Node library for building and updating OCFL repositories&lt;/a&gt;
&lt;a href="https://github.com/UTS-eResearch/ro-crate-js"&gt;ro-crate-js - a Node library for working with RO-Crates&lt;/a&gt;
&lt;a href="https://github.com/UTS-eResearch/ocfl-nginx"&gt;ocfl-nginx - an extension to nginx allowing it to serve versioned OCFL content&lt;/a&gt;
&lt;a href="https://cloud.docker.com/u/mikelynch/repository/docker/mikelynch/nginx-ocfl"&gt;a Docker image for ocfl-nginx&lt;/a&gt;
&lt;a href="https://code.research.uts.edu.au/eresearch/solr-catalog"&gt;solr-catalog - a Node library for indexing an OCFL repository into a Solr index&lt;/a&gt;
&lt;a href="https://code.research.uts.edu.au/eresearch/data-portal"&gt;data-portal - a single-page application for searching a Solr index&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The codebase is in a lot of places but that’s consistent with the approach - they are all just components which we can deploy as we need them&lt;/p&gt;
&lt;p&gt;The nginx extension is very small and would be easy to reimplement against another server&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/FAIR Repo - eResearch Presentation/Slide16.png' alt='Access control
licences on datasets in RO-Crate
nginx authenticates users
local group service map users to licences
nginx enforces access on search results and payloads
' title='Access control
licences on datasets in RO-Crate
nginx authenticates users
local group service map users to licences
nginx enforces access on search results and payloads
' border='1'  width='85%'/&gt;
&lt;p&gt;This is the most prototypical / primitive part of what we’ve got so far.&lt;/p&gt;
&lt;p&gt;Licences on RO-Crate are indexed in the solr index. nginx authenticates web users, looks up which licences they can access, and applies access control to both search results and payloads.&lt;/p&gt;
&lt;p&gt;At the moment, we’ve got a test server which doesn’t authenticate but which only serves datasets with a public licence and denies access to everything else.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/FAIR Repo - eResearch Presentation/Slide17.png' alt='Access control at its most basic
&lt;p&gt;' title='Access control at its most basic&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;The screenshot on the left is a Solr query showing public and internal licences&lt;/p&gt;
&lt;p&gt;The screenshot on the right is a basic web view of what nginx serves to an unauthenticated guest user - datasets with internal licenses aren’t shown&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/FAIR Repo - eResearch Presentation/Slide18.png' alt='Development strategy
Agile development around well-designed data standards pays off
Successful collaborations with PARADISEC and State Library of New South Wales, showing the feasibility and ease-of-use of both OCFL and RO-Crate
It’s worth engaging with standards at the new/evolving stage, even if this requires a bit of running around to keep up
&lt;p&gt;' title='Development strategy
Agile development around well-designed data standards pays off
Successful collaborations with PARADISEC and State Library of New South Wales, showing the feasibility and ease-of-use of both OCFL and RO-Crate
It’s worth engaging with standards at the new/evolving stage, even if this requires a bit of running around to keep up&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;Good data standards make incremental development much easier.&lt;/p&gt;
&lt;p&gt;We were able to get real results in one- and two-day workshops with teams from PARADISEC and the State Library of New South Wales, both with large, structured digital humanities collections behind APIs.&lt;/p&gt;
&lt;p&gt;Both the OCLF and RO-Crate standards are new and changing, but agile development means that it’s OK and even productive to keep pace with this and feed back into community consultation.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/FAIR Repo - eResearch Presentation/Slide19.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;In the last couple of months Marco La Rosa, an independent developer working for PARADISEC, has ported 10,000 data and collection items into RO-Crate format, AND built a portal which can display them. This means that ANY repository with a similar structure Items in Collections could easily re-use the code and the viewers for various file types.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/FAIR Repo - eResearch Presentation/Slide20.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;The Mitchell Collection - digitised public domain books with detailed metadata in METS and specialised OCR standards. We spend a day at the State Library and were able to successfully extract books into directories of JPEGs and metadata, package these using RO-Crate and start building an OCFL repository.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/FAIR Repo - eResearch Presentation/Slide21.png' alt='Acknowledgements and links
UTS: Moises Sacal Bonequi
PARADISEC: Marco De La Rosa, Nick Thieberger
State Library of New South Wales: Euwe Ermita
&lt;p&gt;https://ocfl.io/
https://researchobject.github.io/ro-crate/
https://code.research.uts.edu.au/eresearch/solr-catalog/
https://github.com/UTS-eResearch/ocfl-js
https://github.com/UTS-eResearch/ocfl-nginx
Docker: mikelynch/nginx-ocfl&lt;/p&gt;
&lt;p&gt;' title='Acknowledgements and links
UTS: Moises Sacal Bonequi
PARADISEC: Marco De La Rosa, Nick Thieberger
State Library of New South Wales: Euwe Ermita&lt;/p&gt;
&lt;p&gt;https://ocfl.io/
https://researchobject.github.io/ro-crate/
https://code.research.uts.edu.au/eresearch/solr-catalog/
https://github.com/UTS-eResearch/ocfl-js
https://github.com/UTS-eResearch/ocfl-nginx
Docker: mikelynch/nginx-ocfl&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;p&gt;&lt;a rel="license" href="http://creativecommons.org/licenses/by/3.0/au/"&gt;&lt;img alt="Creative Commons Licence" style="border-width:0" src="https://i.creativecommons.org/l/by/3.0/au/88x31.png" /&gt;&lt;/a&gt;&lt;br /&gt;This work is licensed under a &lt;a rel="license" href="http://creativecommons.org/licenses/by/3.0/au/"&gt;Creative Commons Attribution 3.0 Australia License&lt;/a&gt;.&lt;/p&gt;
</content><category term="Repositories"></category></entry><entry><title>Meet RO-Crate</title><link href="/2019/11/05/RO-Crate%20eResearch%20Australasia%202019.htm" rel="alternate"></link><published>2019-11-05T00:00:00+11:00</published><updated>2019-11-05T00:00:00+11:00</updated><author><name>Peter Sefton</name></author><id>tag:None,2019-11-05:/2019/11/05/RO-Crate eResearch Australasia 2019.htm</id><summary type="html">&lt;p&gt;By Peter Sefton&lt;/p&gt;
&lt;p&gt;This presentation was given by Peter Sefton at the eResearch Australasia 2019 Conference in Brisbane, on the 24th of October 2019.&lt;/p&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide01.png' alt='Meet RO-Crate
&lt;p&gt;' title='Meet RO-Crate&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;This presentation is part of a series of talks delivered here at eResearch Australasia - so it won’t go back over all of the detail already …&lt;/p&gt;&lt;/section&gt;</summary><content type="html">&lt;p&gt;By Peter Sefton&lt;/p&gt;
&lt;p&gt;This presentation was given by Peter Sefton at the eResearch Australasia 2019 Conference in Brisbane, on the 24th of October 2019.&lt;/p&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide01.png' alt='Meet RO-Crate
&lt;p&gt;' title='Meet RO-Crate&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;This presentation is part of a series of talks delivered here at eResearch Australasia - so it won’t go back over all of the detail already covered - see the &lt;a href="http://ptsefton.com/2017/10/19/datacrate.htm"&gt;introduction of datacrate in 2017&lt;/a&gt; and and the &lt;a href="http://ptsefton.com/2018/10/29/sefton-ro2018.htm"&gt;2018 update&lt;/a&gt;. The standard formerly known as DataCrate has been subsumed into a new standard called Research Object Crate - RO-Crate for short.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide02.png' alt='
&lt;p&gt;Eoghan Ó Carragáin https://orcid.org/0000-0001-8131-2150 (chair)
Peter Sefton https://orcid.org/0000-0002-3545-944X (co-chair)
Stian Soiland-Reyes https://orcid.org/0000-0001-9842-9718 (co-chair)
Oscar Corcho https://orcid.org/0000-0002-9260-0753
Daniel Garijo https://orcid.org/0000-0003-0454-7145
Raul Palma https://orcid.org/0000-0003-4289-4922
Frederik Coppens https://orcid.org/0000-0001-6565-5145
Carole Goble https://orcid.org/0000-0003-1219-2137
José María Fernández https://orcid.org/0000-0002-4806-5140
Kyle Chard https://orcid.org/0000-0002-7370-4805
Jose Manuel Gomez-Perez https://orcid.org/0000-0002-5491-6431
Michael R Crusoe https://orcid.org/0000-0002-2961-9670
Ignacio Eguinoa https://orcid.org/0000-0002-6190-122X
Nick Juty https://orcid.org/0000-0002-2036-8350
Kristi Holmes https://orcid.org/0000-0001-8420-5254
Jason A. Clark https://orcid.org/0000-0002-3588-6257
Salvador Capella-Gutierrez https://orcid.org/0000-0002-0309-604X
Alasdair J. G. Gray https://orcid.org/0000-0002-5711-4872
Stuart Owen https://orcid.org/0000-0003-2130-0865
Alan R Williams https://orcid.org/0000-0003-3156-2105
' title='&lt;/p&gt;
&lt;p&gt;Eoghan Ó Carragáin https://orcid.org/0000-0001-8131-2150 (chair)
Peter Sefton https://orcid.org/0000-0002-3545-944X (co-chair)
Stian Soiland-Reyes https://orcid.org/0000-0001-9842-9718 (co-chair)
Oscar Corcho https://orcid.org/0000-0002-9260-0753
Daniel Garijo https://orcid.org/0000-0003-0454-7145
Raul Palma https://orcid.org/0000-0003-4289-4922
Frederik Coppens https://orcid.org/0000-0001-6565-5145
Carole Goble https://orcid.org/0000-0003-1219-2137
José María Fernández https://orcid.org/0000-0002-4806-5140
Kyle Chard https://orcid.org/0000-0002-7370-4805
Jose Manuel Gomez-Perez https://orcid.org/0000-0002-5491-6431
Michael R Crusoe https://orcid.org/0000-0002-2961-9670
Ignacio Eguinoa https://orcid.org/0000-0002-6190-122X
Nick Juty https://orcid.org/0000-0002-2036-8350
Kristi Holmes https://orcid.org/0000-0001-8420-5254
Jason A. Clark https://orcid.org/0000-0002-3588-6257
Salvador Capella-Gutierrez https://orcid.org/0000-0002-0309-604X
Alasdair J. G. Gray https://orcid.org/0000-0002-5711-4872
Stuart Owen https://orcid.org/0000-0003-2130-0865
Alan R Williams https://orcid.org/0000-0003-3156-2105
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;This is a recent snapshot of the makeup of the current RO-Crate team- compiled by Stian.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide03.png' alt='What is RO-Crate?
&lt;p&gt;RO-Crate is a community effort to establish a lightweight approach to packaging research data with their metadata. It is based on schema.org annotations in JSON-LD, and aims to make best-practice in formal metadata description accessible and practical for use in a wider variety of situations, from an individual researcher working with a folder of data, to large data-intensive computational research environments.&lt;/p&gt;
&lt;p&gt;' title='What is RO-Crate?&lt;/p&gt;
&lt;p&gt;RO-Crate is a community effort to establish a lightweight approach to packaging research data with their metadata. It is based on schema.org annotations in JSON-LD, and aims to make best-practice in formal metadata description accessible and practical for use in a wider variety of situations, from an individual researcher working with a folder of data, to large data-intensive computational research environments.&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;The website says:
RO-Crate is a community effort to establish a lightweight approach to packaging research data with their metadata. It is based on schema.org annotations in JSON-LD, and aims to make best-practice in formal metadata description accessible and practical for use in a wider variety of situations, from an individual researcher working with a folder of data, to large data-intensive computational research environments.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide04.png' alt='2017-06-16 Cry for help! Cameron Neylon: As a researcher... 
2017-07-02 Research Data Crate started
2017-10-12 DataCrate 0.1
2018-03-22 DataCrate 0.2
2018-03-22 RDA BoF: Approaches to Research Data Packaging
2018-08-06 DataCrate 0.3
2018-09-11 Calcyte 0.3.0
2018-09-27 DataCrate 1.0
2018-10-02 npm install calcyte@1.0.0
2018-10-29 Workshop on Research Object RO2018
2019-02-13 RO Lite 0.1
2019-03-28 First RO-Lite community call
2019-05-02 RO-Crate use case gathering
2019-05-30 Google Docs-mode
2019-06-07 Open Repositories workshop: Research Data Packaging
2019-08-23 npm install calcyte@1.0.6
2019-09-24 Workshop on Research Object RO2019
2019-09-12 RO-Crate 0.2
2019-11-?? RO-Crate 1.0
' title='2017-06-16 Cry for help! Cameron Neylon: As a researcher... 
2017-07-02 Research Data Crate started
2017-10-12 DataCrate 0.1
2018-03-22 DataCrate 0.2
2018-03-22 RDA BoF: Approaches to Research Data Packaging
2018-08-06 DataCrate 0.3
2018-09-11 Calcyte 0.3.0
2018-09-27 DataCrate 1.0
2018-10-02 npm install calcyte@1.0.0
2018-10-29 Workshop on Research Object RO2018
2019-02-13 RO Lite 0.1
2019-03-28 First RO-Lite community call
2019-05-02 RO-Crate use case gathering
2019-05-30 Google Docs-mode
2019-06-07 Open Repositories workshop: Research Data Packaging
2019-08-23 npm install calcyte@1.0.6
2019-09-24 Workshop on Research Object RO2019
2019-09-12 RO-Crate 0.2
2019-11-?? RO-Crate 1.0
' border='1'  width='85%'/&gt;
&lt;p&gt;This is a timeline for the merging of the &lt;a href="http://www.researchobject.org/"&gt;Research Object&lt;/a&gt; packaging work with DataCrate - again compiled by Stian. While our DataCrate work was driven by practical concerns and a desire to describe research data with high-quality metadata Research Object shared those concerns but with more of a focus on reproducibility and detailed provenance for research data.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide05.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;This is what an RO-Crate looks like if you open the HTML file that’s in the root directory (or you see one on the web).&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide06.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;This is the &lt;a href="https://researchobject.github.io/ro-crate/"&gt;home page for RO-Crate&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide07.png' alt='💒
' title='💒
' border='1'  width='85%'/&gt;
&lt;p&gt;Where did RO-Crate come from?
RO-Crate is the marriage of &lt;a href="https://www.researchobject.org/"&gt;Research Objects&lt;/a&gt; with &lt;a href="https://github.com/UTS-eResearch/datacrate"&gt;DataCrate&lt;/a&gt;. It aims to build on their respective strengths, but also to draw on lessons learned from those projects and similar research data packaging efforts. For more details, see &lt;a href="https://researchobject.github.io/ro-crate/background.html"&gt;background&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide08.png' alt='👨‍⚕️ Man Health Worker 👩‍⚕️ Woman Health Worker 👨‍🎓 Man Student 👩‍🎓 Woman Student 👨‍🏫 Man Teacher 👩‍🏫 Woman Teacher 👨‍⚖️ Man Judge 👩‍⚖️ Woman Judge 👨‍🌾 Man Farmer 👩‍🌾 Woman Farmer 👨‍🍳 Man Cook 👩‍🍳 Woman Cook 👨‍🔧 Man Mechanic 👩‍🔧 Woman Mechanic 👨‍🏭 Man Factory Worker 👩‍🏭 Woman Factory Worker 👨‍💼 Man Office Worker 👩‍💼 Woman Office Worker 👨‍🔬 Man Scientist 👩‍🔬 Woman Scientist 👨‍💻 Man Technologist 👩‍💻 Woman Technologist 👨‍🎤 Man Singer 👩‍🎤 Woman Singer 👨‍🎨 Man Artist 👩‍🎨 Woman Artist 👨‍✈️ Man Pilot 👩‍✈️ Woman Pilot 👨‍🚀 Man Astronaut 👩‍🚀 Woman Astronaut 👨‍🚒 Man Firefighter 👩‍🚒 Woman Firefighter 👮 Police Officer 👮‍♂️ Man Police Officer 👮‍♀️ Woman Police Officer 🕵 Detective 🕵️‍♂️ Man Detective 🕵️‍♀️ Woman Detective 💂 Guard 💂‍♂️ Man Guard 💂‍♀️ Woman Guard 👷 Construction Worker 👷‍♂️ Man Construction Worker 👷‍♀️ Woman Construction Worker 🤴 Prince 👸 Princess 👳 Person Wearing Turban 👳‍♂️ Man Wearing Turban 👳‍♀️ Woman Wearing Turban 👲 Man With Skullcap 🧕 Woman With Headscarf 🤵 Man in Tuxedo 👰 Bride With Veil 🤰 Pregnant Woman 🤱 Breast-Feeding 👼 Baby Angel 🎅 Santa Claus 🤶 Mrs. Claus 
' title='👨‍⚕️ Man Health Worker 👩‍⚕️ Woman Health Worker 👨‍🎓 Man Student 👩‍🎓 Woman Student 👨‍🏫 Man Teacher 👩‍🏫 Woman Teacher 👨‍⚖️ Man Judge 👩‍⚖️ Woman Judge 👨‍🌾 Man Farmer 👩‍🌾 Woman Farmer 👨‍🍳 Man Cook 👩‍🍳 Woman Cook 👨‍🔧 Man Mechanic 👩‍🔧 Woman Mechanic 👨‍🏭 Man Factory Worker 👩‍🏭 Woman Factory Worker 👨‍💼 Man Office Worker 👩‍💼 Woman Office Worker 👨‍🔬 Man Scientist 👩‍🔬 Woman Scientist 👨‍💻 Man Technologist 👩‍💻 Woman Technologist 👨‍🎤 Man Singer 👩‍🎤 Woman Singer 👨‍🎨 Man Artist 👩‍🎨 Woman Artist 👨‍✈️ Man Pilot 👩‍✈️ Woman Pilot 👨‍🚀 Man Astronaut 👩‍🚀 Woman Astronaut 👨‍🚒 Man Firefighter 👩‍🚒 Woman Firefighter 👮 Police Officer 👮‍♂️ Man Police Officer 👮‍♀️ Woman Police Officer 🕵 Detective 🕵️‍♂️ Man Detective 🕵️‍♀️ Woman Detective 💂 Guard 💂‍♂️ Man Guard 💂‍♀️ Woman Guard 👷 Construction Worker 👷‍♂️ Man Construction Worker 👷‍♀️ Woman Construction Worker 🤴 Prince 👸 Princess 👳 Person Wearing Turban 👳‍♂️ Man Wearing Turban 👳‍♀️ Woman Wearing Turban 👲 Man With Skullcap 🧕 Woman With Headscarf 🤵 Man in Tuxedo 👰 Bride With Veil 🤰 Pregnant Woman 🤱 Breast-Feeding 👼 Baby Angel 🎅 Santa Claus 🤶 Mrs. Claus 
' border='1'  width='85%'/&gt;
&lt;h3&gt;Who is it for?&lt;/h3&gt;
&lt;p&gt;The RO-Crate effort brings together practitioners from very different backgrounds, and with different motivations and use-cases. Among our core target users are: a) research engaged with computation and data-intensive, wokflow-driven analysis; b) digital repository managers and infrastructure providers; c) individual researchers looking for a straight-forward tool or how-to guide to “FAIRify” their data; d) data stewards supporting research projects in creating and curating datasets.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide09.png' alt='🌏🌍
' title='🌏🌍
' border='1'  width='85%'/&gt;
&lt;p&gt;RO-Crate is a collaboration between people all over the world, but the Editors are from Cork, Manchester and Katoomba
Version one of the standard will be out in by Summer.
But which summer? Standard reference points are important. Standards are important.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide10.png' alt='' title='' border='1'  width='85%'/&gt;
&lt;p&gt;Which brings us the benefits of Standards. Without this standardised date format chaos would reign. What if that date had been written 05/08 or 08/05 - someone might end up eating food from May in August, or worse, eating last August’s food in May.&lt;/p&gt;
&lt;p&gt;Anyway, If you find a partner who’ll adopt the ISO 8601 data standard then ...&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide11.png' alt='
' title='
' border='1'  width='85%'/&gt;
&lt;p&gt;… you should marry them.&lt;/p&gt;
&lt;p&gt;Like how we married the Research Object and DataCrate - we bonded over standardisation.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide12.png' alt='title
' title='title
' border='1'  width='85%'/&gt;
&lt;p&gt;Let’s explore standards a bit more. Iif you see this in metadata - what does it mean?&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide13.png' alt='
' title='
' border='1'  width='85%'/&gt;
&lt;p&gt;Is it &lt;a href="https://www.dublincore.org/specifications/dublin-core/dcmi-terms/terms/title/"&gt;a name given to the resource&lt;/a&gt;? URI: https://www.dublincore.org/specifications/dublin-core/dcmi-terms/terms/title/&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide14.png' alt='
' title='
' border='1'  width='85%'/&gt;
&lt;p&gt;An &lt;a href="http://xmlns.com/foaf/spec/#term_title"&gt;honorific like Ms, or Dr&lt;/a&gt;? As it would be in the FOAF ontology.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide15.png' alt='
' title='
' border='1'  width='85%'/&gt;
&lt;p&gt;Or &lt;a href="https://schema.org/title"&gt;a very specific meaning relating to job titles&lt;/a&gt;? As in Schema.org.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide16.png' alt='
http://schema.org/name
' title='
http://schema.org/name
' border='1'  width='85%'/&gt;
&lt;p&gt;In RO-Crate - there’s an HTML page which ships with each dataset that allows you to browse the object in as much detail as the author described it and we are careful to avoid ambiguity by adding help links to each metadata term so you see the definition.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide17.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;Just wanted to shout out to ResearchGraph - led by Amir Aryani at Swinburne Uni - they are also using schema.org.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide18.png' alt='
&lt;p&gt;🖥️
👩🏾‍🔬
' title='&lt;/p&gt;
&lt;p&gt;🖥️
👩🏾‍🔬
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;RO-Crates ship with two files, a human readable one and a machine readable JSON file. The two views (human and machine) of the data are equivalent - in fact the HTML version is generated from the JSON-LD version, via the DataCrate nodejs library.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide19.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;And here’s an automatically generated diagram extracted from the sample DataCrate showing how two images were created. The first result was an image file taken by me (as an agent) using two instruments (my camera and lens), of a place (the object: Catalina park in Katoomba). A sepia toned version was the result of a CreateAction, with the instrument this time being the ImageMagick software. The DataCrate also &lt;a href="https://data.research.uts.edu.au/examples/v1.0/sample/CATALOG_files/5ddf53df/4d0b61c3/d4172208/d5c24816/bad69891/index.html"&gt;contains information&lt;/a&gt; about that CreateAction such as the command used to do the conversion and the version of the software-as-instrument.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;convert -sepia-tone 80% test_data/sample/pics/2017-06-11\ 12.56.14.jpg test_data/sample/pics/sepia_fence.jpg&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;This way of representing file provenance is Action-centred - the focus is on the action that creates a file, rather than the more usual metadata approach of having the file at the centre with properties for “Author” and the like. The action-based approach is MUCH more flexible as it can model the contribution of multiple agents and instruments separately at the expense of being somewhat counter-intuitive to those of us who are used to a library-card approach to metadata where the work is at the centre and has simple properties.&lt;/p&gt;
&lt;p&gt;There was a question after this presentation about whether I had the arrows in this diagram pointing in the right direction. Yes, I do! The convention here is the standard way of representing a &lt;a href="https://en.wikipedia.org/wiki/Semantic_triple"&gt;subject-predicate-object semantic triple&lt;/a&gt; with the subject as the source of the arrow, the predicate (in this case Schem.org property) as a label, and the pointy end pointing at the object.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide20.png' alt='🐥
&lt;p&gt;' title='🐥&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;What’s new / developing at the moment in the RO-Crate world? I will illustrate by looking at recent activity on our Github project.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide21.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;We’re working on ways to &lt;a href="https://github.com/ResearchObject/ro-crate/issues/27"&gt;describe not just files, but the CONTENTS of files&lt;/a&gt; - using properties like &lt;a href="https://schema.org/variableMeasured"&gt;variableMeasured&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide22.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;We have a way to &lt;a href="https://github.com/ResearchObject/ro-crate/blob/master/docs/0.3-DRAFT/index.md#workflows-and-scripts"&gt;describe a workflow&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide23.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;and actions that can be performed on data such as firing up a computational environment to re-run the workflow.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide24.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;You too can add Use Cases like &lt;a href="https://github.com/ResearchObject/ro-crate/issues/39"&gt;this one about software containers&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide25.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;Breakig news: In the last couple of months Marco La Rosa, an independent developer working for PARADISEC, has ported 10,000 data and collection items into RO-Crate format, AND built a portal which can display them. This means that ANY repository with a similar structure Items in Collections could easily re-use the code and the viewers for various file types.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide26.png' alt='
&lt;p&gt;http://45.113.232.73/paradisec.org.au/NT1/98007
' title='&lt;/p&gt;
&lt;p&gt;http://45.113.232.73/paradisec.org.au/NT1/98007
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;This shows an intralinear transcription where you can play various segments of a recording and see the transcription.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide27.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;The .eaf files in the previous example are produced using ELAN software. Marco has done the groundwork for a system that could work across multiple repositories and for stand-alone RO-Crates - the crate metadata describes the files, and what format they’re in, and the viewer which is an HTML page either served by a repository or possibly just off your hard disk, can use that information to load an appropriate viewer.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RO-Crate eResearch Australasia 2019/Slide28.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;RO-Crate will be released in version 1 in November 2019 - we were aiming for
October, but missed that.&lt;/p&gt;
&lt;p&gt;We will publish the parts that are well-tested and stable, and immediately start
on a new version with bleeding-edge cases.&lt;/p&gt;
&lt;p&gt;We want input from potential users, current and prospective implementers and
help drafting new parts of the spec is welcome.&lt;/p&gt;
&lt;p&gt;You can &lt;a href="https://github.com/ResearchObject/ro-crate/issues/1"&gt;join the team&lt;/a&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;pre&gt;&lt;code&gt;    &amp;lt;a rel=&amp;quot;license&amp;quot; href=&amp;quot;http://creativecommons.org/licenses/by/3.0/au/&amp;quot;&amp;gt;&amp;lt;img alt=&amp;quot;Creative Commons Licence&amp;quot; style=&amp;quot;border-width:0&amp;quot; src=&amp;quot;https://i.creativecommons.org/l/by/3.0/au/88x31.png&amp;quot; /&amp;gt;&amp;lt;/a&amp;gt;&amp;lt;br /&amp;gt;This work is licensed under a &amp;lt;a rel=&amp;quot;license&amp;quot; href=&amp;quot;http://creativecommons.org/licenses/by/3.0/au/&amp;quot;&amp;gt;Creative Commons Attribution 3.0 Australia License&amp;lt;/a&amp;gt;.
&lt;/code&gt;&lt;/pre&gt;
</content><category term="DataCrate"></category></entry><entry><title>Publishing versioned datasets using OCFL and nginx</title><link href="/2019/11/05/eResearch2019_lighting_ocfl_nginx.htm" rel="alternate"></link><published>2019-11-05T00:00:00+11:00</published><updated>2019-11-05T00:00:00+11:00</updated><author><name>Mike Lynch</name></author><id>tag:None,2019-11-05:/2019/11/05/eResearch2019_lighting_ocfl_nginx.htm</id><summary type="html">&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eResearch2019_lighting_ocfl_nginx/Slide1.png' alt='Publishing versioned datasets using OCFL and nginx
Mike Lynch – University of Technology Sydney
' title='Publishing versioned datasets using OCFL and nginx
Mike Lynch – University of Technology Sydney
' border='1'  width='85%'/&gt;
&lt;br /&gt;
ARDC funding - Data and Services Discovery projects - Institutional Role in a Data Commons
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eResearch2019_lighting_ocfl_nginx/Slide2.png' alt='Data repositories
' title='Data repositories
' border='1'  width='85%'/&gt;
&lt;br /&gt;
&lt;p&gt;There are a lot of specialized repository applications, from small (Omeka) to large (Hydra, Fedora), all designed as special-purpose homes for datasets and metadata which provide APIs for getting things in and out.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eResearch2019_lighting_ocfl_nginx/Slide3.png' alt='Data repositories
' title='Data repositories
' border='1'  width='85%'/&gt;
&lt;br /&gt;
&lt;p&gt;Experience has shown that …&lt;/p&gt;&lt;/section&gt;</summary><content type="html">&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eResearch2019_lighting_ocfl_nginx/Slide1.png' alt='Publishing versioned datasets using OCFL and nginx
Mike Lynch – University of Technology Sydney
' title='Publishing versioned datasets using OCFL and nginx
Mike Lynch – University of Technology Sydney
' border='1'  width='85%'/&gt;
&lt;br /&gt;
ARDC funding - Data and Services Discovery projects - Institutional Role in a Data Commons
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eResearch2019_lighting_ocfl_nginx/Slide2.png' alt='Data repositories
' title='Data repositories
' border='1'  width='85%'/&gt;
&lt;br /&gt;
&lt;p&gt;There are a lot of specialized repository applications, from small (Omeka) to large (Hydra, Fedora), all designed as special-purpose homes for datasets and metadata which provide APIs for getting things in and out.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eResearch2019_lighting_ocfl_nginx/Slide3.png' alt='Data repositories
' title='Data repositories
' border='1'  width='85%'/&gt;
&lt;br /&gt;
&lt;p&gt;Experience has shown that these solutions don’t scale. Eventually, an institution will have to store a dataset that’s too big either to get in or out, or to store, and will have to look at a workaround like putting the data on disk and pointing to it from a record in the repository.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eResearch2019_lighting_ocfl_nginx/Slide4.png' alt='Oxford Common File Layout
A standard layout for static, file-based repositories
Human and machine readable (JSON)
Lightweight file-level versioning
' title='Oxford Common File Layout
A standard layout for static, file-based repositories
Human and machine readable (JSON)
Lightweight file-level versioning
' border='1'  width='85%'/&gt;
&lt;br /&gt;
&lt;p&gt;The &lt;a href="https://ocfl.io"&gt;Oxford Common File Layout&lt;/a&gt; (OCFL) is a standard for laying out arbitrary collections of files as structured repositories. It grew out of a push from the repositories community for a repository structure that wasn’t locked in to a particular application, didn’t have scaling problems, and was easy to migrate to and from.&lt;/p&gt;
&lt;p&gt;An OCFL repository is a collection of OCFL Objects, laid out according to a simple standard like &lt;a href="https://confluence.ucop.edu/display/Curation/PairTree"&gt;PairTree&lt;/a&gt;. Objects are immutable and versioned. More details can be found at our FAIR Repositories presentation.&lt;/p&gt;
&lt;p&gt;We’re using OCFL for our data publications repository, and will later use it for an internal data repository with access control.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eResearch2019_lighting_ocfl_nginx/Slide5.png' alt='RO-Crate
Research Objects + DataCrate
JSON-LD metadata
' title='RO-Crate
Research Objects + DataCrate
JSON-LD metadata
' border='1'  width='85%'/&gt;
&lt;br /&gt;
&lt;p&gt;RO-Crate is a standard for describing individual datasets which evolved from two previous standards, DataCrate and Research Objects. Datasets are described using JSON-LD, a standard for building linked data descriptions in JSON. The ontology is Schema.org, which is widely supported in industry, and used by Google’s new dataset search engine.&lt;/p&gt;
&lt;p&gt;A “crated” dataset is a directory with an arbitrary file hierarchy inside it, and an RO-Crate JSON-LD document with contextual metadata (title, description, contributors, licences) and descriptions of some or all of the contents. An RO-Crate doesn’t have to describe every file inside it, as this would be impractical for some datasets.&lt;/p&gt;
&lt;p&gt;An RO-Crate doesn’t even have to have any files other than the RO-Crate files. This allows the system to support metadata-only publication, where the actual data is only available on request.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eResearch2019_lighting_ocfl_nginx/Slide6.png' alt='RO-Crate
Research Objects + DataCrate
JSON-LD metadata
HTML preview
' title='RO-Crate
Research Objects + DataCrate
JSON-LD metadata
HTML preview
' border='1'  width='85%'/&gt;
&lt;br /&gt;
&lt;p&gt;An RO-Crate also contains an HTML document which is generated automatically from the JSON-LD. This provides a human-readable view of the metadata with links to the data payloads. It can be used locally or act as the landing page for an RO-Crate published on a web server.&lt;/p&gt;
&lt;p&gt;The HTML document is rendered through some lightweight JavaScript, but will degrade gracefully for browsers or users who don’t support JavaScript.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eResearch2019_lighting_ocfl_nginx/Slide7.png' alt='nginx and Solr
fast, versioned web endpoints
Solr indexes metadata and licences
licences enforced on both search results and payloads 
' title='nginx and Solr
fast, versioned web endpoints
Solr indexes metadata and licences
licences enforced on both search results and payloads 
' border='1'  width='85%'/&gt;
&lt;br /&gt;
&lt;p&gt;OCFL and RO-Crate are the standards we’re using to lay out the repository. To deliver them, we’re using Solr, an efficient search engine, and nginx, a high-performance open-source web server.&lt;/p&gt;
&lt;p&gt;Metadata from the RO-Crates is indexed into Solr, including licences representing lists of users who are allowed to access the dataset. Solr is also used to provide a discover interface via a lightweight single-page application.&lt;/p&gt;
&lt;p&gt;A simple extension to nginx allows it to resolve incoming URLs to paths in the OCFL repository and serve the appropriate version of the RO-Crate metadata and payload files.&lt;/p&gt;
&lt;p&gt;Before serving a file, the nginx handler looks up the Solr index and checks whether the authenticated user has authority to download it.&lt;/p&gt;
&lt;p&gt;Nginx enforces licences on both the search results and payloads.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eResearch2019_lighting_ocfl_nginx/Slide8.png' alt='Examples
In production for UTS data publication repository
PARADISEC – crosswalking to OCFL and RO-Crate
State Library of NSW – Mitchell collection of public domain books
&lt;p&gt;' title='Examples
In production for UTS data publication repository
PARADISEC – crosswalking to OCFL and RO-Crate
State Library of NSW – Mitchell collection of public domain books&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;br /&gt;
&lt;p&gt;The OCFL/RO-Crate and nginx stack is meant to accommodate datasets from a wide range of disciplines and sources, from small web uploads, existing repositories and data capture.&lt;/p&gt;
&lt;p&gt;As part of the ARDC Data and Services Discovery project, we collaborated with Nick Thieberger and Marco La Rosa of PARADISEC and Euwe Ermita’s team at the State Library of NSW. Both of these institutions have rich digital humanities collections stored in custom repositories behind APIs, and in short day- or two-day workshops we were able to make substantial progress in crosswalking their datasets into OCFL.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eResearch2019_lighting_ocfl_nginx/Slide9.png' alt='Links and acknowledgements
https://ocfl.io/
https://researchobject.github.io/ro-crate/
https://github.com/UTS-eResearch/ocfl-nginx
Docker: mikelynch/nginx-ocfl
&lt;p&gt;This research/project is supported by the Australian Research Data Commons (ARDC). The ARDC is enabled by NCRIS.
' title='Links and acknowledgements
https://ocfl.io/
https://researchobject.github.io/ro-crate/
https://github.com/UTS-eResearch/ocfl-nginx
Docker: mikelynch/nginx-ocfl&lt;/p&gt;
&lt;p&gt;This research/project is supported by the Australian Research Data Commons (ARDC). The ARDC is enabled by NCRIS.
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://ocfl.io/"&gt;Oxford Common File Layout&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://researchobject.github.io/ro-crate/"&gt;RO-Crate&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/UTS-eResearch/ocfl-nginx"&gt;ocfl-nginx&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.docker.com/u/mikelynch/repository/docker/mikelynch/nginx-ocfl"&gt;a Docker image for ocfl-nginx&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;p&gt;&lt;a rel="license" href="http://creativecommons.org/licenses/by/3.0/au/"&gt;&lt;img alt="Creative Commons Licence" style="border-width:0" src="https://i.creativecommons.org/l/by/3.0/au/88x31.png" /&gt;&lt;/a&gt;&lt;br /&gt;This work is licensed under a &lt;a rel="license" href="http://creativecommons.org/licenses/by/3.0/au/"&gt;Creative Commons Attribution 3.0 Australia License&lt;/a&gt;.&lt;/p&gt;
</content><category term="Repositories"></category></entry><entry><title>eResearch Survey Report</title><link href="/2019/10/01/eresearch-survey.htm" rel="alternate"></link><published>2019-10-01T00:00:00+10:00</published><updated>2019-10-01T00:00:00+10:00</updated><author><name>Sharyn Wise, Weisi Chen</name></author><id>tag:None,2019-10-01:/2019/10/01/eresearch-survey.htm</id><summary type="html">&lt;h1&gt;Overview&lt;/h1&gt;
&lt;p&gt;In order to know whether we are providing what researchers need and thereby to improve our services to researches, we rolled out a short anonymous survey* back in May 2018 that has been open to all UTS researchers, research assistants and HDR students.&lt;/p&gt;
&lt;p&gt;We have received 41 responses so …&lt;/p&gt;</summary><content type="html">&lt;h1&gt;Overview&lt;/h1&gt;
&lt;p&gt;In order to know whether we are providing what researchers need and thereby to improve our services to researches, we rolled out a short anonymous survey* back in May 2018 that has been open to all UTS researchers, research assistants and HDR students.&lt;/p&gt;
&lt;p&gt;We have received 41 responses so far. The two bar charts below show the distribution of their research positions and the distribution of their research fields respectively.&lt;/p&gt;
&lt;p float="left"&gt;
  &lt;img src="/blog/eResearch-Survey/Fig1.png" width="45%" /&gt;
  &lt;img src="/blog/eResearch-Survey/Fig2.png" width="45%" /&gt;
  &lt;figcaption&gt;Left: Distribution of participants' research positions. Right: Distribution of participants' research fields.&lt;/figcaption&gt;
&lt;/p&gt;
&lt;h1&gt;What researchers say&lt;/h1&gt;
&lt;p&gt;In qualitative comments from the survey, we received feedback that:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“Hacky hour is the best thing about UTS. I would love to have more than one session per week if resources allow: Monday and Thursday for example. Sometimes when dealing with a coding problem what I can't solve it seems like Thursday is so far away. Thank you for your support so far.”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;However, we did not meet the needs of all. One suggestion was to:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“Provide services at a time not in business hours for part time students that cannot come to workshops/support services in business hours.” There were also a few grumbles about our HPC/iHPC that are beyond our budget to fix although the staff who run them came in for high praise. Overall the tone was: “You guys do a great job. Keep it up.”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The most commonly voiced criticism was that eResearch services were difficult to locate:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“Information about what services were available, where to go to access these services, regular reviews and updates on these services would be helpful. Currently services like lab archives etc I found out about by lunch room chat.”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We have tried a variety of avenues of communication, such as workshops, Staff Connect, Service Connect, our website, UTS library website, faculty research mailing lists, GRS mailing lists, on-campus screens advertising Hacky Hour, Hacky Hour and even this blog. None of these methods will satisfy everyone. In fact, one comment read “PLEASE STOP SPAMMING US WITH YOUR EMAILS”, illustrating the frustration with which researchers face a deluge of  emails.&lt;/p&gt;
&lt;p&gt;Another participant noted that a solution would be:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“one pathway to lead to accessing advice rather than multiple lines of inquiry to get to Hacky hour as the place for expert sharing of advice in research design using Lime survey and CloudStor etc.”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;eResearch is therefore excited to be involved in &lt;b&gt;ResHub&lt;/b&gt; which, even if it begins simply as a one-stop directory to researcher services, will be meeting a widely-voiced need.&lt;/p&gt;
&lt;h1&gt;Usefulness and Delivery of eResearch Services&lt;/h1&gt;
&lt;p&gt;In general, the participants find many services useful for their research, and think the delivery of the services have been satisfactory.&lt;/p&gt;
&lt;p&gt;Specifically, for each service we asked researchers to provide feedback on, the chart on the left shows to what extent they think the service is useful for their research, and the chart on the right shows how they think the service is delivered (N/A means they have not used the corresponding service yet).&lt;/p&gt;
&lt;h3&gt;Research data management services (e.g. Stash, Cloudstor, REDcap, eResearch Store, Gitlab, Omero, eNotebooks)&lt;/h3&gt;
&lt;p float="left"&gt;
  &lt;img src="/blog/eResearch-Survey/plot-utility_rdm.png" width="45%" /&gt;
  &lt;img src="/blog/eResearch-Survey/plot-delivery_rdm.png" width="45%" /&gt;
  &lt;figcaption&gt;Left: Usefulness of Research data management services. Right: Delivery quality of Research data management services.&lt;/figcaption&gt;
&lt;/p&gt;
&lt;h3&gt;Other data services (e.g. data archiving, data publication, data visualisation, data arena)&lt;/h3&gt;
&lt;p float="left"&gt;
  &lt;img src="/blog/eResearch-Survey/plot-utility_other_data.png" width="45%" /&gt;
  &lt;img src="/blog/eResearch-Survey/plot-delivery_other_data.png" width="45%" /&gt;
  &lt;figcaption&gt;Left: Usefulness of Other data services. Right: Delivery quality of Other data services.&lt;/figcaption&gt;
&lt;/p&gt;
&lt;h3&gt;Survey platform services (LimeSurvey, REDcap)&lt;/h3&gt;
&lt;p float="left"&gt;
  &lt;img src="/blog/eResearch-Survey/plot-utility_survey.png" width="45%" /&gt;
  &lt;img src="/blog/eResearch-Survey/plot-delivery_survey.png" width="45%" /&gt;
  &lt;figcaption&gt;Left: Usefulness of Survey platform services. Right: Delivery quality of Survey platform services.&lt;/figcaption&gt;
&lt;/p&gt;
&lt;h3&gt;UTS HPC services (e.g. UTS-HPCC, FEIT ARClab)&lt;/h3&gt;
&lt;p float="left"&gt;
  &lt;img src="/blog/eResearch-Survey/plot-utility_uts_hpc.png" width="45%" /&gt;
  &lt;img src="/blog/eResearch-Survey/plot-delivery_uts_hpc.png" width="45%" /&gt;
  &lt;figcaption&gt;Left: Usefulness of UTS HPC services. Right: Delivery quality of UTS HPC services.&lt;/figcaption&gt;
&lt;/p&gt;
&lt;h3&gt;External HPC services (AWS, Nectar, NCI, Intersect)&lt;/h3&gt;
&lt;p float="left"&gt;
  &lt;img src="/blog/eResearch-Survey/plot-utility_ext_hpc.png" width="45%" /&gt;
  &lt;img src="/blog/eResearch-Survey/plot-delivery_ext_hpc.png" width="45%" /&gt;
  &lt;figcaption&gt;Left: Usefulness of External HPC services. Right: Delivery quality of External HPC services.&lt;/figcaption&gt;
&lt;/p&gt;
&lt;h3&gt;eResearch consulting services (e.g. grants/ethics advice, collaboration tools, research data consultation, data security)&lt;/h3&gt;
&lt;p float="left"&gt;
  &lt;img src="/blog/eResearch-Survey/plot-utility_eres_consult.png" width="45%" /&gt;
  &lt;img src="/blog/eResearch-Survey/plot-delivery_eres_consult.png" width="45%" /&gt;
  &lt;figcaption&gt;Left: Usefulness of eResearch consulting services. Right: Delivery quality of eResearch consulting services.&lt;/figcaption&gt;
&lt;/p&gt;
&lt;h3&gt;Training services through Intersect (e.g. REDCap, R, Python, Excel)&lt;/h3&gt;
&lt;p float="left"&gt;
  &lt;img src="/blog/eResearch-Survey/plot-utility_intersect_training.png" width="45%" /&gt;
  &lt;img src="/blog/eResearch-Survey/plot-delivery_intersect_training.png" width="45%" /&gt;
  &lt;figcaption&gt;Left: Usefulness of Intersect training services. Right: Delivery quality of Intersect training services.&lt;/figcaption&gt;
&lt;/p&gt;
&lt;h3&gt;Training through UTS library (e.g. Research Data Management, eNotebooks)&lt;/h3&gt;
&lt;p float="left"&gt;
  &lt;img src="/blog/eResearch-Survey/plot-utility_library_training.png" width="45%" /&gt;
  &lt;img src="/blog/eResearch-Survey/plot-delivery_library_training.png" width="45%" /&gt;
  &lt;figcaption&gt;Left: Usefulness of Library training services. Right: Delivery quality of Library training services.&lt;/figcaption&gt;
&lt;/p&gt;
&lt;h3&gt;&amp;quot;Hacky Hour&amp;quot; researcher support&lt;/h3&gt;
&lt;p float="left"&gt;
  &lt;img src="/blog/eResearch-Survey/plot-utility_hacky_hour.png" width="45%" /&gt;
  &lt;img src="/blog/eResearch-Survey/plot-delivery_hacky_hour.png" width="45%" /&gt;
  &lt;figcaption&gt;Left: Usefulness of Hacky Hour. Right: Delivery quality of Hacky Hour.&lt;/figcaption&gt;
&lt;/p&gt;
&lt;h3&gt;ResBaz Sydney 2017&lt;/h3&gt;
&lt;p float="left"&gt;
  &lt;img src="/blog/eResearch-Survey/plot-utility_resbaz.png" width="45%" /&gt;
  &lt;img src="/blog/eResearch-Survey/plot-delivery_resbaz.png" width="45%" /&gt;
  &lt;figcaption&gt;Left: Usefulness of ResBaz Sydney 2017. Right: Delivery quality of ResBaz Sydney 2017.&lt;/figcaption&gt;
&lt;/p&gt;
&lt;p&gt;*: Study data were collected and managed using REDCap, a secure, web-based software platform designed to support data capture for research studies.&lt;/p&gt;
</content><category term="Blog"></category></entry><entry><title>Roadmap for Genomics Computing Workshop</title><link href="/2019/07/02/genomics_workshop.htm" rel="alternate"></link><published>2019-07-02T00:00:00+10:00</published><updated>2019-07-02T00:00:00+10:00</updated><author><name>Mike Lake</name></author><id>tag:None,2019-07-02:/2019/07/02/genomics_workshop.htm</id><summary type="html">&lt;p&gt;This &lt;a href="https://informatics.sydney.edu.au/genomics_workshop/"&gt;Genomics Workshop&lt;/a&gt;
was held at Sydney University on 3rd June 2019
Pascal Tampubolon and Mike Lake attended the event.
Present were the NSW Chief Scientist and leaders of various Genomics institutes
in Australia.&lt;/p&gt;
&lt;p&gt;The University of Sydney are developing a 1-5 year infrastructure roadmap to
enable excellence in genomics-based …&lt;/p&gt;</summary><content type="html">&lt;p&gt;This &lt;a href="https://informatics.sydney.edu.au/genomics_workshop/"&gt;Genomics Workshop&lt;/a&gt;
was held at Sydney University on 3rd June 2019
Pascal Tampubolon and Mike Lake attended the event.
Present were the NSW Chief Scientist and leaders of various Genomics institutes
in Australia.&lt;/p&gt;
&lt;p&gt;The University of Sydney are developing a 1-5 year infrastructure roadmap to
enable excellence in genomics-based research and a strategy to extend this
across other institutions in Australia.
This workshop showcased some outstanding genomics research being done
Australia. A panel discussion by leaders in the genomics and genomics
infrastructure followed.&lt;/p&gt;
&lt;p&gt;Common and major technical issues raised by attendees were:
requirements for increased space on /scratch on HPCs and long compute
times, problems with I/O bottlenecks in sharing data with
collaborators, and the responsibilities and costs of storing genomics
data. After an ARC grant has finished who stores the data and who pays
for the data to be stored?&lt;/p&gt;
&lt;p&gt;Common and major technical issues raised by attendees were: storage
requirements for research data (compute and archival purposes), long HPC
compute and queue times, I/O bottlenecks in sharing data with collaborators,
and the responsibilities and costs of storing genomics data.&lt;/p&gt;
&lt;p&gt;A number of institutions are &amp;quot;bursting into cloud&amp;quot; for compute and they
want to move the compute to where the data is. Though no one suggested
how this could be done. Some users of the Syd Uni HPC are looking to
migrate pipelines to AWS.&lt;/p&gt;
&lt;p&gt;The ongoing problem of data curation was also raised as well as the
repoducibility of re-running analyses. Using
&lt;a href="http://bioinformatics.mdc-berlin.de/pigx/"&gt;PiGx&lt;/a&gt; for reproducible pipelines
across different infrastructure was mentioned.&lt;/p&gt;
&lt;p&gt;Tony Papenfuss from the Walter and Eliza Hall Institute mentioned that
they were using more GPU cards for processing jobs and that purchasers
of new sequencers are required to specify compute and data storage
requirements prior to purchasing.&lt;/p&gt;
&lt;p&gt;Jon Smillie from NCI told us about the use of Crypt4GH, an encrypted
file format, from the Global Alliance for Genomics and Health (GA4GH).&lt;/p&gt;
&lt;p&gt;Andrew Gilbert, General Manager, Bioplatforms Australia spoke about the
newly formed
&lt;a href="https://www.bioplatforms.com/announcement-pathfinder/"&gt;Australian Bioinformatics Commons&lt;/a&gt;
and specifically drew our attention to the &amp;quot;BioCommons Principles&amp;quot;.&lt;/p&gt;
&lt;p&gt;Attendance was valuable to us in order to meet other researchers and managers
of genomics institutes, to be aware of shared problems and possible future
strategies.&lt;/p&gt;
&lt;p&gt;Mike Lake &amp;amp; Pascal Tampubolon&lt;br /&gt;
eResearch&lt;/p&gt;
</content><category term="Event"></category></entry><entry><title>DataCrate - a progress report on packaging research data for distribution via your repository</title><link href="/2019/07/01/DataCrate-OR2019.htm" rel="alternate"></link><published>2019-07-01T00:00:00+10:00</published><updated>2019-07-01T00:00:00+10:00</updated><author><name>ptsefton</name></author><id>tag:None,2019-07-01:/2019/07/01/DataCrate-OR2019.htm</id><summary type="html">&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide01.png' alt='DataCrate: a progress report on packaging research data for distribution via your repository
Peter Sefton
University of Technology Sydney
&lt;p&gt;' title='DataCrate: a progress report on packaging research data for distribution via your repository
Peter Sefton
University of Technology Sydney&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;This is a talk that I delivered at Open Repositories 2019 in Hamburg Germany, reporting on developments in the DataCrate specification for research data description and packaging. The big news is that &lt;a href="https://github.com/UTS-eResearch/datacrate/blob/master/spec/1.0/data_crate_specification_v1.0.md"&gt;DataCrate&lt;/a&gt; is now part of a broader international effort known as &lt;a href="https://researchobject.github.io/ro-crate/"&gt;RO-Crate&lt;/a&gt;. I spent several hours at the …&lt;/p&gt;&lt;/section&gt;</summary><content type="html">&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide01.png' alt='DataCrate: a progress report on packaging research data for distribution via your repository
Peter Sefton
University of Technology Sydney
&lt;p&gt;' title='DataCrate: a progress report on packaging research data for distribution via your repository
Peter Sefton
University of Technology Sydney&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;This is a talk that I delivered at Open Repositories 2019 in Hamburg Germany, reporting on developments in the DataCrate specification for research data description and packaging. The big news is that &lt;a href="https://github.com/UTS-eResearch/datacrate/blob/master/spec/1.0/data_crate_specification_v1.0.md"&gt;DataCrate&lt;/a&gt; is now part of a broader international effort known as &lt;a href="https://researchobject.github.io/ro-crate/"&gt;RO-Crate&lt;/a&gt;. I spent several hours at the conference working with co-conspirators Stian Soiland-Reyes and Eoghan Ó Carragáin on the first draft of the new spec which we hope to unveil at &lt;a href="https://conference.eresearch.edu.au/"&gt;eResearch Australasia&lt;/a&gt; 2019.&lt;/p&gt;
&lt;p&gt;Eoghan, Stian and I ran a workshop at OR2019 for repository people to talk about the state of the art in Research Data Packaging, and collect use cases - we got lots of useful input from the workshop and the broader conference and had a chance to chat with people working on related standards such as the Oxford Common File Layout (&lt;a href="https://ocfl.io/"&gt;OCFL&lt;/a&gt;) and &lt;a href="https://swordapp.github.io/swordv3/technical-outline.html"&gt;Sword 3&lt;/a&gt; - Neil Jeffries, Andrew Woods, Simeon Warner and my old mate Richard Jones amongst others.&lt;/p&gt;
&lt;p&gt;I also &lt;a href="/2019/07/01/OCLF.htm"&gt;presented work that Mike Lynch, Moises Sacal and I have been doing on OCFL&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;My travel was funded by the University of Technology Sydney.&lt;/p&gt;
&lt;p&gt;Peter Sefton&lt;/p&gt;
&lt;p&gt;University of Technology Sydney, Australia&lt;/p&gt;
&lt;p&gt;DataCrate is a ​specification​ for packaging research data for dissemination and reuse which has been presented at OR before as it developed to its current v1.0 status. This is an update on progress with the specification and tooling. The goals are of the specification are, (a) to maximise the utility of the data for researchers (including the original researchers' 'future selves') - given that a researcher has found a DataCrate package they should be able to tell what it is, how the data may be used and what all the files contain, (b) to enable discovery of the data by exposing metadata as widely as possible to both humans and machines and (c) to enable automated ingest into repositories or catalogues.
DataCrate can express detailed information about which people, instruments and software were involved in capturing or creating data, where they did it and why, as well as how to cite a dataset.
DataCrate draws on other standards (BagIt, JSON-LD, Schema.org) and is designed to be easy to implement.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide02.png' alt='😻
' title='😻
' border='1'  width='85%'/&gt;
&lt;p&gt;When I proposed this update there had been some work going on to merge DataCrate with another standard - Research Object - but we didn’t know what form that would take. This presentation will now let the love-stuck cat out of the bag. I’m going to tell you a love story about two standards.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide03.png' alt='
' title='
' border='1'  width='85%'/&gt;
&lt;p&gt;The story is about DataCrate - which is a Specification for describing and distributing data (any data).&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide04.png' alt='
http://www.researchobject.org/
' title='
http://www.researchobject.org/
' border='1'  width='85%'/&gt;
&lt;p&gt;And &lt;a href="http://www.researchobject.org/"&gt;Research Object&lt;/a&gt; which is also about data packaging but with more emphasis on having a major impact on scholarly communications and practice, including distributing research data with code and workflows in the interests of making it more reproducible.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide05.png' alt='💒
' title='💒
' border='1'  width='85%'/&gt;
&lt;p&gt;There’s going to be a marriage of these two standards.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide06.png' alt='⏪
' title='⏪
' border='1'  width='85%'/&gt;
&lt;p&gt;But before we talk about the upcoming marriage of two specifications, let’s go back over what we’re trying to achieve. I will use examples from DataCrate but the same principles apply with Research Object.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide07.png' alt='Data packaging Functions: S-words 
Strategic: Transform scholarly practice, distribute the stuff that makes research complete, focus on redo-ability and reproducibility (eg Research Object)
Self-documenting: Describe the stuff in a programmer friendly way (eg FD, DataCrate), with a view to making it interoperable (eg DataCrate, RO), try to ensure Reusability (RO) - add human readable HTML (DataCrate, DataSpice)
Self-contained: Bundle stuff (eg Zip, TAR, .dmg, RAR). 
Safe: Make sure the stuff is what it’s supposed to be using checksums (eg BagIt, FD) &amp;amp; Preservation / archival practice
Serialization Syntax: XML (Legacy systems), JSON (FD) Linked Data (RDF, JSON-LD eg RO, DataCrate)
Schemas - which one(s) to use?
' title='Data packaging Functions: S-words 
Strategic: Transform scholarly practice, distribute the stuff that makes research complete, focus on redo-ability and reproducibility (eg Research Object)
Self-documenting: Describe the stuff in a programmer friendly way (eg FD, DataCrate), with a view to making it interoperable (eg DataCrate, RO), try to ensure Reusability (RO) - add human readable HTML (DataCrate, DataSpice)
Self-contained: Bundle stuff (eg Zip, TAR, .dmg, RAR). 
Safe: Make sure the stuff is what it’s supposed to be using checksums (eg BagIt, FD) &amp;amp; Preservation / archival practice
Serialization Syntax: XML (Legacy systems), JSON (FD) Linked Data (RDF, JSON-LD eg RO, DataCrate)
Schemas - which one(s) to use?
' border='1'  width='85%'/&gt;
&lt;p&gt;Here’s a slide from the workshop we ran at the OR conference, setting out some of the functional requirements for a distribution format.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide08.png' alt='
&lt;p&gt;← CLICK HERE TO DOWNLOAD
' title='&lt;/p&gt;
&lt;p&gt;← CLICK HERE TO DOWNLOAD
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;Here’s a screenshot from a sample DataCrate.  It shows the basic metadata for the dataset. This example is online, but the same view is available if you download it as a zip file.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide09.png' alt='
&lt;p&gt;← CLICK HERE
' title='&lt;/p&gt;
&lt;p&gt;← CLICK HERE
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;If you do download the zip then you’ll see these files - the payload data is/are in the &lt;code&gt;/data&lt;/code&gt; directory and the other files are all metadata and fixity checking information. A human can open CATALOG.html and then click around a little website that describes the data, including information down to the file level and about the contents of files.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide10.png' alt='
&lt;p&gt;← Feed this to your computer
' title='&lt;/p&gt;
&lt;p&gt;← Feed this to your computer
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;There’s also machine readable data in JSON-LD format.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide11.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;The JSON-LD is designed to be easily consumed by programs that can do things with the data - and is compatible ith Google’s DataSet search.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide12.png' alt='
&lt;p&gt;🖥️
👩🏾‍🔬
' title='&lt;/p&gt;
&lt;p&gt;🖥️
👩🏾‍🔬
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;The two views (human and machine) of the data are equivalent - in fact the HTML version is generated from the JSON-LD version using a tool called &lt;a href="https://code.research.uts.edu.au/eresearch/CalcyteJS"&gt;CalcyteJS&lt;/a&gt;.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide13.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;Here’s a screenshot of an HTML page about one of the files in the sample dataset - including detailed EXIF technical metdata which from INSIDE the file.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide14.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;And here’s an automatically generated diagram extracted from the sample DataCrate showing how two images were created. The first result was an image file taken by me (as an agent) using two instruments (my camera and lens), of a place (the object: Catalina park in Katoomba). A sepia toned version was the result of a CreateAction, with the instrument this time being the ImageMagick software. The DataCrate also &lt;a href="https://data.research.uts.edu.au/examples/v1.0/sample/CATALOG_files/5ddf53df/4d0b61c3/d4172208/d5c24816/bad69891/index.html"&gt;contains information&lt;/a&gt; about that CreateAction such as the command used to do the conversion and the version of the software-as-instrument.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;convert -sepia-tone 80% test_data/sample/pics/2017-06-11\ 12.56.14.jpg test_data/sample/pics/sepia_fence.jpg&lt;/code&gt;&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide15.png' alt='URIs as names for things
&lt;p&gt;' title='URIs as names for things&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;Because DataCrate is based on JSON-LD, and linked data principles, each term used can have a link to its definition, eg: &lt;a href="https://schema.org/CreateAction"&gt;https://schema.org/CreateAction&lt;/a&gt; so DataCrates are self-documenting.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide16.png' alt='But (🔧🔨🔩🔪🔬)ing is limited:
JSON-LD →  🙅
⛓💾 →  👶 
&lt;p&gt;' title='But (🔧🔨🔩🔪🔬)ing is limited:
JSON-LD →  🙅
⛓💾 →  👶&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;BUT. tools for humans to generate linked-data are under-developed.&lt;/p&gt;
&lt;p&gt;JSON-LD tooling is limited to high-level transformations and there are no easily available libraries for  Research Software Engineers to do simple stuff like traversing graphs or looking up context keys.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide17.png' alt='⛓💾’s still too much like
&lt;p&gt;🚀⚗️
👩🏽‍🚀👷‍🏗️🏢
🛐
' title='⛓💾’s still too much like&lt;/p&gt;
&lt;p&gt;🚀⚗️
👩🏽‍🚀👷‍🏗️🏢
🛐
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;Linked data is still too much like rocket science because of all the architecture astronauts:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.joelonsoftware.com/2001/04/21/dont-let-architecture-astronauts-scare-you/"&gt;Joel Splosky&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;These are the people I call Architecture Astronauts. It’s very hard to get them to write code or design programs, because they won’t stop thinking about Architecture. They’re astronauts because they are above the oxygen level, I don’t know how they’re breathing. They tend to work for really big companies that can afford to have lots of unproductive people with really advanced degrees that don’t contribute to the bottom line.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Also, as with many technologies, RDF can be a bit of a religious matter.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide18.png' alt='💒
' title='💒
' border='1'  width='85%'/&gt;
&lt;p&gt;Speaking of relgion, back to the wedding of the decade ...&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide19.png' alt='
' title='
' border='1'  width='85%'/&gt;
&lt;p&gt;The new entity is called RO-Crate.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide20.png' alt='
' title='
' border='1'  width='85%'/&gt;
&lt;p&gt;We are collecting use-cases in the github repo. We collected several at our OR2019 workshop, and the the repository is still open for business.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide21.png' alt='
' title='
' border='1'  width='85%'/&gt;
&lt;p&gt;Members of the OR-Crate project (anyone can join by following the directions at the repo) are reviewing the spec - which is based on a set of examples - and we expect to have a simplified, clearer specification draft by the end of July, and to launch an Alpha version in October at &lt;a href="https://conference.eresearch.edu.au/"&gt;eResearch Australasia&lt;/a&gt; (subject to getting  presentation accepted :).&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide22.png' alt='What are we working on
Guidance on how to generate schema.org based metadata for datasets:
Who created it, what is the subject matter? what are these files? where was it made? What format is each file in?  where is about, why was it made? (ie what funded project is it part of?)
How does RO-Crate work with BagIt, OCFL, Zip et al
How can I re-run this analysis? Via a workflow?
How can I do stuff with this dataset?
&lt;p&gt;' title='What are we working on
Guidance on how to generate schema.org based metadata for datasets:
Who created it, what is the subject matter? what are these files? where was it made? What format is each file in?  where is about, why was it made? (ie what funded project is it part of?)
How does RO-Crate work with BagIt, OCFL, Zip et al
How can I re-run this analysis? Via a workflow?
How can I do stuff with this dataset?&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;We’re working on merging the DataCrate simple-to-implement approach with the bigger vision of Research Object.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/DataCrate-OR2019/Slide23.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;Join us!&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
</content><category term="DataCrate"></category></entry><entry><title>Implementation of a Research Data Repository using the Oxford Common File Layout standard at the University of Technology Sydney</title><link href="/2019/07/01/OCLF.htm" rel="alternate"></link><published>2019-07-01T00:00:00+10:00</published><updated>2019-07-01T00:00:00+10:00</updated><author><name>ptsefton</name></author><id>tag:None,2019-07-01:/2019/07/01/OCLF.htm</id><summary type="html">&lt;p&gt;This is a presentation by Michael Lynch and Peter Sefton, delivered by Peter Sefton at Open Repositories 2019 in Hamburg. My travel was funded by the University of Technology Sydney.&lt;/p&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide01.png' alt='Implementation of a Research Data Repository using the Oxford Common File Layout standard at the University of Technology Sydney
Michael Lynch, Peter Sefton
University of Technology Sydney, Australia
&lt;p&gt;' title='Implementation of a Research Data Repository using the Oxford Common File Layout standard at the University of Technology Sydney
Michael Lynch, Peter Sefton
University of Technology Sydney, Australia&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;This presentation will discuss an implementation of the Oxford Common File Layout (OCFL) in an institutional research data repository at …&lt;/p&gt;&lt;/section&gt;</summary><content type="html">&lt;p&gt;This is a presentation by Michael Lynch and Peter Sefton, delivered by Peter Sefton at Open Repositories 2019 in Hamburg. My travel was funded by the University of Technology Sydney.&lt;/p&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide01.png' alt='Implementation of a Research Data Repository using the Oxford Common File Layout standard at the University of Technology Sydney
Michael Lynch, Peter Sefton
University of Technology Sydney, Australia
&lt;p&gt;' title='Implementation of a Research Data Repository using the Oxford Common File Layout standard at the University of Technology Sydney
Michael Lynch, Peter Sefton
University of Technology Sydney, Australia&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;This presentation will discuss an implementation of the Oxford Common File Layout (OCFL) in an institutional research data repository at the University of Technology. We will describe our system in terms of the conference themes of Open and Sustainable and with reference to the needs and user experience of data depositors and users (many have large data, and or large numbers of files). OCFL, which is an approach to repository implementation based on static data, was developed to deal with a number of issues with “traditional” repository design, many of which are particularly acute when dealing with research data. We will cover how this meets our user and institutional needs and is a sustainable approach to managing data.&lt;/p&gt;
&lt;p&gt;This was presented by Peter Sefton - so the “I” throughout is him.&lt;/p&gt;
&lt;p&gt;This presentation was in &lt;a href="https://www.conftool.net/or2019/index.php?page=browseSessions&amp;amp;form_session=380&amp;amp;presentations=show"&gt;a session about OCFL&lt;/a&gt; so we didn’t need to explain the standard in detail.&lt;/p&gt;
&lt;p&gt;Here’s what they say on the &lt;a href="https://ocfl.io/"&gt;OCFL site&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This Oxford Common File Layout (OCFL) specification describes an application-independent approach to the storage of digital information in a structured, transparent, and predictable manner. It is designed to promote long-term object management best practices within digital repositories.&lt;/p&gt;
&lt;p&gt;Specifically, the benefits of the OCFL include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Completeness, so that a repository can be rebuilt from the files it stores&lt;/li&gt;
&lt;li&gt;Parsability, both by humans and machines, to ensure content can be understood in the absence of original software&lt;/li&gt;
&lt;li&gt;Robustness against errors, corruption, and migration between storage technologies&lt;/li&gt;
&lt;li&gt;Versioning, so repositories can make changes to objects allowing their history to persist&lt;/li&gt;
&lt;li&gt;Storage diversity, to ensure content can be stored on diverse storage infrastructures including conventional filesystems and cloud object stores&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide02.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;Also Moises Sacal worked with us on this.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide03.png' alt="&amp;gt; repo = new Repository()
&lt;p&gt;&amp;quot; title='&amp;gt; repo = new Repository()&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;Hello and welcome to this presentation about OCFL at UTS. Don’t be alarmed - yes there’s javascript code on the screen, but there will not be a test!&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide04.png' alt='&amp;gt; repo = new Repository()
&lt;p&gt;Repository { ocflVersion: '1.0', objectIdToPath: [Function] }&lt;/p&gt;
&lt;p&gt;R&lt;/p&gt;
&lt;p&gt;' title='&amp;gt; repo = new Repository()&lt;/p&gt;
&lt;p&gt;Repository { ocflVersion: '1.0', objectIdToPath: [Function] }&lt;/p&gt;
&lt;p&gt;R&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;To me, being able to type &lt;code&gt;npm install ocfl&lt;/code&gt; then  instantiate a repository in an interactive shell is quite amazing - when I first worked with repositories, from about 2006, installing a repository was a big job, there were usually lots of prerequisites and installation (for example see this guide to installing UQ’s &lt;a href="https://web.archive.org/web/20130419100148/http://www.rubric.edu.au/techreports/Fez_Full_Installation_Procedure_LatestNov06.htm"&gt;Fez Fedora repository&lt;/a&gt; could be very finicky, but then in those days the repositories were “full stack” services whereas OCFL is just about the heart and soul of a repository, the storage.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide05.png' alt='What you’re going to get
An explanation of OCFL ...
... from a UTS perspective.
&lt;p&gt;A confession.&lt;/p&gt;
&lt;p&gt;And some philosophy:
“What is a ‘repository’ anyway”
' title='What you’re going to get
An explanation of OCFL ...
... from a UTS perspective.&lt;/p&gt;
&lt;p&gt;A confession.&lt;/p&gt;
&lt;p&gt;And some philosophy:
“What is a ‘repository’ anyway”
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;This talk is not just about OCFL, it reflects on the role of repositories and how we should view them in our infrastructure.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide06.png' alt='Confession TODO
http://www.rubric.edu.au/techreports/migration_toolkit.htm
' title='Confession TODO
http://www.rubric.edu.au/techreports/migration_toolkit.htm
' border='1'  width='85%'/&gt;
&lt;p&gt;My first “Confession”.&lt;/p&gt;
&lt;p&gt;Many, many years ago I was the technical manager for the RUBRIC project -
Regional Universities Building Research Infrastructure Collaboratively. This
project helped a group of about nine partner unis get their first publications
repositories up and running and our team morphed into the the national
repository support service for Australia. As part of this work we built a
variety of migration tools to assist people in getting data from existing
systems such as endnote, or ingesting MARC (library) metadata into a repository.
To do this we used the DSpace archive format, a simple directory-plus-metadata
file format that was not unlike OCFL.&lt;/p&gt;
&lt;p&gt;It occurred to me at the time that we could build a simple static repository
system that used something like the DSpace archive format, with separate ingest
workflows, and build a portal using something like Apace Solr (though at that
stage there really wasn’t anything else like Apache Solr).&lt;/p&gt;
&lt;p&gt;I didn’t follow up on those thoughts, as I thought they
were a bit heretical - the monolithic repository architecture was the orthodoxy,
though I did flirt with this idea in this article about discovery portals and
persistent identifiers. Anyway, it seems like those &lt;em&gt;wrong&lt;/em&gt; thoughts are
something I can talk about, now that we have OCFL.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide07.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Confession 2&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When I was working on the Australian Government funded NeCTAR Virtual Lab project, Alveo I suggested we start with a Hydra/Fedora repository as the ‘heart’ of the Virtual Lab, the repository component. That worked well in that we got up and running very fast with a discovery portal for data - but ran into problems when it came to data access - for example making Item Lists - data-sets - performed terribly because of the overhead inherent in the Hydra architecture.&lt;/p&gt;
&lt;p&gt;I &lt;a href="http://www.doria.fi/handle/10024/97740"&gt;delivered a presentation at Open Repositories 2014&lt;/a&gt;, written with other Alveo staff, that explored some of the problems with using a “full stack” repository at scale.&lt;/p&gt;
&lt;p&gt;Initially (and throughout the project) &lt;a href="https://orcid.org/0000-0003-2357-9652"&gt;Steve Cassidy&lt;/a&gt; was suggesting a more OCFL-like way of loading data where it would be arranged on disk by some kind of curation human or machine orchestrated process and then indexed. Steve was right. I was wrong. I’d love to come back to the Alveo lab architecture for another go, maybe in a future round of &lt;a href="https://ror.org/038sjwq14"&gt;ARDC&lt;/a&gt; investment?&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide08.png' alt='&amp;gt; repo.create(&amp;quot;/Users/124411/everything&amp;quot;)
&lt;p&gt;' title='&amp;gt; repo.create(&amp;quot;/Users/124411/everything&amp;quot;)&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;Like I said, creating a repository is this simple.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide09.png' alt='What is a repository
anyway?
&lt;p&gt;' title='What is a repository
anyway?&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;But what is a repository?&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide10.png' alt='
A lifestyle
An application for managing scholarly comms
A service or collection of services like a Library
A service or collection of services like an Archive
A service or [...] like a Records Management department
A place to put stuff
An software application to store stuff
' title='
A lifestyle
An application for managing scholarly comms
A service or collection of services like a Library
A service or collection of services like an Archive
A service or [...] like a Records Management department
A place to put stuff
An software application to store stuff
' border='1'  width='85%'/&gt;
&lt;p&gt;There are a variety of ways to look at a “repository” - all of these are facets of what a repository can be.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide11.png' alt='🛒 📬 🐿 💽 🗃🗄🔭🔎🔬
&lt;p&gt;' title='🛒 📬 🐿 💽 🗃🗄🔭🔎🔬&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;Repositories are no longer just places to put stuff (if they ever were) - they’re positively bristling with services. Over the 13 years of this conference, repositories have become shall we say baroque. And in some cases slow. They also tend to have problems when you present them with a terabyte file, or ten million 1 kilobyte files. OCFL is, we think, in part a reaction from our community to those realities.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide12.png' alt='CON
&lt;p&gt;' title='CON&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;Using OCFL as the storage layer in a repository is a radical separation of con-&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide13.png' alt='CERNS
&lt;p&gt;' title='CERNS&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;-cerns.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide14.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;This is a view of our eResearch Architecture - Stash is our Data Management service, which is an implementation of &lt;a href="https://www.redboxresearchdata.com.au/"&gt;ReDBox&lt;/a&gt;. The point of showing the diagram is not to go through it all in detail, but to show how complex the ecosystem of research data services can be - and where data archive / repository core sits. The OCFL components (the publication and archive file-systems) are shown in red, while Agents (the repository Adaptor that writes content from Research Workspace Services in a repository, the Public and Private data portales) that access OCFL are shown in blue.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide15.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;This is our work-in-progress discovery portal - this is not built in to the repository like it would be with, say DSpace, it’s a separate service that indexes an OCFL repository (in this case the public data).&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide16.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;Here’s the same diagram stripped back a bit more ...&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide17.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;And even more stripped back, to show only the core repository service (in red) and a couple of services that interact with it (in blue).&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide18.png' alt='' title='' border='1'  width='85%'/&gt;
&lt;p&gt;And here’s the actually repository part - the storage layer.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide19.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;The idea of OCFL is that there are services that all access this core layer but they’re disposable and/or interchangeable, and you just leave your precious data where it is, on disk (or some virtual disk-like view of whatever the storage solution &lt;em&gt;de jour&lt;/em&gt; happens to be).&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide20.png' alt='TODO	
Progressively drill down on the architecture until it’s “just”  a partition.
' title='TODO	
Progressively drill down on the architecture until it’s “just”  a partition.
' border='1'  width='85%'/&gt;
&lt;p&gt;Here’s a screenshot of what an OCFL object looks like - it’s a series of versioned directories, each with a detailed inventory.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide21.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;No this slide is not about a zoological repository. It’s about the Elephants in the repository AKA the risks that come with an OCFL implementation.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The main issue in implementation is making sure that you have to THINK about the design of software to update the repository at the same time. You don’t want to end up with (a) corrupt files or (b) transactions that didn’t complete.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;OCFL is also very new (it’s not at version 1 yet) so there’s some risk that the services we are hoping for won’t arrive - but given that our data are still safe on our own storage infrastructure and it possible to code-up an OCFL consuming application in a matter of days this is not a serious risk.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide22.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;OCFL needs some explaining. I’ve had a couple of conversations with developers where it takes them a little while to get what it’s for.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide23.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;But they DO get it.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide24.png' alt='But why do I have to use this?
Every mature language now has a thing that builds a project for you, gives you a skeleton so they’re all in the same spot every time.
Michael Lynch 
' title='But why do I have to use this?
Every mature language now has a thing that builds a project for you, gives you a skeleton so they’re all in the same spot every time.
Michael Lynch 
' border='1'  width='85%'/&gt;
&lt;p&gt;Mike Lynch’s summary - this is a modern take on “how to I organise my stuff” - in this sense OCFL is like a framework for data - and we all use frameworks for code these days,&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide25.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;We have some good news to announce - we have a grant to continue our OCFL work from the &lt;a href="https://ror.org/038sjwq14"&gt;Australian Research Data Commons&lt;/a&gt;. (I’ve used the new Research Organisation Registry (ROR) ID for ARDC, just because it’s new and you should all check out the ROR).&lt;/p&gt;
&lt;p&gt;We’re going to be demonstrating large-scale use of OCFL, with research objects described using &lt;a href="https://researchobject.github.io/ro-crate/"&gt;RO-Crate&lt;/a&gt; metadata. (See also my presentation on &lt;a href="/2019/06/21/DataCrate-OR2019.htm"&gt;DataCrate&lt;/a&gt; which introduces RO-Crate).&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/OCLF-OR-2019/Slide26.png' alt='🤔
&lt;p&gt;' title='🤔&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;p&gt;So what do we think?&lt;/p&gt;
&lt;p&gt;Working with OCFL has been really great for the team at UTS - it’s well designed, and just when you think “hey, how do I do file-locking” you find that there’s a hint in the design (a &lt;code&gt;/deposit&lt;/code&gt; directory in this case) that points towards a solution.&lt;/p&gt;
&lt;p&gt;Over at the &lt;a href="https://researchobject.github.io/ro-crate/"&gt;RO-Crate&lt;/a&gt; project Stian and Eoghan and I really liked the OCFL approach of having a clear spec with implementation notes, and we’re going to try to emulate that as we work on merging Research Object and DataCrate into one general purpose way of describing and packaging research data.&lt;/p&gt;
&lt;p&gt;See also my presentation on &lt;a href="/2019/06/21/DataCrate-OR2019.htm"&gt;DataCrate&lt;/a&gt; which introduces RO-Crate.&lt;/p&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
</content><category term="Repositories"></category></entry><entry><title>Trip Report - Open Repositories 2019 - Peter Sefton</title><link href="/2019/07/01/OR2019.htm" rel="alternate"></link><published>2019-07-01T00:00:00+10:00</published><updated>2019-07-01T00:00:00+10:00</updated><author><name>ptsefton</name></author><id>tag:None,2019-07-01:/2019/07/01/OR2019.htm</id><summary type="html">&lt;p&gt;[Edited: 2019-07-01, 2019-07-2 fixed a few typos]&lt;/p&gt;
&lt;p&gt;This year Open Repositories was in Hamburg, Germany. I was funded by the
University of Technology Sydney to attend. I gave two presentations, one on &lt;a href="/2019/07/01/OCLF.htm"&gt;our work on scalable research data repositories&lt;/a&gt;
and other on &lt;a href="/2019/07/01/DataCrate-OR2019.htm"&gt;research data packaging&lt;/a&gt; and ran a workshop, more …&lt;/p&gt;</summary><content type="html">&lt;p&gt;[Edited: 2019-07-01, 2019-07-2 fixed a few typos]&lt;/p&gt;
&lt;p&gt;This year Open Repositories was in Hamburg, Germany. I was funded by the
University of Technology Sydney to attend. I gave two presentations, one on &lt;a href="/2019/07/01/OCLF.htm"&gt;our work on scalable research data repositories&lt;/a&gt;
and other on &lt;a href="/2019/07/01/DataCrate-OR2019.htm"&gt;research data packaging&lt;/a&gt; and ran a workshop, more on
which is below.&lt;/p&gt;
&lt;p&gt;This year was an intense, focussed conference for me. &lt;a href="http://ptsefton.com/2018/07/10/or2018.htm"&gt;Last year&lt;/a&gt;
I had a few take-aways from presentations; this year was one of those
conferences where the value was in the conversations and in getting down to
work.&lt;/p&gt;
&lt;h1&gt;Keynotes&lt;/h1&gt;
&lt;p&gt;I didn't get much out of the opening Keynote, from &lt;a href="https://jeffgothelf.com/"&gt;Jeff Gothelf&lt;/a&gt;, who's
apparently a star in the User Experience (UX) world. It was a pretty generic
presentation, with some well timed jokes but not very &lt;em&gt;useful&lt;/em&gt;. Ok, UX is
important, but when your examples are coming from well-resourced projects what
are resource-constrained repository workers supposed to do to improve UX?&lt;/p&gt;
&lt;p&gt;My mate &lt;a href="https://cottagelabs.com/"&gt;Richard Jones&lt;/a&gt; from Cottage Labs asked a great question about how we can improve
the UX for developers using APIs but didn't get much of an answer.&lt;/p&gt;
&lt;p&gt;Even though I would have preferred a more on-topic keynote, it did spark a
number of useful conversations in the breaks, including some I had with Richard
about how we can build developer-friendly services. I always learn lots from Mr
Jones at OR.&lt;/p&gt;
&lt;p&gt;The closing Keynote was much more audience-appropriate. &lt;a href="https://orcid.org/0000-0003-1613-5981"&gt;Heather Piwowar&lt;/a&gt;, who
has, like, direct User Experience of Open Repositories gave us all a kind of
career-affirming hug; pointing out that the repository people in the room are
helping to build huge database of open scholarly information.&lt;/p&gt;
&lt;p&gt;We put up and maintain the content, and her &lt;a href="https://unpaywall.org/"&gt;Unpaywall&lt;/a&gt; site finds it:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;An open database of 23,565,946 free scholarly articles.
We harvest Open Access content from over 50,000 publishers and repositories, and make it easy to find, track, and use.
&lt;a href="https://unpaywall.org/"&gt;https://unpaywall.org/&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;She reckons our
time is about to come - that we're at the take-off point for hockey-stick
growth. Here's her presentation
&lt;a href="https://www.slideshare.net/hpiwowar/open-repositories-2019-closing-keynote-heather-piwowar"&gt;The Breakout Moment for open repositories is now&lt;/a&gt;.
On topic, useful and inspiring.&lt;/p&gt;
&lt;h1&gt;Workshop in data packaging&lt;/h1&gt;
&lt;p&gt;&lt;a href="https://orcid.org/0000-0001-8131-2150"&gt;Eoghan Ó Carragáin&lt;/a&gt;, &lt;a href="https://orcid.org/0000-0001-9842-9718"&gt;Stian Soiland-Reyes&lt;/a&gt; and I ran an all-day workshop on Data Packaging
which was well attended - we ran through the background in the morning, what
data packaging is all about, why it's important and Eoghan gave us a tour of
some of the major efforts going on in the space at the moment. Stian gave more
detail on Research Objects, and &lt;a href="https://orcid.org/0000-0001-9530-627X"&gt;Marina  Soares E Silva&lt;/a&gt; talked about recent developments with Mendeley Data.&lt;/p&gt;
&lt;p&gt;We structured the discussion around these themes or functions of Research Data Packaging:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Strategic: Transform scholarly practice, distribute the stuff that makes
research complete, focus on redo-ability and reproducibility (eg Research
Object) - Self-documenting: Describe the stuff in a programmer friendly way
(eg FD, DataCrate), with a view to making it interoperable (eg DataCrate, RO),
try to ensure Reusability (RO) - add human readable HTML (DataCrate,
DataSpice)&lt;/li&gt;
&lt;li&gt;Self-contained: Bundle stuff (eg Zip, TAR, .dmg, RAR).&lt;/li&gt;
&lt;li&gt;Safe: Make sure the stuff is what it’s supposed to be using checksums (eg BagIt, FD) &amp;amp; Preservation / archival practice&lt;/li&gt;
&lt;li&gt;Serialization Syntax: XML (Legacy systems), JSON (FD) Linked Data (RDF, JSON-LD eg RO, DataCrate)&lt;/li&gt;
&lt;li&gt;Schemas - which one(s) to use?&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;We did a quick pass on these topics in the morning, looped back to the most
popular ones in the afternoon - has some impromptu lightning talks and finished
up with a session where we solicited Use Cases to go into the &lt;a href="/2019/07/01/DataCrate-OR2019.htm"&gt;RO-Crate&lt;/a&gt; process
Thanks go to &lt;a href="https://orcid.org/0000-0003-1541-5631"&gt;Paul Walk&lt;/a&gt; and &lt;a href="https://orcid.org/0000-0003-3311-3741"&gt;Neil Jefferies&lt;/a&gt; both of whom I've known for a
decade or so for showing up and contributing enthusiastically.&lt;/p&gt;
&lt;h1&gt;My conference&lt;/h1&gt;
&lt;img alt='Hotel converted from an old water tower' src='/blog/or-2019/movenpick.jpg'&gt;
&lt;p&gt;I was lucky in my choice of converted water tower to stay in. I got to have
breakfast with &lt;a href="https://orcid.org/0000-0003-1419-2405"&gt;Martin Fenner&lt;/a&gt; from DataCite (we met back in 2011
at the first Beyond the PDF meeting). He was on the list for our Monday workshop
but couldn't make it. I filled Martin in on what we're doing with &lt;a href="/2019/07/01/DataCrate-OR2019.htm"&gt;RO-Crate&lt;/a&gt; - how we're using Schema.org metadata in JSON-LD. Martin was completely on-board
with using Schema.org, and told me that if we have that, then using the DataCite
XML format would probably be redundant. If that works out then that simplifies
things - we need to talk more about this on the RO-Crate Project.&lt;/p&gt;
&lt;p&gt;Martin also seemed to me to be frustrated that there are still a number of
packaging efforts with no clear leader - simple, I said, just endorse RO-Crate
when it comes out as DataCite approved :)&lt;/p&gt;
&lt;p&gt;Stian and Eoghan and I took the opportunity to get stuck in to editing the
RO-Crate draft, and getting to know each other a bit - forging a strong editorial team for &lt;a href="/2019/07/01/DataCrate-OR2019.htm"&gt;RO-Crate&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I chaired a short-presentation session (7 minutes, 24 slides) - I like to get
the presenters all down on stage at the end, get them to re-introduce
themselves, give them a minute to tell us anything they forgot in the heat of
presentation then get a discussion going.&lt;/p&gt;
&lt;img alt="The session I chaired" src="/blog/or-2019/lecture-hall.jpg"&gt;
&lt;h1&gt;Hamburg&lt;/h1&gt;
&lt;img alt='A pretty shop' src='/blog/or-2019/alsterhaus.jpg'&gt;
&lt;img alt='Stickers on rubbish bins' src='/blog/or-2019/stickers.jpg'&gt;
&lt;p&gt;Hamburg's a great place to wander around - lots of parks and lots of stickers on
garbage bins  that I really liked.&lt;/p&gt;
&lt;img alt='Wedding photo in a park' src='/blog/or-2019/gardens.jpg'&gt;
&lt;img alt='"Antifa Zone" sticker on a road sign' src='/blog/or-2019/antifa.jpg'&gt;
&lt;p&gt;Hamburg harbour, where we went for the conference dinner boat ride, is a giant
metaphor for Data Packaging. Container terminals as far as the eye can see being
loaded and unloaded by giant robot-cranes. Shipping containers are, of course, a
triumph of standardization and here you could see them being moved around at
industrial scale - just like we're going to do using RO-Crate for Research Data.&lt;/p&gt;
&lt;img alt='Shipping containers and a wind turbine, Hamburg harbour' src='/blog/or-2019/containers.jpg'&gt;
</content><category term="Repositories"></category></entry><entry><title>Grant awarded; ARDC Discovery Activities; FAIR Simple Scalable Static Research Data Repository Demonstrator</title><link href="/2019/06/07/ardc_ocfl.htm" rel="alternate"></link><published>2019-06-07T00:00:00+10:00</published><updated>2019-06-07T00:00:00+10:00</updated><author><name>Peter Sefton</name></author><id>tag:None,2019-06-07:/2019/06/07/ardc_ocfl.htm</id><summary type="html">&lt;p&gt;The eResearch team at UTS in collaboration with colleagues at QCIF and AARNet
applied for and received funding under the Australian Research Data Commons &amp;quot;Institutional
role in a data commons&amp;quot; grant scheme.&lt;/p&gt;
&lt;p&gt;We applied for $50,000 but were only awarded $49,999 :(.&lt;/p&gt;
&lt;p&gt;The proposal is reproduced below. We promised …&lt;/p&gt;</summary><content type="html">&lt;p&gt;The eResearch team at UTS in collaboration with colleagues at QCIF and AARNet
applied for and received funding under the Australian Research Data Commons &amp;quot;Institutional
role in a data commons&amp;quot; grant scheme.&lt;/p&gt;
&lt;p&gt;We applied for $50,000 but were only awarded $49,999 :(.&lt;/p&gt;
&lt;p&gt;The proposal is reproduced below. We promised to share what we're doing - so
there will be more coming about what this Oxford Common File Layout (&lt;a href="https://ocfl.io/"&gt;OCFL&lt;/a&gt;)´
thing is, why it matters and what we're testing / demoing.&lt;/p&gt;
&lt;p&gt;In short we want to explore how to store, preserve and publish large volumes of
well-described research data using simple open technologies. We were already on
this path and involved in international collaboration, the ARDC grant will help
us get there quicker and test our ideas.&lt;/p&gt;
&lt;h1&gt;Title of the project: FAIR Simple Scalable Static Research Data Repository Demonstrator&lt;/h1&gt;
&lt;p&gt;&lt;em&gt;Lead organisation/contractor:&lt;/em&gt;	University of Technology Sydney&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Project leader:&lt;/em&gt;	Dr Peter Sefton – eResearch Support Manager&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Project leader contact details:&lt;/em&gt;	Email: peter.sefton@uts.edu.au&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Phone:&lt;/em&gt; 0404 096932&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Other organisations involved:&lt;/em&gt;	AARNet – (Adam Bell) Queensland Cyber Infrastructure Foundation (QCIF) – (Andrew White)&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Amount of funding requested (up to a maximum 50K):&lt;/em&gt;	$50K&lt;/p&gt;
&lt;h1&gt;Proposal&lt;/h1&gt;
&lt;p&gt;Which area of fundable activity have you chosen:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Institutional data infrastructure, policies and procedures to support better research	YES&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Management of sensitive data	YES&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Integration of institutionally supported data infrastructure with national, discipline, and international infrastructure	YES&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Question(s) the project will address&lt;/h2&gt;
&lt;p&gt;In the interests of providing an improved procedure for resourcing, disposing and retaining data, can we demonstrate a FAIR research data repository architecture using static files laid out according to the Oxford Common File Layout  (&lt;a href="https://ocfl.io/"&gt;OCFL&lt;/a&gt;), an emerging international standards-effort that can operate sustainably at multiple scales from single-collection data sets to national collection? Can we also make this data Findable via a search portal using standards based metadata and make both open and sensitive data Accessible to the right parties, and improve tracking  of data outputs managed inside and outside the organisation? Can OCFL increase opportunities for interoperability? Can distributing this data using the &lt;a href="https://github.com/UTS-eResearch/datacrate/blob/master/spec/1.0/data_crate_specification_v1.0.md"&gt;DataCrate specification&lt;/a&gt;  increase the supply (and findability) of Interoperable and Reusable Research data objects with improved research data provenance? How might this architecture work with proprietary software such as Figshare and/or national services such as Cloudstor? How viable are these approaches for a range of disciplines? What other developments (e.g. standards for licensing sensitive data, procedures for applying access permissions at the file level for computing facilities, and national group management systems) would be needed to adapt this approach to storing and indexing sensitive data?&lt;/p&gt;
&lt;h2&gt;Proposed approach&lt;/h2&gt;
&lt;p&gt;We propose to build a demonstrator / proof of concept system which tests OCFL for use as a general-purpose data repository, for both open and sensitive data with a discovery portal. OCFL is chosen because it is based on established technology and can be used at computing facilities and on shared infrastructure without requiring server-based repository software or expensive and slow migration of large data collections via APIs.&lt;/p&gt;
&lt;p&gt;The demonstrator will be populated with (1) specific datasets from the UTS data repository from a wide variety of disciplines including microbiology, history, computer science &amp;amp; speleology and of (2) varying scale from single collections to an entire university research data repository, building on the DataCrate for describing and packaging data, and (3)  we will test the scalability of our approach by automatically generating a large number of plausibly-linked simulated test datasets and contextual entities (people, organizations, equipment, software describing data provenance) with group-based access permissions and demonstrate how a search portal can be used to ensure Findability and appropriate Access for the sensitive data by using  an automated test suite to check the visibility of objects in a portal.  We will also (4) demonstrate how individual data collections can be indexed in detail to produce collection-level discovery services, using two projects that were funded by the ANDS Major Open Data Collections: &lt;a href="http://omeka.uws.edu.au/farmstofreeways/"&gt;Farms to Freeways&lt;/a&gt;  and &lt;a href="https://dharmae.research.uts.edu.au/"&gt;Dharmae&lt;/a&gt;  (UTS).&lt;/p&gt;
&lt;h2&gt;Who will be consulted/involved in the execution of the project and how will they be involved&lt;/h2&gt;
&lt;p&gt;This project will be led by Dr Peter Sefton at UTS who will also write the final report. eResearch Analyst Michael Lynch at UTS and staff at QCIF will develop specifications and software, building on preliminary work on a research data portal (for findability), extend it to ensure accessibility to the right users via a simple permissions system with static metadata that assigns group or individual access rights. The demonstrator repositories will be Interoperable with other software stacks using the same standard, particularly AARNet’s trial project “Adding Archival pathways to CloudStor”  (investigating Archivematica as a preservation service) and via a project that  is being proposed in Program 1 by the University of Melbourne.  The project will consult with UTS stakeholders via the UTS eResearch Community of Practice which meets quarterly, via our regular eResearch outreach activities. We will consult with other Australian institutions via the ARDC network (we will offer to run webinars and keep the community up to date via mailing lists) and via our membership in Intersect. We will also consult with the international OCFL community.&lt;/p&gt;
&lt;h2&gt;Outputs/materials that will be shared with all of Australia as a result of the project&lt;/h2&gt;
&lt;p&gt;The project will result in open source code (both stand alone and as part of ReDBOX), open access documents; specifications and a report which answers the questions above in light of our findings.  A project representative will deliver the report at the National Data Summit.&lt;/p&gt;
&lt;h2&gt;Evidence of ongoing commitment to outputs (if relevant)&lt;/h2&gt;
&lt;p&gt;UTS, along with QCIF, is one of the major contributors to the (originally) ANDS-funded ReDBOX Research Data Management Platform, and has demonstrated commitment to the ARDC community of REDBOX user institutions by contributing substantially to ReDBOX’s first major upgrade. UTS runs ReDBOX in order to support our strategic commitment to Research Excellence and Research Integrity and to implement our Research Management Policy - and will consult with the UTS Research Integrity officer Louise Wheeler. UTS is committed to building an OCFL based repository and discovery portal, but the work is not scheduled to be completed until 2020 - this funding will allow us to fast-track development of a demonstrator that can be presented to the ARDC, and the research community as a proof of concept for building a data commons at dataset, organisational, discipline and national scale. ReDBox is sustained by a community of Universities that pay a maintenance fee to the Queensland Cyber Infrastructure Foundation. The AARNet pilot project in this space will lead to sustainable investment should the demonstrator be successful.&lt;/p&gt;
&lt;h2&gt;Other information you wish to provide&lt;/h2&gt;
&lt;p&gt;Costing over 4 months is $50K (15 days of specification and design = $15K 30 days of coding = $30K 5 days of test-data development = $5K Report: (in kind) 5 days $5K)
UTS is collaborating with the University of Melbourne on a proposal under Program 1, looking at the PARADISEC collection, and with AARNet &lt;a href="https://conference.eresearch.edu.au/2018/08/adding-archival-pathways-to-cloudstor/"&gt;investigating Archivematica &amp;amp; Cloudstor interoperating with OCFL &amp;amp; Datacrate packages&lt;/a&gt;, which complements this proposal. These projects together will aim to demonstrate the use of interoperable standards for building a data commons - testing a range of tools that operate over standardised (OCFL) repository architecture and standardised (DataCrate) metadata.&lt;/p&gt;
</content><category term="Project"></category></entry><entry><title>Hacky Hour - 1st Quarter 2019</title><link href="/2019/03/26/hacky_hour_2019-qtr1.htm" rel="alternate"></link><published>2019-03-26T00:00:00+11:00</published><updated>2019-03-26T00:00:00+11:00</updated><author><name>Kuhan Ramachandra</name></author><id>tag:None,2019-03-26:/2019/03/26/hacky_hour_2019-qtr1.htm</id><summary type="html">&lt;p&gt;This Hacky Hour blog post covers the 14th February to the
21st March. These 6 sessions of Hacky Hour saw a good turnout, with both HDR
students and researchers attending. 
 
Repeat visitors were common in the first few weeks – this was due to the solid
assistance given by our eResearch …&lt;/p&gt;</summary><content type="html">&lt;p&gt;This Hacky Hour blog post covers the 14th February to the
21st March. These 6 sessions of Hacky Hour saw a good turnout, with both HDR
students and researchers attending. 
 
Repeat visitors were common in the first few weeks – this was due to the solid
assistance given by our eResearch team that encouraged attendees to return week
after week, until they had solutions to their queries.&lt;/p&gt;
&lt;div style="float:left;"&gt;
Queries that were addressed included: 
&lt;ul&gt;
&lt;li&gt; HPC installation and use 
&lt;li&gt; Software expertise 
&lt;li&gt; GROMACs 
&lt;li&gt; SSH connection assistance 
&lt;li&gt; Redcap 
&lt;li&gt; Visualisation 
&lt;li&gt; STASH 
&lt;li&gt; Programming 
&lt;li&gt; Machine learning 
&lt;li&gt; Management of data supercomputer 
&lt;li&gt; Figures in R 
&lt;li&gt; Database assistance 
&lt;li&gt; GPU queries 
&lt;li&gt; Excel data processing 
&lt;li&gt; Geonomic / statistical analysis 
&lt;li&gt; Omero storage 
&lt;li&gt; General queries. 
&lt;/ul&gt;
&lt;/div&gt;
&lt;div&gt;
&amp;nbsp; &lt;img width="500" src="/blog/hacky_hour/attendance-qtr1.png" /&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;p&gt;If you’re a researcher or higher degree research (HDR) student, then we warmly
welcome you to join us at our Hacky Hour sessions every Thursday @ Penny Lane
Café in Building 11 from 3-4pm.  🙂&lt;/p&gt;
</content><category term="Hacky Hour"></category></entry><entry><title>History of Computational Computing</title><link href="/2019/03/12/trevor_pearcey.htm" rel="alternate"></link><published>2019-03-12T00:00:00+11:00</published><updated>2019-03-12T00:00:00+11:00</updated><author><name>Mike Lake</name></author><id>tag:None,2019-03-12:/2019/03/12/trevor_pearcey.htm</id><summary type="html">&lt;p&gt;Last week I attended the &lt;a href="http://pearceycentenary.org.au"&gt;Dr Trevor Pearcey Centenary Celebration&lt;/a&gt;.
This was an afternoon of talks at Sydney University to celebrate the achievements of Dr Trevor Pearcey,
a British-born Australian scientist, who created &lt;a href="https://en.wikipedia.org/wiki/CSIRAC"&gt;CSIRAC&lt;/a&gt; in 1949.
This was Australia's first digital computer and the 4th or 5th stored program computer …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Last week I attended the &lt;a href="http://pearceycentenary.org.au"&gt;Dr Trevor Pearcey Centenary Celebration&lt;/a&gt;.
This was an afternoon of talks at Sydney University to celebrate the achievements of Dr Trevor Pearcey,
a British-born Australian scientist, who created &lt;a href="https://en.wikipedia.org/wiki/CSIRAC"&gt;CSIRAC&lt;/a&gt; in 1949.
This was Australia's first digital computer and the 4th or 5th stored program computer in the world.
CSIRAC is the oldest surviving first-generation electronic computer in the world.
I saw CSIRAC a few years ago when in Melbourne.&lt;/p&gt;
&lt;p&gt;In comparison the talk by Dr Sarah Pearce, Deputy Director of CSIRO Astronomy &amp;amp; Space Science,
covered the computer at the &lt;a href="https://pawsey.org.au"&gt;Pawsey Centre&lt;/a&gt; to process the data from the
&lt;a href="https://www.atnf.csiro.au/projects/askap/"&gt;Australian Square Kilometre Array Pathfinder Telescope&lt;/a&gt;.
The ASKAP is already producing 5.2 terabytes of data per second, which is processed
on site and then streamed down to Galaxy and stored at the rate of 2.5 Gbytes per second.&lt;/p&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;                    &lt;/th&gt;&lt;th&gt; CSIRAC              &lt;/th&gt;&lt;th&gt; Pawsey's "Galaxy"            &lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt; Number of cores:   &lt;/td&gt;&lt;td&gt; 1                   &lt;/td&gt;&lt;td&gt; 9440                         &lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt; Clock Speed:       &lt;/td&gt;&lt;td&gt; 1 kHZ               &lt;/td&gt;&lt;td&gt; 3.00 GHz                     &lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt; Operations/second: &lt;/td&gt;&lt;td&gt; 1000                &lt;/td&gt;&lt;td&gt; 200 million million          &lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt; Memory:            &lt;/td&gt;&lt;td&gt; ~ 1K or 2K          &lt;/td&gt;&lt;td&gt; 31.55 Terabytes              &lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt; Local storage:     &lt;/td&gt;&lt;td&gt; 1024, 20 bit-words  &lt;/td&gt;&lt;td&gt; 1.3 Petabytes                &lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt; Logic elements:    &lt;/td&gt;&lt;td&gt; 2000 valves         &lt;/td&gt;&lt;td&gt; 7000 billion transistors     &lt;/td&gt;&lt;/tr&gt; 
&lt;tr&gt;&lt;td&gt; Input:             &lt;/td&gt;&lt;td&gt; punched paper tape  &lt;/td&gt;&lt;td&gt; high level computer language &lt;/td&gt;&lt;/tr&gt; 
&lt;/table&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;Prof. Hugh Durrant-Whyte, the New South Wales Chief Scientist &amp;amp; Engineer, also gave a talk on
&amp;quot;Why Science Matters&amp;quot;, Barbara Ainsworth, Pearcey Biographer &amp;amp; Curator of the
Monash Museum of Computing History spoke about Pearcey's history and his innovations.&lt;/p&gt;
&lt;p&gt;Prof. Andrew Dzurak (Director of the Australian National Fabrication Facility &amp;amp; ARC
Centre of Excellence for Quantum Computation &amp;amp; Communication Technology, UNSW) and
Prof. David Reilly (Director of Microsoft Quantum, Sydney &amp;amp; Chief Investigator in
the ARC Centre of Excellence for Engineered Quantum Systems, School of Physics,
University of Sydney) both explained to us why quantum computers will be important and the current
state of development of this technology.
Dr Baerbel Koribalski (OCE Science Leader, CSIRO) gave us a great talk on how super computers have
enabled astronomers to model galaxy formation across the life and physical span of our universe.&lt;/p&gt;
&lt;p&gt;Prof. Ben Eggleton, Director of the University of Sydney Nano Institute, School
of Physics, University of Sydney also gave some participants a tour of the Nano Labs.&lt;/p&gt;
&lt;p&gt;References: &lt;a href="https://collections.museumvictoria.com.au/articles/1337"&gt;CSIRAC, The First Computer in Australia, 1949-1964&lt;/a&gt;;  
Pawsey Supercomputing Centre's &lt;a href="https://pawsey.org.au/systems/galaxy/"&gt;Galaxy&lt;/a&gt;;  
&lt;a href="https://www.atnf.csiro.au/projects/askap/"&gt;Australian Square Kilometre Array Pathfinder Telescope&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Mike Lake&lt;br /&gt;
eResearch&lt;/p&gt;
</content><category term="Event"></category></entry><entry><title>PBSWeb Released as Open Source</title><link href="/2019/01/23/pbsweb_release.htm" rel="alternate"></link><published>2019-01-23T00:00:00+11:00</published><updated>2019-01-23T00:00:00+11:00</updated><author><name>Mike Lake</name></author><id>tag:None,2019-01-23:/2019/01/23/pbsweb_release.htm</id><summary type="html">&lt;p&gt;I have released as open source the small web application that we use to show the nodes, queues and jobs running on our HPC cluster. This is used by the cluster administrators to quickly see the status of the nodes and how busy they are, to check how full each …&lt;/p&gt;</summary><content type="html">&lt;p&gt;I have released as open source the small web application that we use to show the nodes, queues and jobs running on our HPC cluster. This is used by the cluster administrators to quickly see the status of the nodes and how busy they are, to check how full each queue is and to see what jobs are running and queued. It’s also useful for users to see this information as well.&lt;/p&gt;
&lt;p&gt;You can see this web app in use here: &lt;a href="https://hpc.research.uts.edu.au/status/"&gt;https://hpc.research.uts.edu.au/status/&lt;/a&gt;&lt;br /&gt;
It's been released on GitHub here: &lt;a href="https://github.com/UTS-eResearch/pbsweb"&gt;https://github.com/UTS-eResearch/pbsweb&lt;/a&gt;&lt;br /&gt;
And it's been announced as available on the
&lt;a href="http://community.pbspro.org/t/released-a-small-web-app-to-show-cluster-status/1416"&gt;PBS Pro open source forum&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Mike Lake&lt;br /&gt;
eResearch&lt;/p&gt;
</content><category term="Blog"></category></entry><entry><title>Upcoming Training Courses in 2019 Have been Updated</title><link href="/2019/01/22/training_update_2019-01-22.htm" rel="alternate"></link><published>2019-01-22T00:00:00+11:00</published><updated>2019-01-22T00:00:00+11:00</updated><author><name>Weisi Chen</name></author><id>tag:None,2019-01-22:/2019/01/22/training_update_2019-01-22.htm</id><summary type="html">&lt;h2&gt;Upcoming Training Courses in 2019 Have been Updated&lt;/h2&gt;
&lt;p&gt;UTS eResearch and Intersect offer a wide range of specialised courses for researchers, from beginner through to advanced levels in High-Performance Computing (HPC), Programming with R/Python/Matlab, Excel, data management, data cleaning and visualisation, databases and SQL, and more. Delivered by …&lt;/p&gt;</summary><content type="html">&lt;h2&gt;Upcoming Training Courses in 2019 Have been Updated&lt;/h2&gt;
&lt;p&gt;UTS eResearch and Intersect offer a wide range of specialised courses for researchers, from beginner through to advanced levels in High-Performance Computing (HPC), Programming with R/Python/Matlab, Excel, data management, data cleaning and visualisation, databases and SQL, and more. Delivered by Intersect's team of experts, training courses provide practical and research-relevant hands-on exercises.&lt;/p&gt;
&lt;p&gt;Upcoming training courses are updated regularly. The latest update has been done on 22 Jan 2019. Courses from March are not open for registration yet, please keep an eye on our &lt;a href="https://eresearch.uts.edu.au/training/"&gt;training page&lt;/a&gt;.&lt;/p&gt;
</content><category term="Event"></category></entry><entry><title>Provisioner - A Framework for Integrated Research Data Management</title><link href="/2018/12/03/eres2018-provisioner.htm" rel="alternate"></link><published>2018-12-03T00:00:00+11:00</published><updated>2018-12-03T00:00:00+11:00</updated><author><name>Mike Lynch</name></author><id>tag:None,2018-12-03:/2018/12/03/eres2018-provisioner.htm</id><summary type="html">&lt;p&gt;This is a presentation by Mike Lynch, Peter Sefton and Sharyn Wise, delivered at eResearch Australasia 2018 by Mike Lynch.&lt;/p&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eresearch2018-provisioner/Slide01.png' alt='A Framework for Integrated Research Data Management
With services for planning, provisioning research storage and applications and describing and packaging research data
&lt;p&gt;Mr Michael Lynch Dr Peter Sefton Ms Sharyn Wise
' title='A Framework for Integrated Research Data Management
With services for planning, provisioning research storage and applications and describing and packaging research data&lt;/p&gt;
&lt;p&gt;Mr Michael LynchDr Peter SeftonMs Sharyn Wise
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;/summary&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eresearch2018-provisioner/Slide02.png' alt='Provisioner
Integrate research data management into research applications
Allow researchers to self-provision research apps
Apply lessons learned from earlier generations (Data Capture)
Give researchers something, get out of the way
We didn’t want to build a monolith
Small parts, loosely joined, data-centric
&lt;p&gt;A Framework for Integrated Research Data Management
' title='Provisioner
Integrate research data management into research applications
Allow researchers to self-provision research apps
Apply lessons learned from earlier generations (Data Capture)
Give researchers something, get out of the way
We didn’t want to build a monolith
Small parts, loosely joined, data-centric&lt;/p&gt;
&lt;p&gt;A Framework for Integrated Research Data Management
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 2&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Provisioner grew out of two conflicting requirements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We want to be able integrate research data management into the tools researchers actually use to do their research, giving …&lt;/li&gt;&lt;/ul&gt;&lt;/details&gt;&lt;/section&gt;</summary><content type="html">&lt;p&gt;This is a presentation by Mike Lynch, Peter Sefton and Sharyn Wise, delivered at eResearch Australasia 2018 by Mike Lynch.&lt;/p&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eresearch2018-provisioner/Slide01.png' alt='A Framework for Integrated Research Data Management
With services for planning, provisioning research storage and applications and describing and packaging research data
&lt;p&gt;Mr Michael Lynch Dr Peter Sefton Ms Sharyn Wise
' title='A Framework for Integrated Research Data Management
With services for planning, provisioning research storage and applications and describing and packaging research data&lt;/p&gt;
&lt;p&gt;Mr Michael LynchDr Peter SeftonMs Sharyn Wise
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;/summary&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eresearch2018-provisioner/Slide02.png' alt='Provisioner
Integrate research data management into research applications
Allow researchers to self-provision research apps
Apply lessons learned from earlier generations (Data Capture)
Give researchers something, get out of the way
We didn’t want to build a monolith
Small parts, loosely joined, data-centric
&lt;p&gt;A Framework for Integrated Research Data Management
' title='Provisioner
Integrate research data management into research applications
Allow researchers to self-provision research apps
Apply lessons learned from earlier generations (Data Capture)
Give researchers something, get out of the way
We didn’t want to build a monolith
Small parts, loosely joined, data-centric&lt;/p&gt;
&lt;p&gt;A Framework for Integrated Research Data Management
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 2&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Provisioner grew out of two conflicting requirements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We want to be able integrate research data management into the tools researchers actually use to do their research, giving the researchers something besides data management – access to facilities and software, easier publication, etc&lt;/li&gt;
&lt;li&gt;We didn’t want to build a monolith, and wouldn't be allowed to if we did&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eresearch2018-provisioner/Slide03.png' alt='Case study: microscope to Data Arena
The research pipeline that started it all:
Microscope video of bacteria Pseudomonas aeruginosa
Image recognition and tracking of bacteria
Simulation of bacteria behavior
3D immersive visualisation of simulated bacteria
Practical research: reduce risk of hospital infection
What if we could automate provenance for this pipeline?
A Framework for Integrated Research Data Management
' title='Case study: microscope to Data Arena
The research pipeline that started it all:
Microscope video of bacteria Pseudomonas aeruginosa
Image recognition and tracking of bacteria
Simulation of bacteria behavior
3D immersive visualisation of simulated bacteria
Practical research: reduce risk of hospital infection
What if we could automate provenance for this pipeline?
A Framework for Integrated Research Data Management
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 3&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;A real case of complex data management.&lt;/p&gt;
&lt;p&gt;Our original idea was to put a data repository in for Data Arena customers, from which the DA team would download datasets.&lt;/p&gt;
&lt;p&gt;But the DA team just wanted NFS mounts, and repositories aren’t a good fit for large datasets - high-end visualization and data science both use filesystems.&lt;/p&gt;
&lt;p&gt;What they could use: a git repository to manage their code and pipelines (GitLab)&lt;/p&gt;
&lt;p&gt;On the other end, the MIF have long wanted to use OMERO for microscopy.&lt;/p&gt;
&lt;p&gt;Our original plan was for a point-to-point integration which would only service two flagship installations.&lt;/p&gt;
&lt;p&gt;The key insight out of which Provisioner grew was: what if we could connect every research app to every other one? (eventually)&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eresearch2018-provisioner/Slide04.png' alt='Apps and workspaces
App  = a research application – can be specific or general
OMERO – dedicated microscopy app
GitLab – for research software development
Coming next: file shares, ELNs
Workspace = a research team’s thing within an app
A workspace is linked to an RDMP and an owner
Once it’s created, the researcher goes directly to the app
&lt;p&gt;A Framework for Integrated Research Data Management
' title='Apps and workspaces
App  = a research application – can be specific or general
OMERO – dedicated microscopy app
GitLab – for research software development
Coming next: file shares, ELNs
Workspace = a research team’s thing within an app
A workspace is linked to an RDMP and an owner
Once it’s created, the researcher goes directly to the app&lt;/p&gt;
&lt;p&gt;A Framework for Integrated Research Data Management
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 4&lt;/h3&gt;
&lt;/summary&gt;
Note that the term "workspace" is already being used by another UTS project – we were gazumped – but we still think it’s the best terminology for the abstraction we're trying to describe.
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eresearch2018-provisioner/Slide05.png' alt='Workspace API
The workspace abstraction lets us capture common high-level operations:
&lt;p&gt;Create a new workspace for an RDMP
Share a workspace with colleagues
Export data (to a data record or another workspace)
Import data&lt;/p&gt;
&lt;p&gt;We are building this out incrementally: create and export are first&lt;/p&gt;
&lt;p&gt;A Framework for Integrated Research Data Management
' title='Workspace API
The workspace abstraction lets us capture common high-level operations:&lt;/p&gt;
&lt;p&gt;Create a new workspace for an RDMP
Share a workspace with colleagues
Export data (to a data record or another workspace)
Import data&lt;/p&gt;
&lt;p&gt;We are building this out incrementally: create and export are first&lt;/p&gt;
&lt;p&gt;A Framework for Integrated Research Data Management
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;/summary&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eresearch2018-provisioner/Slide06.png' alt='DataCrate – integrated metadata
Builds on widely-supported standards – BagIt and JSON-LD
Provide linked metadata in human- and machine-readable forms
Metadata is still useful outside Provisioner
Targeting schema.org as shared vocabulary
Capture technical metadata with instrument
Capture provenance metadata with createAction and updateAction
Data by reference with tools to fetch files as needed
A Framework for Integrated Research Data Management
' title='DataCrate – integrated metadata
Builds on widely-supported standards – BagIt and JSON-LD
Provide linked metadata in human- and machine-readable forms
Metadata is still useful outside Provisioner
Targeting schema.org as shared vocabulary
Capture technical metadata with instrument
Capture provenance metadata with createAction and updateAction
Data by reference with tools to fetch files as needed
A Framework for Integrated Research Data Management
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;/summary&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eresearch2018-provisioner/Slide07.png' alt='ReDBox 2.0
RDMP and dataset description tool
Improved metadata collection and integration
Service catalogue for provisioning workspaces in apps
Data publication workflow
Modern re-implementation (Node.js, Angular, Mongo)
More maintainable
No more curation – web rather than db principles
&lt;p&gt;A Framework for Integrated Research Data Management
' title='ReDBox 2.0
RDMP and dataset description tool
Improved metadata collection and integration
Service catalogue for provisioning workspaces in apps
Data publication workflow
Modern re-implementation (Node.js, Angular, Mongo)
More maintainable
No more curation – web rather than db principles&lt;/p&gt;
&lt;p&gt;A Framework for Integrated Research Data Management
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 7&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Metadata integration takes a &amp;quot;just-in-time&amp;quot; approach, where it's collected only when it's needed.&lt;/p&gt;
&lt;p&gt;Each metadata record is derived from the previous one:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Research data management plan&lt;/li&gt;
&lt;li&gt;Data record&lt;/li&gt;
&lt;li&gt;Data publication&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eresearch2018-provisioner/Slide08.png' alt='Agile development!
A Framework for Integrated Research Data Management
Shared API / Orchestrator
ReDBox
ReDBox
Original design
End product
' title='Agile development!
A Framework for Integrated Research Data Management
Shared API / Orchestrator
ReDBox
ReDBox
Original design
End product
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 8&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;On the left is the original design, which had a separate Provisioner component (in blue) which orchestrated api requests and talked to the research apps&lt;/p&gt;
&lt;p&gt;On the right is the product we built, with no separate orchestrator, and modules within ReDBox which drive the apps&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What happened to that orchestration layer?&lt;/li&gt;
&lt;li&gt;Needed to break the abstraction to authenticate to GitLab&lt;/li&gt;
&lt;li&gt;This is what happens when concepts for APIs hit real life&lt;/li&gt;
&lt;li&gt;The minimum viable product didn’t end up including it&lt;/li&gt;
&lt;li&gt;This is a better design: simpler, and we can implement orchestration/queueing if and when it’s needed&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eresearch2018-provisioner/Slide09.png' alt='Public data repository
A Framework for Integrated Research Data Management
ReDBox 2.0
Solr index
Angular
Staging server
Public server
' title='Public data repository
A Framework for Integrated Research Data Management
ReDBox 2.0
Solr index
Angular
Staging server
Public server
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 9&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;At present, we have public datasets in our ReDBox 1.9 instance being fed to RDA.&lt;/p&gt;
&lt;p&gt;Data publication is manual: once the metadata is ready we (actually, I) put it on an nginx web server&lt;/p&gt;
&lt;p&gt;The new model isn’t that different from this: the repository is a filesystem with static DataCrates. The HTML catalogs in these act as landing pages.&lt;/p&gt;
&lt;p&gt;We build a solr index from the machine-readable metadata in the DataCrates, and the user interface to this is a single-page Angular app.&lt;/p&gt;
&lt;p&gt;Much better for security: we can host the website on a public-facing VM with just nginx and solr and keep ReDBox inside the firewall&lt;/p&gt;
&lt;p&gt;Immutable datasets – we’re in discussion with a research group who want to publish their live data – large networking datasets which they’re in the process of refining.&lt;/p&gt;
&lt;p&gt;Try to allow them to do this transparently but still allow integrity, so a given DOI points to a determinate snapshot of their data.&lt;/p&gt;
&lt;p&gt;Oxford Common File Layout for the static DataCrate repository&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eresearch2018-provisioner/Slide10.png' alt='The original case study
How much have we got?
Export from OMERO project to a DataCrate, with linked data recording technical metadata
Link the DataCrate to a ReDBox data record
Create GitLab workspace with the DataCrate, data available by reference / as needed
Export GitLab workspace to a new data record
Publish data
A Framework for Integrated Research Data Management
' title='The original case study
How much have we got?
Export from OMERO project to a DataCrate, with linked data recording technical metadata
Link the DataCrate to a ReDBox data record
Create GitLab workspace with the DataCrate, data available by reference / as needed
Export GitLab workspace to a new data record
Publish data
A Framework for Integrated Research Data Management
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;/summary&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eresearch2018-provisioner/Slide11.png' alt='What’s next?
A Framework for Integrated Research Data Management
' title='What’s next?
A Framework for Integrated Research Data Management
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 11&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;This is a diagram drawn by Gerrad Barthelot, head of our architecture team.&lt;/p&gt;
&lt;p&gt;It shows where we’d like to be in a few years – with an expanded range of services and research apps with Provisioner adaptors to Stash 3 (ReDBox 2).&lt;/p&gt;
&lt;p&gt;Note that this diagram shows the original Provisioner design with an orchestration layer between ReDBox and each research app.&lt;/p&gt;
&lt;p&gt;For now, we’re continuing with the simple model of ReDBox hooks acting as adapters directly talking to apps, and we’ll use middleware between these and the apps if and when we need it, on an app-by-app basis.&lt;/p&gt;
&lt;p&gt;Not all of the connections will be fully automated – some may put a request into ServiceConnect, or even just send an email&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/eresearch2018-provisioner/Slide12.png' alt='Thank	you
&lt;p&gt;' title='Thank	you&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;/summary&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
</content><category term="Repositories"></category></entry><entry><title>Launching DataCrate v1.0 a general purpose data packaging format for research data distribution and web-display</title><link href="/2018/11/01/launch_datacrate_1.htm" rel="alternate"></link><published>2018-11-01T00:00:00+11:00</published><updated>2018-11-01T00:00:00+11:00</updated><author><name>Peter Sefton</name></author><id>tag:None,2018-11-01:/2018/11/01/launch_datacrate_1.htm</id><summary type="html">&lt;p&gt;This is a presentation by Peter Sefton, Michael Lynch, Liz Stokes and Gerard Devine, delivered at eResearch Australasia 2018 by Peter Sefton.&lt;/p&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide01.png' alt='Launching DataCrate v1.0: a general purpose data packaging format for research data distribution and web-display
&lt;p&gt;' title='Launching DataCrate v1.0: a general purpose data packaging format for research data distribution and web-display&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 1&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;In this presentation we will launch version 1.0 of the DataCrate standard. The presentation will cover:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The motivation for this work, and prior art …&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/details&gt;&lt;/section&gt;</summary><content type="html">&lt;p&gt;This is a presentation by Peter Sefton, Michael Lynch, Liz Stokes and Gerard Devine, delivered at eResearch Australasia 2018 by Peter Sefton.&lt;/p&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide01.png' alt='Launching DataCrate v1.0: a general purpose data packaging format for research data distribution and web-display
&lt;p&gt;' title='Launching DataCrate v1.0: a general purpose data packaging format for research data distribution and web-display&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 1&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;In this presentation we will launch version 1.0 of the DataCrate standard. The presentation will cover:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The motivation for this work, and prior art - why we needed to bring together the standards we did in the way that we did.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A walk-through of example data crates from a variety of sources, speleology, clinical trials, simulation, social history, environmental science and microbiology.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;An introduction to tools for making data crates with an appeal to attendees to join us in making more tools, for more new kinds of data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A demonstration of how DataCrates are being used at UTS to move data though the research lifecycle - archiving and publishing data.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide02.png' alt='
peter.sefton@uts.edu.au
michael.lynch@uts.edu.au
elizabeth.stokes@uts.edu.au
g.devine@westernsydney.edu.au
' title='
peter.sefton@uts.edu.au
michael.lynch@uts.edu.au
elizabeth.stokes@uts.edu.au
g.devine@westernsydney.edu.au
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 2&lt;/h3&gt;
&lt;p&gt;The following people contributed to this presentation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;peter.sefton@uts.edu.au&lt;/li&gt;
&lt;li&gt;michael.lynch@uts.edu.au&lt;/li&gt;
&lt;li&gt;elizabeth.stokes@uts.edu.au&lt;/li&gt;
&lt;li&gt;g.devine@westernsydney.edu.au&lt;/li&gt;
&lt;/ul&gt;
&lt;/summary&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide03.png' alt='Motivation
💻+ 💾 + 📦 = 🙅
' title='Motivation
💻+ 💾 + 📦 = 🙅
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 3&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;There were no existing generic data packaging standards with human and and machine readable&lt;/p&gt;
&lt;p&gt;🙅 === FACE WITH NO GOOD GESTURE&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide04.png' alt='Motivation: package data with maximum useful context
Who …  made it? funded the work? 
What … format are these files? … is the research about?
Where … was it collected? … is it about? 
Why … was it done?  … &amp;lt;link to publication&amp;gt;
How … were these files created? … can I repeat that process? 
' title='Motivation: package data with maximum useful context
Who …  made it? funded the work? 
What … format are these files? … is the research about?
Where … was it collected? … is it about? 
Why … was it done?  … &amp;lt;link to publication&amp;gt;
How … were these files created? … can I repeat that process? 
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 4&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Our motivation was to be able to display and distribute data sets with useful
&amp;quot;who, what where&amp;quot; metadata in a way that is easy to for coders to target, and
easy for researchers to consume, both as readers and programmers who might want
to run code against a data set.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide05.png' alt='' title='' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 5&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;We have a growing list of examples.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide06.png' alt='' title='' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 6&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;DataCrate provides human-readable &lt;a href="https://data.research.uts.edu.au/examples/v1.0/sample/CATALOG_files/pairtree_root/pi/cs/=2/01/7-/06/-1/1%5E/20/12/,5/6,/14/,j/pg/index.html"&gt;HTML data about files&lt;/a&gt; including detailed metadata.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide07.png' alt='Ability to describe file provenance
&lt;p&gt;' title='Ability to describe file provenance&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 7&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;This slide shows a CreateAction, where an &lt;code&gt;instrument&lt;/code&gt; - a &lt;a href="https://data.research.uts.edu.au/examples/v1.0/Victoria_Arch_pub/CATALOG_files/pairtree_root/Da/ta/Ca/pt/ur/e_/wc/r0/3/index.html"&gt;Lidar scanner&lt;/a&gt; - was used by an &lt;code&gt;agent&lt;/code&gt; - the person - to create two files.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide08.png' alt='Software can be an instrument too
&lt;p&gt;' title='Software can be an instrument too&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 8&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;&lt;a href="https://data.research.uts.edu.au/examples/v1.0/sample/CATALOG_files/pairtree_root/Se/pi/aC/on/ve/rs/io/n/index.html"&gt;This&lt;/a&gt; shows a software package (&lt;code&gt;instrument&lt;/code&gt;) acting on a file (&lt;code&gt;object&lt;/code&gt;)  used to create another file (&lt;code&gt;result&lt;/code&gt;) - a sepia version of a picture.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide09.png' alt='All metadata is available in JSON-LD 
&lt;p&gt;' title='All metadata is available in JSON-LD&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 9&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;DataCrates &lt;a href="https://data.research.uts.edu.au/examples/v1.0/sample/CATALOG.json"&gt;contain metadata in JSON-LD&lt;/a&gt;.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide10.png' alt='... so relationships can be visualized
&lt;p&gt;' title='... so relationships can be visualized&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 10&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Why do we want machine readable data? One reason would be to generate visualisations that help people understand relationships in the data set. Here’s a demo I coded up in about half-an hour before the conference that shows how we might visualise the the way files are created. It shows a Person (me) who is the agent in two CreateActions, one where the &lt;code&gt;instrument&lt;/code&gt; is a camera/lens combination and the &lt;code&gt;object&lt;/code&gt; is the place being pictured, and the result is a file, and one where the &lt;code&gt;object&lt;/code&gt; is said file, the &lt;code&gt;instrument&lt;/code&gt; is a software package, and the result is a sepia version of the original photo.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide11.png' alt='URIs as names for things
&lt;p&gt;' title='URIs as names for things&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 11&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Each term used has a link to its definition, eg: &lt;a href="https://schema.org/CreateAction"&gt;https://schema.org/CreateAction&lt;/a&gt;&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide12.png' alt='(🔧🔨🔩🔪🔬)ing is an issue for JSON-lD 
' title='(🔧🔨🔩🔪🔬)ing is an issue for JSON-lD 
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 12&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Tooling is a problem.
JSON-LD is a great format, but:
There are no utility libraries for things like looking up context keys.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide13.png' alt='Calcytejs
' title='Calcytejs
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 13&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;&lt;a href="https://code.research.uts.edu.au/eresearch/CalcyteJS"&gt;Calcyte&lt;/a&gt; uses multi-worksheet spreadsheets for data entry, based on an idea of Mike Lake’s.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide14.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 14&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;This works, but it’s not an ideal user iterface.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide15.png' alt='' title='' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 15&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Gerard Devine is &lt;a href="https://github.com/gdevine/hiev_datacrate"&gt;developing a tool&lt;/a&gt; which will allow DataCrate export from the Australian National Data Service funded HIEv system.&lt;/p&gt;
&lt;p&gt;HIEv DataCrate - At the Hawkesbury Institute for the Environment at Western Sydney University, a bespoke data capture application (HIEv) harvests a wide range of environmental data (and associated file level metadata) from both automated sensor networks and analysed datasets generated by researchers. Leveraging built-in APIs within the HIEv a new packaging function has been developed, allowing for selected datasets to be identified and packaged in the DataCrate standard, complete with metadata automatically exported from the HIEv metadata holdings into the JSON-LD format. Going forward this will allow datasets within HIEv to be published regularly and in an automated fashion, in a format that will increase their potential for reuse.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide16.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 16&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Christian Evenhuis is developing a tool for &lt;a href="https://code.research.uts.edu.au/MIF/Workflows/omero-datacrate"&gt;exporting microscope images&lt;/a&gt; from &lt;a href="https://www.openmicroscopy.org/omero/"&gt;Omero&lt;/a&gt;.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide17.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 17&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Chris is working to describe the equipment used in the Microbial Imaging Facility (MIF), Here’s a &lt;a href="https://code.research.uts.edu.au/MIF/microscope-instructions/wikis/Nikon-Ti/Nikon-Ti-inverted-epifluorescent-microscope"&gt;page for a microscope&lt;/a&gt;, this is part of work in progress to descibe as much of the context of research in MIF as possible.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide18.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 18&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Peter Sefton has developed &lt;a href="https://github.com/UTS-eResearch/omeka-datacrate-tools"&gt;code&lt;/a&gt; to export &lt;a href="https://omeka.org/classic/"&gt;Omeka Classic&lt;/a&gt; repositories to DataCrate. &lt;a href="https://data.research.uts.edu.au/examples/v1.0/farms_to_freeways/"&gt;This&lt;/a&gt; is an example of one from the University of Western Sydney curated by Katrina Trewin. This uses the &lt;a href="https://pcdm.org"&gt;Portland Common Data Model&lt;/a&gt; for modelling repository structure. We are using these data sets to help develop an Omeka service based on the &lt;a href="https://omeka.org/s/"&gt;Omeka S&lt;/a&gt; software, along with data from Dspace extracted using another nascent &lt;a href="https://github.com/UTS-eResearch/datacrate-dspace-tools"&gt;code project&lt;/a&gt;.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide19.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 19&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Provisioner grew out of two basic requirements, which seem to conflict with one another:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;We want to be able integrate research data management into the tools researchers actually use to do their research, rather than as an add-on to an existing process (like DC)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Any such system should give the researchers something besides data management – access to facilities and software, easier publication, etc&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We don’t want to build a monolith, and even if we wanted to build a monolith, we wouldn’t be allowed to – the current mood is SAAS, on-premises only if necessary, no single points of failure&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The UNIX philosophy of small parts, loosely joined, and the idea that data has gravity&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide20.png' alt='It’s standards all the way down
Oxford Common File Layout ← Static file-based repositories
THIS TALK → DataCrate ← THIS TALK
Data Crate builds on Bagit ←  Data packages w/ checksums, content by ref
Schema.org ← Main metadata standard / Repo metadata standard → PCDM
JSON-LD ← Linked data in programmer-friendly format
&lt;p&gt;' title='It’s standards all the way down
Oxford Common File Layout ← Static file-based repositories
THIS TALK → DataCrate ← THIS TALK
Data Crate builds on Bagit ←  Data packages w/ checksums, content by ref
Schema.org ← Main metadata standard / Repo metadata standard → PCDM
JSON-LD ← Linked data in programmer-friendly format&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 20&lt;/h3&gt;
&lt;/summary&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://ocfl.io/"&gt;Static Oxford Common File Layout&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/UTS-eResearch/datacrate/blob/master/spec/1.0/data_crate_specification_v1.0.md"&gt;DataCrate&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://en.wikipedia.org/wiki/BagIt"&gt;Bagit&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://schema.org"&gt;Schema.org&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pcdm.org"&gt;Portland Common Data Model&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://json-ld.org/"&gt;JSON-LD&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide21.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 21&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Next step is to take this to an international meeting to see if we can get some agreement between project using similar approaches.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide22.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 22&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;&lt;a href="https://github.com/ropenscilabs/dataspice/blob/master/README.md"&gt;Dataspice&lt;/a&gt; does a similar thing to DataCrate - they could easily be aligned.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide23.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 23&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;&lt;a href="https://researchobject.github.io/specifications/bundle/"&gt;Research Object Bundle&lt;/a&gt; also tries to package data with JSON-LD data, but in a way that is (we think) more complicated to implement, and without the human-readable web-site embedded in the package.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide24.png' alt='Help wanted! 
We invite you to:
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Critique the standard&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Generate some more sample data sets as a spec for people who will ...&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;... write a packaging tool&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Export from data management system (eg MyTardis :)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Write a GUI or web tool for people to create DataCrates&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Help add viz to our HTML pages&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;' title='Help wanted!
We invite you to:
Critique the standard
Generate some more sample data sets as a spec for people who will ...
.. write a packaging tool
Export from data management system (eg MyTardis :)
Write a GUI or web tool for people to create DataCrates
Help add viz to our HTML pages&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 24&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Please help.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/launch_datacrate/Slide25.png' alt='
&lt;p&gt;' title='&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 25&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Please contribute to or use the &lt;a href="https://github.com/UTS-eResearch/datacrate/blob/master/spec/1.0/data_crate_specification_v1.0.md"&gt;spec&lt;/a&gt;&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
</content><category term="DataCrate"></category></entry><entry><title>LinuxConf 2018</title><link href="/2018/09/30/index.htm" rel="alternate"></link><published>2018-09-30T00:00:00+10:00</published><updated>2018-09-30T00:00:00+10:00</updated><author><name>Mike Lake</name></author><id>tag:None,2018-09-30:/2018/09/30/index.htm</id><summary type="html">&lt;p&gt;The 2018 Linux Conference Australia was held the University of Technology
Sydney from 22-26 January 2018. I attended courtesy of eResearch.&lt;/p&gt;
&lt;p&gt;Linux totally dominates supercomputers. As of November 2017, all 500 of the
world's fastest supercomputers were running Linux. This is because most of the
world's scientific software for generating …&lt;/p&gt;</summary><content type="html">&lt;p&gt;The 2018 Linux Conference Australia was held the University of Technology
Sydney from 22-26 January 2018. I attended courtesy of eResearch.&lt;/p&gt;
&lt;p&gt;Linux totally dominates supercomputers. As of November 2017, all 500 of the
world's fastest supercomputers were running Linux. This is because most of the
world's scientific software for generating or crunching research data is
written to run on Linux systems because of its openness, ability to be customised,
speed and robustness. The eResearch HPC runs the Centos 6.9 Linux distribution and
some of our users run a linux &amp;quot;distro&amp;quot; themselves.&lt;/p&gt;
&lt;p&gt;Like most large conferences there were many parallel sessions, so deciding what
to attend was sometimes a difficult choice. The sessions I
attended mostly reflected my interest in bioinformatics and system
administration, as those topics have relevance to the HPC cluster
administration and user support.&lt;/p&gt;
&lt;p&gt;Of particular interest in the bioinformatics stream was; James Ferguson and Dr
Martin Smith of the Garvan Institute and their nanopore sequencing pipeline,
the data and code that
&lt;a href="https://portal.stemformatics.org"&gt;Stemformatics&lt;/a&gt; and
&lt;a href="https://bioinformatics.csiro.au"&gt;CSIRO's bioinformatics&lt;/a&gt; provide,
Matt Tods keynote on the &lt;a href="http://opensourcemalaria.org"&gt;Open Source Malaria&lt;/a&gt; project
and key requirements of lab notebooks
(read some of their open &lt;a href="http://malaria.ourexperiment.org"&gt;Lab Notebooks&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;In the system administration stream of particular interest was
&lt;a href="https://www.terraform.io"&gt;Terraform&lt;/a&gt; for building and versioning
infrastructure safely, James Shubin's new configuration management tool
&lt;a href="https://github.com/purpleidea/mgmt"&gt;Mgmt&lt;/a&gt;,
using mod-security and fail2ban, and Thomas Schöbel-Theuer's
&lt;a href="http://schoebel.github.io/mars/"&gt;MARS&lt;/a&gt; Long Distance
Replication which replicates huge amounts of data across continents.&lt;/p&gt;
&lt;p&gt;The talk on high performance science covered data formats, provenance and reproducible research.
If your using git and don't mind a good command line tool for reproducible
research then have a look at &lt;a href="https://www.datalad.org"&gt;DataLad&lt;/a&gt;. This
is build on top of git-annex and extends it with an intuitive command-line interface.
&lt;a href="https://www.reprozip.org"&gt;ReproZip&lt;/a&gt; allows you to pack your research along with all necessary data files, libraries, environment variables and options. It's a python package similar to
&lt;a href="https://github.com/recipy/recipy"&gt;recipy&lt;/a&gt; which also attempts to help with the problem
of reproducible research.&lt;/p&gt;
&lt;p&gt;Mentioned was an interesting site &lt;a href="https://mybinder.org"&gt;binder&lt;/a&gt;, which
turns your GitHub repo into a collection of interactive notebooks (consider security though).
If you use pipelines for your data flow then here is a
&lt;a href="https://github.com/pditommaso/awesome-pipeline"&gt;curated list of awesome pipeline toolkits&lt;/a&gt;,
apparently inspired by
&lt;a href="https://github.com/kahun/awesome-sysadmin"&gt;a curated list of amazingly awesome open source sysadmin resources&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Other personally interesting talks were the MUON detector,
and stories from The Register's &lt;a href="https://www.theregister.co.uk/Tag/on-call"&gt;&amp;quot;On Call&amp;quot;&lt;/a&gt; and,
if you are an open source developer, an excellent talk on the problem of
contributor overload.&lt;/p&gt;
&lt;p&gt;LinuxConf 2018 Stats: 200 presentations selected from 400 submissions,
700 attendees from 20 countries, 140 MB/sec sustained bandwidth for 1.8 TB
during the conference, 1321 unique network devices detected, and more than 2000 coffees served.&lt;/p&gt;
&lt;p&gt;The next Linux Conf will be held from 21-25 January 2019 at the University of
Canterbury, Christchurch, New Zealand. See the link here:
&lt;a href="https://linux.conf.au"&gt;LCA 2019, Canterbury NZ&lt;/a&gt;.
Registration costs for students and hobbyists is always kept low and Linux
Confs are always very inclusive as diverse groups of people are made welcome.&lt;/p&gt;
&lt;p&gt;Mike Lake&lt;br /&gt;
eResearch&lt;/p&gt;
</content><category term="Event"></category></entry><entry><title>Domesday Preppers! Prepping for preservation at University of Technology Sydney eResearch, ITD</title><link href="/2018/08/02/UTSSLNSWpreservationpresentation.htm" rel="alternate"></link><published>2018-08-02T00:00:00+10:00</published><updated>2018-08-02T00:00:00+10:00</updated><author><name>sharyn</name></author><id>tag:None,2018-08-02:/2018/08/02/UTSSLNSWpreservationpresentation.htm</id><summary type="html">&lt;p&gt;This presentation was written and delivered by Sharyn Wise (with couple of slides from Peter
Sefton) for the &lt;a href="https://www.eventbrite.com.au/e/australasia-preserves-digital-preservation-community-of-practice-tickets-45225818641"&gt;Australasia
Preserves&lt;/a&gt;
meeting at the NSW State Library. This was one of a number of short talks from
various organisations and was the only one to focus on research data. UTS has …&lt;/p&gt;</summary><content type="html">&lt;p&gt;This presentation was written and delivered by Sharyn Wise (with couple of slides from Peter
Sefton) for the &lt;a href="https://www.eventbrite.com.au/e/australasia-preserves-digital-preservation-community-of-practice-tickets-45225818641"&gt;Australasia
Preserves&lt;/a&gt;
meeting at the NSW State Library. This was one of a number of short talks from
various organisations and was the only one to focus on research data. UTS has
not started our Digital Preservation journey yet - we're still packing our
things and looking at the map, or 'prepping'.&lt;/p&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/UTSSLNSWpreservationpresentation/Slide1.png' alt='Domesday Preppers! Prepping for preservation at University of Technology Sydney eResearch, ITD
Sharyn Wise, Research Data Manager
e:  Sharyn.wise@uts.edu.au
' title='Domesday Preppers!Prepping for preservation at University of Technology SydneyeResearch, ITD
Sharyn Wise, Research Data Manager
e:  Sharyn.wise@uts.edu.au
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 1&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Today I’ll be presenting on preparing for preservation at UTS, because we are not there yet, and I suspect it is more of a journey than a destination anyway. Preservation is on our eResearch Roadmap for next year, so I am grateful to learn from the experience of others today. I named this talk in a rather feisty moment, thinking that the story of the Domesday book was the archetypal cautionary tale and would be well-known in preservation circles. If you don't know the story you can google it, but the title in the Guardian says it all – digital data is incredibly fragile in comparison to earlier technologies. Instead I will begin with what I think of as a Domesday tale of research data.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/UTSSLNSWpreservationpresentation/Slide2.png' alt='Why preserve? A research data ”Domesday” tale
&lt;p&gt;' title='Why preserve? A research data ”Domesday” tale&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 2&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;I still remember seeing media headlines about the potential dangers of unbalanced Omega 6 oil consumption in 2013 – like this one here in Time Magazine. But as so often happens with media headlines, the back story of the actual research paper in BMJ turned out to more complex and interesting. A U.S Cardiovascular Health researcher, conducting a meta-analysis had come across a paper from a 1973 clinical trial, the Sydney Diet Heart Study which according to the lipids hypothesis of the time, had replaced saturated fat with polyunsaturated fat in the diet. This trial had been discontinued because the participants had shown a higher rate of mortality. However, when the Sydney researchers wrote up the results, they had not reported on specific causes of death. The data confusingly showed the opposite of what they had hypothesised, so they published a brief paper and left it at that.&lt;/p&gt;
&lt;p&gt;It just so happened that the unsaturated fat used was safflower oil - which we now know uniquely contains only Omega-6 fatty acids. So the US researcher began hunting for the dataset. He eventually tracked down the last surviving member of the research team and asked if the data was available. Luckily the Sydney researcher had kept it among piles of boxes in his garage. After a bit of rummaging,  he produced an obsolete 9-track magnetic computer tape. Data Recovery was expensive and difficult, but worth it, because of what it added to knowledge of lipid metabolism: the Omega 6 group had 6 %age points higher risk of death from cardiovascular and coronary heart disease than the control.&lt;/p&gt;
&lt;p&gt;Now let's remember that this study could not be replicated without risk to human life so that data would have been gone forever if not recovered. I like this cautionary tale, because it highlights some extra challenges we face. Like how do we know what to preserve, when knowledge grows in ways that cannot always be anticipated? Of course there is the Records Act which requires that research data from human clinical trials is preserved for up to 25 years, but that wouldn’t have been long enough to save this data. Our Records Managers suggest that it should be up to the researchers, as the experts, to recommend their data for longer term preservation. However, would the Sydney Heart Diet trial researchers have selected this data for preservation? Probably not. So do we preserve everything? Well that isn't really financially viable – as we know, its one thing to keep data and quite another thing to manage it over years. And that brings me to the next challenge.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/UTSSLNSWpreservationpresentation/Slide3.png' alt='
What is the data?
It can be just about anything…Here: software code, microscopy imaging; geospatial
' title='
What is the data?
It can be just about anything…Here: software code, microscopy imaging; geospatial
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 3&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Which is getting research data under management in the first place. Researchers are suspicious of giving their data to anyone – after all it is their competitive advantage. So lets step back - What is research data?&lt;/p&gt;
&lt;p&gt;It can be just about anything – which in itself is daunting. A lot of research data is not human readable, like the last two of these three examples – geospatial data and microscopy data. So obviously we need to preserve more than just the outputs if the data is to remain accessible and usable. Broadly speaking, the usual approach here is to convert to open formats, ideally without losing metadata, and to preference open source software if possible, which we can retain. And this is where the concept of preparing for preservation comes in.&lt;/p&gt;
&lt;p&gt;We need to start a lot earlier in the data lifecycle than the archiving stage, to ensure that metadata and provenance data are not lost. And because this may involve changes to researcher practices it is also a huge cultural change challenge. One problem we face is that most scientific instrument data comes off instruments in proprietary formats. This problem is being addressed in Optical microscopy by a consortium called OME - Open Microscopy Environments, and Bioformats, who work to bust open proprietary files from optical microscopes, and extract the metadata as well as the images as TIFs. The microscopy slide shown here is inside Omero, their open source imaging repository. Similarly, the map data is in open standardised formats inside Geoserver, an open source geospatial platform. In both cases, these platforms offer significant enough benefits to researchers that they are willing to use them as repositories for their data. So does Gitlab, the code repository environment shown here. So far so good.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/UTSSLNSWpreservationpresentation/Slide4.png' alt='6
&lt;p&gt;Provisioner&lt;/p&gt;
&lt;p&gt;Provisioner
Research Data Mgmt Plan
Research Data Catalogue&lt;/p&gt;
&lt;p&gt;Picture credit: Gerrad Barthelot, Technical Architect, IT Infrastructure UTS
' title='6&lt;/p&gt;
&lt;p&gt;Provisioner&lt;/p&gt;
&lt;p&gt;Provisioner
Research Data Mgmt Plan
Research Data Catalogue&lt;/p&gt;
&lt;p&gt;Picture credit: Gerrad Barthelot, Technical Architect, IT Infrastructure UTS
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 4&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;But how do we bring together these discipline specific repositories into one managed solution? By loosely coupling them together in an architecture we are developing called the Provisioner. A researcher (top centre) accesses the data management catalogue “Stash”, which provisions managed workspaces at the beginning of their projects and at the end, creates data records, where they can upload or link to their data in place. Stash keeps all the various metadata packages we need to manage, describe, contextualise and ultimately preserve the data and as we find suitable research platforms to add, we will write an adaptor to add it in.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/UTSSLNSWpreservationpresentation/Slide5.png' alt='
Where is the data?
1. Persuade researchers that &amp;#x27;keeping&amp;#x27; is not &amp;#x27;managing&amp;#x27;
2. make archiving easier with DataCrate.
&lt;p&gt;Images sourced from Wikimedia Commons
' title='
Where is the data?&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Persuade researchers that 'keeping' is not 'managing'&lt;/li&gt;
&lt;li&gt;make archiving easier with DataCrate.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Images sourced from Wikimedia Commons
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 5&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;So what about smaller, human-readable datasets – like surveys, or heterogeneous humanities data collections, for example? It can be hard to persuade these researchers of the value of managing their data too, rather than throwing it on some media and into a drawer. Even worse is that even for those who want to manage their data properly, there has been a dearth of simple tools for researchers to capture the necessary metadata and bundle their data for archiving, as Cameron Neylon complains here on his blog.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/UTSSLNSWpreservationpresentation/Slide6.png' alt='Slide 6' title='' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 6&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Enter DataCrate, which is a new standardization effort lead by UTS, specifically for research data. The specification is &lt;a href="https://github.com/UTS-eResearch/datacrate/blob/master/README.md"&gt;available on GitHub&lt;/a&gt;, we’d welcome your input: feel free to raise an issue with suggestions or disagreements, or to send us a pull request with changes.&lt;/p&gt;
&lt;p&gt;The spec is designed for implementers, it explains how DataCrate is built on existing standards, the &lt;a href="https://en.wikipedia.org/wiki/BagIt"&gt;BagIt&lt;/a&gt; packaging specification, and how to add Linked Data metadata in JSON-LD format to describe the package, its files, and its context, such as the people who created it, the organisations which funded it, the equipment used in creating the data and so on.&lt;/p&gt;
&lt;p&gt;The plan is that DataCrate will allow us to move data between systems (including by-reference using BagIt’s link features) with rich file-level and package level metadata, with human-readable manifest travelling with the data to help contextualize it for future users.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/UTSSLNSWpreservationpresentation/Slide7.png' alt='Slide 7' title='' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 7&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;This is a screenshot of a data set available &lt;a href="https://data.research.uts.edu.au/examples/v0.2/farms_to_freeways/"&gt;here&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If sufficient of this metadata is present, then the spec explains how to construct a &lt;a href="https://www.datacite.org/"&gt;DataCite&lt;/a&gt; citation as we can see here.&lt;/p&gt;
&lt;p&gt;Essentially, DataCrates are data packages that have a content-manifest with checksums, to help ensure data-integrity, they have metadata in a linked-data format (using JSON-LD and metadata terms from schema.org and other ontologies) and they have an index.html file which displays the metadata in a human readable summary format, both for the package (crate) and …&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/UTSSLNSWpreservationpresentation/Slide8.png' alt='Slide 8' title='' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 8&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;... they have File-level descriptions of the content.
So although the original use case I just mentioned was packaging data for retention, we are prepping for preservation here also by employing Richard Lehane's Ziegfried tool to run through several of the steps required to identify preservation metadata.&lt;/p&gt;
&lt;p&gt;(The screenshots here are from a &lt;a href="https://data.research.uts.edu.au/examples/v0.2/Victoria_Arch_pub/"&gt;data set containing cave survey data&lt;/a&gt; and the UK national Archives’ &lt;a href="http://www.nationalarchives.gov.uk/PRONOM/fmt/370"&gt;Pronom format registry&lt;/a&gt;)&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/UTSSLNSWpreservationpresentation/Slide9.png' alt='
&lt;p&gt;Acknowledgements
STASH and Provisioner development is supported by the Australian National Data Service (ANDS) through the Australian Government’s National Collaborative Research Infrastructure Strategy Program and the Education Investment Fund (EIF) Super Science Initiative. ​
STASH is based on ReDBox, developed by Queensland Cyber Infrastructure Foundation Ltd (QCIF)
Sharyn Wise – eResearch Analyst	sharyn.wise@uts.edu.au 
Peter Sefton – Manager, eResearch Support	peter.sefton@uts.edu.au
Data Crate - https://github.com/UTS-eResearch/datacrate/&lt;/p&gt;
&lt;p&gt;Authors
' title='&lt;/p&gt;
&lt;p&gt;Acknowledgements
STASH and Provisioner development is supported by the Australian National Data Service (ANDS) through the Australian Government’s National Collaborative Research Infrastructure Strategy Program and the Education Investment Fund (EIF) Super Science Initiative. ​
STASH is based on ReDBox, developed by Queensland Cyber Infrastructure Foundation Ltd (QCIF)
Sharyn Wise – eResearch Analyst	sharyn.wise@uts.edu.au 
Peter Sefton – Manager, eResearch Support	peter.sefton@uts.edu.au
Data Crate - https://github.com/UTS-eResearch/datacrate/&lt;/p&gt;
&lt;p&gt;Authors
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes - Slide 9&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;So in sum, at UTS we may be early on the journey to preservation, but I've tried to illustrate some ways that we are building preparation into our systems and thinking so that when we come to implement preservation workflows, the means to do so, metadata and tooling for example, will be there. Here are our contact details and we will be happy to answer questions at any time. Thank you&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
</content><category term="Preservation"></category></entry><entry><title>Research Bazaar Sydney 2018 Report</title><link href="/2018/07/24/resbaz_sydney_2018.htm" rel="alternate"></link><published>2018-07-24T00:00:00+10:00</published><updated>2018-07-24T00:00:00+10:00</updated><author><name>Weisi Chen</name></author><id>tag:None,2018-07-24:/2018/07/24/resbaz_sydney_2018.htm</id><summary type="html">&lt;p&gt;A couple of weeks ago, I attended Research Bazaar (ResBaz) Sydney 2018 as an instructor. This is a wrap-up report on this successful and enjoyable event.&lt;/p&gt;
&lt;p&gt;&lt;img src="/blog/resbaz_sydney_2018/logo.png" alt="ResBaz Logo" /&gt;&lt;/p&gt;
&lt;h1&gt;Overview&lt;/h1&gt;
&lt;p&gt;Research Bazaar was initiated in Melbourne, and now it is a worldwide festival promoting the digital literacy emerging at the centre of modern …&lt;/p&gt;</summary><content type="html">&lt;p&gt;A couple of weeks ago, I attended Research Bazaar (ResBaz) Sydney 2018 as an instructor. This is a wrap-up report on this successful and enjoyable event.&lt;/p&gt;
&lt;p&gt;&lt;img src="/blog/resbaz_sydney_2018/logo.png" alt="ResBaz Logo" /&gt;&lt;/p&gt;
&lt;h1&gt;Overview&lt;/h1&gt;
&lt;p&gt;Research Bazaar was initiated in Melbourne, and now it is a worldwide festival promoting the digital literacy emerging at the centre of modern research, where researchers can learn from training courses, share knowledge and skills, network with peer researchers and have fun. In Australia, ResBaz has been held in Melbourne, Sydney, Brisbane, Perth and Hobart. Sydney’s first ResBaz was held at the University of Sydney in February 2016, which was aimed mainly at researchers from the University of Sydney. Luckily, at that time I was invited to be one of the trainers, which allows me to be a witness of the growth of this event in Sydney. Last year, with the support of our UTS eResearch team and the UTS library, ResBaz was held at UTS in July 2017. This was essentially the first “ResBaz” ResBaz with the involvement of multiple institutions: University of Technology Sydney, University of New South Wales, University of Sydney, Macquarie University, and Intersect. With 300+ registrations and 13 workshops delivered, the event was quite a success. I acted as an instructor again.&lt;/p&gt;
&lt;p&gt;Earlier this year, five Intersect colleagues, and myself became Software Carpentry instructors. Now Intersect has eight Software Carpentry accredited trainers, plus some experienced eResearch analysts and trainers to support ResBaz moving forward. ResBaz Sydney 2018 was held at Macquarie University on 3-5 July 2018. This community-building event keeps growing over time - over 600 researchers and research technologists from 9 universities (Macquarie University, University of Sydney, University of New South Wales, University of Technology Sydney, Western Sydney University, Australian Catholic University, University of New England, University of Newcastle, University of Wollongong) registered for this year’s sessions (compared with 300 registrations from 4 universities last year), including &lt;strong&gt;78 registered researchers from UTS&lt;/strong&gt;.&lt;/p&gt;
&lt;h1&gt;Training &amp;amp; Talks&lt;/h1&gt;
&lt;p&gt;Training was the key part of ResBaz that attendees enjoyed so much. Together with Aidan Wilson (Intersect) and Robert Woodward (Macquarie University), I co-instructed in one of the two Python streams over two days that covered Unix, Python and Git. There were other sessions including Beginner R Programming, Advanced R and Python Programming, Humanities and Social Sciences, SQL &amp;amp; Data Manipulation (including web scraping). The “themed” lightning talks included the Humanities and Technology Camp Unconference, Machine Learning, Lean Innovation for Researchers, etc., and there were sponsor talks on the third day.&lt;/p&gt;
&lt;div id="images"&gt;
    &lt;figure style="float:left;"&gt;
        &lt;img style="margin-right:1ex;" src="/blog/resbaz_sydney_2018/weisi_training.jpg" width="350" alt="Teaching Python using Jupyter Notebook"&gt;&lt;br&gt;
        &lt;figcaption&gt;Me Teaching Python using Jupyter Notebook :)&lt;/figcaption&gt;
    &lt;/figure&gt;
    &lt;figure style="float:left;"&gt;
        &lt;img src="/blog/resbaz_sydney_2018/training.jpg" width="430" alt="Researchers Coding!"&gt;&lt;br&gt;
        &lt;figcaption&gt;Researchers Coding!&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/div&gt;
&lt;div style="clear:both;"&gt;
&lt;br&gt;
&lt;p&gt;As a witness and instructor in all the 3 ResBaz Sydney events in history, I found the following highlights of this year’s training:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Macquarie University used an “overbooking” strategy for all training streams. It worked quite well to get a better turnout. In our Python stream, 33 researchers turned out, which was better than last year.&lt;/li&gt;
&lt;li&gt;More experienced instructors and helpers than last year shared the teaching load in each session. There were three Software Carpentry accredited instructors and three helpers in our Python session with 33 attendees, so the trainer-trainee ratio was quite high.&lt;/li&gt;
&lt;li&gt;We’ve achieved absolutely positive feedback. The only “negative” feedback was that the researchers wanted to have shorter breaks and more time for learning!&lt;/li&gt;
&lt;li&gt;Some lightning talks were quite inspiring and were well-received.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Bazaar Spirit&lt;/h1&gt;
&lt;p&gt;The grand hall was used for networking, session break catering, posters and sponsor booths. Although catering was not stellar, we all had a great time talking with researchers about their research and posters, where and how they can get support, and reunited with old friends and colleagues.&lt;/p&gt;
&lt;p&gt;One highlight was that Intersect provided popcorns and Intersect@10 anniversary cupcakes! They were so popular and attendees loved them.&lt;/p&gt;
&lt;p&gt;&lt;img src="/blog/resbaz_sydney_2018/hall.jpg" width="70%" alt="Coffee break"&gt;&lt;br&gt;
Coffee break
&lt;br&gt;&lt;/p&gt;
&lt;div id="images"&gt;
    &lt;figure style="float:left;"&gt;
        &lt;img style="margin-right:1ex;" src="/blog/resbaz_sydney_2018/popcorn.jpg" width="420" alt="Popcorn Served by Intersect"&gt;&lt;br&gt;
        &lt;figcaption&gt;Popcorn Served by Intersect&lt;/figcaption&gt;
    &lt;/figure&gt;
    &lt;figure style="float:left;"&gt;
        &lt;img src="/blog/resbaz_sydney_2018/git_cupcake.jpg" width="380" alt="Learning Git with Intersect@10 Cupcakes"&gt;&lt;br&gt;
        &lt;figcaption&gt;Learning Git with Intersect@10 Cupcakes&lt;/figcaption&gt;
    &lt;/figure&gt;
&lt;/div&gt;
&lt;div style="clear:both;"&gt;
&lt;br&gt;
&lt;p&gt;&lt;img src="/blog/resbaz_sydney_2018/reunion.jpg" alt="Reunion - Sharyn, Piy and Carmi" title="Reunion!" /&gt;&lt;br&gt;
Reunion - Sharyn (UTS), Piy (Usyd) and Carmi (CSIRO)
&lt;br&gt;&lt;br&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="/blog/resbaz_sydney_2018/intersect.jpg" width="80%" alt="Reunion - the Intersect Team"&gt;&lt;br&gt;
Reunion - the Intersect Team
&lt;br&gt;&lt;br&gt;&lt;/p&gt;
&lt;p&gt;One thing that could be improved in the future is that the main hall was not close to one of the two training buildings. It was a bit hard for attendees to move back and forth to the catering/booth area - this could be one of the reasons why they didn’t like the session breaks. Best to have all training rooms in the same building, so participants can get out of the rooms during breaks and grab a coffee and muffin, and get back to rooms easily whenever they want.&lt;/p&gt;
&lt;h1&gt;Support On!&lt;/h1&gt;
&lt;p&gt;ResBaz is about eResearch, research community building, and research technology support. This year, the participation of regional universities have highlighted the successful event - this is a sign of ResBaz expansion and spirit spreading!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Not only did I represent Intersect, I was also representing UTS as part of the UTS eResearch team&lt;/strong&gt;. I was so delighted to see many UTS researchers attend this event, and benefit from the training and networking. This is in addition to our training courses provided at UTS and the Hacky Hour eResearch support by the UTS eResearch team that UTS researchers already have access to.&lt;/p&gt;
&lt;p&gt;Last but not least, Intersect was one of the biggest supporters of ResBaz Sydney 2018. On top of being a major sponsor, one Intersect eResearch Analyst sat in the organising committee; 11 people were sent to ResBaz as instructors and helpers (some of them came to Sydney from Melbourne, Geelong, Canberra, Armidale and Newcastle), and six training workshops were led or co-instructed by Intersect.&lt;/p&gt;
&lt;p&gt;I am looking forward to the next ResBaz Sydney in 2019! Which university will volunteer to host it? Stay tuned!&lt;/p&gt;
&lt;h1&gt;Need Training and Support @UTS?&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;Training&lt;/strong&gt; courses are updated regularly. Please keep an eye on our &lt;a href="https://eresearch.uts.edu.au/training/"&gt;training page&lt;/a&gt;. The latest update has been done on 19 July 2018.&lt;/p&gt;
&lt;p&gt;Weekly &lt;strong&gt;Hacky Hour&lt;/strong&gt; tech support: 3-4pm every Thursday @Penny Lane (Building 11).&lt;/p&gt;
</content><category term="Event"></category></entry><entry><title>Trip Report (with bonus opinions) - Open Repositories 2018, Bozeman Montana, USA</title><link href="/2018/07/10/or2018.htm" rel="alternate"></link><published>2018-07-10T00:00:00+10:00</published><updated>2018-07-10T00:00:00+10:00</updated><author><name>ptsefton</name></author><id>tag:None,2018-07-10:/2018/07/10/or2018.htm</id><summary type="html">&lt;p&gt;I (Peter Sefton) recently attended &lt;a href="http://www.or2018.net/"&gt;OR2018&lt;/a&gt;, the Open
Repositories conference from June 4-7, 2018 in Bozeman Montana.&lt;/p&gt;
&lt;p&gt;This post is being posted on the &lt;a href="https://eresearch.uts.edu.au"&gt;UTS eResearch site&lt;/a&gt; and on &lt;a href="http://ptsefton.com"&gt;my site&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;My trip was funded by the University of Technology Sydney (UTS).&lt;/p&gt;
&lt;img alt="The sign at the Lewis and Clark Motel" src="/blog/or-2018/lewis_and_clark_sign.JPG"&gt;
&lt;h1&gt;Mission&lt;/h1&gt;
&lt;p&gt;Gavin Kennedey from QCIF was also in …&lt;/p&gt;</summary><content type="html">&lt;p&gt;I (Peter Sefton) recently attended &lt;a href="http://www.or2018.net/"&gt;OR2018&lt;/a&gt;, the Open
Repositories conference from June 4-7, 2018 in Bozeman Montana.&lt;/p&gt;
&lt;p&gt;This post is being posted on the &lt;a href="https://eresearch.uts.edu.au"&gt;UTS eResearch site&lt;/a&gt; and on &lt;a href="http://ptsefton.com"&gt;my site&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;My trip was funded by the University of Technology Sydney (UTS).&lt;/p&gt;
&lt;img alt="The sign at the Lewis and Clark Motel" src="/blog/or-2018/lewis_and_clark_sign.JPG"&gt;
&lt;h1&gt;Mission&lt;/h1&gt;
&lt;p&gt;Gavin Kennedey from QCIF was also in attendance, and we were on something of a
mission - to promote and get feedback on the recent work we've been doing on the
&lt;a href="https://www.redboxresearchdata.com.au/"&gt;ReDBox research data management
platform&lt;/a&gt;. We ran a ReDBOX intro
workshop on the Monday of the conference. Gavin and I also presented a general
introduction to ReDBOX and the provisioner, and I went into more detail about
&lt;a href="http://ptsefton.com/2018/06/29/DataCrate_2018.htm"&gt;the DataCrate standard&lt;/a&gt; for
shipping and showing-off research data that I have been leading, with help from
a growing community of supporters. I also did a presentation in the technical
session which included a live demo of using ReDBox to ingest DataCrates -
showing how it could 'sniff out' metadata from DataCrate packages.&lt;/p&gt;
&lt;h1&gt;Research Data Management&lt;/h1&gt;
&lt;p&gt;Open Repositories now has enough going on about Research Data Management that I
was able to spend most in those sessions when I wasn't meeting with people
directly.&lt;/p&gt;
&lt;p&gt;I heard people talk about a few things that helped confirm some of our design
choices at UTS, and a few things to challenge my world view as well.&lt;/p&gt;
&lt;p&gt;Vladimir Bubalo from Macquarie, Gavin and I  chased up some more details about the &lt;a href="https://dataverse.org/"&gt;Dataverse
repository software&lt;/a&gt;.There is some interest from
Macquarie in what's available in the way of open source research data
repositories.&lt;/p&gt;
&lt;p&gt;As attested by the session &lt;a href="https://www.conftool.net/or2018/index.php?page=browseSessions&amp;amp;presentations=show&amp;amp;search=australian+data+archive"&gt;Reaching out with Data: Dataverse Creating a Global
Community&lt;/a&gt;
Dataverse looks like it's a thriving product and would be a good integration
target for ReDBOX, it powers the &lt;a href="https://ada.edu.au/"&gt;Australian Data Archive&lt;/a&gt;
for one thing and can be used as an institutional data repository. One problem
they're still grappling with is large file support, which is the same issue with
&lt;em&gt;any&lt;/em&gt; repository software when you try to put large volumes of, or large numbers
of data streams through the API or web interface.&lt;/p&gt;
&lt;p&gt;There was a really interesting talk from the UK Jisc Research Data Shared
Services (RDSS) project (on which I have done some very part time consulting via
Artefactual in Canada) about how they failed to get a Samvera repository working
as part of their offering.&lt;/p&gt;
&lt;p&gt;The have published &lt;a href="https://docs.google.com/document/d/1CQ_Oc9rRjub-e964_6PKEB1N0H03UEOUb52thDNA3p4/edit#heading=h.y6gyy9rpmdvk"&gt;their report&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It seems like the project to adapt Samvera failed not because of large files or
large volumes of files, although that problem may have come up later, as &lt;a href="http://www.doria.fi/handle/10024/97641"&gt;we
reported at OR2014&lt;/a&gt;, Hydra, which was
rebranded Samvera had severe performance problems on the Alveo virtual lab
related to processing number of files in a single transaction. Two issues Jisc
called out in their report are &lt;em&gt;one&lt;/em&gt;, a failure to implement their complex
domain model in Samvera.  They say:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In order to capture and store the range of metadata required by the RDSS CDM
the internal storage model of the work type within RDSS Samvera needed to map
closely to the CDM’s conceptual data model. However, it did not prove to be
straightforward to translate the conceptual data model of the RDSS CDM, which
leverages entity-relationships, to a programming/storage model that is largely
intended to be flat.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And &lt;em&gt;two&lt;/em&gt;, they had problems implementing a message passing model.&lt;/p&gt;
&lt;p&gt;In order to avoid the kind of problems reported by Jisc and to deal with the
reality that many researchers just want to gulp-down files, or object-store
blobs, or database tables and most emphatically do not want to sip data
through a tiny API-straw we are building research data management
systems that are distributed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Avoiding complex customisation of applications wherever possible and instead
linking Research Data Management Plans (RDMPs) to  workspaces in research software
such as lab notebooks or git repositories, and archival data stores using
standard linked data techniques.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Not&lt;/em&gt; using message-queues that might block progress on workloads (lesson
learned from ReDBOX 1 development and its ill-conceived &amp;quot;curation&amp;quot;
workflows). We plan to run &amp;quot;bots&amp;quot;, stand alone agents instead which generate
reports or make links or auto-provision spaces.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Keeping archival data, and data that needs rapid access in large scalable
file storage systems, described and linked to RDMPs, data
descriptions, and data publication records. For us, 'the repository' is a
collection of services which includes the central index and a variety of
stores, rather than a single application: It's a lifestyle; it's not just
for Christmas; and most importantly it's a governance thing.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Given the above one of the things I was keen to do at OR2018 was to find out more
about the &lt;em&gt;Oxford Common Filesystem Layout&lt;/em&gt; (OCLF) which is being driven by people from
Oxford, Emory, Stanford, Cornell and Duraspace - it's about how you organise
digital assets in the kind of deconstructed architecture I described above for
UTS. The files are kept on a file system (&lt;em&gt;yes&lt;/em&gt; cloud-first people, that might
actually be backed by an object-store) so that you can run services against
them: index them for discovery, check their integrity, generate dissemination
versions for distribution, report on items due for disposal and so on.&lt;/p&gt;
&lt;p&gt;I couldn't get to the OCFL presentation but lead author Neil Jeffries from Oxford
talked me through the emerging standard and the process being used to develop
it. Neil assured me we can go ahead and implement against the draft spec (and
challenged me with the statement that at Oxford they don't like to talk about
data vs metadata, it's all data. I'm still thinking that over Neil, but I think I
still believe in metadata).&lt;/p&gt;
&lt;p&gt;This is from the (still rather sparse) draft dated 2018-06-22, a couple of weeks
after the conference:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A general observation is that the contents of a digital repository -- that is,
the digital files and metadata that an institution might wish to manage -- are
largely stable. Once content has been accessioned, it is unlikely to change
significantly over its lifetime. This is in contrast to the software
applications that manage these contents, which are ephemeral, requiring
constant updating and replacement. Thus, transitions between
application-specific methods of file management to support software upgrades
and replacement cycles can be seen as unnecessary and risky change, changing
the long-term stable objects to support the short-term, ephemeral software.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;By providing a specification for the file-and-folder layout on disk, the OCFL
is an attempt at reducing, or even eliminating, the need for these
transitions. As an application-independent specification, conforming
applications will natively 'understand' the underlying file structure without
needing to first transition these contents to their own format.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://ocfl.io/"&gt;https://ocfl.io/&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;From my notes of the conversation with Neil:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Multiple things accessing the file system writing changes to LOGS. Eg fixity
check, or create new.&lt;/p&gt;
&lt;p&gt;There is no state. File system is the state.  Digital pres services are
workers / crawler or message queue driven.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The incomplete &lt;a href="https://ocfl.io/is"&gt;spec&lt;/a&gt; is online - it's not enough to do a
complete implementation yet, but we will track it.&lt;/p&gt;
&lt;p&gt;The host university, Montana State had a nice take on this too. Sara Mannheimer,
Jason A. Clark, James Espeland  presented &lt;a href="https://www.conftool.net/or2018/index.php?page=browseSessions&amp;amp;presentations=show&amp;amp;search=A+Prototype+for+the+Institutional+Research+Data+Index"&gt;A Prototype for the Institutional
Research Data
Index&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Most out-of-the-box institutional repository systems don’t provide the
workflows and metadata features required for research data. Consequently, many
libraries now support two institutional repository systems—one for
publications, and one for research data—even when there are nearly a thousand
data repositories in the United States, many of which provide services and
policies that ensure their trustworthiness and suitability for institutional
research data. Libraries are either increasing spending by purchasing data
repository solutions from vendors, or replicating work by building,
customizing, and managing individual instances of data repository software.
This presentation suggests a potential solution to this issue: a prototype for
an open source Institutional Research Data Index (IRDI) that promotes
discovery and reuse of institutional datasets through automatic metadata
harvesting and search engine optimization. IRDI could lead to a single,
unified index for academic institutional research data. A unified data index
would lead to three key impacts: increasing discovery, reuse, and citation of
open research data; reinforcing the idea that research data is a legitimate
scholarly product; and promoting community-wide systems that require less
resource expenditure.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;They noted that getting a research data repository up and running is hard and
expensive.&lt;/p&gt;
&lt;p&gt;Their solution:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Deposit data elsewhere in discipline repositories&lt;/li&gt;
&lt;li&gt;Have a local IRDI - Institutional Research Data Index&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is similar to the ReDBox approach in that it is highly distributed and it
contains an IRDI. The Montana team have soon-to-be-released code that helps find
data that's residing &lt;em&gt;out there&lt;/em&gt; on the web, eg in Figshare, which we need to
look in to.&lt;/p&gt;
&lt;p&gt;Another thing we should explore ReDBOX is more about how data is stored and
secured. We are working with a product, the DELL/EMC Isilon which has a lot of
features in this area, but I plan to look at the Edinburgh &amp;amp; Manchester
DataVault project as well  &lt;a href="https://www.conftool.net/or2018/index.php?page=browseSessions&amp;amp;form_session=328#paperID165"&gt;Sustaining the momentum, moving the DataVault
project to a
service&lt;/a&gt;
Claire Knowles presented with with Mary McDerby, Robin Rice, Thomas Higgins&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/DataVault/datavault"&gt;DataVault&lt;/a&gt; does encrypted multi-site
storage. 3 copies. One on site, one outer Edinburgh on UK cloud. They use
chunking to reduce risk around encryption - lose less of a file if there's a
problem.&lt;/p&gt;
&lt;h1&gt;Some highlights&lt;/h1&gt;
&lt;p&gt;An insight from Esme Cowles in &lt;a href="https://www.conftool.net/or2018/index.php?page=browseSessions&amp;amp;form_session=231#paperID161"&gt;Valkyrie: Reimagining the Samvera Community&lt;/a&gt; which is looking at adding swappable back ends to the Samvera platform (so you can use something other than Fedora to store your stuff).&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Lesson from Islandora - don't fight your host platform.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Speaking of Islandora, in &lt;a href="https://www.conftool.net/or2018/index.php?page=browseSessions&amp;amp;presentations=show&amp;amp;search=Relational+Databases+as+Repository+Objects"&gt;Relational Databases as Repository Objects&lt;/a&gt; Alexander
Garnett showed off a plugin that gives live access to a SQL database &lt;em&gt;in&lt;/em&gt; your
Islandora repository; it spins up a Docker container on demand. I asked Alex on
The Twitter if he'd seen
&lt;a href="https://datasette.readthedocs.io/en/stable/"&gt;Datasette&lt;/a&gt; but he said that SQLite
does not scale well in his experience.&lt;/p&gt;
&lt;p&gt;Thomas Morrell from Caltech mentioned a few interesting things in &lt;a href="https://www.conftool.net/or2018/index.php?page=browseSessions&amp;amp;form_session=236#paperID171"&gt;Positioning a repository as campus research infrastructure&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Live visualization from a data reposioty&lt;/li&gt;
&lt;li&gt;Demoed fetching data from a &lt;a href="https://tccondata.org/"&gt;live repository&lt;/a&gt; using a Jupyter notebook.&lt;/li&gt;
&lt;li&gt;Harvesting metadata out of git repositories using the &lt;a href="https://codemeta.github.io/user-guide/"&gt;codemeta&lt;/a&gt; standard.
Codemeta metadata is very close in design to our DataCrate. this could be straight out of DataCrate:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;    {&amp;quot;author&amp;quot;:
      {
      &amp;quot;@id&amp;quot;:&amp;quot;http://orcid.org/0000-0003-0077-4738&amp;quot;,
      &amp;quot;@type&amp;quot;:&amp;quot;Person&amp;quot;,
      &amp;quot;email&amp;quot;:&amp;quot;slaughter@nceas.ucsb.edu&amp;quot;,
      &amp;quot;givenName&amp;quot;:&amp;quot;Peter&amp;quot;,
          &amp;quot;familyName: &amp;quot;Slaughter&amp;quot;
      }
    }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Joshua A. Westgard from the University of Maryland Libraries presented &lt;a href="https://www.conftool.net/or2018/index.php?page=browseSessions&amp;amp;form_session=234#paperID263"&gt;a Python
Library for the Fedora
API&lt;/a&gt;,
which would have been of interest to us at UTS if we'd gone ahead with our
planned Fedora 4 design for data management instead of joining forces with QCIF
in the new MongoDB based ReDBox.&lt;/p&gt;
&lt;p&gt;Chris Diaz from Northwestern University talked about making Static websites over
the top of a repo in &lt;a href="https://www.conftool.net/or2018/index.php?page=browseSessions&amp;amp;form_session=234#paperID170"&gt;Jekyll and Institutional
Repositories&lt;/a&gt;,
looks like a great, sustainable alternative to Omeka exhibitions worth
considering in some situations.&lt;/p&gt;
&lt;h1&gt;And the winners are...&lt;/h1&gt;
&lt;p&gt;I thought I'd pick out a few of the best quotes and one-liners from my notes.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Best quote: Robin Dean&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.conftool.net/or2018/index.php?page=browseSessions&amp;amp;form_session=234#paperID222"&gt;Fractional agile: how to do iterative development when it’s only part of your job&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;h3&gt;Abstract&lt;/h3&gt;
&lt;p&gt;Agile development methods such as scrum strongly recommend a dedicated team
for software development, meaning that all team members work solely on one
project at a time. This is not feasible in many of the academic and
open-source community environments where open repositories are developed. This
presentation will encourage developers, managers, and open-source contributors
to find sustainable ways to participate in agile development even if they
can’t dedicate themselves full time to a project. Drawing from my experience
as a scrum master on a digital repository team, I will share practical advice
on setting expectations, communicating about availability, and maintaining a
sustainable pace on a scrum team composed entirely of part-time members.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And the quote:&lt;/p&gt;
&lt;blockquote&gt;
&lt;h1&gt;THERE IS NO SUCH THING AS AN EXPECTATION THAT IS TOO CLEAR&lt;/h1&gt;
&lt;/blockquote&gt;
&lt;p&gt;Ta Robin, I've been using this at work.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Best catchphrase: Robert S. Doiel, Thomas Morrell&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.conftool.net/or2018/index.php?page=browseSessions&amp;amp;form_session=248#paperID173"&gt;Building software at the edges of heterogeneous repositories&lt;/a&gt;
Caltech Library, United States of America&lt;/p&gt;
&lt;blockquote&gt;
&lt;h3&gt;Abstract&lt;/h3&gt;
&lt;p&gt;Caltech Library has a heterogeneous mix of repository systems (e.g. EPrints
hosts CaltechAUTHORS and CaltechTHESIS, while CaltechDATA is based on
Invenio). Caltech Library has changed its focus from developing in the
specific repository system to one of development at the edges leveraging web
APIs. This has allowed us to not only repurpose content but start working at
collection level curation by integrating external data sources like ORCID,
CrossRef, FundRef and DataCite. The philosophy we have evolved is to work from
copies of the data in JSON form using an Open Source tool Caltech Library
created called &lt;a href="http://caltechlibrary.github.io/dataset"&gt;dataset&lt;/a&gt; as well as
additional Open Source tools in a project called
&lt;a href="http://caltechlibrary.github.io/datatools"&gt;datatools&lt;/a&gt;. These command line
tools are written in Go but can be easily used from more popular languages
like Python. This talk will introducing these tools and demonstrate their
usage via Python.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And the phrase:&lt;/p&gt;
&lt;blockquote&gt;
&lt;h1&gt;CONTINUOUS MIGRATION&lt;/h1&gt;
&lt;/blockquote&gt;
&lt;p&gt;I liked this phrase because we're migrating data at the moment between RedBox
1 and ReDBox 2 and it's causing my colleague Michael Lynch visible physical
and psychic pain. Robert's articulation of the idea that data migration
should be normal, and done constantly so it's easy and considered business as
usual resonated. But now that I think about it - what we're really aiming for
is to get the core data to stay put and not to move - as per the aims of the
Oxford Common Filesystem Layout I quoted above. The systems that index,
access and analyse the data, and trade metadata are the ones involved in
continuous migration.&lt;/p&gt;
&lt;p&gt;I am please to report that Robert has recently been in contact with me to
talk about aligning their dataset project with DataCrate, this is a step
towards DataCrate being able to look inside data files and do stuff like
describe column-headers in tabular data, rather than just describing the
files.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Best &amp;quot;Lesson Learned&amp;quot;: Lars Holm Nielsen, Alexandros Ioannidis, Krzysztof
Nowak, Jose Benito Gonzalez Lopez from CERN&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.conftool.net/or2018/index.php?page=browseSessions&amp;amp;form_session=249#paperID304"&gt;File loss: hits and near misses&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;h3&gt;Abstract&lt;/h3&gt;
&lt;p&gt;Repositories increasingly depend on external cloud storage or other complex
distributed systems in order to satisfy ever-growing needs for storing larger
data volumes. The cloud system helps repository manages store terabytes and
petabytes of data, and often simplifies the file management in the underlying
repository software. We trust these systems to store our files, yet, often we
lack understanding of the operation and internals of these systems and how
they can fail. This talk will present two file loss incidents on Zenodo,
uncovering some ways these distributed systems can fail. One incident was
caused by a coincidence of two software bugs in independent systems (the hit),
and a second incident was caused by a human operational mistake in the cloud
storage system (the near miss).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Great talk about when things go wrong. We don't hear enough of these stories.&lt;/p&gt;
&lt;p&gt;The lesson?&lt;/p&gt;
&lt;blockquote&gt;
&lt;h1&gt;LOG EVERYTHING&lt;/h1&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;The venue etc&lt;/h1&gt;
&lt;p&gt;Often the OR conference is at a conference centre when in North America but this
one was at the uni, in a lovely part of Bozeman, walkable from Downtown, through
a leafy suburb of early twentieth century houses. The venue was comfy, the food
was fine, and there was a cash bar at the dinner, though with a distinct lack of
options for people who didn't want alcohol, and a decent band, &lt;a href="https://www.littlejaneandthepistolwhips.com/"&gt;Little Jane and
the pistol whips&lt;/a&gt;. Gotta love that
gun culture. Downtown Bozeman is great, plenty of good food options including
Bison from Ted Turner's ranch. Ted not only revolutionised TV news by inventing
CNN, he is a largest private owner of Bison in the world according to our
server. He's even involved in research...&lt;/p&gt;
&lt;p&gt;Q. What did Ted Turner discover when he let 200 Bison loose in the top paddock
for 1 year?&lt;/p&gt;
&lt;img alt="The neon sign at Ted's" src="/blog/or-2018/teds.JPG"&gt;
&lt;p&gt;A. The bisontennial.&lt;/p&gt;
</content><category term="Repositories"></category></entry><entry><title>Open Repositories 2018 Presentation: ReDBox 2.0 / Provisioner</title><link href="/2018/07/06/RedBoX-Provisioner-OR2018.htm" rel="alternate"></link><published>2018-07-06T00:00:00+10:00</published><updated>2018-07-06T00:00:00+10:00</updated><author><name>ptsefton</name></author><id>tag:None,2018-07-06:/2018/07/06/RedBoX-Provisioner-OR2018.htm</id><summary type="html">&lt;p&gt;This is a presentation that Gavin Kennedy and I gave at &lt;a href="http://www.or2018.net/"&gt;Open Repositories
2018&lt;/a&gt; in Bozeman Montana.&lt;/p&gt;
&lt;p&gt;I am posting this on the &lt;a href="https://eresearch.uts.edu.au/"&gt;UTS eResearch website&lt;/a&gt; and on &lt;a href="http://ptsefton.com"&gt;my own site&lt;/a&gt;.&lt;/p&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide01.png' alt='ReDBox 2.0 / Provisioner
&lt;p&gt;Gavin Kennedy, QCIF
Peter Sefton, University of Technology Sydney
' title='ReDBox 2.0 / Provisioner&lt;/p&gt;
&lt;p&gt;Gavin Kennedy, QCIF
Peter Sefton, University of Technology Sydney
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 1&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;The first part of this presentation was narrated by Gavin.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide02.png' alt='ReDBox
ReDBox is a Research Data Management Platform that assists researchers and institutions to plan, create and publish their research data assets.
Open source (github.com/redbox-mint)
Managed and supported by QCIF
Fully customisable
Integration and interoperation
Cloudified
Community Driven
' title='ReDBox
ReDBox is a Research Data Management Platform that assists researchers and institutions to plan, create and publish their research data assets.
Open source (github.com/redbox-mint)
Managed and supported by QCIF
Fully customisable
Integration and interoperation
Cloudified
Community Driven
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 2&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;ReDBox is a …&lt;/p&gt;&lt;/details&gt;&lt;/section&gt;</summary><content type="html">&lt;p&gt;This is a presentation that Gavin Kennedy and I gave at &lt;a href="http://www.or2018.net/"&gt;Open Repositories
2018&lt;/a&gt; in Bozeman Montana.&lt;/p&gt;
&lt;p&gt;I am posting this on the &lt;a href="https://eresearch.uts.edu.au/"&gt;UTS eResearch website&lt;/a&gt; and on &lt;a href="http://ptsefton.com"&gt;my own site&lt;/a&gt;.&lt;/p&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide01.png' alt='ReDBox 2.0 / Provisioner
&lt;p&gt;Gavin Kennedy, QCIF
Peter Sefton, University of Technology Sydney
' title='ReDBox 2.0 / Provisioner&lt;/p&gt;
&lt;p&gt;Gavin Kennedy, QCIF
Peter Sefton, University of Technology Sydney
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 1&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;The first part of this presentation was narrated by Gavin.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide02.png' alt='ReDBox
ReDBox is a Research Data Management Platform that assists researchers and institutions to plan, create and publish their research data assets.
Open source (github.com/redbox-mint)
Managed and supported by QCIF
Fully customisable
Integration and interoperation
Cloudified
Community Driven
' title='ReDBox
ReDBox is a Research Data Management Platform that assists researchers and institutions to plan, create and publish their research data assets.
Open source (github.com/redbox-mint)
Managed and supported by QCIF
Fully customisable
Integration and interoperation
Cloudified
Community Driven
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 2&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;ReDBox is a Research Data Management Platform. It assists researchers and institutions to plan, create and publish their research data assets.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It is proudly open source.&lt;/li&gt;
&lt;li&gt;It is managed and supported by the QCIF Engineering Services team.&lt;/li&gt;
&lt;li&gt;It is fully customisable.&lt;/li&gt;
&lt;li&gt;It is an integration platform, it will join your platforms together. It is interoperable, your platforms will happily exchange information with ReDBox.&lt;/li&gt;
&lt;li&gt;It is in the cloud: mostly, we are progressively increasing our cloud options.&lt;/li&gt;
&lt;li&gt;But most important is that it is community driven, the work we do to continuously improve ReDBox is determined, prioritised and often contributed by our community.&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide03.png' alt='ReDBox History' title='ReDBox History' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 3&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;ReDBox has a rich history, with ongoing support from the Australian eResearch sector. Originally ReDBox was created on top of the Fascinator platform by USQ back in 2010 and taken up by successive institutions under the ANDS Metadata Stores funding program. In 2012 QCIF took over the management of ReDBox, giving it a stable non-partisan home. 2015 saw our first cloud version in ReDBox Lite, last year we produced our first SaaS version with the RAPportal, and this year we are releasing ReDBox 2.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide04.png' alt='ReDBox Community
' title='ReDBox Community
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 4&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;This slide shows the institutions we know are using ReDBOX. The ones above the
line have support contract with QCIF which help to pay for maintenance and
enhancement of the software.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide05.png' alt='ReDBox 1.x
Institutional Data Catalogue ++
Metadata Store
Mint Name Space Authority
DMPt
Curator
Harvester
Transformers
API
OAI/PMH Publisher
&lt;p&gt;' title='ReDBox 1.x
Institutional Data Catalogue ++
Metadata Store
Mint Name Space Authority
DMPt
Curator
Harvester
Transformers
API
OAI/PMH Publisher&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 5&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;At its core ReDBox is a customisable research data registry, providing forms, workflows and system
integrations to assist institutions in creating and managing the metadata
describing research data collections. It includes the innovative Mint platform
to provide a name-authority lookup service for researcher information and
project details, as well as commonly used research classifications, such as FOR
codes. Mint helps to create high quality linked-data metadata, by supplying URIs
as metadata values rather than strings which can be error-prone and ambiguous.&lt;/p&gt;
&lt;p&gt;It incorporates a Data Management Planning tool, curation functions and a
flexible harvester tool to pull data in from external platforms. It is a
schemaless forms driven platform, incorporating configurable transformers to
generate metadata from a standard schema, like RIF/CS or DataCite Citation
format.&lt;/p&gt;
&lt;p&gt;ReDBox includes a full set of APIs and provides an OAI-PMH interface for
metadata harvesting by repositories including Research Data Australia.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide06.png' alt='ReDBox 2.0
Technology refresh
Single web application framework (Sails.js)
Easy to configure forms
New object model (JSON-LD in MongoDB)
Modular/plug-in architecture
Multi-institutional RBaaS (e.g. RAPortal.org.au)
DLC (Data LifeCycle) support via RAiDs
maDMPs
Workspace concept to manage resources and datasets
Flexible storage provisioning with the UTS Provisioner Framework
Dataset harvesting, archiving (OCFL) and publishing
Comprehensive search/discovery tool
&lt;p&gt;' title='ReDBox 2.0
Technology refresh
Single web application framework (Sails.js)
Easy to configure forms
New object model (JSON-LD in MongoDB)
Modular/plug-in architecture
Multi-institutional RBaaS (e.g. RAPortal.org.au)
DLC (Data LifeCycle) support via RAiDs
maDMPs
Workspace concept to manage resources and datasets
Flexible storage provisioning with the UTS Provisioner Framework
Dataset harvesting, archiving (OCFL) and publishing
Comprehensive search/discovery tool&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 6&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;In late 2017 QCIF embarked upon a complete refresh of the ReDBox technology
stack, assisted by RDS and UTS. We threw out the clunky front end tech stack and
used Angular.js, a javascript framework that hasn’t gone off overnight. We have
created a form configuration process to let you create your own forms and style
them using bootstrap. It is easy to extend functionality using a plugin
architecture. We can deploy it so multiple institutions to run off a hosted
version of ReDBox, our ReDBox as a Service offering.  The DMP tool now supports
Machine Actionable DMPs. We have introduced a Workspace concept for managing
resources within ReDBox. And with UTS we are implementing provisioning, data
harvesting and rewriting the search/discovery tool.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide07.png' alt='Data Life Cycle
' title='Data Life Cycle
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 7&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;And this is where the Data Life Cycle is supported. It allows users to plan for
their data, acquire the resources for storing their data, harvest file level
data and metadata back into ReDBox, curate it, archive it, publish it and make
it discoverable.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide08.png' alt='ReDBox
maDMPs
' title='ReDBox
maDMPs
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 8&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;Mercifully for you, this is not a demo. This is just a screen shot of a full
blown DMP, with the DMP components organised in a tabular form. At this point I
will remind everyone that it is fully configurable and that the forms can be as
minimal or extensive as you require.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide09.png' alt='RAP
www.raportal.org.au
Configurable SaaS Platform
Proto-DMP
RAiD Support in ReDBox
Institutional Views
&lt;p&gt;' title='RAP
www.raportal.org.au
Configurable SaaS Platform
Proto-DMP
RAiD Support in ReDBox
Institutional Views&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 9&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;An example of the minimum form is the Research Activity Portal (RAP). The RAP is
our first configurable SaaS platform. It is a proto-DMP designed for users to
register their research activity in order to get a RAiD, which is a Research
Activity ID. It is available nationally in Australia and supports institutional
views, so an individual institution can have a RAP themed to their institution
with customised forms.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide10.png' alt='Provisioner - principles
&lt;p&gt;' title='Provisioner - principles&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 10&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;At this point Petie took over the talking.&lt;/p&gt;
&lt;p&gt;The next few slides go through some of the principles behind the service-provisioning aspect of ReDBOX.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide11.png' alt='Give the people what they (should) want
Improved ability to do
high-integrity high-impact research
' title='Give the people what they (should) want
Improved ability to do
high-integrity high-impact research
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 11&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;We try to appeal to Researcher’s better selves: who doesn’t want to do high
integrity research that has high impact?&lt;/p&gt;
&lt;p&gt;Knowing where your data are is obviously key to Research Integrity, and is
mandated by research norms, codes of practice and by funders. The provisioner
helps with this; because provisioning can be invoked from a Research Data
Management Plan (RDMP), the provisioned research space - we call them
&lt;em&gt;workspaces&lt;/em&gt; - has a bit of metadata that points back to the RDMP, which carries
details about authorship, ownership of rights, retention requirements etc.&lt;/p&gt;
&lt;p&gt;We also try to improve research impact - by making data available with as much
provenance as possible to encourage re-use.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide12.png' alt='An army of bots
To automate obvious things, like provisioning lab notebooks to everyone in a lab, providing a git repository for every software PhD
' title='An army of bots
To automate obvious things, like provisioning lab notebooks to everyone in a lab, providing a git repository for every software PhD
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 12&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;“An army of bots” is not really a principle, the principle is that everything
possible should be done to automatically capture research data where it is and
link it back to its custodian, and to metadata about the context in which the
data were collected, curated or created. In the eResearch team at UTS we have
long experience with customer requests such as “can we make sure that every
student has a file-share to which the supervising panel also has read access
even as the supervisors change” or “can I have an eNotebook for every PhD
candidate in my department”. The provisioner will help with this kind of
automation and more; we want to do things like report on how many electronic
notebooks or git repositories are owned by the staff and students in a cohort
that are NOT linked to an RDMP.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide13.png' alt='Small Pieces, loosely joined
Avoid point to point integration, use standards for integration
&lt;p&gt;' title='Small Pieces, loosely joined
Avoid point to point integration, use standards for integration&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 13&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;When we started work on what became the Provisioner module of ReDBOX we were looking at how to integrate data applications that house research workspaces. For example how might we assist researchers in moving - maybe by reference rather than copying - data from a project in a microsocope-image database to the storage service in a visualization facility so the developers there can work on analytics and visualizations, and then archive the original data, code and outputs of the viz process? When we considered the scope of this and that we might be supporting dozens of applications that provide workspaces over the next few years, and working more loosely with hundreds more we realiased that we didn’t want to be doing &lt;code&gt;n * n&lt;/code&gt; integrations, where &lt;code&gt;n&lt;/code&gt; is the number of research apps we support. We looked to &lt;em&gt;standards&lt;/em&gt; for managing data.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide14.png' alt='DataCrate Packaging
' title='
&lt;p&gt;DataCrate Packaging
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 14&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;One of the key standards was a way of moving “packages” of research data, where a package might mean a zip file, a manifest of data-by-reference or a directory of well-described data on a file share. From this search was born DataCrate - a specification which links together several existing standards into what we think is a best-practice generic way to ship and display research data.&lt;/p&gt;
&lt;p&gt;This screenshot is of a the index.html file from a DataCrate which you can take a look at &lt;a href="https://data.research.uts.edu.au/examples/v0.2/farms_to_freeways/"&gt;online&lt;/a&gt; or &lt;a href="https://data.research.uts.edu.au/examples/v0.2/farms_to_freeways.zip"&gt;download as a zip file&lt;/a&gt;.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide15.png' alt='DataCrate Packaging
' title='
&lt;p&gt;DataCrate Packaging
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 15&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;You can read more about the &lt;a href="https://github.com/UTS-eResearch/datacrate/tree/master/spec"&gt;spec at github&lt;/a&gt; and in the &lt;a href="http://ptsefton.com/2018/06/29/DataCrate_2018.htm"&gt;presentation&lt;/a&gt; I (Petie) gave at OR2018.&lt;/p&gt;
&lt;p&gt;Summary: It uses BagIt for organizing files with checksums, JSON-LD for high-fidelity extensible metadata (linked data in JSON format), uses the schema.org vocabulary where possible for general metadata (dates, places, people) and specifies where to look for terms that are not covered by schema.org.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide16.png' alt='Slide 16' title='' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 16&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;This diagram is a highly abstracted view of the UTS architecture for research-data management. Researchers are shown in many places in this diagram to emphasise, that while they &lt;em&gt;can&lt;/em&gt; use the Stash (ReDBox) portal to edit Research Data Management Plans, and describe data sets, them can ALSO continue to use research &lt;em&gt;workspaces&lt;/em&gt; independently. We aim to have an army of ‘provisioner bots’ working to help keep track of this.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide17.png' alt='Slide 17' title='' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 17&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;This is a mock-up of the Redbox 2.0 functionality showing the way researchers will be able to manage the research process, creating data management plans and requesting workspaces, as well as identifying existing workspaces and linking to them. The right-hand screenshot shows how gitlab projects can be linked to a Research Data Management Plan as &lt;em&gt;workspaces&lt;/em&gt;.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide18.gif' alt='Slide 18' title='
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 18&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;This animation shows how a researcher will create a new workspace - in this case a gitlab repository. The Provisioner uses the gitlab API to connect as the user, create a new project/workspace, and leaves behind a calling card which links the project back to the RDMP. Later, the researcher will be able to archive and maybe publish this data via Provisioner, and it will know where to find the data.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide19.png' alt='Slide 19' title='' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 19&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;This diagram shows how Research apps will be loosely coupled with Stash (the UTS name for our data management system that runs ReDBOX).&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide20.png' alt='Slide 20' title='' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 20&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;This diagram shows the technical architecture of the system. The “Repository” is actually several loosely coupled sub systems. The UTS implementation of ReDBox 2 will be a kind of “deconstructed” repository with the functions we see in more monolithic software such as DSPace or Samvera residing in different places. For example ReDBox is about data and workspace management, but the Data resides in research apps and in a static file-based archive, and the public-facing discovery service will be a separate application (which we’re working on at the moment).&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide21.png' alt='Future possibilities
Generic long tail data repositories
Integration with preservation systems
More provisioning plugins (eg Github, OSF, Figshare)
National data catalogue
Actionable Service Catalogue
(inter)National DMP
Open data workflows
Generate project portals
' title='Future possibilities
Generic long tail data repositories
Integration with preservation systems
More provisioning plugins (eg Github, OSF, Figshare)
National data catalogue
Actionable Service Catalogue
(inter)National DMP
Open data workflows
Generate project portals
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 21&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;This slide lists some of the future possibilities for ReDBox - we’re particularly interested in its potential at a national or consortial level.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide22.png' alt='Sustainability
Yes, it is FREE OPEN SOURCE SOFTWARE, but...
QCIF is Not For Profit
Competitive, commercial environment
Sandstone mentality
Small team, permanent staff
Financial sustainability
Support subscriptions
Projects - implementations &amp;amp; customisations
Managed hosting
SaaS model
Community and Communication
Community driven development
' title='Sustainability
Yes, it is FREE OPEN SOURCE SOFTWARE, but...
QCIF is Not For Profit
Competitive, commercial environment
Sandstone mentality
Small team, permanent staff
Financial sustainability
Support subscriptions
Projects - implementations &amp;amp; customisations
Managed hosting
SaaS model
Community and Communication
Community driven development
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 22&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;Finally (from Gavin) some words on sustainability.&lt;/p&gt;
&lt;p&gt;ReDBox is free open source software and you can download and run it today. But&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;QCIF is a not for profit organisation that can’t afford to cover the costs of ReDBox and the NCRIS grants are a distant memory.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;So we have multiple models for subscribing to a support service. This covers our maintenance costs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We fund development through projects like our collaboration with UTS.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We offer increasingly commercial services like managed hosting of ReDBox.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We are rolling out the Software as a Service (SaaS) version with a transactional charging model, subject ot feedback, next year.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We depend on our community and free and open communication with our community, so we know what each are doing.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;And we depend on our community to drive the direction of ReDBox.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/RedBoX-Provisioner-OR2018/Slide23.png' alt='More www.redboxresearchdata.com.au
&lt;p&gt;' title='More            www.redboxresearchdata.com.au&lt;/p&gt;
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;p&gt;&lt;em&gt;Notes&lt;/em&gt; - Slide 23&lt;/p&gt;
&lt;/summary&gt;
&lt;p&gt;If you would like to know more please check out our &lt;a href="https://www.redboxresearchdata.com.au"&gt;website&lt;/a&gt;.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
</content><category term="Repositories"></category></entry><entry><title>End-to-End Research Data Management for the Responsible Conduct of Research at the University of Technology Sydney</title><link href="/2018/07/04/APRI_2018_provisioner.htm" rel="alternate"></link><published>2018-07-04T00:00:00+10:00</published><updated>2018-07-04T00:00:00+10:00</updated><author><name>ptsefton</name></author><id>tag:None,2018-07-04:/2018/07/04/APRI_2018_provisioner.htm</id><summary type="html">&lt;p&gt;This presentation was written by Louise Wheeler, Sharyn Wise and me for the
&lt;a href="https://www.apri2018.org/"&gt;Asia Pacific Research Integrity 2018&lt;/a&gt; meeting in
Taiwan, Feb 2018 it was scripted and delivered by Louise, who is the UTS &lt;em&gt;Manager,
Research Integrity and Research Program&lt;/em&gt; and Sharyn works in the eResearch team
with me. This …&lt;/p&gt;</summary><content type="html">&lt;p&gt;This presentation was written by Louise Wheeler, Sharyn Wise and me for the
&lt;a href="https://www.apri2018.org/"&gt;Asia Pacific Research Integrity 2018&lt;/a&gt; meeting in
Taiwan, Feb 2018 it was scripted and delivered by Louise, who is the UTS &lt;em&gt;Manager,
Research Integrity and Research Program&lt;/em&gt; and Sharyn works in the eResearch team
with me. This is good introduction to the work we've been doing on the UTS
provisioner project from Louise's &lt;a href="http://www.arc.gov.au/research-integrity"&gt;Research
Integrity&lt;/a&gt; (RI) perspective. There's
not much technical detail in this talk about the open source
&lt;a href="https://www.redboxresearchdata.com.au/"&gt;ReDBox&lt;/a&gt; platform on which our data
management system, Stash, is built. I'll post more soon about that.&lt;/p&gt;
&lt;p&gt;Thanks also to Chris Evenhuis for some slide design.&lt;/p&gt;
&lt;p&gt;I'm posting this both on the &lt;a href="https://eresearch.uts.edu.au/"&gt;eResearch blog&lt;/a&gt; and
at &lt;a href="http://ptsefton.com"&gt;my site&lt;/a&gt;.&lt;/p&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/apri_2018_provisioner/Slide01.png' alt='End-to-EndResearch Data Managementfor theResponsible Conduct of Researchat the University of Technology Sydney
APRI Network Meeting 2018
February 26, 2018
' border='1'  width='85%'/&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Today I’ll be presenting the UTS approach to Research Data Management (RDM), and
highlighting how the development of a practical and comprehensive solution can
enable researchers to meet their RI obligations&lt;/p&gt;
&lt;p&gt;I’ll run through a brief history that lead to our approach, and then I will
present an overview of the solution we are implementing.&lt;/p&gt;
&lt;p&gt;Formal RI training may occur early in a researchers career, but otherwise
becomes assumed knowledge. Awareness of responsible research practices is not
often actively supported or promoted throughout an academic's career, at least
not consistently at the institutional level. At UTS we have pockets of
opportunities for promoting research integrity principles, and one such pocket
is the eResearch team – who have regular face-time with researchers when
supporting them with tools and practices. We are using these opportunities to
build a comprehensive framework that will enable researchers to meet RI
principles, through improved systems and practices.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/apri_2018_provisioner/Slide02.png' alt='
Evolution
&lt;p&gt;Image Source: www.arc.gov.au, http://www.dijifi.com/service/data-management/
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Australia’s current code has been in place since 2007. It is a relatively
prescriptive document that guides institutions and researchers as to their
responsibilities in meeting integrity principles, including management of
research data.&lt;/p&gt;
&lt;p&gt;Despite these guidelines, at that time there was no defined or coordinated
approach to research data management at UTS. Any management relied on the best
endeavours of individual researchers to protect their data. They continued to
store their data on file-servers; in drawers, in unlabelled hard drives, in
their garages etc.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/apri_2018_provisioner/Slide03.png' alt='
Evolution
&lt;p&gt;Image Source: www.arc.gov.au, http://www.dijifi.com/service/data-management/
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;It was not until 2013 that national infrastructure funding allowed Australian
Universities to develop Research Data catalogues. For the first time researchers
were encouraged to think of their data as an asset to be managed, as opposed to
simply kept. At the same time, national requirements to include RDM plans in
proposals to our major funding bodies, gave universities a bigger driver for
changing culture.&lt;/p&gt;
&lt;p&gt;At UTS we began to govern research data at the Policy level, which required
researchers to plan their data management, At the systems level, because of the
new focus on RDM requirements, we had the necessary senior-level investment to
develop our research data catalogue (which we call Stash).&lt;/p&gt;
&lt;p&gt;Take up of Stash over the next two years was very slow. With minimal resources,
the strategy for promoting Stash focused on educating graduate research students
and thereby indirectly, their supervisors. It was difficult to persuade
researchers that sharing their data was a good idea, and that storage should be
on university owned and controlled systems.&lt;/p&gt;
&lt;p&gt;And, since the ANDS funding rules didn’t allow for development of data
repositories, our catalogues weren’t integrated with our storage solutions. It
was clear at that point that researchers needed an integrated data management
solution.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/apri_2018_provisioner/Slide04.png' alt='
Evolution
&lt;p&gt;Image Source: www.arc.gov.au, http://www.dijifi.com/service/data-management/
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;In 2016 efforts around RI and RDM began to align. We rolled out a new Research
Integrity Framework while the eResearch team produced a &lt;a href="https://eresearch.uts.edu.au/eresearch/strategy/"&gt;Strategy and
roadmap&lt;/a&gt; that explicitly
linked data management to Integrity requirements and during 2017, our research
policy environment was redrafted in the same vein.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/apri_2018_provisioner/Slide05.png' alt='
Evolution
&lt;p&gt;Image Source: https://las.inf.ethz.ch/research/large-scale-machine-learning
Image Source: www.arc.gov.au, http://www.dijifi.com/service/data-management/
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Collectively, these efforts enabled us to put the pieces in place for an
end-to-end data management solution and the &lt;a href="https://eresearch.uts.edu.au/2018/04/05/provisioner_1.htm"&gt;Provisioner project&lt;/a&gt; was born.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/apri_2018_provisioner/Slide06.png' alt='6
&lt;p&gt;Provisioner&lt;/p&gt;
&lt;p&gt;Provisioner
Picture credit: Gerrad Barthelot, Technical Architect, IT Infrastructure UTS
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;This is Provisioner. It looks complex, but I will walk you through it using a
couple of examples that demonstrate how it supports both RDM and research
integrity.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/apri_2018_provisioner/Slide07.png' alt='6
&lt;p&gt;Provisioner&lt;/p&gt;
&lt;p&gt;Provisioner
Research Data Mgmt Plan
Research Data Catalogue&lt;/p&gt;
&lt;p&gt;Picture credit: Gerrad Barthelot, Technical Architect, IT Infrastructure UTS
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;A researcher (top centre) accesses the data management catalogue “Stash”, which
has three parts:​ (1) ​Create a data management plan (as required by our policy
environment), to describe how their data will be collected, analysed, stored and
accessed.​ (2) Access research data catalogue listing where archived data sets can
be found​&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/apri_2018_provisioner/Slide08.png' alt='6
&lt;p&gt;Provisioner&lt;/p&gt;
&lt;p&gt;Provisioner&lt;/p&gt;
&lt;p&gt;Research Data Mgmt Plan
Research Data Catalogue
Picture credit: Gerrad Barthelot, Technical Architect, IT Infrastructure UTS
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;And from the plan, they (will by the end of 2018) have access to (3) the
innovative provisioning tool which allows researchers to provision workspaces
such as file-shares, electronic notebooks, and repositories for programming
code,​&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/apri_2018_provisioner/Slide09.png' alt='6
&lt;p&gt;Provisioner&lt;/p&gt;
&lt;p&gt;Provisioner
Research Data Mgmt Plan
Research Data Catalogue
Picture credit: Gerrad Barthelot, Technical Architect, IT Infrastructure UTS
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;They can request and link to data storage, and at the end of the project
archived and publish their datasets.​ ​&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/apri_2018_provisioner/Slide10.png' alt='6
&lt;p&gt;Provisioner&lt;/p&gt;
&lt;p&gt;Provisioner&lt;/p&gt;
&lt;p&gt;Picture credit: Gerrad Barthelot, Technical Architect, IT Infrastructure UTS
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;The Provisioner will also have a range of reporting tools and automated ‘bots’
than generate useful reports for the Research Integrity team – identifying
‘orphaned’ workspaces, or auto-creating data management plans for researcher or
research students.​&lt;/p&gt;
&lt;p&gt;Information is pulled in from our existing databases to populate the RDMPs,
again saving researchers time.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/apri_2018_provisioner/Slide11.png' alt='6
&lt;p&gt;Provisioner&lt;/p&gt;
&lt;p&gt;Provisioner
Picture credit: Gerrad Barthelot, Technical Architect, IT Infrastructure UTS
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;We had a recent example in a science lab where some code from one project was
modified for use in a new project. Most researchers are not aware that code is
considered research data that must be retained, particularly in the sciences
where they are not trained in IT. So no copies of the code were retained, and
that data is now lost. In the Provisioner model, &lt;a href="https://about.gitlab.com/tools"&gt;Gitlab&lt;/a&gt; solves this problem
without the researcher having to stop and think about their research integrity
obligations by saving each version the code and creating a revision history.​&lt;/p&gt;
&lt;p&gt;By introducing Provisioner, we are not implying that researchers can remain
blissfully unaware of their responsibilities, but we are supporting them by
providing a positive and streamlined experience, with the added comfort of
understanding that it comprehensively addresses responsible research practices.
We will continue to raise awareness by linking integrity training to our
practical training around data mgmt, and other element of the project
lifecycle.​&lt;/p&gt;
&lt;p&gt;​We’ve found that bringing researchers into contact with our eResearch experts
is an effective way of supporting and building an integrity culture. Our
researchers tend to prefer practice-based over principles-based education: so we
are aiming to build the principles into our workshops/clinics (e.g. Research
ethics practices, Research data management practices, Publications workshops
that focus on open access and reproducibility​&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/apri_2018_provisioner/Slide12.png' alt='6
&lt;p&gt;Provisioner&lt;/p&gt;
&lt;p&gt;Provisioner
Picture credit: Gerrad Barthelot, Technical Architect, IT Infrastructure UTS
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Another example of how Provisioner addresses RCR is the incorporation of the
&lt;a href="http://www.labarchives.com/"&gt;LabArchives&lt;/a&gt;/ELN (Electronic Lab Notebook)(top left). As this model is being implemented uni-wide, it
gives us  an institutional advantage of oversight. eNotebooks developed at
the lab or faculty level leads to inconsistent practices and means that data is
out of the universities control.&lt;/p&gt;
&lt;p&gt;Our policy dictates that rights in data are owned by the institution, which means we should
have some control over how it is stored, accessed and reported, and also have
the ability to interrogate the system, should any disputes arise. The
LabArchives tool ensures that data can not be manipulated or lost, can help to
clarify IP and ownership issues, and also enables us to manage data access when
researchers move institutions.​ ​ ​ ​&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/apri_2018_provisioner/Slide13.png' alt='6
&lt;p&gt;Provisioner&lt;/p&gt;
&lt;p&gt;Provisioner
Picture credit: Gerrad Barthelot, Technical Architect, IT Infrastructure UTS
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;And finally, for researchers doing real-live imaging experiments, they could
generate terabytes of data in one session, so Provisioner can provide access to
large, online repositories such as
&lt;a href="https://www.openmicroscopy.org/omero/"&gt;Omero&lt;/a&gt;.​ This gives researchers comfort
that all their experimental data can be kept, not just selected files.  ​​ ​ ​​​&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/apri_2018_provisioner/Slide14.png' alt='
&lt;p&gt;Picture credit: Screen shot of Stash 2.0, UTS Research Data Catalogue
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Provisioner is still being developed throughout 2018. What we’ve learned so far is that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Simply mandating new RDM requirements doesn’t work. Researchers are
incentivised by integrated and discipline-relevant tools that save them time
and give them reason to uphold good data management practices for more than just
policy compliance.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Successful implementation requires stakeholder engagement and championship at
senior levels - we are fortunate that our Associate Dean, Research (ADR) in
Health has invested heavily in both our RI and RDM responsibilities and has
implemented a cultural change in that Faculty, meaning that every project now
has an RDMP. This is reflected in uptake of RDMPs across the institution in
2017, supported by the extensive cross-promotion of between Ethics, RI/RCR
and RDM representatives at UTS.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/apri_2018_provisioner/Slide15.png' alt='
&lt;p&gt;' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes&lt;/h3&gt;
&lt;/summary&gt;
&lt;p&gt;Early feedback has been positive with researchers keen to take up the components
we have rolled out because they can see immediate value in adopting them, and we
can already begin to see the effectiveness of the model by the levels of uptake
of our &lt;a href="http://www.labarchives.com/"&gt;LabArchives&lt;/a&gt; and &lt;a href="https://about.gitlab.com/tools"&gt;GitLab&lt;/a&gt; since their
implementation in 2017.&lt;/p&gt;
&lt;p&gt;However, there is still a way to go. Data archiving and preservation to cater
for longer retention periods such as those demanded by clinical trials are still
ahead on the eResearch roadmap. Data management practice is improving but is by
no means universal. We accept that culture change concerning RDM is a long term
proposition; but it can also be a quick win for RCR by offering an opportunity
to promote research integrity in practical ways. Meanwhile, we take a risk
management approach, focusing on sensitive and personal data and grants where
the funding bodies require RCR and research data management.&lt;/p&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
&lt;section typeof='http://purl.org/ontology/bibo/Slide'&gt;
&lt;img src='/blog/apri_2018_provisioner/Slide16.png' alt='
&lt;p&gt;Acknowledgements
STASH and Provisioner development is supported by the Australian National Data Service (ANDS) through the Australian Government’s National Collaborative Research Infrastructure Strategy Program and the Education Investment Fund (EIF) Super Science Initiative. ​
STASH is based on ReDBox, developed by Queensland Cyber Infrastructure Foundation Ltd (QCIF)
Louise Wheeler – Manager, Research Integrity	louise.wheeler@uts.edu.au
Sharyn Wise – eResearch Analyst	sharyn.wise@uts.edu.au
Peter Sefton – Manager, eResearch Support	peter.sefton@uts.edu.au
Authors
' border='1'  width='85%'/&amp;gt;&lt;/p&gt;
&lt;details open="open"&gt;
&lt;summary&gt;
&lt;h3&gt;Notes&lt;/h3&gt;
&lt;/summary&gt;
&lt;/details&gt;
&lt;/section&gt;
&lt;br/&gt;&lt;br/&gt;&lt;hr/&gt;
</content><category term="Repositories"></category></entry><entry><title>Introducing Provisioner</title><link href="/2018/04/05/provisioner_1.htm" rel="alternate"></link><published>2018-04-05T00:00:00+10:00</published><updated>2018-04-05T00:00:00+10:00</updated><author><name>Mike Lynch</name></author><id>tag:None,2018-04-05:/2018/04/05/provisioner_1.htm</id><summary type="html">&lt;p&gt;This post is an introduction to the Provisioner, an open framework for
research data management which we're developing in collaboration with
&lt;a href="https://www.qcif.edu.au/"&gt;the Queensland Cyber Infrastructure Foundation, QCIF&lt;/a&gt;
and
&lt;a href="http://www.ands.org.au/"&gt;the Australian National Data Service, ANDS&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Provisioner grew out of a project funded by the UTS IT Capital
Managemement Program, which is …&lt;/p&gt;</summary><content type="html">&lt;p&gt;This post is an introduction to the Provisioner, an open framework for
research data management which we're developing in collaboration with
&lt;a href="https://www.qcif.edu.au/"&gt;the Queensland Cyber Infrastructure Foundation, QCIF&lt;/a&gt;
and
&lt;a href="http://www.ands.org.au/"&gt;the Australian National Data Service, ANDS&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Provisioner grew out of a project funded by the UTS IT Capital
Managemement Program, which is, confusingly, also called
Provisioner. In this post, I'll use &amp;quot;Provisioner&amp;quot; to refer to the
framework and software, not the project as a whole.&lt;/p&gt;
&lt;p&gt;The original goal of the project was to provide a data pipeline which
would manage the transfer of microscopy data from
the
&lt;a href="https://www.uts.edu.au/about/faculty-science/microbial-imaging-facility/"&gt;Microbial Imaging Facility, MIF&lt;/a&gt;
to the visualisation environment at the
&lt;a href="https://www.uts.edu.au/partners-and-community/data-arena/overview"&gt;Data Arena&lt;/a&gt;,
with some analysis and processing on the way.&lt;/p&gt;
&lt;p&gt;During the initial scoping and design work for the project, the
middleware system to support the transfer of data from MIF to the Data
Arena started to look like a desirable technology in itself. This
middleware is what became the Provisioner. The final design for
Provisioner is an attempt to address two problems -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;how do we support end-to-end data management without taking
researchers away from the tools they use for research or requiring
them to manually enter administrative data?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;how do we integrate data management into research tools without
having to customise a growing list of software platforms?&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The best introduction to Provisioner is to describe how most
researchers will interact with it.&lt;/p&gt;
&lt;h2&gt;Service catalogue&lt;/h2&gt;
&lt;p&gt;Stash is the UTS research data catalogue, allowing researchers to
create RDMPs (research data management plans) and describe
datasets. Stash is currently implemented in &lt;a href="http://www.redboxresearchdata.com.au/"&gt;ReDBox 1.9&lt;/a&gt;, an open source
platform maintained by &lt;a href="https://www.qcif.edu.au/"&gt;QCIF&lt;/a&gt;. As part of
this project, QCIF are reimplementing ReDBox using modern web
frameworks.&lt;/p&gt;
&lt;p&gt;The new version of ReDBox will include a service catalogue, which
will allow researchers to request a range of IT services for use with
a project with an RDPM.&lt;/p&gt;
&lt;p&gt;&lt;img src="/blog/provisioner/provisioner.png" alt="Provisioner architecture diagram" /&gt;&lt;/p&gt;
&lt;p&gt;This diagram shows where we'd like to be in a few years with
Provisioner managing research data across&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;file storage on local or cloud services&lt;/li&gt;
&lt;li&gt;research code repositories in &lt;a href="https://about.gitlab.com/"&gt;GitLab&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;dedicated microscopy management from
&lt;a href="https://www.openmicroscopy.org/omero/"&gt;OMERO&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;electronic lab notebooks from &lt;a href="http://www.labarchives.com/"&gt;labarchives&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;secure data collection surveys from &lt;a href="https://www.project-redcap.org/"&gt;REDCap&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;and other services that will be requested by our researchers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The current project is aiming to provide connectors to
GitLab, OMERO and labarchives.&lt;/p&gt;
&lt;p&gt;The Provisioner terminology for these services is &lt;em&gt;research apps&lt;/em&gt; (or
just &lt;em&gt;apps&lt;/em&gt;), and a directory, site or project belonging to a
researcher in a particular app is referred to as a &lt;em&gt;workspace&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;When a researcher requests a workspace via the service catalogue,
Stash creates a workspace record in its own database which is linked
to the user and RDMP. It then calls the Provisioner API to request a
new workspace. The Provisioner API translates this request into the
actual operations needed to provision the workspace in that particular
research app, and writes a metadata file into the workspace which
links it to the new workspace record, the researcher and the RDMP.&lt;/p&gt;
&lt;p&gt;The Stash interface then reports back that the workspace has been
created, and gives the research a URI and instructions for how to
access it.&lt;/p&gt;
&lt;p&gt;This has given the researcher a resource which they can start using,
and has ensured via the metadata file that the resource can be linked
back to the RDMP and project which govern it.&lt;/p&gt;
&lt;h2&gt;Metadata&lt;/h2&gt;
&lt;p&gt;Once a research has used it to get access to a research app,
Provisioner will get out of their way and let them use the app's
native interface. Even this minimal use-case will be a big step
forward in terms of research data management: we'll have a system
which can make sure that new research IT storage and workspaces
contain documentation, in the form of the Provisioner metadata, which
describes the data and links back to the RDMP and researcher who are
responsible for it.&lt;/p&gt;
&lt;p&gt;The exact spec of the Provisioner metadata is evolving as we develop
the system. We're going to use
&lt;a href="https://github.com/UTS-eResearch/datacrate/"&gt;DataCrate&lt;/a&gt;,
an evolving standard for packaging and documenting
research data. DataCrate provides an HTML version which can be used as
a landing page or manifest when a dataset is published - here's
&lt;a href="https://data.research.uts.edu.au/public/Victoria_Arch/"&gt;a sample dataset of LIDAR data from the Wombeyan caves&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The same data is also provided as a &lt;a href="https://json-ld.org/"&gt;JSON-LD&lt;/a&gt;
document which is easily machine-readable and follows linked open data
principles.&lt;/p&gt;
&lt;p&gt;DataCrate is flexible about the level of detail it can provide: it's
possible but not necessary to go down to the level of individual
files. This means that early in the research life-cycle, a workspace
can have broad, high-level descriptions, which can be filled in as
required when a dataset is moved into the publication workflow.&lt;/p&gt;
&lt;h2&gt;Managing workspaces&lt;/h2&gt;
&lt;p&gt;The workspace interface in Stash will allow researchers to carry out
high-level management tasks. By &amp;quot;high-level&amp;quot;, we mean operations which
make sense when applied to a workspace in any research app, rather
than those which only apply to, for example, microscopy, or file
storage.&lt;/p&gt;
&lt;p&gt;The range of high-level operations will include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;sharing a workspace with other researchers&lt;/li&gt;
&lt;li&gt;setting a workspace to be immutable (read-only)&lt;/li&gt;
&lt;li&gt;making a workspace from one app available in a different research
app&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Provisioner API is being designed to provide a common interface
for these operations. Each research app will have an adaptor which
translates these API calls into the native API for that app.&lt;/p&gt;
&lt;p&gt;The third operation listed above, making a workspace from one app
available in another, is how Provisioner will satisfy the initial
requirement: a data pipeline from MIF to the Data Arena. It also has
the potential to support many more useful processes, such as data
archiving and publication. I'll go into more detail about this in the
next blog post.&lt;/p&gt;
</content><category term="Project"></category></entry><entry><title>Upcoming Training Courses in 2018 Have been Updated</title><link href="/2018/02/28/training_update_2018-02-28.htm" rel="alternate"></link><published>2018-02-28T00:00:00+11:00</published><updated>2018-02-28T00:00:00+11:00</updated><author><name>Weisi Chen</name></author><id>tag:None,2018-02-28:/2018/02/28/training_update_2018-02-28.htm</id><summary type="html">&lt;h2&gt;Upcoming Training Courses in 2018 Have been Updated&lt;/h2&gt;
&lt;p&gt;UTS eResearch and Intersect offer a wide range of specialised courses for researchers, from beginner through to advanced levels in High-Performance Computing (HPC), Programming with R/Python/Matlab, Excel, data management, data cleaning and visualisation, databases and SQL, and more. Delivered by …&lt;/p&gt;</summary><content type="html">&lt;h2&gt;Upcoming Training Courses in 2018 Have been Updated&lt;/h2&gt;
&lt;p&gt;UTS eResearch and Intersect offer a wide range of specialised courses for researchers, from beginner through to advanced levels in High-Performance Computing (HPC), Programming with R/Python/Matlab, Excel, data management, data cleaning and visualisation, databases and SQL, and more. Delivered by Intersect's team of experts, training courses provide practical and research-relevant hands-on exercises.&lt;/p&gt;
&lt;p&gt;Upcoming training courses are updated regularly. The latest update has been done on 28 Feb 2018. Courses after April are not open for registration yet, please keep an eye on our &lt;a href="https://eresearch.uts.edu.au/training/"&gt;training page&lt;/a&gt;.&lt;/p&gt;
</content><category term="Event"></category></entry><entry><title>DataCrate: Formalising ways of packaging research data for re-use and dissemination</title><link href="/2018/01/30/datacrate.htm" rel="alternate"></link><published>2018-01-30T00:00:00+11:00</published><updated>2018-01-30T00:00:00+11:00</updated><author><name>ptsefton</name></author><id>tag:None,2018-01-30:/2018/01/30/datacrate.htm</id><summary type="html">&lt;p&gt;By Peter Sefton&lt;/p&gt;
&lt;p&gt;&lt;a href="http://ptsefton.com/2017/10/19/datacrate.htm"&gt;A version of this post is also available at my website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This is a presentation I gave at eResearch Australasia 2017-10-18 about the new &lt;a href="https://github.com/UTS-eResearch/datacrate/blob/master/spec/0.1/data_crate_specification_v0.1.md"&gt;Draft (v0.1) Data Crate Specification&lt;/a&gt; for data packaging I've just completed, with lots of help from others (credits at the end).&lt;/p&gt;
&lt;h3&gt;BACKGROUND …&lt;/h3&gt;</summary><content type="html">&lt;p&gt;By Peter Sefton&lt;/p&gt;
&lt;p&gt;&lt;a href="http://ptsefton.com/2017/10/19/datacrate.htm"&gt;A version of this post is also available at my website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This is a presentation I gave at eResearch Australasia 2017-10-18 about the new &lt;a href="https://github.com/UTS-eResearch/datacrate/blob/master/spec/0.1/data_crate_specification_v0.1.md"&gt;Draft (v0.1) Data Crate Specification&lt;/a&gt; for data packaging I've just completed, with lots of help from others (credits at the end).&lt;/p&gt;
&lt;h3&gt;BACKGROUND&lt;/h3&gt;
&lt;p&gt;In 2013 Peter Sefton and Peter Bugeia presented at eResearch Australasia on a format for packaging research data(1), using standards based metadata, with one innovative feature – instead of including metadata in a machine readable format only, each data package came with an HTML file that contained both human and machine readable metadata, via RDFa, which allows semantic assertions to be embedded in a web page.&lt;/p&gt;
&lt;p&gt;Variations of this technique have been included in various software products over the last few years, but the there was no agreed standard on which vocabularies to use for metadata, or specification of how the files fitted together.&lt;/p&gt;
&lt;h3&gt;THE PRESENTATION&lt;/h3&gt;
&lt;p&gt;This presentation will describe work in progress on the DataCrate specification(2), illustrated with examples, including a tool to create DataCrate. We will also discuss other work in this area, including Research Object Bundles (3)  and DataConservency(4) packaging.&lt;/p&gt;
&lt;p&gt;We will be seeking feedback from the community on this work should it continue? Is it useful? Who can help out?
The DataCrate spec:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Has both human and machine readable metadata at a package (data set/collection) level as well as at a file level&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Allows for and encourages inclusion of contextual metadata such as descriptions of organisations, facilities, experiments and people linked to files with meaningful relationships (eg to say a file was created by a particular machine, as part of a particular experiment, at an organisation).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Is a BagIt profile(5). BagIt(6) is a simple packaging standard for file-based data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Has a README.html tag file at the root with bagit-style metadata about the distribution (contact details etc) with a link to;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;a CATALOG.html file in RDFa, using schema.org metadata inside the payload (data) dir with detailed information about the files in the package, and a redundant CATALOG.json in JSON-LD format&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Is extensible easily as it is based on RDF.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;REFERENCES&lt;/h3&gt;
&lt;p&gt;Sefton P, Bugeia P. Introducing next year’s model, the data-crate; applied standards for data-set packaging. In: eResearch Australasia 2013 [Internet]. Brisbane, Australia; 2013. Available from: http://eresearchau.files.wordpress.com/2013/08/eresau2013_submission_57.pdf&lt;/p&gt;
&lt;p&gt;datacrate: Bagit-based data packaging specification for dissemination of research data with useful human and machine readable metadata: “Make Data Crate Again!” [Internet]. UTS-eResearch; 2017 [cited 2017 Jun 29]. Available from: https://github.com/UTS-eResearch/datacrate&lt;/p&gt;
&lt;p&gt;Research Object Bundle [Internet]. [cited 2017 Jun 16]. Available from: https://researchobject.github.io/specifications/bundle/&lt;/p&gt;
&lt;p&gt;Data Conservancy Packaging Specification Home [Internet]. [cited 2017 Jun 29]. Available from: http://dataconservancy.github.io/dc-packaging-spec/dc-packaging-spec-1.0.html&lt;/p&gt;
&lt;p&gt;Ruest N. BagIt Profiles Specification [Internet]. 2017 Jun. Available from: https://github.com/ruebot/bagit-profiles&lt;/p&gt;
&lt;p&gt;Kunze J, Boyko A, Vargas B, Madden L, Littman J. The BagIt File Packaging Format (V0.97) [Internet]. [cited 2013 Mar 1]. Available from: http://tools.ietf.org/html/draft-kunze-bagit-06&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-0.png' alt='DataCrate: Formalising || ways of packaging research ||    data for re-use and ||       dissemination ||  Peter Sefton, University of Technology Sydney'&gt;
&lt;p&gt;This is a presentation I gave at eResearch Australasia 2017-10-18.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-1.png' alt='test-1'&gt;
&lt;p&gt;Peter Bugeia and I talked about this 4 years ago. This year I got around to leading the
effort to standardising what we did back then.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-2.png' alt='test-2'&gt;
&lt;p&gt;This presentation is structured as a story.&lt;/p&gt;
&lt;p&gt;Back in June Cameron Neylon was &lt;a href="http://cameronneylon.net/blog/as-a-researcher-im-a-bit-bloody-fed-up-with-data-management/"&gt;annoyed&lt;/a&gt;&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-3.png' alt='&amp;quot;More concretely I specifically have data from a set of interviews. I have audio and || I have notes/transcripts. I have the interview prompt. I have decided this set of || around 40 files is a good package to combine into one dataset on Zenodo. So my || next step is to search for some guidance on how to organise and document || that data. Interviews, notes, must be a common form of data package right? || So a quick search for a tutorial, or guidance or best practice? ||  || Nope. Give it a go. You either get a deep dive into metadata schema (and || remember I&amp;#x27;m one of the 2% who even know what those words mean) or you get || very high level generic advice about data management in general. Maybe you get || a few pages giving (inconsistent) advice on what audio file formats to use.&amp;quot;'&gt;
&lt;p&gt;When I saw this cry for help I contacted Cameron and offered to work with him.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-4.png' alt='&amp;quot;As a researcher trying to do a good job || of data deposition, I want an example of || my kind of data being done well, so I can ||   copy it and get on with my research&amp;quot;'&gt;
&lt;p&gt;More from Cameron.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-5.png' alt='There were no examples'&gt;
&lt;p&gt;But actually, there are no simple examples of how to organise &amp;quot;long-tail&amp;quot; data sets for
publication. Research data management books will tell you about various metadata
standards, but how do you enter the metadata and associate it with your data?&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-6.png' alt='So we made one'&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-7.png' alt='Fast forward to this week ...'&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-8.png' alt='Cameron Professor Neylon has ||    published his dataset'&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-9.png' alt='https://doi.org/10.13039/501100000193'&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;p&gt;The dataset is &lt;a href="https://doi.org/10.5281/zenodo.844394"&gt;available from Zenodo&lt;/a&gt;, an open data repository hosted by CERN.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-11.png' alt='It&amp;#x27;s a zipped-up BagIt bag'&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-12.png' alt='test-12'&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-13.png' alt='There&amp;#x27;s a catalog inside'&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-14.png' alt='test-14'&gt;
&lt;p&gt;This is a human-readable catalog that lists all the files in the data set.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-15.png' alt='With information about people, places, || licenses and their relationships to the ||                 files ||            in the DataCrate'&gt;
&lt;p&gt;And has information about their context and the relationships between them.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-16.png' alt='test-16'&gt;
&lt;p&gt;For example it shows that Cameron is the creator of the dataset. Note that Cameron is
idetified by his ORCID ID: &lt;a href="http://orcid.org/0000-0002-0068-716X"&gt;http://orcid.org/0000-0002-0068-716X&lt;/a&gt;. Using URLs to identify things such as
people is one of the key principles of
&lt;a href="https://en.wikipedia.org/wiki/Linked_data"&gt;Linked Data&lt;/a&gt;.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-17.png' alt='With lots of useful info about || relationships between the files'&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-18.png' alt='Like this one is || a translation of ||  this other one'&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-19.png' alt='test-19'&gt;
&lt;p&gt;Here's an example of a relationship between two of the files - one is a translation of
another.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-20.png' alt='And it&amp;#x27;s not just nice tables either'&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-21.png' alt='&amp;lt;div ||   resource=&amp;quot;./data/.../WorkshopBookletParticipants.docx&amp;quot; ||   property=&amp;quot;http://schema.org/translationOf&amp;quot;&amp;gt; ||   ... || &amp;lt;/div&amp;gt;'&gt;
&lt;p&gt;The HTML contains RDFa embedded metadata.
&lt;a href="https://en.wikipedia.org/wiki/RDFa"&gt;RDFa&lt;/a&gt; is a standard way of embedding sematics in a
web page.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-22.png' alt='That&amp;#x27;s standard semantic web metadata ||        as used by search engines'&gt;
&lt;p&gt;RDFa, using the &lt;a href="http://schema.org"&gt;schema.org&lt;/a&gt; metadata vocabulary is widely used by
search engines.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-23.png' alt='test-23'&gt;
&lt;p&gt;Movie times, opening times, recipes - these are all some of the things that search
engines understand.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-24.png' alt='But that&amp;#x27;s not all.'&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-25.png' alt='There&amp;#x27;s programmer-friendly JSON || metadata: easy to look up Contact'&gt;
&lt;p&gt;This package also has JSON metadata.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-26.png' alt='test-26'&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-27.png' alt='&amp;quot;@graph&amp;quot;: [ ||   { ||     &amp;quot;@id&amp;quot;: &amp;quot;data&amp;quot;, ||     &amp;quot;@type&amp;quot;: &amp;quot;Dataset&amp;quot;, ||     &amp;quot;Contact&amp;quot;: { ||       &amp;quot;@id&amp;quot;: &amp;quot;http://orcid.org/0000-0002-0068-716X&amp;quot;, ||       &amp;quot;@type&amp;quot;: &amp;quot;Person&amp;quot;, ||       &amp;quot;Email&amp;quot;: &amp;quot;cn@cameronneylon.net&amp;quot;, ||       &amp;quot;ID&amp;quot;: &amp;quot;http://orcid.org/0000-0002-0068-716X&amp;quot;, ||       &amp;quot;Name&amp;quot;: &amp;quot;Cameron Neylon&amp;quot; ||     },'&gt;
&lt;p&gt;The JSON is easily usable by programmers - getting the contact for this dataset for
example is a simple operation.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-28.png' alt='And use the context to expand that to a ||         full unambiguous URI'&gt;
&lt;p&gt;But if needed, the simple &amp;quot;Contact&amp;quot; can be turned into a URI, as per LInked Data
principles.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-29.png' alt='&amp;quot;@context&amp;quot;: { ||  ... ||   &amp;quot;Description&amp;quot;: &amp;quot;schema:description&amp;quot;, ||   &amp;quot;License&amp;quot;: &amp;quot;schema:license&amp;quot;, ||   &amp;quot;Title&amp;quot;: &amp;quot;schema:name&amp;quot;, ||   &amp;quot;Name&amp;quot;: &amp;quot;schema:name&amp;quot;, ||   &amp;quot;Creator&amp;quot;: &amp;quot;schema:creator&amp;quot;, ||   ... ||   &amp;quot;TranslationOf&amp;quot;: &amp;quot;schema:translationOf&amp;quot;, ||   &amp;quot;Funder&amp;quot;: &amp;quot;schema:Funder&amp;quot;, ||   &amp;quot;Person&amp;quot;: &amp;quot;schema:Person&amp;quot;, ||   &amp;quot;Contact&amp;quot;: &amp;quot;schema:accountablePerson&amp;quot;, ||    ... ||    &amp;quot;schema&amp;quot;: &amp;quot;http://schema.org/&amp;quot;,'&gt;
&lt;p&gt;You can look up Contact in the DataCrate JSON-LD context and see that it maps to
schema:accountablePerson&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-30.png' alt='Contact -&amp;gt; schema:accountablePerson || schema:accountablePerson -&amp;gt; || http://schema.org/accountablePerson'&gt;
&lt;p&gt;Then you can map schema:Accountable person to http://schema.org/accountablePerson&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-31.png' alt='And machine-readable BagIt checksums ||           to check integrity'&gt;
&lt;p&gt;There are also checksums for all the data files.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-32.png' alt='test-32'&gt;
&lt;p&gt;There's a Bagit manifest file.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-33.png' alt='test-33'&gt;
&lt;p&gt;Which lists all the files and their checksums, so the validity of the bag can be checked.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-34.png' alt='It&amp;#x27;s not so much a package as a'&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-35.png' alt='test-35'&gt;
&lt;p&gt;This package is like a gift from Cameron, to his collaborators, to other researchers and   to his future self.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-36.png' alt='How did you do it?'&gt;
&lt;p&gt;.. to do this work ...&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-37.png' alt='We used an experimental tool called ||            Calcyte'&gt;
&lt;p&gt;We used an experimental tool called Calcyte&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-38.png' alt='I ran Calcyte on Cameron&amp;#x27;s Google Drive ||     share to create CATALOG.xlsx files'&gt;
&lt;p&gt;... I ran Calcyte on Cameron's Google Drive share to create CATALOG.xlsx files ...&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-39.png' alt='test-39'&gt;
&lt;p&gt;&lt;a href="https://codeine.research.uts.edu.au/eresearch/calcyte"&gt;Calcyte&lt;/a&gt; is experimental early-
stage open source software written by my group (mainly me) at UTS.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-40.png' alt='test-40'&gt;
&lt;p&gt;Calcyte created spreadsheets which functioned as metadata forms that Cameron could
fill out.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-41.png' alt='test-41'&gt;
&lt;p&gt;The spreadsheets are multi-sheet workbooks, giving us scope to describe not only data
entities like files, but metadata entities such as people, licenses and organisations.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-42.png' alt='Cameron filled out the metadata'&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-43.png' alt='I ran Calcyte to create the human and ||       machine readable metadata'&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-44.png' alt='Rinse, repeat || (took a few goes)'&gt;
&lt;p&gt;We spent a couple of months working on this intermittently, it will be quicker next time,
but this level of data description will always involve a fair bit of care and work, at
least a few hours for this scale of project. It's also important to proofread the result,
just as with publishing articles.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-45.png' alt='So what&amp;#x27;s special about this packaging ||              approach?'&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-46.png' alt='Human AND machine readable web- ||                 native ||             linked-data ||              metadata, ||    not just string-values in XML'&gt;
&lt;p&gt;The advantages of this approach are that the package has: Human AND machine
readable web-native linked-data metadata,
not just string-values in XML&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-47.png' alt='test-47'&gt;
&lt;p&gt;This slide is a reminder of what the CATALOG.html file looks like, complete with its
DataCite citation, which, when people start citing this, will add to Cameron's
academic capital.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-48.png' alt='This work is based on previous efforts || l Cr8it - now being looked after by Newcastle.edu.au (via Western Sydney and ||    Intersect) https://github.com/digitalbridge/crateit/tree/develop || l HIEv https://github.com/IntersectAustralia/dc21 || l Mike Lake&amp;#x27;s CAVE repository. https://suss.caves.org.au/cave/ || Both of these are covered in our 2013 presentation at eResearch Australasia || It builds on other standards: || BagIt: https://tools.ietf.org/html/draft-kunze-bagit-14 || Schema.org http://schema.org'&gt;
&lt;p&gt;This work is based on previous efforts&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Cr8it - now being looked after by Newcastle.edu.au (via Western Sydney and
Intersect) &lt;a href="https://github.com/digitalbridge/crateit/tree/develop"&gt;https://github.com/digitalbridge/crateit/tree/develop&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;HIEv &lt;a href="https://github.com/IntersectAustralia/dc21"&gt;https://github.com/IntersectAustralia/dc21&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Mike Lake's CAVE repository. &lt;a href="https://suss.caves.org.au/cave/"&gt;https://suss.caves.org.au/cave/&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Cr8it and HIEv are covered in our 2013 presentation at eResearch Australasia&lt;/p&gt;
&lt;p&gt;It builds on other standards:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;BagIt: &lt;a href="https://tools.ietf.org/html/draft-kunze-bagit-14"&gt;https://tools.ietf.org/html/draft-kunze-bagit-14&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Schema.org &lt;a href="http://schema.org"&gt;http://schema.org&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-49.png' alt='test-49'&gt;
&lt;p&gt;The format used in this demo is described in a
&lt;a href="https://github.com/UTS-eResearch/datacrate/tree/master/spec/0.1"&gt;draft specification&lt;/a&gt;.&lt;/p&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-50.png' alt='TODO || (assuming people see the value in DateCrate) || 1. Use at UTS for our data repository, and for export from various services || 2. Lobby to get support integrated into Zenodo, Figshare et al || 3. Improve capture/packaging tools (Cra8it, Cloudstor Collections &amp;lt;your-system- ||    here&amp;gt; || 4. Work with others on aligning this work with other standards, [here&amp;#x27;s a list ||    someone else put together https://docs.google.com/document/d/155lA2BcixTl- ||    zwJHGfLkxsmg7WmQbBK00QWyP8QggkE || 5. Work with RDA on their repository interchange format. || 6. https://www.rd-alliance.org/groups/research-data-repository-interoperability- ||    wg.html'&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Use at UTS for our data repository, and for export from various services&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Lobby to get support integrated into Zenodo, Figshare et al&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Improve capture/packaging tools (Cra8it, Cloudstor Collections &lt;your-system-here&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Work with others on aligning this work with other standards, [here's a list someone
else put together](https://docs.google.com/document/d/155lA2BcixTl-
zwJHGfLkxsmg7WmQbBK00QWyP8QggkE/edit).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Work with RDA on their repository interchange format.
&lt;a href="https://www.rd-alliance.org/groups/research-data-repository-interoperability-wg.html"&gt;https://www.rd-alliance.org/groups/research-data-repository-interoperability-wg.html&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;br/&gt;
&lt;hr/&gt;
&lt;img src='/blog/datacrate/test-51.png' alt='&amp;quot;Make data crate again&amp;quot; ||    Liz Stokes 2017'&gt;
&lt;p&gt;I'll leave it with this slogan from our UTS data librarian and friend of eResearch, Liz
Stokes.&lt;/p&gt;
&lt;p&gt;Thanks to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Cameron Neylon for being customer zero&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Liz Stokes for working on metadata crosswalking/mapping&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Mike Lake for coding and ideas&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Conal Tuohy and Duncan Loxton for commenting on the draft spec&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Amir Aryani for discussions about metadata&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And the mainly Sydney-based metadata group who met in the leadup to this work
Piyachat Ratana, Sharyn Wise, Michael Lynch, Craig Hamilton, Vicki Picasso, Gerry
Devine, Katrin Trewin, Ingrid Mason, Peter Bugeia.&lt;/p&gt;
</content><category term="Repositories"></category></entry><entry><title>2017 eResearch Strategy update</title><link href="/2018/01/04/2017-strategy-update.htm" rel="alternate"></link><published>2018-01-04T00:00:00+11:00</published><updated>2018-01-04T00:00:00+11:00</updated><author><name>UTS eResearch Staff</name></author><id>tag:None,2018-01-04:/2018/01/04/2017-strategy-update.htm</id><content type="html">&lt;p&gt;We have just completed our &lt;a href="/eresearch/strategy-2017/"&gt;first annual review of progress&lt;/a&gt; on the 2016-2020 eResearch &lt;a href="/eresearch/strategy/"&gt;strategy&lt;/a&gt;.&lt;/p&gt;
</content><category term="blog"></category></entry><entry><title>Upcoming Training Courses Have been Updated</title><link href="/2017/09/15/training_update_2017-09-15.htm" rel="alternate"></link><published>2017-09-15T00:00:00+10:00</published><updated>2017-09-15T00:00:00+10:00</updated><author><name>Weisi Chen</name></author><id>tag:None,2017-09-15:/2017/09/15/training_update_2017-09-15.htm</id><summary type="html">&lt;h2&gt;Upcoming Training Courses Have been Updated&lt;/h2&gt;
&lt;p&gt;UTS eResearch and Intersect offer a wide range of specialised courses for researchers, from beginner through to advanced levels in High-Performance Computing (HPC), Programming with R/Python/Matlab, Excel, data management, data cleaning and visualisation, databases and SQL, and more. Delivered by Intersect's team …&lt;/p&gt;</summary><content type="html">&lt;h2&gt;Upcoming Training Courses Have been Updated&lt;/h2&gt;
&lt;p&gt;UTS eResearch and Intersect offer a wide range of specialised courses for researchers, from beginner through to advanced levels in High-Performance Computing (HPC), Programming with R/Python/Matlab, Excel, data management, data cleaning and visualisation, databases and SQL, and more. Delivered by Intersect's team of experts, training courses provide practical and research-relevant hands-on exercises.&lt;/p&gt;
&lt;p&gt;Upcoming training courses are updated regularly. Please keep an eye on our &lt;a href="https://eresearch.uts.edu.au/training/"&gt;training page&lt;/a&gt;. The latest update has been done on 15 September 2017.&lt;/p&gt;
</content><category term="Event"></category></entry><entry><title>Announcing Ozmeka - Linked Data enhancements for Omeka</title><link href="/2016/03/01/ozmeka.htm" rel="alternate"></link><published>2016-03-01T00:00:00+11:00</published><updated>2016-03-01T00:00:00+11:00</updated><author><name>Peter Sefton</name></author><id>tag:None,2016-03-01:/2016/03/01/ozmeka.htm</id><summary type="html">&lt;p&gt;In collaboration with Western Sydney University (formerly UWS), the
eResearch team here at UTS has been working on some enhancements to
the &lt;a href="http://omeka.org"&gt;Omeka&lt;/a&gt; repository tool. We presented a &lt;a href="https://eresearch.uts.edu.au/2015/06/19/ozmeka-or2015.htm"&gt;general paper&lt;/a&gt; about this
at Open Repositories 2015, but in this (very late!) post we'd like to
talk a bit about the …&lt;/p&gt;</summary><content type="html">&lt;p&gt;In collaboration with Western Sydney University (formerly UWS), the
eResearch team here at UTS has been working on some enhancements to
the &lt;a href="http://omeka.org"&gt;Omeka&lt;/a&gt; repository tool. We presented a &lt;a href="https://eresearch.uts.edu.au/2015/06/19/ozmeka-or2015.htm"&gt;general paper&lt;/a&gt; about this
at Open Repositories 2015, but in this (very late!) post we'd like to
talk a bit about the software.&lt;/p&gt;
&lt;p&gt;There are a couple of changes we made that we think might be good
additions to the Omeka v2 core. I called in to visit the Omeka team at
George Mason University in June 2015 to demo this work, but we have yet to reach
out to the broader Omeka community.&lt;/p&gt;
&lt;p&gt;Please let us know what you think. We'll try to have that discussion
over in the Omeka forums or on Twitter, rather than here, though.&lt;/p&gt;
&lt;h1&gt;What is ozmeka?&lt;/h1&gt;
&lt;p&gt;Ozmeka is a set of plugins, or forks of plugins, to make Omeka into
more of a linked data platform. That is, to enable Omeka repositories
to embrace &lt;a href="http://www.w3.org/DesignIssues/LinkedData.html"&gt;linked data principles&lt;/a&gt;, the first two of which are:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ol&gt;
&lt;li&gt;Use URIs as names for things.&lt;/li&gt;
&lt;li&gt;Use HTTP URIs so that people can look up those names.&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;
&lt;p&gt;The new version of Omeka, known as Omeka S, promises to fix a lot
of the issues we were targeting with our work on Ozmeka, but it is
not ready for production use yet.&lt;/p&gt;
&lt;h1&gt;Where can I get it?&lt;/h1&gt;
&lt;p&gt;Everything we're talking about here is available on Github, under the
&lt;a href="https://github.com/ozmeka/ozmeka"&gt;Ozmeka&lt;/a&gt; project. There is a Github &lt;a href="https://github.com/ozmeka/"&gt;Ozmeka Repository&lt;/a&gt; where we did some
lightweight project management, with milestones and issues, but the
plugins are all in individual repositories. We'll talk about them below.&lt;/p&gt;
&lt;h1&gt;What specifically did you produce?&lt;/h1&gt;
&lt;p&gt;We made changes, which we hope improved a bunch of Omeka plugins and
produced some scripts for pushing data into Omeka.&lt;/p&gt;
&lt;h2&gt;Item Relations&lt;/h2&gt;
&lt;p&gt;The Omeka Item Relations plugin already existed, and it was one of the main
reasons we chose Omeka, as it allowed not only repository items to
have URIs but for items to be related to each other using terms from
standard vocabuaries. This meant that instead of having metadata like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;This item &amp;lt;dc:Creator&amp;gt; &amp;quot;Some name&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can have:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;This item &amp;lt;dc:Creator&amp;gt; #21
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Where #21 is a reference to another item in Omeka. This is a huge
step-up in metadata quality over using strings as values.&lt;/p&gt;
&lt;p&gt;What did we do:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Added a lookup to item relations, so you can find the thing you're
trying to relate to. Previously you had to know the ID. Yes, the
user interface could  be
improved but it works efficiently.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Added Item Relations to the API, so that external code can relate
items. This is used in some of the &lt;a href="#utilities"&gt;utilities&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Added English glosses to relations so they make sense when
displayed on the page. For example, the creator relation from the
Dublin Core metadata standard is ambiguous.
Does &lt;code&gt;This item Creator Somebody&lt;/code&gt; mean that somebody created
the item or somebody was created by the item. We added the ability
for relations to be displayed like: &lt;code&gt;This item was created by Somebody&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Added the ability to create new Item Relations vocabularies (if only
via the API).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These all seem like good candidates to roll into the ItemRelations
plugin.&lt;/p&gt;
&lt;h2&gt;Added URIs to the core data model / worked on lookups&lt;/h2&gt;
&lt;p&gt;One of the limitations of Omeka v 2.x is that it only accepts simple
text values for metadata. Omeka-s fixes this, and metadata values can
be text, a URI or a reference to another item. After much soul
searching, Thom added a new database field to allow any metadata item
to have a URI, and linked this into some new work on external lookups
(presently &lt;a href="http://scot.curriculum.edu.au/"&gt;SCOT&lt;/a&gt; (the (Australian) Schools Online Thesaurus),
&lt;a href="http://www.geonames.org/"&gt;Geonames&lt;/a&gt; and the preexisting US Library of Congress lookup),
so that we can let people select metadata values from external
services, or input a URI as a value. Given that this is the approach
taken in Omeka-s, this looks like an obvious thing to retro-fit to
Omeka v2.x.&lt;/p&gt;
&lt;h2&gt;Sundry&lt;/h2&gt;
&lt;p&gt;We also have a simple modal dialogue to ensure visitors accept Terms
and Conditions, an extra option (filter by Subject) in Search and a
light plugin which renders attached HTML files inline in an item.&lt;br /&gt;
The Seasons theme has also been slightly modified to present linked
images on a page with their metadata instead of raw.&lt;/p&gt;
&lt;h2&gt;Auto-ozmeka making it easy to deploy a new server&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://github.com/ozmeka/auto-ozmeka/"&gt;Auto-ozmeka&lt;/a&gt; is a
collection of scripts that uses &lt;a href="http://git-scm.com"&gt;Git&lt;/a&gt;,  &lt;a href="http://vagrantup.com/"&gt;Vagrant&lt;/a&gt;,
&lt;a href="https://docs.ansible.com/ansible/intro_installation.html"&gt;Ansible&lt;/a&gt; and &lt;a href="http://virtualbox.org"&gt;Virtualbox&lt;/a&gt;
to create a virtual machine.  The VM will have a fresh installation
of Omeka and the suite of Ozmeka plugins ready to use.  &lt;a href="https://lucene.apache.org/solr/"&gt;Solr&lt;/a&gt; search is
preconfigured.&lt;/p&gt;
&lt;p&gt;Enable the Ozmeka plugins you wish to use in the web GUI, add any
others you may need, and you're ready to start developing your
linked-data website.&lt;/p&gt;
&lt;p&gt;In the near future we expect to be extending this script to
provision production environments as well.&lt;/p&gt;
&lt;h2&gt;Utilities&lt;/h2&gt;
&lt;p&gt;In addition to the plugins we developed a &lt;a href="https://github.com/ptsefton/omeka-python-utils"&gt;few tools&lt;/a&gt; for using Omeka
over the API. Work started on this at Western Sydney, and continued at
UTS. The main tool, &lt;a href="https://github.com/ptsefton/omeka-python-utils/blob/master/xlsx2omeka.md"&gt;xlsx2omeka.py&lt;/a&gt; takes a spreadsheet of data and imports it into an
Omeka repository, including the ability to create collections and
upload multiple different types of item and multiple attachements per item.&lt;/p&gt;
&lt;p&gt;Our spreadsheet uploader works, but it's clumsy, and I (Peter) have started
working on a &lt;a href="https://github.com/ptsefton/omeka-python-utils/blob/master/csv2omeka.py"&gt;new approach&lt;/a&gt; that uses CSV files rather than
spreadsheets, and does not require as much fiddling around, but this
one is not yet very mature and is undocumented.  The idea with the new approach is to allow
the &lt;em&gt;same&lt;/em&gt; data to be uploaded to multiple types of repository, we're also
working on an
&lt;a href="https://github.com/ptsefton/spreadsheet-to-fedora-commons-4"&gt;uploader for Fedora 4&lt;/a&gt;
which is very much a work in progress.&lt;/p&gt;
</content><category term="Repositories"></category></entry><entry><title>Hacky Hour, HPC focus with special guest Dr Joachim Mai</title><link href="/2015/09/01/hacky_hour_2015.htm" rel="alternate"></link><published>2015-09-01T00:00:00+10:00</published><updated>2015-09-01T00:00:00+10:00</updated><author><name>Mike Lake</name></author><id>tag:None,2015-09-01:/2015/09/01/hacky_hour_2015.htm</id><summary type="html">&lt;p&gt;Attention HPC users! Are you ready to play in the big league?&lt;/p&gt;
&lt;p&gt;We’ve noticed some sophisticated research compute happening on the UTS HPCCs, so this week
Hacky Hour has a special guest for you.&lt;/p&gt;
&lt;p&gt;Meet Raijin, the National Supercomputing facility (and star of ABC TV’s thriller “The Code …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Attention HPC users! Are you ready to play in the big league?&lt;/p&gt;
&lt;p&gt;We’ve noticed some sophisticated research compute happening on the UTS HPCCs, so this week
Hacky Hour has a special guest for you.&lt;/p&gt;
&lt;p&gt;Meet Raijin, the National Supercomputing facility (and star of ABC TV’s thriller “The Code”)!&lt;/p&gt;
&lt;p&gt;&lt;img src="/blog/hacky_hour/raijin.jpg" alt="Picture of Raijin at NCI" /&gt;&lt;/p&gt;
&lt;p&gt;Ok, so Raijin isn’t coming to us, but our guest,
&lt;a href="http://www.intersect.org.au/content/intersect-team#mai_joachim"&gt;Dr Joachim Mai&lt;/a&gt;,
can help bring you to Raijin.&lt;/p&gt;
&lt;p&gt;If you are getting frustrated with capacity issues or queues it might be time
to migrate to the big time.&lt;/p&gt;
&lt;p&gt;Joachim will answer questions on HPC, parallel programming and provide advice to help you put
together a &lt;a href="http://www.intersect.org.au/time/merit"&gt;merit application&lt;/a&gt; for NCI compute resources.&lt;/p&gt;
&lt;p&gt;Not an HPC user? Not a problem ... come anyway!&lt;/p&gt;
&lt;p&gt;The usual crew will be there. Hacky Hour is a weekly meetup where researchers
can congregate to work on their research problems related to code, data, or
digital tools in a social environment.&lt;/p&gt;
&lt;p&gt;Bring along your challenges, digital tools or code to share. Have your
technical research problems solved or find someone who can help you.&lt;/p&gt;
&lt;p&gt;When &amp;amp; Where? Thursdays, 3pm-4pm at &lt;a href="http://pennylane.com.au"&gt;Penny Lane&lt;/a&gt; (the Cafe/bar in building 11).&lt;/p&gt;
&lt;p&gt;Look for the group of people with laptops at the large table and join us for a
drink.&lt;/p&gt;
&lt;p&gt;HACKY HOUR : Technology support for researchers, by researchers (&amp;amp; eResearchers)&lt;/p&gt;
</content><category term="Hacky Hour"></category></entry><entry><title>Hacky Hour 1.0</title><link href="/2015/08/20/hacky_hour_201508.htm" rel="alternate"></link><published>2015-08-20T00:00:00+10:00</published><updated>2015-08-20T00:00:00+10:00</updated><author><name>Jared Berghold</name></author><id>tag:None,2015-08-20:/2015/08/20/hacky_hour_201508.htm</id><summary type="html">&lt;p&gt;&lt;img src="/blog/hacky_hour/HackyHourLogoWithText_1200x240.png" alt="" /&gt;&lt;/p&gt;
&lt;h1&gt;Hacky Hour 1.0&lt;/h1&gt;
&lt;p&gt;Thank you to everyone who came to our first ever Hacky Hour at UTS. It was great to see so many people come along to get help with their problems, lend a hand to others or just catch up with other researchers or people on the …&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;img src="/blog/hacky_hour/HackyHourLogoWithText_1200x240.png" alt="" /&gt;&lt;/p&gt;
&lt;h1&gt;Hacky Hour 1.0&lt;/h1&gt;
&lt;p&gt;Thank you to everyone who came to our first ever Hacky Hour at UTS. It was great to see so many people come along to get help with their problems, lend a hand to others or just catch up with other researchers or people on the eResearch team.&lt;/p&gt;
&lt;p&gt;Here are just a few of the things people talked about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Peter Sefton talked to people from the &lt;a href="http://www.uts.edu.au/about/faculty-science/microbial-imaging-facility/about-us"&gt;Microbial Imaging Facility&lt;/a&gt; about managing microscope image data using an application such as &lt;a href="https://www.openmicroscopy.org/site"&gt;Omero&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Michael Lake talked to a researcher from the &lt;a href="http://www.uts.edu.au/research-and-teaching/our-research/climate-change-cluster/research-programs/remote-sensing"&gt;C3 Remote Sensing&lt;/a&gt; group about code execution performance (R vs Python), using profiling tools to optimise code, writing test suites and &lt;a href="http://software-carpentry.org/v4/vc/"&gt;the benefits of using a version control system&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Sharyn Wise talked to A/Prof Cynthia Whitchurch and Dr Lynne Turnbull about &lt;a href="http://www.gelifesciences.com/webapp/wcs/stores/servlet/ProductDisplay?categoryId=10982&amp;amp;catalogId=10101&amp;amp;productId=17614&amp;amp;storeId=11752&amp;amp;langId=-1"&gt;IN Cell Miner&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Chris Bell, our current Server Administrator, caught up with researchers to go over outstanding ServiceConnect issues.&lt;/li&gt;
&lt;li&gt;Michael Lynch helped a researcher from &lt;a href="http://www.uts.edu.au/about/faculty-engineering-and-information-technology"&gt;FEIT&lt;/a&gt; who's looking at measuring social media engagement. He wanted some help finding datasets and understanding the IP issues around collecting posts.&lt;/li&gt;
&lt;li&gt;Darren Lee and Jared Berghold helped researcher Mingming Cheng with his data wrangling problem (see below). In the process, they introduced him to &lt;a href="https://www.python.org/"&gt;Python&lt;/a&gt;, the popular Python distribution called &lt;a href="https://store.continuum.io/cshop/anaconda/"&gt;Anaconda&lt;/a&gt; and the excellent interactive programming tool called &lt;a href="https://jupyter.org/"&gt;Jupyter&lt;/a&gt; (formerly known as &lt;a href="http://ipython.org/"&gt;IPython&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Problem Solved!&lt;/h1&gt;
&lt;p&gt;Mingming Cheng, a PhD researcher from the UTS Business School, came along with a problem he encountered while assisting his supervisor, Dr Deborah Edwards, on her project that examines the spatial behavior of tourists in NSW.&lt;/p&gt;
&lt;p&gt;Mingming and Deborah are analysing public converations that have been extracted from TripAdvisor by the &lt;a href="http://www.uts.edu.au/research-and-teaching/our-research/advanced-analytics-institute"&gt;Advanced Analytics Institute&lt;/a&gt; at UTS and generating visualisations showing the interactions between forum participants.&lt;/p&gt;
&lt;p&gt;Mingming needed the spreadsheet in the appropriate format for importing into the visualisation tool, &lt;a href="http://gephi.github.io/"&gt;Gephi&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The data was in this format:&lt;/p&gt;
&lt;p&gt;&lt;img src="/blog/hacky_hour_2/ExcelDataBefore-ds.png" alt="" /&gt;&lt;/p&gt;
&lt;p&gt;But it needed to be in this format :&lt;/p&gt;
&lt;p&gt;&lt;img src="/blog/hacky_hour_2/ExcelDataAfter-ds.png" alt="" /&gt;&lt;/p&gt;
&lt;p&gt;As you can see, the usernames in the &lt;em&gt;author&lt;/em&gt; column had to be transposed onto a single row for each conversation thread. Conversation threads were identified using a unique &lt;em&gt;index&lt;/em&gt; number. Mingming had over 65,000 rows in his spreadsheet, so doing this by hand was not a viable option.&lt;/p&gt;
&lt;p&gt;Darren Lee from the &lt;a href="http://www.uts.edu.au/future-students/design-architecture-and-building/facilities/research-facilities/data-arena-prototype"&gt;UTS Data Arena&lt;/a&gt; came to the rescue, hacking up a &lt;a href="https://www.python.org/"&gt;Python&lt;/a&gt; script in the time it takes to drink a beer (or maybe two).&lt;/p&gt;
&lt;p&gt;Here's what the script looked like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import sys
import csv

with open('/Users/jared/Desktop/trial_mingming.csv') as csvfile:
    reader = csv.reader(csvfile)
    prevThreadNum = None

    with open ('/Users/jared/Desktop/trial_mingming_out.csv', 'w') as outfile:
        rowBuffer = []
    
        for row in reader:
            threadNum = row[0]
            if prevThreadNum == threadNum:
                rowBuffer.append(row[5])
            else:
                if len(rowBuffer):
                    #print rowBuffer[0] + &amp;quot;,&amp;quot; + &amp;quot;;&amp;quot;.join(rowBuffer[1:])
                    outfile.write(rowBuffer[0] + &amp;quot;,&amp;quot; + &amp;quot;;&amp;quot;.join(rowBuffer[1:]) + &amp;quot;\n&amp;quot;)
                rowBuffer = []
                rowBuffer.append(row[0])
                rowBuffer.append(row[5])
                prevThreadNum = threadNum
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The script takes a CSV (comma separated values) file as its input. The &lt;em&gt;.csv&lt;/em&gt; file was generated by performing a &lt;em&gt;File &amp;gt; Save As...&lt;/em&gt; in Excel and choosing &lt;em&gt;.csv&lt;/em&gt; as the file format.&lt;/p&gt;
&lt;p&gt;After reading in the CSV file, the Python script steps through the data, one row at a time. The first thing it does is stores the value from the first column (i.e. column '0') in a variable called &lt;em&gt;threadNum&lt;/em&gt;: &lt;code&gt;threadNum = row[0]&lt;/code&gt;.
As you can see in the screenshot above, this first column is the &lt;em&gt;index&lt;/em&gt; or conversation thread number.&lt;/p&gt;
&lt;p&gt;The script then checks the &lt;em&gt;index&lt;/em&gt; number to see if it has encountered that number before: &lt;code&gt;prevThreadNum == threadNum&lt;/code&gt;.
If it has, it continues to build up the list of usernames from the &lt;em&gt;author&lt;/em&gt; column (i.e. the 6th column or column number '5' if you start counting from 0 like computers do).&lt;/p&gt;
&lt;p&gt;If the script hasn't encountered that number before, then the script has come across a new conversation thread. It writes out the &lt;em&gt;index&lt;/em&gt; number and a semi-colon separated list of all the usernames it has collected for that  thread to the output &lt;em&gt;.csv&lt;/em&gt; file: &lt;code&gt;outfile.write(rowBuffer[0] + &amp;quot;,&amp;quot; + &amp;quot;;&amp;quot;.join(rowBuffer[1:]) + &amp;quot;\n&amp;quot;)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;As you can see in the screenshot above, the row in the new spreadsheet now has the &lt;em&gt;index&lt;/em&gt; number in the first column, followed by a list of usernames separated by semi-colons (';') in the second column. It's this semi-colon separated list that is &lt;a href="http://gephi.github.io/users/supported-graph-formats/csv-format/"&gt;understood by Gephi&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We ran the script over Mingming's entire dataset and in less than a minute, he had the &lt;em&gt;.csv&lt;/em&gt; file he needed to load into Gephi for visualising the data. Mingming was very grateful to have a solution to a problem he'd been grappling with for the last two months.&lt;/p&gt;
&lt;p&gt;If you would like to know more about the script above, any of the tools mentioned (&lt;a href="https://www.python.org/"&gt;Python&lt;/a&gt;, &lt;a href="https://jupyter.org/"&gt;Jupyter&lt;/a&gt;, &lt;a href="http://gephi.github.io/"&gt;Gephi&lt;/a&gt;) or any other digital research tool, then come along next week.&lt;/p&gt;
&lt;p&gt;Thanks for reading and we look forward to seeing you at the next Hacky Hour!&lt;/p&gt;
&lt;p&gt;&lt;img src="/blog/hacky_hour_2/HackyHourPhoto-20150813-1_1000x563.jpg" alt="Hacking away: Mingming Cheng, Darren Lee, Jared Berghold and Claire Hardgrove" /&gt;&lt;/p&gt;
</content><category term="Hacky Hour"></category></entry><entry><title>Hacky Hour at UTS, Thursday arvo eResearch meetup / helpdesk starting August 13, 2015</title><link href="/2015/08/03/hacky_hour_2015_announce.htm" rel="alternate"></link><published>2015-08-03T00:00:00+10:00</published><updated>2015-08-03T00:00:00+10:00</updated><author><name>Jared Berghold</name></author><id>tag:None,2015-08-03:/2015/08/03/hacky_hour_2015_announce.htm</id><summary type="html">&lt;p&gt;&lt;img src="/blog/hacky_hour/HackyHourLogo_240x240.png" alt="" /&gt;&lt;/p&gt;
&lt;h1&gt;What is Hacky Hour?&lt;/h1&gt;
&lt;p&gt;Inspired by the &lt;a href="http://melbourne.resbaz.edu.au/hackyhour"&gt;ResBaz&lt;/a&gt; people at The University of Melbourne, the &lt;a href="https://eresearch.uts.edu.au/"&gt;UTS eResearch&lt;/a&gt; team are launching Hacky Hour at UTS. Hacky Hour has been successfully replicated at other universities such as &lt;a href="https://www.fmhs.auckland.ac.nz/en/faculty/about/news-and-events/events/2015/3/26/hacky-hour-.html"&gt;The University of Auckland&lt;/a&gt; and &lt;a href="http://minisciencegirl.github.io/studyGroup/"&gt;The University of British Columbia&lt;/a&gt;, so we’re excited to …&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;img src="/blog/hacky_hour/HackyHourLogo_240x240.png" alt="" /&gt;&lt;/p&gt;
&lt;h1&gt;What is Hacky Hour?&lt;/h1&gt;
&lt;p&gt;Inspired by the &lt;a href="http://melbourne.resbaz.edu.au/hackyhour"&gt;ResBaz&lt;/a&gt; people at The University of Melbourne, the &lt;a href="https://eresearch.uts.edu.au/"&gt;UTS eResearch&lt;/a&gt; team are launching Hacky Hour at UTS. Hacky Hour has been successfully replicated at other universities such as &lt;a href="https://www.fmhs.auckland.ac.nz/en/faculty/about/news-and-events/events/2015/3/26/hacky-hour-.html"&gt;The University of Auckland&lt;/a&gt; and &lt;a href="http://minisciencegirl.github.io/studyGroup/"&gt;The University of British Columbia&lt;/a&gt;, so we’re excited to be bringing it to UTS!&lt;/p&gt;
&lt;p&gt;Hacky Hour is a weekly meetup where researchers can congregate to work on their research problems related to code, data, or digital tools in a social environment.&lt;/p&gt;
&lt;p&gt;There will be knowledgeable folk from the eResearch team there who can help you with your questions.&lt;/p&gt;
&lt;h1&gt;Why should I come to Hacky Hour?&lt;/h1&gt;
&lt;p&gt;Here are some reasons you might want to come to Hacky
Hour:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You want someone to show you the basics in &lt;a href="https://www.r-project.org/"&gt;R&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;You want to know the best way to show some spatial data on a map.&lt;/li&gt;
&lt;li&gt;You recently attended an &lt;a href="https://www.intersect.org.au/content/training"&gt;Intersect training course&lt;/a&gt; and want to get some more help using tool X.&lt;/li&gt;
&lt;li&gt;Your honours student is pestering you to use you use &lt;a href="http://software-carpentry.org/v4/vc/"&gt;version control&lt;/a&gt; and you want some help getting started.&lt;/li&gt;
&lt;li&gt;You need to get a huge dataset from a colleague overseas and want to know the fastest way to do it.&lt;/li&gt;
&lt;li&gt;You need a quick drink while you wait for your scripts to finish running...&lt;/li&gt;
&lt;li&gt;Or you just want to hack away at your data-crunching scripts in some good company.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Even if you don’t have any problems you want solved, come along to help solve someone else’s. Help a colleague overcome their fears of using Git or show off a really useful tool you just discovered.&lt;/p&gt;
&lt;p&gt;Over the coming weeks we’ll also be running some mini-workshops in things like &lt;a href="https://www.raspberrypi.org/"&gt;Raspberry Pi&lt;/a&gt; and &lt;a href="https://jupyter.org/"&gt;Jupyter&lt;/a&gt;. Stay tuned for more details.&lt;/p&gt;
&lt;h1&gt;I’m in. How do I get involved?&lt;/h1&gt;
&lt;p&gt;Just show up. If you have a laptop or tablet, you might want to bring it along.&lt;/p&gt;
&lt;p&gt;Hacky Hour is a weekly get together on Thursdays at 3pm-4pm, at &lt;a href="http://www.pennylane.com.au/"&gt;Penny Lane&lt;/a&gt; (the cafe/bar on the ground floor of building 11). The first Hacky Hour will be on Thursday 13 August. Come along and look for all the people with laptops.&lt;/p&gt;
&lt;p&gt;We can’t wait to see you there!&lt;/p&gt;
</content><category term="Hacky Hour"></category></entry><entry><title>Getting Started with Machine Learning in MATLAB</title><link href="/2015/07/29/getting_started_with_matlab_2015-08-25.htm" rel="alternate"></link><published>2015-07-29T00:00:00+10:00</published><updated>2015-07-29T00:00:00+10:00</updated><author><name>UTS eResearch Staff</name></author><id>tag:None,2015-07-29:/2015/07/29/getting_started_with_matlab_2015-08-25.htm</id><summary type="html">&lt;p&gt;&lt;em&gt;Date:&lt;/em&gt; 25 August 2015&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Time:&lt;/em&gt; 2.30pm – 3.30pm&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Session Format:&lt;/em&gt; Presentation&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Venue:&lt;/em&gt;  CB11.06.408&lt;/p&gt;
&lt;p&gt;Machine learning is being used to solve interesting problems in many fields: autonomous navigation, fraud detection, animal conservation, health care, energy forecasting. But many machine learning algorithms are complicated to implement, so how can …&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;em&gt;Date:&lt;/em&gt; 25 August 2015&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Time:&lt;/em&gt; 2.30pm – 3.30pm&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Session Format:&lt;/em&gt; Presentation&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Venue:&lt;/em&gt;  CB11.06.408&lt;/p&gt;
&lt;p&gt;Machine learning is being used to solve interesting problems in many fields: autonomous navigation, fraud detection, animal conservation, health care, energy forecasting. But many machine learning algorithms are complicated to implement, so how can you start using these algorithms on your applications?&lt;/p&gt;
&lt;p&gt;In this session, Matt will show how the MATLAB® environment makes it easy to apply machine learning algorithms to your data. Attend this session to learn how to use functionality in Statistics and Machine Learning Toolbox™ and Neural Network Toolbox™ to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Find natural patterns in your data&lt;/li&gt;
&lt;li&gt;Build predictive models&lt;/li&gt;
&lt;li&gt;Evaluate, interpret, and improve your models&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Speaker Bio:&lt;/p&gt;
&lt;p&gt;Matt Tearle is a training content developer for MathWorks USA. He develops instructional material on MATLAB for technical computing and data analysis. Prior to joining MathWorks, Matt taught applied mathematics and computing and the University of Colorado, Boulder, where he received his PhD in computational fluid dynamics in 2004. Matt has been using MATLAB since 1992, when he was an undergraduate at the University of Auckland. A true-born Cantabrian, he still supports the Crusaders.&lt;/p&gt;
&lt;p&gt;RSVP: 5PM Tuesday August 18, 2015  to Matthew.Gaston@uts.edu.au&lt;/p&gt;
</content><category term="Event"></category></entry></feed>