UTS eResearch Strategy Update 2018

About this document

This is the second annual update on the eResearch program at UTS. It tracks our progress towards our strategic goals in the 2016-2020 eResearch strategy which was approved in late 2016, the format has been simplified from the one used in 2017.

eResearch Strategy

The eResearch Strategy outlines how UTS will organise, prioritise and invest in eResearch infrastructure services and support to raise researcher capability and productivity. These are the success criteria for the UTS eResearch Strategy:

  • Researchers and their students have access to best in class research infrastructure consistent with a world-leading university of technology.

  • UTS is seen as a leader in research data management, with mature practices and systems for capturing, managing, sharing and reusing research data.

  • IT systems support researchers, not vice versa.

Summary: How are we going against the success criteria?

Our strategic goals are discussed in more detail below, but here’s a summary of how we’re going against the 2020 vision. We have made solid progress since 2016 as measured by the success criteria.

Researchers and their students have access to best in class research infrastructure …

  • We have consolidated storage from more than five separate storage in house solutions managed by the eResearch team pre 2016 to a single, scalable storage system which is managed by ITD Technical Services and eResearch. This has significantly reduced risk, as there are now many more technical staff managing the system and the same system is used for research and learning storage.

  • The successful and innovative legacy interactive computing platform ARCLab is now being systematically replaced with a new facility that is (a) using the same commodity hardware as our HPC platform (b) housed in the enterprise data centre and (c) virtualised so it can be re-partitioned as needed as UTS researcher skills and needs develop and evolve.

UTS is seen as a leader in research data management …

The new version of Stash, our Research data management platform will was launched at the start of 2019. UTS Integrity Officer Louise Wheeler presented the UTS vision at an international conference and at the Australian Research Managers (ARMS) meeting this year, and feedback was positive.

The provisioner project delivered three fully supported research workspace services:

  1. Omero: for the Microbial Imaging facility to manage and collaborate over optical microscope images. This will be made available to all UTS researchers over time.

  2. Two gitlab repositories for research software code (one behind the firewall and one for public data or collaborations with non UTS researchers).

  3. eNotebooks via the LabArchives electronic notebook system.

The UTS eResearch team has led the development of a new specification for packaging, interchanging and publishing research data; DataCrate. This fulfills a requirement which is increasingly important globally; this work has been recognized as valuable by our colleagues at UTS, with active collaboration currently taking place with the Microbial Imaging Facility where Dr Christian Evenhuis is improving the integrity of UTS research with detailed provenance metadata, and in the Faculty of Arts and Social Sciences with a collaboration on Collections as Data, led by Prof Deb Verhoeven.

National and global interest has also been strong, based on feedback from conferences attended (eResearch NZ, eResearch Australasia, Digital Pathways, Open Repositories and the IEEE eScience conference).

IT systems support researchers, not vice versa.

This is harder to measure than the other goals, but as we consolidate infrastructure and put in place core data management systems we continue to reach out to our researcher colleagues.

For example, in 2018 there has been good progress on a new procurement model for computing with the Faculty of Engineering and IT (FEIT). Currently eResearch deals with a constant stream of requests for new workstation computers, usually with GPUs for machine learning, and there is still a significant amount of "shadow IT" being purchased. These computers, managed by researchers can cost valuable research time, add to power and cooling costs in buildings, and may sit idle for long periods.

To improve this, the faculty is developing a plan to invest up front in server-grade machines to be added to the new iHPC facility, for exclusive use by the faculty, and as requests come in, allocations from the central facility will be ‘sold’ to researchers, and those funds used for further upgrades. The model is still under development, but there are promising signs that this will help drive higher-impact research on fast computing resources with a lot less paperwork than at present and reduce the proliferation of desktop machines.

This model has potential to be replicated across other faculties and facilities, and we'll work with the office of the DVCR to identify opportunities and priorities.

New! UTS 2027 Strategy

The eResearch team were actively involved in the development of the 2027 strategy via the Crowdicity platform - three of the six ideas that were chosen in the research area had eResearch input.

The new UTS 2027 strategy says:

Our research will be exemplified by excellence, engagement with global partners, and innovative collaborations that transcend disciplinary and professional boundaries.

We will create the next generation approach of integrated learning and research, encouraging the exchange of ideas between learners and teachers. This form of knowledge transfer will not only define our new learning experience, it will increase the impact of our research. It will be fundamental to the differentiation and value of a future university education. Uniquely, we will integrate our strengths in science, technology, engineering and mathematics with our excellence in humanities, social sciences and design to create new ways of tackling the challenges of our time.

The two highlighted passages about integrating learning and research and STEM and humanities/social sciences signal that the current eResearch strategy needs to be updated as a new UTS Research Strategy emerges. In the past the eResearch team has not had the resources to assist with student-facing systems, and usage of our systems, and consulting services by HASS researchers has been low. So, in anticipation of the 2027 strategy we started tackling these gaps.

How did we respond to the emerging integration challenges in 2018?

In anticipation of the 2027 strategy, in the second half of 2018 we undertook some work commissioned by Prof Deb Verhoeven from the Faculty of Arts and Social Sciences (FASS), to explore a new eResearch service that focusses on “Collections as Data”. A full report will be prepared early in 2019, but in essence this service will allow for HASS activities to become more sustainable and make data available for cross-disciplinary collaboration. Often HASS (and STEM for that matter) research produces data collections organised as one-off repositories, exhibitions, web-sites which are difficult to re-use, and a challenge to sustain. Prof Verhoeven has described this as "Littering the Internet".

The Collections as Data project has produced a preliminary design for a sustainable service in which flexible collection software (we're starting with the repository application Omeka S) can be commissioned for a set period, e.g. for the duration of a PhD or an ARC project, with an archiving and data re-use strategy already in place. This approach was tested in microcosm with a cohort of students on three UTS collections using Omeka S. The results of their work will be archived and made available to future cohorts, and to UTS researchers. On this project, working with students allow us to collapse the 5-year cycle of research down to the semester level, to refine our systems and produce researcher development programmes that support the integration goals of the 2027 strategy.

Deb says:

I believe DataCrate is a genius solution for the humanities and social sciences. As more and more HASS researchers work with “collections as data”, initiatives like DataCrate ensure our work is sustainable and interchangeable. Australia’s national Virtual Laboratory for the humanities housed at UTS – the HuNI (Humanities Networked Infrastructure) – also needs DataCrate in order to engage and extend researchers working with national data collections. By enabling researchers to scale from data on their desktop to data held by peak organisations, DataCrate opens up innovative opportunities for discovery. Prof Deb Verhoeven (via email)

The work on Collections as Data will feed in to the eResearch program for 2019, to the “Provisioning and Sustaining Research Applications” IT Capital Management Project. Development work on Omeka S will be prioritised as part on an agile process where we will be asking stakeholders: “what shall we do next to support your strategic goals?”.

We will work with Pro Vice-Chancellor (Education) Peter Scott on scaling-up the integration of eResearch services with education.

More detail on progress

Theme 2016 state End of 2018 update 2020 Goal

Infrastructure

2016 State

Compute & storage established & supported locally.

2020 Goal

Nationally linked compute & storage with integrated support.

## 2018 Progress

Computing

Basic infrastructure requests for storage and virtual computing are now included in the UTS helpdesk/workflow system, Service Connect.

The first phase of the interactive HPC (iHPC) project is now live. This is a re-configurable virtualized facility which allows us to be flexible about how resources are configured, and prepare us for a time when computing resources may come from a national or commercial cloud provider (which is not possible at the moment).

This project, which was planned to start in 2017, was not funded until 2018, then ran later than planned for a variety of reasons including delays with our partner, Intersect. Our first attempt at linking to the national infrastructure by making the facility part of the NeCTAR cloud did not go smoothly.

Lesson learned:

  1. When building a facility like this where there are no offerings in the market, and it is not clear how to meet emerging requirements a more agile approach would be safer, starting with a complete proof of concept that does not leave unanswered questions about how the facility will work. Had we done this, then we would likely have moderated our ambition for the facility to be part of the national infrastructure, as this integration, via Intersect turned out to be at the cost of high management overhead and less than ideal usability.

  2. Cloud computing works at scale, where vendors have effectively infinite capacity to offer, but in a resource constrained environment there is a risk that allocated resources don't get used, yet the high performance GPU based computers we need are not available (at any price) from cloud providers so we need to closely manage resource allocations.

Storage

In 2017 we consolidated all on-site research storage into one scalable facility (Isilon) which is also used for other UTS storage, including for learning data. In 2018 overall UTS growth exceeded expectations, and the technical services team have worked hard and responded quickly to bring forward 2019’s storage upgrade. Our vendor, DELL/EMC was very responsive. This storage is lower-cost and slower than the current equipment, and will be used to house less frequently accessed data. The data can be tiered automatically.

The public cloud compute trials for bioinformatics workflows in Assoc. Prof. Aaron Darling's group started in 2017 continue. This is demonstrating that research groups who can adapt their workloads to run on any available computing, at any time and for any amount of time have a significant advantage in research efficiency; lessons learned from this work were shared back to the eResearch Steering committee, and will inform researcher development in 2019 and beyond.

2019 Plans

In 2019 we plan to:

  • Continue development of iHPC (budget permitting):

  • Look for opportunities to integrate with national services by approach vendors again in 2019 about the possibility of joining the NeCTAR cloud.

  • Continue to merge batch and interactive HPC onto a common platform.

  • Prepare for greater integration with national and public cloud services.

  • Maximise computing use by introducing scheduling systems in addition to batch queuing.

  • Explore opportunities to integrate local, national and public cloud computing.

  • Begin to add archiving and preservation services to the Isilon storage

    • Explore opportunities to Link CloudStor and local (Isilon) storage.

    • Begin to put "use-by" dates on all provisioned workspaces, via the use of Research Data Management plans, with Stash reminders to archive data from expiring workspaces.

  • Take part in an AARNet trial of preservation services.

    The project will investigate the preservation actions available in Archivematica and explore the addition of these capabilities to AARNet's cloud storage service (CloudStor) to enable the service to provide a truly sustainable long-term solution for research and cultural heritage data preservation, storage and re-use. AARNet has engaged Artefactual Systems, the maintainers of Archivematica, as consulting partners on this project.

    Your [UTS] work to date with Artefactual Systems and AARNet on developing data packaging tools is of great value to this initiative, notably your work with DataCrate and the Oxford Common File Layout. We are inviting you to participate in the project as an advisor and to collaborate with AARNet on architecture and design outcomes.

    ... An instance of Archivematica will be available by March. Workshops and engagement with participating institutions will take place from March through to June.

  • Work with ITD Technical Services to design 'cold' storage architectures for long-term data archiving.

Research Data Management

2016 State

Catalogue live but fragmented RDM support

2020 Goal

End-to-end RDM support including data repository

2018 Progress

UTS-wide support for best-practice Research Data Management is steadily improving. The adoption of the 2017 Research Management Policy and improved awareness has seen an increase in requests for advice on RDM, particularly at the ethics approval stage of the research lifecycle, resulting in 140 new data management plans. In the fourth quarter this increase, and growth in the use of REDCAP for clinical trials was significant enough to impact work on the provisioner project (which was already using BAU resources without backfill), meaning it was not released in Q4.

REDCap, introduced in 2017 to securely manage clinical trial data, as well as general surveys has been well-used; 130 new data collectors were added.

Stash / provisioner 3.0 - UTS’s end to end research data management application rolled out either in February 2019. It provides:

  • A streamlined replacement for managing RDMPs and Dataset descriptions.

  • Integrated research service catalogue (provisioning) linked to RDMPs.

  • A new, highly scalable approach to managing research data for the long term using emerging standards for research data storage.

As part of the provisioner (Stash 3) project UTS led an effort to develop a much-needed new specification for research data packaging so that datasets can be published with human and machine readable metadata (where the machines include Googles robots, helping get data into Google DataSets) This work has generated international interest, contributing significantly to our success criterion of being seen as a leader in data management.

Collections as Data: As mentioned above, eResearch has been working with the Faculty of Arts and Social Sciences (FASS) on establishing platforms for managing heterogeneous data-sets This engagement has established the groundwork for a new Omeka S data management service to be made available in the provisioner framework.

eNotebook (Lab Archives) growth has been slow – while usage has nearly doubled from 221 per month in Dec 2017 to 408 in November 2018, this is much lower than expected when we started the project. The reason for the slow growth is that the full-time equivalent staff member to promote the system was not funded in the initial project and the 2018 project was cut significantly in scope to meet budget and resource constraints.

2019 Plans

Note: as per the eResearch Principle "storage is not data management" – adding a physical infrastructure tier as we have done with the Isilon storage does not replace ongoing work to catalogue research data via Stash. We need to develop processes to keep important data, discard what can be discarded and dispose of data when required by ethics and funding agreements; without learning to do these things we will have a much higher ongoing bill as more will need to be stored.

Work on Research Data Management will be done under the continuing agile Provisioning and Sustaining Research Applications project. The goal is to move closer to the success criteria Researchers and their students have access to best in class research infrastructure consistent with a world-leading university of technology and UTS is seen as a leader in research data management, with mature practices and systems for capturing, managing, sharing and reusing research data. Work on the menu for 2019 includes:

  • Launching a data-discovery portal which will use DataCrate data packages showing UTS data sets in context, with rich information ("who, what, where") about the data to enable potential re-users to find and evaluate data sets, and to cite them if used. NOTE: this portal has the potential to be merged with the "Find an expert" project being sponsored by RIO, discussions are under way.

  • With the library, and faculties, roll out Stash 3. With the new system in place we will be able to significantly increase evangelism around the project and expand our program of recruitment.

Currently all moajor new storage requests also result in data management plans, but in 2019 we will expand this to other consultations, for example during Hacky Hour consultations which involve code, we'll assist researchers in provisioning a git repository for their code, via an RDMP if one doesn't exist.

  • Roll out new provisioner enhancements, and develop support plans for new services:

  • Plugins for Storage, eNotebooks and ReDCAP (clinical trials software). These are all started, in various states of completion.

  • Allow researchers to move data between systems, and publish data from any of the linked systems, coordinated from Stash 3.

    • Add new services proposed during the year after triage by the project team and approval by the project board.
  • Set up infrastructure for Collections as Data using Omeka-S, including linking to the HuNI virtual laboratory.

Research Facilities

2016 State

Data Arena open for business but all facilities need better Research Data Management.

2020 Goal

Data Arena and other facilities integral to world-leading research.

2018 Progress

The OMERO repository for microscopy has been installed in the Microbial Imaging Facility (MIF) and work has started on creating facility-specific workflows and metadata customisations.

eResearch is collaborating with other facilities on data management planning as part of our business-as-usual work.

Data Arena: work on data management with the Data Arena has been hampered by resource constraints – there have been no staff available in the facility to do the groundwork needed to describe existing data - so this remains on the backlog.

There has been some encouraging progress with FEIT on improving access to computing facilities for the faculty as reported above.

2019 Plans

Assuming the Data Arena is better resourced in 2019, we will continue our work to describe and archive all data from the existing projects and publish as much data and code as possible for re-use.

Researcher Capability

2016 State

Unevenly distributed, some training and advice available

2020 Goal

All HDRs and researchers have access to training & support

2018 Progress

This is an area which still needs more investment - eResearch does not have resources to make major improvements in research although we do try to embed mentoring and training in as many activities as possible.

Continuing with best-effort training and mentoring: - Intersect-led training courses continue to run: 325 researchers were trained across 18 courses, with overall positive feedback.

  • 78 UTS researchers registered for the Research Bazaar training event at Macquarie University.

  • Hacky-hour, a weekly tech meetup at Penny Lane is ongoing and numbers continue to increase (although our record-keeping so far has not been great).

  • Six-monthly outreach sessions for HPC and interactive HPC (ARCLab) have continued.

  • Planned work on Visualization training (in addition to Data Arena activities) did not eventuate, due to resource constraints.

2019 Plans

For 2019:

  • Continue involvement in UTS researcher development planning, DVCR-office workshops and RES-Hub planning.

  • Hacky Hour:

    • Improve record keeping and reporting, one of our new junior staff has some marketing experience which we intend to use.

    • Integrate Stash 3 provisioning further into consultations, making sure people have data management plans and teach them to provision their own services.

    • Insist on the use of Gitlab (or other git services) so that eResearch contributions can be tracked and, of course, code and analytical scripts are managed according to best practice.

  • Establish a new drop-in session: a monthly Collections Clinic to further the development of HASS and STEM data collections and collaboration.

Support Model

2016 State

Locally (non-integrated) support

2020 Goal

Research driven, & support integrated, across central units

2018 progress

The eResearch team worked with ITD Client Services to add a service catalog of eResearch services to Service Connect - the UTS workflow management / issue tracking system.

The eResearch and Research Data Management Steering Committee was established in 2017 and met twice in 2018. Peter Sefton, eResearch Support manager and Louise Wheeler, Manager, Research Programs and Research Integrity are working with the office of the Deputy Vice Chancellor, Research to review the structure of eResearch governance, in the light of improvements to the governance of all ITD projects in 2018. The committee will be replaced with an eResearch community of practice which will promote eResearch at UTS and advice committees and boards as needed.

A Survey on eResearch closed at the the end of 2018.

In addition to current activities (see above), work with the new Research Development Committee on a whole-of-UTS approach to researcher development.

2019 Plans

  • Engage with the new Australian Research Data Commons and represent the interests of UTS researchers.

  • Work with Pro Vice-Chancellor (Education) Peter Scott on scaling-up the integration of eResearch services with education.

  • Align eResearch roadmap with the new Research strategy.

  • Interpret and report on the eResearch service survey.