The mdwRepository is the central institutional repository of the mdw - University of Music and Performing Arts Vienna. It is built around the concept of Digital Asset Management, and consists of a content-agnostic internal backend and public touchpoints.
At the mdw, we define a 'digital asset' as everything that is stored in a digital format, is valuable to, and legally usable by its owner, and, thus, helps to exploit the owner‘s goals. This reflects in a technical definition of a digital asset being a combination of a content object and/or file, and its metadata, e.g. describing the file and its contents, its creation context, its rights and permissions.
These contents can be research data, archival documents, audio and video recordings of concerts held at the mdw or images used by the communications department.
Digital assets are managed in a 'Digital Asset Management System' (DAMS), which is a combination of software, hardware and professional services that enable storing, managing and accessing digital assets throughout their lifecycle. This lifecycle typically consists of four phases:
3. publishing and/or distribution, and
4. preservation or archival of digital assets.
Thus, the core part of the mdwRepository is the digital asset management system (DAMS). On the software side, we use nuxeo in its Open Source variant as the internal backend system operated in-house at the university's own IT department as a service provider (https://www.nuxeo.com/products/digital-asset-management/). We do not perform data-level curation aside from technical checks (e.g. in order to ensure data accuracy) and creation of new formats (e.g. renditions for preview).
On the services side, we provide digital asset management related training, incl. data creation support, data management planning, curation and legal support support, metadata or media format guidance.
The designated community of the backend system consists of mdw-internal researchers and their project-partners, teachers and students.
The public touchpoints are used to publish and distribute the digital assets. These include technical services (e.g. OAI-PMH endpoint or SPARQL-endpoint) or public websites. These services are intended to actively promote the use of digital assets amongst the designated communities, which may include other researchers, or ordinary citizens also via the use of additional channels such as WorldCat or BASE (Bielefeld Academic Search Engine).
The mission of the mdwRepository is to ensure and promote sustainable services of ingest, storage and access to digital assets of the mdw. It preserves the original, master data and renditions that can be previewed online when access rights are granted, together with it's metadata record (based upon metadata standards such as Dublin Core, disciplinary standards or custom schemes that are published via the mdwRepository scheme section).
The main functions of the institutional repository are:
- acquiring digital assets of the mdw,
- providing a management environment for the digital assets,
- digital preservation,
- providing access to the digital assets for various purposes,
- training users involved in the digital asset management process, and
- actively promoting the use of the digital assets.
Employees and students at the mdw have access to nuxeo via LDAP authentication. External users must be registered by the mdwRepository administrators.
Access levels can be defined in the management backend depending on the digital asset requirements. The information on rights and conditions of access are written in the administrative metadata (rights metadata and ACL). We differentiate between the access to metadata and content object and/or file. When access to metadata is granted it does not automatically imply access to the content object and/or file.
Once persistently published (with URN or DOI), the digital assets are available to the public.
The users assign the licenses required for their Digital Assets on their own based on a pre-defined license list covering License Statements incl. "All Rights Reserved" or open licenses, e.g. Creative Commons. Together with the Access Rules, Licenses control the publishing and distribution option available for the digital assets.
The mdwRepository is operated in-house at the mdw - University of Music and Performing Arts Vienna. We mainly use Virtual Machines to host the different services provided by the mdwRepository:
- nuxeo backend system (with ElasticSearch Cluster for search functionality, Kibana for analysis, Oracle database, and network storage that can be extended dynamically as the data amount grows),
- Triple Store and Graph Database (Apache Jena Fuseki instances and Blazegraph),
- OAI-PMH endpoint provided via an Apache Webserver,
- mdwCMS for managed content on the mdwRepository website,
- Python Flask server for public touchpoints (restricted public APIs, custom frontends), and
- Custom frontends (accessing the nuxeo backend system via its REST API).
All services undergo daily backups / snapshots.
The staff of the IT department ensures the technical availability and functionality of the mdwRepository infrastructure.
The Digital Asset Management team provides knowledge on digital preservation in general, expertise in data management planning, metadata management, and implementing custom application functionalities.
Data Integrity and Authenticity
Authenticity is preserved by creating an audit record for each digital asset ingested and managed within the mdwRepository (incl. creation user and datetime information, modification or status or publication information). The audit trail is stored in the database backend.
Both metadata and files are versioned at the application layer by the repository software (using versions and revisions).
Data Discovery and Identification
The backend system nuxeo creates unique IDs for every asset. Thus, each record can be uniquely identified within the system. The files in the Java Content Repository are stored in checksum based directory structure.
For external publication the mdwRepository provides persistent identifiers:
- URNs for all externally avaliable resources and
- DOIs on request.
Data can be published externally via the OAI-PMH endpoint for third parties to harvest and re-use the data and through an Apache Jena Fuseki Triple Store and a Blazegraph server via our SPARQL endpoints. Additionally data can be presented on websites that interact with the backend system via a RESTful API.
All Metadata is indexed in an ElasticSearch index together with the full text extracted from documents for easy retrieval. All search strategies available by ElasticSearch can be used within the mdwRepository. Searches can be bookmarked for easy access later on.
mdwRepository is listed in the Registry of Research Data Repositories, re3data.org.
The OAI-PMH endpoint is registered in the list of OAI-PMH Registered Data Providers, which is used as a source for OAI-PMH endpoints available for harvesting data. Additionally OAI-PMH based data is available via BASE (Bielefeld Academic Search Engine).
For greater security of access all mdwRespository components are accessed via a firewall both for internal and external users (no direct access to servers aside from IT department).
Access to search servers (ElasticSearch) is restricted by IP-based access restrictions. Thus, users do not have direct access to these servers.
Only a limited number of members of the IT department, external service providers with a valid contract and non-disclosure agreement, have access to the servers, databases and storages.