Hiberlink: Time Travel for the Scholarly Web

Duration

2013 Q1 to 2015 Q1 (24 months)

Website

hiberlink.org

Summary

Citation of sources is a fundamental part of scholarly discourse. Typically, the citations in journal articles and books were to other published articles or books, whether online or in print on-shelf; in today’s web-based scholarly communication the range of scholarly statements and resources that are being published and referenced has greatly enlarged. Citations are now commonly made to all types of resource (software, datasets, websites, ontologies, presentations, blogs, videos, etc) that are made available through a variety of publication venues on the Web. The highly dynamic nature of the Web introduces a significant challenge: the content at the end of any given referenced HTTP URI is very liable to change over time. As a result, what is online at the time of citation is less likely to be there when scholars wish to look up the citation. Sometimes called ‘citation rot’, this issue is two-fold: the HTTP URI may no longer work (so-called ‘link rot’), and the content at the end of the HTTP URI may have evolved and may even have become dramatically different from when originally cited.

The Hiberlink project aims to provide empirical evidence that will characterise the full extent of the problem, recognising that this goes beyond the preservation of e-journals. The intention is to provide actionable solutions that can form part of the web infrastructure for the web-published content, in order that scholarly communication and the sources upon which it depends may be preserved, understood and reused by future generations of researchers.

The technical activities include three strands:

  1. Problem Quantification: A vast corpus of scholarly work will be collected and assessed for citation rot using text mining and information extracting tools including the Memento protocol.
  2. Archival Solution Infrastructure: Pro-active approaches for archiving citations at the point of use will be investigated.
  3. Temporal Reference Solutions: New methods of citations will be researched in order to provide time stamps in citations to enable access to the correct archived version of a reference.

We will work with publication venues to evaluate the proposed prototype solutions for broad deployment. The results and outcomes of this project will be share openly with the research community.

Project Deliverables

  • A report on the extent to which the context that surrounded a scholarly paper at the time of its publication can be recreated at a later time, and the extent to which the ability to do so is dependent upon properties of the publication venue, the publication, and the cited resources
  • A report for pro-active archiving
  • A prototype tool to demonstrate how pro-active archiving could be embedded with existing publication systems
  • A report on temporal referencing
  • A prototype tool to demonstrate seamless navigation of cited HTTP URIs toward the appropriate temporal version in a web archive

Partners

EDINA Contacts

Project Manager: Muriel Mewissen, EDINA
Tel: +44 (0) 131 651 7283

Funders

The Andrew W. Mellon Foundation

Links

 

Contact us at: edina@ed.ac.uk
EDINA, Causewayside House
160 Causewayside, Edinburgh
United Kingdom EH9 1PR

EDINA is the Jisc-designated national data centre at the University of Edinburgh.

jisc logo