Text is, and will remain, our main method of knowledge transfer. Over 500 million tweets were sent per day in 2014 and 269 billion emails were sent per day in 2017. The total volume of published academic literature is doubling at least every nine years. A lack of technical skills should not preclude anyone from analysing at scale the wealth of textual data we create every day – and yet a large number of those working in the arts and humanities, in national and local government, in small businesses and local communities are excluded from experimenting with data-driven innovation by poor data literacy, barriers to using more complex tools and inadequate infrastructure.
We’ve won seed funding from the Edinburgh and South East Scotland City Region Deal to address this challenge and serve a national demand for accessible tools in data science and analysis.
The ‘answer’ to data literacy starts with user research. We have collaborated extensively with DDI Programme Sector Leads, academic and professional services colleagues across The University of Edinburgh, the National Library of Scotland, The Data Lab, The Scottish Government, Project Jupyter, SMEs and other communities to understand user needs and assess how current text and data mining solutions support their research and skills.
Based on insights from this user research phase, we’re developing an intuitive, easy-to-use text and data mining pilot service based on the Defoe tool created by Professor Melissa Terras and Dr Rosa Filgueira Vicente from EPCC, taking forward the Research Data Spring work funded by JISC at UCL and the British Library. We will build an intuitive visual interface and develop a service for students, researchers and regional SMEs supported by comprehensive computational and data literacy skills training. The service will receive and securely host collections of documents, automatically process them, and support novel analyses that would otherwise be beyond the scale of human comprehension or too time-consuming and costly to undertake.