One of our first steps on this project has been the creation of an online bibliographic resource, aggregating holdings information from a number of significant collections of the County Surveys. Our aim has been quite simple really: to identify where print and digital copies can be found, in order to assess the accessibility of this set of publications. We want to get a grasp of what is out there already before we consider which individual surveys we could most usefully digitise as a part of this project, and we also want to make this resource available to researchers, enabling them to more easily find and consult specific volumes and editions. It turns out that this is rather easier said than done. In this post, the first of two, I’ll describe the steps we have taken to build our database and some of the challenges we have encountered.
The primary difficulties in gathering data stem from the number of surveys and the ways in which they were produced and published. The County Surveys were undertaken in two phases, the first (known as ‘the original reports’) totalled 91 surveys, and the second (the ‘revised’ or ‘corrected’ reports), another 85. There was also a series of Irish surveys corresponding to the second phase, adding another 24. As the disparity in numbers between the phases indicates, the revisions were not merely of the texts: the names and areas covered also changed significantly. So, to give a couple of examples, there are four surveys covering Perthshire in the first phase, but only one in the second; there is no survey for Bute in the first series, but there is one in the second series. To add to this confusion, some areas were surveyed twice by different surveyors and many of the ‘revised’ surveys were reissued with changes in the first few years. In total, we think that we are dealing with around 200 reports for around 135 different named geographic entities. We can’t be entirely sure because we do not have any list of the complete set. It is quite possible that some surveys were commissioned but never published. It is also possible that others were published, but aren’t held in the collections on which we have been drawing. In short, we are building up our database and our knowledge about the Surveys by bringing together and comparing different collections, each of which may be partial and incomplete.
We were lucky, however, to have some solid foundations on which to build. In 2012/13, an authoritative bibliography of the County Surveys by Heather Holmes was published, detailing the collections of Edinburgh University Library, the National Library of Scotland, ECCO and the library of the Royal Highland and Agricultural Society of Scotland among other collections. In order to extract this information from Holmes’ prose article, we created a spreadsheet indicating the titles listed and the collections in which they were held. This forms the basis of our bibliographic resources, which we are adding to through gathering additional data from other known collections. We requested and were given a spreadsheet listing the holdings of a major collection held by the Perkins Agricultural Library at the University of Southampton. Using a simple title search, we manually extracted the holdings data of the Surveys from Hathi Trust, a laborious process, and then brought the various spreadsheets together to create a master list. We then requested holdings information from the NLS and EUL, asking them to extract lists from their catalogues so that we can compare their lists with our master list, cross-checking to ensure that we have all the data available from these sources. We also hope to get holdings data from the British Library, and we are currently exploring methods for extracting data from Google books. Although some libraries and collections have been able to extract data from their catalogues, and in some cases we have been able to harvest or scrape it, in most cases there has been quite a bit of manual work involved.
At the same time, the software engineers on the team have been building a database and interface that allows us to look at the information through different filters and to edit individual holdings information. At present we can filter by author, country, county, and phase. This has been invaluable in helping to bring gaps and anomalies to light, making cross-checking much more manageable. Conversely, the process of cross-checking has stimulated our thinking about how the structure of the database might be developed and the kinds of filtering and comparisons mechanisms researchers might need or want as part of the interface. Through this iterative method, as new data sets arrive, our online collection is gradually taking shape and the requirements for the interface are becoming more clearly defined.
In a future post, I’ll discuss another way that we are checking the completeness of the holdings data: mapping the Surveys against the geographical coverage they represent.