Jul 162015
 

One of our key aims in building the interface for our collection was to allow people to explore and “play with” the data. It’s hard to get a sense of the extent of the series and the relationships between the surveys without some kind of overview: once you can see the surveys all together and look at them in different ways, it’s much easier to grasp their logic. So we wanted a tool that would aggregate all of the information we have gathered and then allow people to look at that information in flexible ways, to filter and explore it according to their interests.

Flexibility was also a priority in technical terms: we’re making this data available for the first time in this format, so we are aware that we don’t really know what people will want to do with it. We don’t see what we have done with the demonstrator as being the last word but rather the first. Based on this, we can start to understand the data better and start to understand how people might want to access it.  We expect to have to adapt the data and the ways of accessing it as we go along and we learn what we can most usefully provide to the community.

The Data

The process of gathering data has been described in another post, but from the demonstrator’s point of view what was important was to try to keep things as general and adaptable as possible. Nevertheless, this kind of historical data presents certain peculiarities and challenges. One of the most obvious is how to present the survey data. The surveys are arranged by county but the counties that were used are not the counties as they are today. Indeed, the counties used in the first and second phases of the county surveys are not the same. So we needed a mechanism which would allow people to make sense of the data without being restrictive. We’ve achieved this by providing a canonical list of counties taken from Ordinance Survey Data from the early 19th century. We then map this to the actual counties as surveyed. There’s not a perfect match here but we take a “permissive” view of the data – we’d rather show you slightly too much than too little. So the user gets presented with the canonical list in the search facility and we then map that to the county data to decide what to show. The same holds for the author data. We hold a canonical list of authors and map these to the real authors. This allows us to adjust the data in future as we discover more about it.

The Data Model

This mapping then gives rise to the data model. We have surveys which have a county associated with them. Then we have a list of counties which we present to the user which may map to more than one of the underlying counties. That can get a bit confusing but if we look at an example, it becomes clear. If we want to look at the surveys for Shetland then in the filter list we have “Zetland or Shetland” which is how it is listed in the Ordinance Surveys. In the first phase of the surveys, Shetland was included under “Northern Counties and Islands” but in the second phase it has a survey of its own. The implication of this for the data model is that we have to have a one-to-many mapping from entries in the search list to the entries in the surveys. In fact, the same county survey might appear under more than one search term e.g. the first phase “Northern Counties and Islands” needs to appear under Shetland, Orkney, Caithness and Sutherland. So we have to have a many-to-many mapping between the search counties in the interface and the counties as specified in the surveys themselves. To do this we adopt the standard database approach of having a mapping table i.e.ccounty_county

So ccounty is the list of counties as it appears in the search list and county is as they appear in the surveys and the mapping table allows us to relate these two to each other in any way we want. Each Survey can have many publications and each publication can be held in multiple places. This explains why we have separated out surveys from publications from holdings in the data model.

database schema

Database Schema (click to open in new tab)

This model might seem a little complex but it gives us a great deal of flexibility in how we handle counties and authors and makes it fairly easy to add new information about publications and holdings as it becomes available to us.

The Technology Chosen

In line with the ethos of flexibility, we decided to work with standard technology components. At the back end is a relational database. Sitting on top of that is a Web Application built using a standard MVC framework. This approach has advantages in terms of the flexibility but also in terms of getting up and running quickly. The MVC approach (Model-View-Controller) separates out the storage of the data (the Model) from the logic of the application (the Controller) and how the data is displayed (the View). This means that changing one part of it has less impact because it is isolated from the other components. A good example of this flexibility is the change we made to the interface which was covered in a previous post.

The MVC approach to web applications is one of the standard development techniques for web applications these days and when it comes to implementing this you have a wide choice of languages and MVC systems. In our case, it’s all written in Perl using Postgres for the DB with a Catalyst Application on top. So the application takes the standard Catalyst approach of using DBIx::Class to implement the Model and interface to the database and Template Toolkit for the front end. The choice of specific MVC implementation doesn’t matter so much – there are plenty to choose from! It’s really the flexibility this approach gives which is the main thing. Using standard technologies gives us the adaptability we need to be able to do this easily, so that we can get the data available and we can adapt to whatever changes come out of that down the line.

Evolution by Use

So this demonstrator gives people access to look at the data. We’re hoping people will find it helpful in “playing with” the data. But it’s very much the first draft. We expect it to evolve over time as we and any one else interested in the Surveys gets to know the data better and we start to understand more about how to make this data available to people.

Jun 242015
 

We are delighted to announce that our bibliographic search tool is now live and accessible from the ‘Search‘ tab in the menu above.

Our demonstrator includes bibliographic data from some of the best collections of the surveys and, where possible, provides links to library catalogue entries and  digital editions. Researchers can search by modern county name, by series, by county and by author. Results are presented in a new tab after each search, so that you can compare multiple search results by toggling between pages. There are also detailed analyses of collections, revealing the extent of holdings and coverage, and indicating which surveys would be needed to complete each collection.

demonstrator2

 

We hope that the demonstrator will be a useful finding aid and discovery tool for those interested in the County Surveys, the history of statistical reporting and British history more broadly. We would welcome any feedback on the tool, and would be very keen to hear about how it is used or whether it could usefully offer other features and information. If you have ideas, please get in touch with us at edina@ed.ac.uk.

Apr 102015
 

After several months of preparing and curating data for our online bibliographic tool, last week we were excited to see the new design for the GUI produced by EDINA’s resident designer Jackie Clark.

Throughout the development process we’ve been working with a functional but deliberately sparse interface which shows results in a simple table alongside filter terms. While this has been entirely fit for purpose during checking and testing phases, it shows its database origins very clearly, it’s quite text heavy and requires lots of clicking through lists and tables to get to publication entries.  As we’ve become aware during the development, this has implications in terms of navigation, as you need to retrace your steps by going back in your browser, and it has meant that comparing between different searches is not easy. We’ve discussed various options to resolve these issues, considering for example whether to introduce a ‘shopping basket’ feature that enables you to collect records together for comparison.

the development/working interface

the development/working interface

These usability concerns were on our mind when considering potential designs. The new design was selected to solve some of these problems for us, i.e. by simply separating out the search terms from the results. As well as skinning the tool in the same design as our blog, Jackie has moved the long filter lists into a more visually appealing table design which allows scrolling within boxes rather than within the page.

the first design for the new interface

the first design for the new interface

This means that you can always see the chosen filters from the same view, making your search parameters very clear, which wasn’t the case in the development version. In addition, the new tab system means that each search opens a separate results tab, so your search page remains with your parameters within view and you can toggle back and forth between the search and the results if you want to check these. In addition, if you want to compare the results of a number of searches, you can run the search several times keeping the results tab produced each time open, and then toggle between these. An added benefit of this design is that it thins out the content, giving more space on the results page. We’re now considering how best to use this space, considering what information we might add in order to best enhance the records. It’s a great example of how a good design can create elegant solutions to functional problems.

We’re looking forward to launching the new interface via this website in a few weeks’ time, so watch this space!

Mar 312015
 
Folded Map from the 'revised' survey of North Wales published in 1810

Folded Map from the ‘revised’ survey of North Wales published in 1810

As I described in an earlier post, cross-checking between bibliographies and catalogues has formed a substantial part of the work that has so far gone into building our bibliographic database of the holdings of the County Surveys.  In order to make sure that we are identifying and locating as many of the surveys as we can, we have also been checking our holdings information against county maps. If we can say that we have found surveys that document all the counties, then we can be confident that we have the significant majority, if not a complete set.

The task is complicated by the history of the counties in Britain. Firstly, the borders of regions, districts and other administrative areas have changed frequently over the last two hundred years, meaning that the areas referred to by a county name in 1800 can be different to the areas referred to by the same name now. Secondly, as these changes have taken place, the names used to refer to areas have also changed, in some cases quite dramatically. What we now call Dumfries and Galloway, for example, was historically three counties, Dumfriesshire, Wigtownshire and ‘the Stewartry of Kirkcudbright’. We decided that for the twenty-first century researchers who would be consulting our online collection, the ability to search by modern names would be important.

To map our list of survey holdings to the geography of the British Isles and to the areas defined by modern names, we turned to the Ordnance Surveys. Ordnance Survey have been responsible for mapping and surveying the UK since the 1790s, the same decade in which the County Surveys were commissioned. Their most recent county maps represent Britain in the mid-Twentieth Century, when counties were still the primary administrative entities across the nation (since the 1970s, regional authorities have replaced counties), so the list of their maps represented as authoritative a list of modern counties as we are likely to find.  This has become both our ‘canonical county name list’ and the list of geographical areas against which we can map the areas covered by the surveys.

We hope, in time, that this work will enable us to create an intuitive map interface for our collection.  In itself, though, the process of mapping the surveys to the areas has been very valuable in that it has turned up quite a few queries about our holdings data and about the surveys themselves. For example, it has enabled us to identify no less than four surveys that deal with the same geographical area under different titles, for example Kings County in Ireland is also surveyed as Offaly, and the ‘Central Highlands’ survey of series one describes itself as dealing with Perthshire, although there are three other surveys with Perth in the title published in the same series.  The process has also allowed us to check geographically neighbouring surveys in order to establish the boundaries of the areas dealt with: for instance, in the first series there appears to be no survey for Bute. Checking Clydesdale, Argyle, and Ayr, the areas which might conceivably contain or neighbour Bute, reveals no trace of the Isle, suggesting that it was probably completely neglected during the first phase rather than incorporated in another survey.

This kind of cross-checking and mapping is time-consuming but necessary to ensure the integrity of the data and of our mode of presentation. In a sense it is an echo of the work done by the original surveyors, who also sought to compile information about areas and fields for which there was no guide place. “In obtaining an account of the present state of husbandry in North Wales several difficulties occurred”, wrote George Kay the first surveyor of the area in 1794. “Among others, no distinct map of it could be procured, although I enquired at all the shops in the principle towns from Edinburgh to Chester.” (p.1) Charting previously uncharted territories, the surveyors laid the foundations of Sir John’s pyramid of enquiries, enabling new questions to be posed and answered. At a much more modest scale, we hope our online collection will help researchers chart the bibliographic landscape of the Surveys and facilitate new research on this fascinating material.

Mar 032015
 

One of our first steps on this project has been the creation of an online bibliographic resource, aggregating holdings information from a number of significant collections of the County Surveys.  Our aim has been quite simple really: to identify where print and digital copies can be found, in order to assess the accessibility of this set of publications. We want to get a grasp of what is out there already before we consider which individual surveys we could most usefully digitise as a part of this project, and we also want to make this resource available to researchers, enabling them to more easily find and consult specific volumes and editions. It turns out that this is rather easier said than done. In this post, the first of two, I’ll describe the steps we have taken to build our database and some of the challenges we have encountered.

The primary difficulties in gathering data stem from the number of surveys and the ways in which they were produced and published. The County Surveys were undertaken in two phases, the first (known as ‘the original reports’) totalled 91 surveys, and the second (the ‘revised’ or ‘corrected’ reports), another 85. There was also a series of Irish surveys corresponding to the second phase, adding another 24. As the disparity in numbers between the phases indicates, the revisions were not merely of the texts: the names and areas covered also changed significantly. So, to give a couple of examples, there are four surveys covering Perthshire in the first phase, but only one in the second; there is no survey for Bute in the first series, but there is one in the second series. To add to this confusion, some areas were surveyed twice by different surveyors and many of the ‘revised’ surveys were reissued with changes in the first few years. In total, we think that we are dealing with around 200 reports for around 135 different named geographic entities. We can’t be entirely sure because we do not have any list of the complete set. It is quite possible that some surveys were commissioned but never published. It is also possible that others were published, but aren’t held in the collections on which we have been drawing. In short, we are building up our database and our knowledge about the Surveys by bringing together and comparing different collections, each of which may be partial and incomplete.

We were lucky, however, to have some solid foundations on which to build. In 2012/13, an authoritative bibliography of the County Surveys by Heather Holmes was published, detailing the collections of Edinburgh University Library, the National Library of Scotland, ECCO and the library of the Royal Highland and Agricultural Society of Scotland among other collections. In order to extract this information from Holmes’ prose article, we created a spreadsheet indicating the titles listed and the collections in which they were held. This forms the basis of our bibliographic resources, which we are adding to through gathering additional data from other known collections. We requested and were given a spreadsheet listing the holdings of a major collection held by the Perkins Agricultural Library at the University of Southampton. Using a simple title search, we manually extracted the holdings data of the Surveys from Hathi Trust, a laborious process, and then brought the various spreadsheets together to create a master list. We then requested holdings information from the NLS and EUL, asking them to extract lists from their catalogues so that we can compare their lists with our master list, cross-checking to ensure that we have all the data available from these sources.  We also hope to get holdings data from the British Library, and we are currently exploring methods for extracting data from Google books. Although some libraries and collections have been able to extract data from their catalogues, and in some cases we have been able to harvest or scrape it, in most cases there has been quite a bit of manual work involved.

At the same time, the software engineers on the team have been building a database and interface that allows us to look at the information through different filters and to edit individual holdings information. At present we can filter by author, country, county, and phase. This has been invaluable in helping to bring gaps and anomalies to light, making cross-checking much more manageable. Conversely, the process of cross-checking has stimulated our thinking about how the structure of the database might be developed and the kinds of filtering and comparisons mechanisms researchers might need or want as part of the interface. Through this iterative method, as new data sets arrive, our online collection is gradually taking shape and the requirements for the interface are becoming more clearly defined.

In a future post, I’ll discuss another way that we are checking the completeness of the holdings data: mapping the Surveys against the geographical coverage they represent.