One of our key aims in building the interface for our collection was to allow people to explore and “play with” the data. It’s hard to get a sense of the extent of the series and the relationships between the surveys without some kind of overview: once you can see the surveys all together and look at them in different ways, it’s much easier to grasp their logic. So we wanted a tool that would aggregate all of the information we have gathered and then allow people to look at that information in flexible ways, to filter and explore it according to their interests.
Flexibility was also a priority in technical terms: we’re making this data available for the first time in this format, so we are aware that we don’t really know what people will want to do with it. We don’t see what we have done with the demonstrator as being the last word but rather the first. Based on this, we can start to understand the data better and start to understand how people might want to access it. We expect to have to adapt the data and the ways of accessing it as we go along and we learn what we can most usefully provide to the community.
The process of gathering data has been described in another post, but from the demonstrator’s point of view what was important was to try to keep things as general and adaptable as possible. Nevertheless, this kind of historical data presents certain peculiarities and challenges. One of the most obvious is how to present the survey data. The surveys are arranged by county but the counties that were used are not the counties as they are today. Indeed, the counties used in the first and second phases of the county surveys are not the same. So we needed a mechanism which would allow people to make sense of the data without being restrictive. We’ve achieved this by providing a canonical list of counties taken from Ordinance Survey Data from the early 19th century. We then map this to the actual counties as surveyed. There’s not a perfect match here but we take a “permissive” view of the data – we’d rather show you slightly too much than too little. So the user gets presented with the canonical list in the search facility and we then map that to the county data to decide what to show. The same holds for the author data. We hold a canonical list of authors and map these to the real authors. This allows us to adjust the data in future as we discover more about it.
The Data Model
This mapping then gives rise to the data model. We have surveys which have a county associated with them. Then we have a list of counties which we present to the user which may map to more than one of the underlying counties. That can get a bit confusing but if we look at an example, it becomes clear. If we want to look at the surveys for Shetland then in the filter list we have “Zetland or Shetland” which is how it is listed in the Ordinance Surveys. In the first phase of the surveys, Shetland was included under “Northern Counties and Islands” but in the second phase it has a survey of its own. The implication of this for the data model is that we have to have a one-to-many mapping from entries in the search list to the entries in the surveys. In fact, the same county survey might appear under more than one search term e.g. the first phase “Northern Counties and Islands” needs to appear under Shetland, Orkney, Caithness and Sutherland. So we have to have a many-to-many mapping between the search counties in the interface and the counties as specified in the surveys themselves. To do this we adopt the standard database approach of having a mapping table i.e.
So ccounty is the list of counties as it appears in the search list and county is as they appear in the surveys and the mapping table allows us to relate these two to each other in any way we want. Each Survey can have many publications and each publication can be held in multiple places. This explains why we have separated out surveys from publications from holdings in the data model.
This model might seem a little complex but it gives us a great deal of flexibility in how we handle counties and authors and makes it fairly easy to add new information about publications and holdings as it becomes available to us.
The Technology Chosen
In line with the ethos of flexibility, we decided to work with standard technology components. At the back end is a relational database. Sitting on top of that is a Web Application built using a standard MVC framework. This approach has advantages in terms of the flexibility but also in terms of getting up and running quickly. The MVC approach (Model-View-Controller) separates out the storage of the data (the Model) from the logic of the application (the Controller) and how the data is displayed (the View). This means that changing one part of it has less impact because it is isolated from the other components. A good example of this flexibility is the change we made to the interface which was covered in a previous post.
The MVC approach to web applications is one of the standard development techniques for web applications these days and when it comes to implementing this you have a wide choice of languages and MVC systems. In our case, it’s all written in Perl using Postgres for the DB with a Catalyst Application on top. So the application takes the standard Catalyst approach of using DBIx::Class to implement the Model and interface to the database and Template Toolkit for the front end. The choice of specific MVC implementation doesn’t matter so much – there are plenty to choose from! It’s really the flexibility this approach gives which is the main thing. Using standard technologies gives us the adaptability we need to be able to do this easily, so that we can get the data available and we can adapt to whatever changes come out of that down the line.
Evolution by Use
So this demonstrator gives people access to look at the data. We’re hoping people will find it helpful in “playing with” the data. But it’s very much the first draft. We expect it to evolve over time as we and any one else interested in the Surveys gets to know the data better and we start to understand more about how to make this data available to people.