Digitisation of information contained in old books

Andrew N

Active member
Hello all,

I don't want to step on any toes here and this is purely a thought exercise for now.

Some old guidebooks that have not yet been superceded, such as Northern Caves 1, are incredibly useful but have now been out of print for a number of years. Obviously it's not difficult to obtain a copy of these on eBay and most clubs have a copy or three so thankfully the information is not lost. As the years go on, however, it will get harder and harder to obtain.

What would be the copyright and ethical considerations with digitising the information contained within such books and making it freely available online, such as on a searchable database/website? Would the original authors of these books, as well as the publishing company, support this behaviour? Would it benefit the caving community at large?

I'm not talking about simply providing a PDF scan of the books, but rather lifting the information out of them, such as the cave locations, descriptions and length/depth information and allowing it to be viewed online on a dedicated website, and providing updated references/information/surveys where applicable if a cave has been extended since.

It is a project I am interested in doing but it seems it may be fraught with political implications and I totally respect that the information is the intellectual property of the people who produced it.

Thoughts welcome.
 

Wayland Smith

Active member
IMHO.
If you just digitize the books, (without permission from the copyright owner,)
That would be illegal.

If you take the information to use (after checking the facts) on a website.
Quoting as references.
That is research. (y)
 

Pitlamp

Well-known member
I applaud your well meant suggestion. But copyright law is a bit of a minefield I'm afraid (probably rightly so). You can get into quite serious trouble if you get it wrong and the fact that you're not making any money from the project isn't relevant. It might be better to keep life simple and just wait until the next edition of Northern Caves Volume 1 comes out?

Your question was general in nature though and you may have been thinking of other books for which new editions aren't planned. If that's the case you definitely need to get the blessing of the holder(s) of the copyright first, who will normally be listed near the front. If they have passed away, the copyright remains part of their estate, so you'd need to ask permission from the beneficiaries of the will of the deceased copyright holder(s). They may not be cavers and may have no idea of the high regard with which an out of print book and / or its authors are regarded by the caving community. So, if you ask in the right way, there's a good chance they'd be happy to give the permission you need. Finding them may be a minior challenge but asking the publisher is a good start (as they may still be sending a trickle of royalties). This forum may also be a good way of tracking down families of deceased cavers, who now hold the copyright.

(I'm not a legal type but I've had to look into copyright issues in the past and ask permission myself. I'm pretty sure the above is right but it's only a very brief summary of a complex subject. Hope it helps, if nothing else.)
 

Andrew N

Active member
Thanks for the thoughtful replies.

I think it's clear that copying written descriptions of caves contained within books would require permission if done at all. However, the legality and ethics of copying other information within the books - such as cave names, locations, lengths, and depths - is less clear to me.

If a book lists a cave named Imaginary Cave, located at A Grid Reference, with a length of 1 kilometre and a depth of 100 metres, are these datapoints the intellectual property of the person who collated and published them, or are they simply facts about the features of the land in a particular location? My gut feeling is that it would be both ethical and legal to copy this information and make it available, but I've been surprised by making assumptions about the law before!

I realise some websites, such as excellent one that the CNCC provide, already possess some of this functionality - however the CNCC do not include minor caves/sinks and it wouldn't make much sense for them to do so.

I also know that the Northern Caves website, for the 'new' book, has a database similar to the one I am proposing except exclusively for the area covered by the book. Presumably such a website would also be produced for new editions of the book. I suppose the main questions as to whether it would be worth digitising this information (sans descriptions) would be:
  • Is anyone already doing/has done it? In which case can I assist? (Badlad alluded to this, I think)
  • Would the time between completing this project and the publication of new editions of guidebooks for an area (such as the new Northern Caves series) be long enough that it would actually be useful to anyone?
I'm acutely aware that I am very new in the caving community and that my ideas are surely not original. Upon reflection, it seems likely that such databases are already in progress. Nonetheless, it's often difficult to actually find the time to get out caving and an armchair project such as this to 'keep me interested' is appealing.
 

mikem

Well-known member
So, what is already available? You need to be aware that many of the grid refs in books aren't correct, although most are close enough, a few may be way out! & I'm sure I've missed a few that I've heard of & others that I haven't, who can suggest some more...

From CHECC website, map based on NC1:
NC2:
NC3:









So the project that suggests itself is joining all of these sources together - on typing in the cave name you get all the relevant links. The thing to be aware of is that this sort of project is a nightmare to keep current, as web addresses are changed over time, so best to go for the most general page that's useful.
 
Last edited:

mikem

Well-known member
Almost forgot (although more S.Wales based):

But don't know of anything along the lines of what I suggested, probably because it's such a big job.
 
Last edited:

RobinGriffiths

Well-known member
All the regional bodies have online registries apart from the CNCC. Not sure how up to date they are kept though. I suspect both Wales and Derbyshire are kept up to date.

http://www.cambriancavingcouncil.org.uk/registry/ccr_registry.php (probably the most sophisticated)
https://registry.gsg.org.uk/sr/registrysearch.php (hosted by the Grampian)

It would be nice if they all had an open, standard api though so that caver/developers could produce additional applications and features using live data without recourse to scraping.

Robin
 
  • Like
Reactions: 2xw

mikem

Well-known member
The registries are not all run by regional councils though. They also mostly include bibliographies etc that would be a massive job for the Dales - if publishing the rest of Northern caves happens then it should cover (at least partially) this requirement.

On further reflection I decided a cave specific search engine would be an easier project, where it checks all mentions of a particular site within a curated list of webpages, with the option also to Google them (& extra filters for popular locations). People could then suggest other appropriate websites (& you'd only have to change one entry when domains are moved) - Google is great but it brings up far too many inappropriate suggestions for some names!
 
Last edited:
  • Like
Reactions: 2xw

aricooperdavis

Moderator
I did a bit of work with Bob Mehew looking at cave entrances on SSSIs, and for this we used a number of publicly accessible data sources. Where possible I accessed the data directly, but I did have to resort to web-scraping for a couple of sources. I did this overnight, used careful rate limiting, and requested only the bare minimum to minimise any negative impacts on the web hosts, but it still could be considered antisocial.

The problem with this approach is that it takes a long time so can't be done in real time as searches come in, it isn't very robust (minor changes to site layouts can mean the webscraper has to be redesigned), and it's not how the data-sources were intended to be used - in fact many have terms of use that explicitly prohibit web scraping, which I abided by.

Many of the regional councils use cave registry software that is based on a common design by Matt Voysey, which makes it easy to work directly with the databases that drive them as they all have a common layout. It would be interesting to see whether the BCA's Cave Registry & Archive working group could support a centralised access system for these distributed registries (with permission of the owners of each, of course!).
 
  • Like
Reactions: 2xw

RobinGriffiths

Well-known member
So a nice simple Rest api on each registry? Just 3 methods returning Json.

GetAll() return list [ID, Lon, Lat, Name]
Search(params) return list [ID, Lon, Lat, Name]
GetDetails(ID) return {details for single location}
 
Top