Making survey data available

TheBitterEnd · Oct 20, 2011

I've been mulling this one over and was going to post something long winded about some ideas I had for a standard for cave data interchange. But on refelection, Rhys, I think your Wiki approach is as good as it will get. Do make sure that you have paper copies, you can guarantee they will out live your current computer, this forum and the Wiki format.

Rhys · Oct 21, 2011

The real challenge is actually just getting people to use any repository and convincing them that it really is a good thing to share, and therefore save, all their hard work for the future geneations of cavers.

Rhys

andrew · Oct 21, 2011

I am very doubtful about using a wiki, cave data is made up of thousands of files, and as projects get bit they get very complex.

In a repository you get a replica of all the data you want in the structure

To quote Julian Todd

" Add that the real issue is the layout, organization and commit access of the data -- not how it is hosted. The hosting can be migrated much more easily than changes in the data organization."

Here you can use a browser to look and download the data. (it is on knowledgeforge but that is only the host). With a svn client you can get updates without having to download it all, and with the correct permissions, update parts. This example shows one of the 2 main structures that are used.

http://knowledgeforge.net/sesame/club/mmmmc/

This has notes in parallel structure to data files. The most important thing being to capture the original data, scanned notes etc. The electronic files are really derived data, as they come from the notes, but do contain lots of important things, like corrected mistakes. The centreline and outputted surveys are really not needed and use space plus change often, but sometimes are included so that less technical users can help with the project, but these last things are personal choices

The alternative structure (notes with the survey data) is available at,

http://www.ubss.org.uk/cave_survey_archive.php#Cheddar

This is also stored in a repository which currently has access for about 8 people, it is not visible publicly due to restrictions on the server, so we occasionally update the zipped file.

There are lots of groups using these sort of systems nowadays. If you want to set one up PM me and I will help you or put you in contact with someone who can.

The http://www.cave-registry.org.uk/ is now up and I have added some data (now deleted) just need to do a bit of work on permissions, and some instructions, are there any Windows TortoiseSVN users out there, I have a set of instructions I wrote for the Mulu repository we used in 2005, but they may be a bit dated and I do not have windows any more so someone updating them would be helpful

TheBitterEnd · Oct 21, 2011

Rhys, I quite agree and that's what stopped me in my tracks with posting my "long winded" proposal - caving is a hobby and people will do the bits that interest them. I think most diggers and surveyors do want to be remembered but want to dig and cave, not fiddle around with flippin' computers.

Andrew, I really do understand where you're are coming from, I use SVN daily at work and sometimes CVS and SourceSafe. But in the past we had SCCS and CMS, now GIT is the new SVN ... on it goes.

The hosting can be migrated much more easily than changes in the data organization

That's the one to crack - data organisation. Looking through the misty mountain stuff there is a variety of file formats, some "readme.txt" files to indicate what the data is, etc. I'm sure the people currently using this data today know what it is but how does some one get a handle on it in 20 years time. This thread gives an idea of the potential problems. Also it only seems to be survey data. How do we tie it into to other information, stal dating, archeology, dye traces, etc.?

What I was going to propose was a more structured "wrapper" around whatever data is available, but why would anyone bother?

graham · Oct 21, 2011

TheBitterEnd said:
Rhys, I quite agree and that's what stopped me in my tracks with posting my "long winded" proposal - caving is a hobby and people will do the bits that interest them. I think most diggers and surveyors do want to be remembered but want to dig and cave, not fiddle around with flippin' computers.

A hobby!!??!! Bloody Hell!

TheBitterEnd said:
Andrew, I really do understand where you're are coming from, I use SVN daily at work and sometimes CVS and SourceSafe. But in the past we had SCCS and CMS, now GIT is the new SVN ... on it goes.

They are all just tools. As Andrew says and you agreed.

The hosting can be migrated much more easily than changes in the data organization

TheBitterEnd said:
That's the one to crack - data organisation. Looking through the misty mountain stuff there is a variety of file formats, some "readme.txt" files to indicate what the data is, etc. I'm sure the people currently using this data today know what it is but how does some one get a handle on it in 20 years time. This thread gives an idea of the potential problems.

Haven't yet gone back to that thread, but the "twenty year" problem will be there whatever format is chosen. Have you ever tried to deal with ancient written data? Over the last ten years or so, we've dug out and computerised a lot of data going back to the 1940's and none of it was exactly in a format that would be used now. We solved most of the problems, but it could have been much easier if the recorders then had understood the problem - as we now do - and clearly documented what they had done.

TheBitterEnd said:
Also it only seems to be survey data. How do we tie it into to other information, stal dating, archeology, dye traces, etc.?

You gotta start somewhere.

TheBitterEnd said:
What I was going to propose was a more structured "wrapper" around whatever data is available, but why would anyone bother?

Because "bother" is what some people do. I suggest you look at such wonderful resources as the Mendip Cave Registry and Archive at www.mcra.org.uk and then ask those responsible (not blowing my own trumpet here, I've barely contributed to this work) why they "bothered".

dave_the_cave · Nov 14, 2011

graham said:
TheBitterEnd said:

What I was going to propose was a more structured "wrapper" around whatever data is available, but why would anyone bother?

Click to expand...

Because "bother" is what some people do. I suggest you look at such wonderful resources as the Mendip Cave Registry and Archive at www.mcra.org.uk and then ask those responsible (not blowing my own trumpet here, I've barely contributed to this work) why they "bothered".

If you want a structured open format for the data then you should consider semantic web, linked open data, RDF - a collection of caving related ontologies would be great - a standard url for every cave and standard locations

If you look at the mcra they provide a url for every one of their sites but its just an arbitrary number

http://www.mcra.org.uk/registry/sitedetails.php?id=1714

My version is not any better

http://www.bristolunderground.com/dpexplorer/dpexplorer.html?entrance=19

Both urls are clickable

Something similar for inside specific points inside a cave could be used to organise and present cave photo collections. This way survey data can be used for other
purposes than surveys.

The linked open data initiative has made tools, ontologies and data available - such as ordinance survey data

A collection of ontologies provide a flexible way of structuring data

TheBitterEnd · Nov 15, 2011

I agree and I was going to post a proposal but as you can tell by the tumble weed blowing though this thread there would seem to be very little interest in such a thing. I also feel that most data producers would not see the benefit of the extra work involved and so would not bother. implementing it.

graham · Nov 15, 2011

In 10 years, some of my data has been through three different data formats.

Surely what is important here is just to ensure that the data so painstakingly acquired survives; an end user can work out later what he may want to do with it. That cannot happen if it is lost.

TheBitterEnd · Nov 15, 2011

But in a way that is the point of an open standard. HTML pages from 1995 will still display in a browser today, ASCII files from the early 1960's are still viewable, DXF has been around since 1982.

It's not about choosing one software package or another it's about saying, for example, all published data will be ASCII, survey legs will be of the form "From, To, Length, Bearing, Clino" etc. Then in 50 years time we will at least have a spec for what it is we are trying to read.

Another issue that things like the Mendip Registry seek to address is "What and Where" and here again an open standard for published/exported data would mean that in several decades time we would know what the data was indended to mean.

JohnMCooper · Nov 15, 2011

Even an open standard changes.

I was brought up on the EBCDIC character set before moving on to ACSII and am currently watching the development of Unicode and ISO/IEC 10646 Universal Character Set (UCS) codes which permit 8 bit, 16 bit and 32 bit variants. The idea that an old fashioned 7 bit code can last much longer seems strange to me. Why should non-English characters be banned from survey data just to preserve an American instigated code?

As pointed out in another post, sumps are usually surveyed without a clinometer reading, using depth at station instead.

I agree with Graham that it's most important to get survey data recorded somehow in a way that will be understood in the future.

graham · Nov 15, 2011

John, shall I tell Andrew that you want to start giving passages Chinese names, or will you ...

JohnMCooper · Nov 15, 2011

As a member of Chelsea Spel?ological Society caving on Llangattock with Pete?s Caf? in Agen Allwedd and the Hard Rock Caf? in Ogof-y-Darren-Cilau I?ve only got as far as the extended ASCII character set so far!

TheBitterEnd · Nov 15, 2011

I don't really think EBCDIC is an open standard and it was developed along side ASCII not before it. It was/is an IBM Mainframe/AIX standard and even IBM didn't choose to carry it forward to the IBM PC.

Unicode has been around since 1989 and embodies ASCII (in that the first 256 code points are identical to ASCII ) so it could be argued that it is an extension to ASCII and there-in lies the trick - an extensible standard. HTML 5 is the current version but HTML 1.0 will render on a HTML 5 compliant browser - it has been extended.

But this is just idle chat, I doubt there is any real interest in moving this forward

dave_the_cave · Nov 15, 2011

TheBitterEnd said:
I agree and I was going to post a proposal but as you can tell by the tumble weed blowing though this thread there would seem to be very little interest in such a thing. I also feel that most data producers would not see the benefit of the extra work involved and so would not bother. implementing it.

Actually there is quite a lot of data out there already (for cave locations). There are a lot of googlemaps based web pages showing cave locations. Recently there were gpx versions of the locations. It is quite easy to web scrape or perform the format conversions to get some data. So we could just do it for cave locations.

TheBitterEnd · Nov 16, 2011

But there is a world of difference between an ontology and somehting scraped off a website.

TheBitterEnd · Nov 16, 2011

FWIW my thinking is to define a common standard (similar to HTML, DXF, KML) which at its simplest wraps some light weight structure around available "raw" data with the option to extend in terms of both breadth an depth - particularly to make the data more structured over time and add new data sets to existing information. It would be possible (I believe) to export/convert data from standard surveying software into this format. However ideally tools like Survex/Therion would embrace this model at a deeper level and add functionality to better decompose survey data into this format.

The fundamental object that this proposal seeks to model is a "void". At it's simplest the data required is a CaveSystem (with a name) and one void (with a name). The primary assumption is that having discovered and named a void it forms the basis off which all other data can be hung. Consistent naming is critical and could be driven by an ontology.

Even unsurveyed holes in the ground can use this data model. A void can have "referenceMaterial" which may be anything - a text description, a link to a web site, a reference to a paper survey book. A void can have survey data, which could be either in the format defined in this specification or it could be a block of data from a survey application (with application name and version).

Just to be clear, a void is what ever the data author defines it as (within the constraints of it being a space in the earth). So "West Kingsdale" could be a void but so could Rowten Sump. However the more granular the data the more useful it is, i.e. a "good" definition of a void might be that is is a space of single character (chamber, shaft, aven, rift, bedding, phreatic tube) and that a new void definition is used for each change of character.

This proposal aims to model a cave system as a series of connected voids. Again these need not have survey data but if they do it would be expected that there would be a block of survey data per void. The connection between voids can have a description, for example the connection of Lancaster Hole to Bridge Hall might be "south west through slot or along ledge above slot" and Bridge to Lancaster "north east at top of large bolder slope"

Voids can be grouped for a particular purpose, so for example if you wanted to describe a route though a cave, you could create a void group that indicates the set of voids that make up the route. Similarly it would be possible to group voids for conservation status or ecological purposes. Voids could also be individually tagged by these categories and other data such as exploration history, diggers etc could be added.

Survey data and reference material have an IPR element which defines the rights holder, date, license and a release date. This could be applied to other elements as necessary.

I know this sounds like a lot of work but it starts with adding a bit of a standard "header" to what ever data we have now and then develops over time. I know Mendip has a cave registry and from what I have seen of it, it stores the kind of data suggested here so could be output to this format..

The real "upshot" of this (and to answer Rhys's original question), if we can agree a common format that is flexible enough to accommodate future needs, then there can be any number of distributed repositories and if/when people move on, the data will be easy to integrate into other repositories and so will not be lost. Obviously a central repository at the BCRA or some such organisation would be a good back-up.

Here is a Relax NG Schema for this (not yet complete) ...

Code:

start = CaveSystem
CaveSystem = element caveSystem { 
    attribute name { text },
            
    element void {
        attribute uniqueId { text },             ## not necessarily the standard "IT" guid, 
                                                 ## possibly built up - GB.TC.EG.Lancaster
        attribute name { text },
        attribute character{
            string "rift"    | 
            string "bedding" | 
            string "tube"    | 
            string "chamber" | 
            string "aven"    | 
            string "shaft"  
        },
        VoidRef*,                                ## connected voids
        element terminus {                       ## some kind of system end point  
            attribute type{
                string "entrance" | 
                string "sump"     | 
                string "choke"  
            },
                                                 ##if type is entrance then details can be supplied
            Entrance* 
        }*,
        SurveyStation*,
        Graphics*,
        ReferenceMaterial*
    }+,    
    
    element survey {
        attribute surveyors   { text },
        attribute date       { text },
        attribute instrument { text },
        attribute correction { text },
        
        ((element station {
            attribute name  { text },
            attribute left  { xsd:double },
            attribute right { xsd:double },
            attribute up    { xsd:double },
            attribute down  { xsd:double }
        }+,
        
        element traverse {            
            element leg {
                attribute fromStation { text },
                attribute toStation   { text },
                attribute bearing     { xsd:double },
                attribute distance    { xsd:double },
                attribute clino       { xsd:double }
            }+
        }+) | SurveyAppData),
        
        ReferenceMaterial*,
        IPR?
    }*,
    
    element voidGroup {
        attribute name         { text },
        attribute groupPurpose { text },
        VoidRef*
    }*
                
}

Entrance = element entrance {
        attribute name { text }, 
        
        element locale {
            attribute country { text },
            attribute region { text },           ## a system could have entrances in more than one country/region
            ( (attribute easting { xsd:int }, attribute northing { xsd:int })
            | (attribute latitude { xsd:double }, attribute longitude { xsd:double }) )
        }        
    }
                
                
VoidRef = element voidRef {                      ## a reference to another void
    attribute uniqueId { text },
    text
 }

 
ReferenceMaterial = element referenceMaterial {  ## a reference to external information
    attribute author { text },
    attribute date { text },
    IPR?,
    text
    
 }

SurveyStation = element surveyStation { attribute station { text } }

SurveyAppData = element surveyAppData{
    attribute application { text },
    attribute version { text }
}


Graphics = element graphics { 
    element line      {Point, Point }*,
    element polyline  { Point+ }*,
    element polygon   { Point+ }*
}

IPR = element ipr {
    attribute rightsHolder { text },
    attribute date { text },
    attribute releaseDate { text },              ## the date at which the author has 
                                                 ## granted rights for release to the public domain        
    attribute license { text }

}

Point = element point { attribute x {xsd:double}, attribute y {xsd:double} }

dave_the_cave · Nov 22, 2011

Sorry for the rather belated response (I have been down with flu and unsure how to answer)

An interesting format with the voids idea and admirably concrete.

I want some means of naming a cave chamber or passage - presumably this is catered for by the unique id. In different surveys this could be shown in a different location. If there was a standard means of identifying parts of a cave I could tag my photos with the identifiers.

I'll try and put a semantic web hat on (it does not really fit me)

The semantic web approach is some what more subtle - although it is yet another format it is not really. It designed for the situation where there are different silos of (caving) data which will happen if cave data is kept (and silos cannot be avoided on the web) It is easy-ish to get syntactic data integration by simple transformations into the semantic web from different formats (I ought to transform the void-format xml to Turtle).

It gets further advantage in the integration task by allocating unique ids (urls) and also provides some tools for asserting the sameness on different entities and relations.

More subtleness follows because of the emphasis on establishing common ontologies - in the linked open data area respected authorities are publishing ontologies for spatial relations, location enabling resource data to be expressed using them e.g southampton, but allowing others to build their own from these common ontologies.

TheBitterEnd · Nov 23, 2011

Thanks for taking the time to read what I posted, it would be fantastic if someone picked up the baton and produced a semantic web implementation and ontology for caves.

You are correct in that I saw the Unique ID as the primary way of identifying a chamber or passage and thought it would be neat to have codes for country, region, cave system so that something like GB.TC.EG.BillTaylors would identify Bill Taylor's passage in the Easegill cave system in the Three Counties cave region in Great Britain. Or, as you suggest it could be a link into an ontology.

I believe Rhys's original point was about addressing the problem of "I've created some cave data - Survey data, Graphics, Photos, digging logs, etc. how do I make this data available". I was looking at the problem from the other side - "I've found a hole in the ground, what is known about it". And I believe if Rhys's data was wrapped in something like the data structure I suggested (in whatever form) I could search for the information geographically, but equally well by digger, surveyor, or passage name or whatever (find me all unclimbed avens?).

cavermark · Nov 23, 2011

Rhys said:
The real challenge is actually just getting people to use any repository and convincing them that it really is a good thing to share, and therefore save, all their hard work for the future geneations of cavers.

Rhys

The kind of jargon in the previous few posts has already scared me off.

graham · Nov 23, 2011

cavermark said:
Rhys said:

The real challenge is actually just getting people to use any repository and convincing them that it really is a good thing to share, and therefore save, all their hard work for the future geneations of cavers.

Rhys

Click to expand...

The kind of jargon in the previous few posts has already scared me off.

Yes, quite! I work with a repository of cave survey data which is shared, edited added to and used by a number of people & I haven't got a clue what they are on about.

It's all very well having these airy fairy ideas but unless you are prepared to put the work in and produce a system that not only does what you want it to do but is also usable by the average caver and cave surveyor who, frankly doesn't give a shit about computing except as a tool they can sometimes use (few of them make their own compasses, either) then you are just wasting your time.

Descent 297 - Out Now!

Lots of variety including 'Attaining Unobtainium', 'Mermaids and Angels' and Benarat, Mulu 2024....

Making survey data available

Well-known member

Moderator

Member

Well-known member

New member

Member

Well-known member

New member

Well-known member

Member

New member

Member

Well-known member

Member

Well-known member

Well-known member

Member

Well-known member

New member

New member