• Derbyshire Explorers Forum 2023 - Sat 30th September - It's On!

    The event is now being held at the Mechanics Institute in the centre of Eyam (opposite the church), still on Saturday 30th September, with doors opening at 15:00. Best bit is that there is a bar on site.

    Entry is free and there will be evening meal also. More details to follow...

    Click here for more details

Extracting data from the Web

Bob Mehew

Well-known member
I am after help from someone who can write code to extract data from the web. Simply put, I am interested in several web pages which offer access to data in a large number of files (thousands) using a simple coding system for each file name. What I would like is to be provided with a program (using python or other free software) which can go to the specified site(s), download the file and search it for certain words. If the words are present, then the file can be kept and the web address recorded in a file. If the words are not present then the file can be deleted. Then the program should make up the next file name and repeat the task. Obviously the program will have to cope with a null return for a file name plus no doubt other error messages. If any one thinks they can help would they PM me for further details. It is of course caving related but I do not wish to broadcast what I have in mind.

Many thanks in anticipation.


Hi Bob,

Do you want to send me an email with the details? This sort of web scraping is relatively straightforward in python. Searching for words in the document may be possible depending on the file format.

All the best,
Last edited:

Bob Mehew

Well-known member
Many thanks to Ari who has done one part of the the task and also to Benfool who are now working on the other part.