• Hello From Descent

    The publication date for issue 289 is the 10th of December, meaning subscribers should receive their copies during the week leading up to that date. It is also available from caving suppliers such as Inglesport and Starless River, or from our new website

    New Descent board here:

Extracting data from the Web

Bob Mehew

Active member
I am after help from someone who can write code to extract data from the web. Simply put, I am interested in several web pages which offer access to data in a large number of files (thousands) using a simple coding system for each file name. What I would like is to be provided with a program (using python or other free software) which can go to the specified site(s), download the file and search it for certain words. If the words are present, then the file can be kept and the web address recorded in a file. If the words are not present then the file can be deleted. Then the program should make up the next file name and repeat the task. Obviously the program will have to cope with a null return for a file name plus no doubt other error messages. If any one thinks they can help would they PM me for further details. It is of course caving related but I do not wish to broadcast what I have in mind.

Many thanks in anticipation.
 

aricooperdavis

Moderator
Hi Bob,

Do you want to send me an email with the details? This sort of web scraping is relatively straightforward in python. Searching for words in the document may be possible depending on the file format.

All the best,
Ari
 
Last edited:

Bob Mehew

Active member
Many thanks to Ari who has done one part of the the task and also to Benfool who are now working on the other part.
 
Top