Wanted programming help in sorting out data

Bob Mehew

Well-known member
I am looking for some help from someone to write a program in preferably python to sort two sets of time based data into order.  There are (at least) two complicating factors.  The first is the date and time statement is likely to be different between the two sets of data (c.f. 22/08/2019 09:32:23 and 22/08/19 9:33 and no doubt other variants).  The second is the data is usually in txt or csv format but has different column divider codes (c.f. tab, comma, space etc.).  One data set also contains the occasional null value for CO2 plus a host of program progress lines which need ignoring.

The purpose of the program is to take two sets of in cave CO2 and other data (pressure, temperature and relative humidity) obtained at different time intervals over a common time period and extract the CO2 reading from the more frequent data set timed closest to the time of each sample in the less frequent data set.  I then want to do some statistical checking of how well the two sets of CO2 reading correlate.  That might get complicated as a trail stab at it has indicated there could be a time lag between the two data sets.

If you are interested in helping develop a CO2 logger for use in cave, then please PM me.

many thanks in anticipation
 

andrewmcleod

Well-known member
This sort of thing is easy in Python (although everything is great in Python  :ang: ); send me some data and I can knock something up. Is use of numpy (standard package for doing proper number with in python) ok (can be easily done without it).

It's also fairly easy to interpolate a time sequence on one set of intervals (e.g. every 4 mins past the hour) onto a different time sequence (e.g. on the hour) for comparison (I have done this for logger data before for someone) and you can either average to longer intervals or interpolate to shorter intervals (although obviously you don't get new data by doing that!).

Having done the mapping onto equivalent sampling intervals you could get the cross-correlation as a function of time delay to estimate any lag.

(Although by Python I assume you mean Python3 these days)
 

Bob Mehew

Well-known member
Lexik has already started on the task so thanks for the offer.  I have PMed you.

Yes Python 3 would be preferable  and numpy is OK.  (I run Spyder under Anaconda3 though that is causing me some irritation these days in failing to update.)

Thanks for the thoughts about interpolation; I might come back to you.  But it looks like the lag which has been seen is a mixture of BST v GMT setting plus a number of minutes mismatch in times kept by the clocks in each logger  :-[
 
Top