Wanted Computer Program help

Bob Mehew

Well-known member
I have a five column by 1,000,000 row csv file data set which I would like to trim using Excel. (The columns represent 4 different measurements with respect to time.) Whilst I can see how to manually read in the data, create a chart of measurements with respect to time and hence identify the start and finish time of the sub set of data I want, I have no experience of VBA to write a code which will do that, then extract a sub set of data given the start and finish times and save it as another excel file. Can some one help me? (It would be a nice addition to also be able to cull the sub set of data to a specified one in some many, as well as multiply individual columns by a specified constant to transform the units of the data.) And yes it is connected with rope testing. A typical csv file in zip form can be downloaded from here .
 

alanw

Well-known member
I'm afraid I can't help with VBA/Excel, but it might be really easy using SQLite.

If you don't want to try installing it under Windows, if you give me the selection parameters and scaling factor I can do it for you.

SQL:
$ sqlite3 13_BPC_46.sqlite
SQLite version 3.40.1 2022-12-28 14:03:47
Enter ".help" for usage hints.

sqlite> CREATE table BPC ("time" REAL, "A" REAL, "B" REAL, "C" REAL, "D" REAL);

sqlite> .import --csv 13_BPC_46.csv BPC

sqlite> .mode csv
sqlite> .output foo.csv

sqlite> SELECT time, A * 10, V2, V3, V4 FROM BPC WHERE time > 1.0 AND time < 1.00002;

sqlite> .q

$ cat foo.csv
1.00000198,-15.30704,2.73654,4.25528,0.0
1.00000398,-15.32902,2.73654,4.251617,0.00305213
1.00000598,-15.31681,2.740203,4.261995,0.0
1.00000798,-15.29484,2.743865,4.251617,-0.00366256
1.00000998,-15.32902,2.747528,4.244292,-0.00366256
1.00001198,-15.32902,2.757905,4.244292,0.0
1.00001398,-15.33879,2.757905,4.251617,0.0
1.00001598,-15.32902,2.768893,4.247955,0.00305213
1.00001798,-15.32902,2.76523,4.251617,0.00671469
1.00001998,-15.32902,2.77927,4.244292,-0.00732511
$
 

royfellows

Well-known member
Hi Bob, I have just read this so this is a quick response. I think you will find more muscle and flexibility in MS Access, VBA is the language of Access from V2000. Prior to that is was Access Basic which is similar but not quite the same. Access has inbuilt wizards that can inport data into Access tables from a lot of formats including .xls, .csv, .txt etc. Once into Access tables, recorsets can be parsed in VBA doing just abourt anything you want. Its a very powerfuul high level language. However, a warning. As I have said many times in the past Access is deceptively easy to get into. Its like a beach on a nice sunny day and sea looks tempting. So you go in for a paddle, another step and its up to your knees, another and up to your waist, and then the bottom takes a sudden plunge.
Thing is, once into Access and processed, you can then output as you wish using the method of DoCmd - DoCmd.OutputTo acOutputReport, "PageIndex", acFormatRTF, strFilePath, True

I am currently back on my high efficiency 4 mode driver project, and I have a bench example running in the next room. So the big thing here is time, so I dont know what to say.. I will down load your file and have a play, but no promises!
 

royfellows

Well-known member
I just walked away from my machine then came back with another thought. Once in Access data can be filtered, sorted and manipulated in different ways using SQL queries. SQL can be concatenated in VBA calling in whatever variables. Just a thought
 

royfellows

Well-known member
I have the file downloaded and will have a play tomorrow. Looking at it, you only have 5 columns of data, so it could be very simple. Basically it would be a straight forward import with each column being a field in a single table, but each row would be a record, so a lot of records. It may be possible to simply use an SQL Query with called in VBA procedures that are inbuilt. Example is the Left function, Left(3, 0,0) off the top of my head, could be wrong in the arguments but basically calling the first 3 characters on the left into a new query record. These could be outputted to a new table using an action query. Childs play compared to some of the stuff I have wrote such as APR in credit agreemenst and early settlement discounts.

I am training tonight but will have a play with it tomorrow. If you can capture my interest you have won half the battle.Will come back on it.
Reading you post again, I can see that you want it in Excell so as to create charts, looks interesting. By the way, maths works in SQL and VBA just as it would on a calculator, no magic involved, just so you know.
 

alanw

Well-known member
P.S. apologies for the above copy/paste error, the line should be
Code:
sqlite> SELECT time, A * 10, B, C, D FROM BPC WHERE time > 1.0 AND time < 1.00002;
 

lexik

New member
bob - I have some code that is specialized for reading/writing Excel/csv based data - it would probably be a simple job to do what you have asked - I'll have a bit of time this evening to do a demo extract on your sample data - I've got your email somewhere. I'll do a simple extract of the middle bit of the data and a bit of manipulation of the channel data as a proof of concept and drop you an email
 

Bob Mehew

Well-known member
Many thanks for your replies AlanW & Roy. I am a bit reluctant to go down the SQL / SQLite route as it ensnares me into yet another program which I was trying to avoid. Viewing the data is an important component as what I am trying to do is extract that part of the data which covers the large peak (and a bit more) in the attached image.

Thanks for the offer lexik, I will await your effort with interest.
 

Attachments

  • 13 BPC 46.jpg
    13 BPC 46.jpg
    84.3 KB · Views: 82

Edwardov

Member
I’d recommend Python for solving this, worth learning. You could even run someone’s code in Google Colab in your web browser to work on the file.
 

ChrisB

Active member
Bob, whatever software you use, you'll need to tell it the characteristics of the data you want to extract. For example, "find the max value in Channel C and extract 100 rows before and after" or " find the max value in channel C and extract the rows before and after where the value of Channel C is more than 25% of that" - there are many more complex patterns you could look for, such as involving multiple columns or multiple maxima, but until you can define them, there's no point trying automate the process. Looking at the zipped sample, I'm guessing you want something like
0.75- 0.85s, maybe 1.08-1.12s, and some more around 1.3 and just below 1.5, but mathematically I don't know what criteria would pick those out.

Unless you have a huge number of files, trimming manually is going to be quicker than defining what you want and writing code.

PS - you may not recognise my login, we had some detailed discussions about 15years ago on P anchor specs, and I've now retired from the industry we worked in. I've recently taken up caving again.
 

RobinGriffiths

Well-known member
If you want a quick and dirty thing in Excel just based on start and end time, apply a between Filter (Data toobar) to your time column. Copy results (unhidden rows) to a new worksheet and repeat n times. Then save each worksheet as separate csv.

If you really want to use VBA, investigate the Range and Value object properties.
 

petecaves

New member
Might also have a fiddle - be a useful project trying to get my head round more dotnet. A quick thing in Python could be done too. Might be interesting to see how the various solutions offered compare :)
 

Bob Mehew

Well-known member
I have done it in python but I wanted a simple approach for others who don't use python. Robin's comment about using the filter option in Data is almost spot on. I have also got several offers off thread as well as AlanW's helpful contribution. Many thanks to all for their offers and contributions.
 

Edwardov

Member
I have done it in python but I wanted a simple approach for others who don't use python. Robin's comment about using the filter option in Data is almost spot on. I have also got several offers off thread as well as AlanW's helpful contribution. Many thanks to all for their offers and contributions.
Look into Google Colab then. Anyone can run your code in their web browser without having to install anything. You can operate on files they they upload and you can download any outputs.
 

royfellows

Well-known member
I have the data into an Access table in 2 minutes and can do whatever you want with it using SQL or VBA, all I need are instructions.
 

royfellows

Well-known member
Me again. Am I correct in thinking that all you want to do is extract data that exceeds a certain value?
This would be 2 minutes!
 

bills

New member
Why don't you import it into R or Python, I think you will find the coding much easier than VBA and there is lots of internet support
 

Bob Mehew

Well-known member
Why don't you import it into R or Python,
The reason I chose Excel and VBA is that whilst I run Python, my colleagues do not. I was after a program which ran in Excel (and freeware like Libre Office), could do the job with the same amount of interface and did not frighten people off by being too complicated (how was your first experience of adding a module to python?). Lexik has almost finished answering my request whilst Roy Fellows, alanw and another person were most kind in offering alternative approaches but they did not meet the fundamental point of running in Excel. Oh and thanks to Robin Griffiths for pointing out the filter function which along with the indirect function, broke the back of the task. Scrolling down 1,000,000 rows was painful but now is automated.
 

bills

New member
When you mentioned the size of the columns I immediately thought of the two alternatives. I use R personally more than Python, though a colleague uses python. Compared to VBA I find R far easier to use and there are some very helpful web resources. If you are not in academia chatGPT is very helpful and will even write (and explain) R and python scripts for you. I use Conda and Jupyter Labs to run either R or Python, the Juptyter Lab notebook makes it a bit less frightening than runnng from a terminal
 
Top