Browser Caching

andrewmcleod

Well-known member
[gmod]Discussion on browser caching split from Cambrian Covid update[/gmod]


Stuart France said:
I?d just like to mention the subject of browser caching.  Technically, this is a major design flaw (I?m being polite here) which dates back to the days of dial-in internet connections which were very slow, but it remains the 21st Century default.  Simply turning caching off globally in your browser may simply replace one problem with another.  On a Windows browser you generally press F5 or click on Reload to fetch the latest page version, otherwise your browser will display any old cached page it still has, and you would not know whether the page you are then seeing has been superseded or not.

This leads to people complaining that ?your website never changes? or ?your information is obsolete? when that?s not the case if only their browser presented the latest page version by default.  So news items may not get through to people unless they know about F5 etc.

We?ve just disabled browser caching across the entire Cambrian website (i.e. explicitly for every page) so when you next reload one of our pages (having pressed F5 etc for a final time to get it) then our pages will not cache again, so you?ll always see the latest version on future visits to our site.  If anyone notices any problems with this change, then please let our webmaster know via an email.

Stuart France
CCC C&A Officer




Caching in general is not an antiquated thing from the days of dial-up; it is an important part of the web. If no pages or content were cached, either locally or on servers, vastly more content would be downloaded than required. For example, various popular JavaScript libraries will be cached by your browser so once you've seen a particular version once, you won't load it again even on a different website - let alone reloading every images, JavaScript library and CSS file every time you look at a new web page :p

Then on the server side caching is critical for major websites as part of load-balancing.

The trick is to make sure your page caching policies are correct so that dynamic content isn't cached :p

It's not a flaw, it's a (misused) feature :)
 

Stuart France

Active member
Anyone who possessed common sense would have implemented browser page caching in the following manner on every occasion by default:

1) compare the saved date of the browser's cached copy with the  create date of the latest server version of the asset - which has very little overhead as only a few bytes move in either direction
2) if latter is more recent than former then automatically replace the cached page with a fresh download.

In this way a new download occurs only if the local copy is proven to be obsolete and the computation and transmission penalty for obtaining that proof is negligible.  Unfortunately that is not the way it actually works.  Yes I know you can optionally set a max-age on an html file so that it expires after an elapsed time measured in seconds from the moment the page is rendered, and you can set an expires date as an absolute date-time, but this does not solve the fundamental problem of how to climb out of the lobster pot of seeing obsolete material for ever and ever once your browser has cached a page that does not explicitly ask not to be cached.

The default is that digital lobster pots get created, and that is just not a sensible design approach. Not everyone by any means is aware that webpage refreshing is an issue, let alone how to do it, and content providers who do not author webpages for a living may also be unaware of the need to insert extra 'meta' code into all their pages that are subject to potential future change (which is surely all of their pages?) to prevent them ever being cached which prevents their audience from receiving updated content automatically!

A couple of technical support queries have just come my way about how to refresh webpages on Apple computers running Safari.  I'm in the Windows world myself, but I understand that on a Mac you press the Command button which is just left of the space bar plus R at the same time, and on an iPad you can either tap the document image and a popup menu then appears where one of the options is Refresh the page, or you may tap on the refresh icon just right of the browser's URL textbox which looks like an arrow in a broken circle.  Pressing F5 on a Mac is used for adjusting the screen brightness!
 

NewStuff

New member
Stuart France said:
On a Windows browser you generally press F5 or click on Reload to fetch the latest page version, otherwise your browser will display any old cached page it still has, and you would not know whether the page you are then seeing has been superseded or not.
F5 refreshes the page, CTRL+F5 clears the cache.
 

andrewmcleod

Well-known member
Stuart France said:
Anyone who possessed common sense would have implemented browser page caching in the following manner on every occasion by default:

1) compare the saved date of the browser's cached copy with the  create date of the latest server version of the asset - which has very little overhead as only a few bytes move in either direction
2) if latter is more recent than former then automatically replace the cached page with a fresh download.

Ah yes, the simple solution... I wonder why that hasn't been implemented in the constantly changing HTML and de-facto browser standards? :p

Because it's _not_ the best solution.

Checking a date might seem like it's easy - it's just a few bytes, right? :) Except that to make that, you may need to establish a TCP connection, which will need at least three TCP packets, which are all rather more than a few bytes, then make a HTTP or HTTPS request, which for HTTPS will involve negotiating a secure connection first (more bytes). And you have to do this for _every item of content on the page_ - every JS file and CSS file you link to and every image.

Some of this will be mitigated by persistent connections to a webserver you are connected to, but each individual element of the page (image, text, CSS, JS) may all be served by different servers, even if they are on the same domain name. In many cases, static files (images etc) will be served by a different thing (process on a server, different server) than the dynamic content. For example, many web servers run behind a reverse proxy, so users on the Internet only ever see the cache and the reverse proxy only queries the main web server for dynamic content as required. This is because the web server part is usually slower or not optimised for delivering static content to a large number of users simultaneously.

It would probably massively increase server load on the hosting for many popular JS files such as JQuery/Google stuff (reCaptcha and tracking) because instead of each browser downloading the file once, they will get hammered on _every page_ using that library version (probably hundreds of times as much traffic).

In many cases, it's probably more efficient to just download the file than to do the check, then if necessary download the data.

Fundamentally, if your website has been telling people 'this page is OK to cache' (which is the default), and then changing the page, you aren't playing by the right rules :)
 

Stuart France

Active member
You're doing a great job of convincing me the overall design is terribly flawed if all that's the case, and how anyone who is just developing the odd few webpages once in a blue moon is supposed to understand the implications, let alone the people viewing the pages who believe what they're seeing is current content because that's what common sense would imply, is unimaginable.

I wonder how the Internet of Things will ever work?  If you've got some sensor or device that pings a webpage all day long for instance to supply some data, or obtain some data, and most people and organisations on the planet embrace the IoT eventually, and they are all downloading reams and reams of Javascript because it cannot be cached because it is subject to perpetual change in a rapidly moving software development cycle.  And it's all supposed to run on nanopower hardware with a slice of lemon and a couple of pennies.  Then what?  Go on. Tell me.  I dare you.

You know, at least Microsoft got it right when they based shared code (on Windows) on the DLL concept which goes back a lot lot further than Windows.  Where is the DLL in web technology?  Why should more than one page in the same browser instance have to download the same JS library?

The very idea that JS is all mixed up intimately with HTML inside some kind of IT food blender, and PHP kinda looks like JS but isn't (different committees involved no doubt) and PHP is in the client side HTML food processor as well as on the server side... it just beggars belief that it's been allowed to evolve like that.

And don't get me started on Python.  Whose great idea was it that which column stuff is written in matters?  This is a throw-back to Fortran 66.
 

RobinGriffiths

Well-known member
IOT uses it's own lightwight protocol, MQTT, which uses persistent sessions, by which I assume it means that the server holds some key corresponding to the client device, meaning the client doesn't have to create a new connection and authenticate whenever it's calling home. Although I could be totally wrong.
 

aricooperdavis

Moderator
Stuart France said:
And don't get me started on Python.  Whose great idea was it that which column stuff is written in matters?

Python, coming over here and offering a free, open source, powerful, user friendly, and very readable language and putting good hard working semi-colons and curly brackets out of a job. Thank you Brussels!

:tease:
 

andrewmcleod

Well-known member
I am greatly enjoying this newly-split thread :) so many not-quite-right-about-computers things in one post :)

Stuart France said:
You're doing a great job of convincing me the overall design is terribly flawed if all that's the case, and how anyone who is just developing the odd few webpages once in a blue moon is supposed to understand the implications, let alone the people viewing the pages who believe what they're seeing is current content because that's what common sense would imply, is unimaginable.

Easy - if you can't work out what bits of your website need to be cached or not, crack that nut with a sledgehammer and declare all pages 'no-cache' via HTTP headers or page info.
The person developing a few odd webpages once in a blue moon isn't really the target audience, but they also don't run the levels of traffic where it really matters.
Alternatively, just change the URL - add an unnecessary query parameter or whatever (e.g. http://my-url.com/my-page.html?version=234 ) at which point the URL has changed, so the browser reloads (look - an actually helpful suggestion in this post!).

I wonder how the Internet of Things will ever work?  If you've got some sensor or device that pings a webpage all day long for instance to supply some data, or obtain some data, and most people and organisations on the planet embrace the IoT eventually, and they are all downloading reams and reams of Javascript because it cannot be cached because it is subject to perpetual change in a rapidly moving software development cycle.  And it's all supposed to run on nanopower hardware with a slice of lemon and a couple of pennies.  Then what?  Go on. Tell me.  I dare you.

a) Generally any link to JS on a web page will pick a specific version, since that's what the page is tested and developed with, so the JS *is* cached.
b) Secondly, an IoT device wouldn't need to download JS anyway? If you need it to run on nanopower hardware, then it wouldn't be running a browser to run JS at all (it might now be running JS through Node or something, but that's not the same). It just needs to make a HTTP or (as someone suggested) more lightweight protocol request. It doesn't need to do anything to do with the web at all. Just POST some data to an URL...
c) Browser caching is client-side - an IoT device could just not cache (and in fact, as already stated the overhead of a browser is probably unnecessary anyway in simpler devices so by default most libraries _won't_ cache).

You know, at least Microsoft got it right when they based shared code (on Windows) on the DLL concept which goes back a lot lot further than Windows.  Where is the DLL in web technology?  Why should more than one page in the same browser instance have to download the same JS library?

Ah yes, DLL Hell. As a Linux user, I am very aware that the software has been using shared libraries for a long time...

Why should more than one page in the same browser instance have to download the same JS library? Well, they shouldn't - which is why, thanks to caching, they don't (in the case of JQuery again) they won't (as long as the page points to the URL on the JQuery site). Given the browser doesn't know what exists until it reads the link tags in the web page, how is it supposed to know what JS libraries it needs?

Unlike on a PC, where users don't expect to be able to run software before you have installed it (and all relevant dependencies such as shared libraries), users expect to be able to run software (effectively) on the web immediately.

So the browser gets what it needs when it needs it. For example, it might download JQuery version 3.1.2 or whatever, then caches that (referenced against the URL it got it from). If another web page asks for the same URL, it can get it from the cache, instead of re-downloading it. If a webpage hosts its own version of the same file, it will be re-downloaded - but only once for that website (assuming it uses the same URL throughout). That web page might have changed the file...

Remember that security is important here. That JS should only be run on web pages that have specifically linked to that URL. If I just let JS libraries add themselves to some browser catalog, then I could create a malicious version of a library, and it would run on a legitimate site when it tried to use that library as well - at which point I steal all your money from your bank.

The very idea that JS is all mixed up intimately with HTML inside some kind of IT food blender, and PHP kinda looks like JS but isn't (different committees involved no doubt) and PHP is in the client side HTML food processor as well as on the server side... it just beggars belief that it's been allowed to evolve like that.

PHP is an abomination that should be purged from the web with great fury, designed by idiots with no plan... but you'll never download it to your browser as it's purely server-side, so irrelevant to a discussion on caching.

And don't get me started on Python.  Whose great idea was it that which column stuff is written in matters?  This is a throw-back to Fortran 66.

Python is fantastic - designed _with_ a plan with syntax that actually uses words. There are only two kinds of programmers: those who insist on indenting their code correctly (at which point the Python indentation rules are irrelevant), and crap programmers :p

Incidentally, my serious programming career started with Fortran 90/95, which was actually a fantastic language for doing heavy lifting with arrays very, very quickly (and, via Fortran 2003/2008, still is) with good multiprocessing and MPI support.


Essentially, some of the architecture is a little ropy (and PHP is the devil's work - by far the worst programming language I have ever had the misfortune to use), but most of the things about the web are the way they are because a lot of clever people have found good ways of doing it, even if to a budding amateur it doesn't look sensible at the time...
 

TheBitterEnd

Well-known member
Stuart France said:
This is a throw-back to Fortran 66.

Required in Fortran 77 - I'm still supporting some

The compiler has to know when a line ends and when a block ends, so you either use braces, END statements or indentation, modern IDEs are generally very helpful in this regard.
 

aardgoose

Member
In defence of PHP, it is slowly getting better, but it is a long slog/impossible task to undo the damage from it's 'design' flaws.

And of that is the worst thing you have met, have you ever met MUMPS or DSM in its latter incarnation. A language where two spaces are used to indicate a missing/null parameter!
 

Stuart France

Active member
What I meant by the ?food blender? approach to webpage programming is being able to write HTML, CSS, PHP and JS all in the same source code file, such as:
<body>
<div id=?demo?>
<p>hello world from html</p>
<?php
echo <<<_END
Hello world from php
I nearly died of the giggles when I discovered heredoc
and that its _END must be in column 1
_END;
?>
<script> document.write(?hello world from js?) </script>
</div> </body>
Yes I know I?ve not worked CSS into this example.  It is laughable enough as it is.
 

aardgoose

Member
What's wrong with heredocs :) a good old bourne shell feature. Just be thankful Perl never took off as a server side language, an explosion in an O'Reilly bookshelf.

 

andrewmcleod

Well-known member
Stuart France said:
What I meant by the ?food blender? approach to webpage programming is being able to write HTML, CSS, PHP and JS all in the same source code file, such as: [...]

Any decent programming language will let you write complete crap in it. The mixing of content and coding is another reason why PHP is crap; I believe any 'decent' PHP code (an oxymoron) uses separate files for logic and templating or actually uses a formal templating system. The end result of this is still a HTML file that may contain inline CSS and JS, but that's all that will be sent to the end user (plus SVG I guess). In your example, only HTML and JS actually make it to the user once they've passed through the horror that is PHP.

Generally, as you say, it is 'bad form' to include significant amounts of CSS or JS in a HTML file, but it is still useful and necessary to support it. You can change inline CSS either with server-side programming or client-side JS as required, for example. Plus for efficiency, it can get better performance to inline stuff as it reduces the number of connections/requests needed. You can take this to extremes, and inline your JS and your CSS, and encode images either as inline Base64-encoded JPGs/PNGs or whatever or SVG vector images.

 

aardgoose

Member
The mixing of content and coding is another reason why PHP is crap;
That is true of many server side applications, for example JSP (Java Server Pages) and wasn't the invention of PHP, PHP has many faults but it isn't responsible for that one. As you say its more programmer discipline. More at fault is the weak typing much like early MySQL failing to distinguish between NULL and "".

Plus for efficiency, it can get better performance to inline stuff as it reduces the number of connections/requests needed.
There is a trend not to do this as much to hit more modern performance metrics like time to first first interaction etc, and delay loading resources until required. Much of the connection overheads are removed with HTTP/2 when that is available.
 

Stuart France

Active member
Bourne Shell?  Wozzat?

Ah, yes, something I heard of while a postgrad in the 1970s.  It was in a book about Unix that I read back then that I probably still have somewhere along with the one on Fortran66.  All that 'mv' stuff instead of something obvious like RENAME.  And confusingly 'rm' instead of DELETE.  Designed by maths or physics types for themselves.  Just like the C programming language (from the same stable) where some sort of intellectual status obtains from writing down with the smallest number of letters and the highest symbols-to-alphanumerics ratio something that can otherwise be stated far more clearly.

Anyway, happy with PHP.  It's awful but I like it.  I'm starting a new project using PHP during this new lockdown and I'm sure the end product will have a long and happy life and be immune to browser caching because it is server-side.

 

ChrisJC

Well-known member
Stuart France said:
Just like the C programming language (from the same stable) where some sort of intellectual status obtains from writing down with the smallest number of letters and the highest symbols-to-alphanumerics ratio something that can otherwise be stated far more clearly.

Clearly you've never attempted a proper program.

Chris.
 

Stuart France

Active member
Your comment was succinctly ad hominem, Chris.

I made my living out of software for the past 40 years and created a business that employs other people doing the same.  I must have got something right somewhere along the line.

The sad thing is that people are doomed to repeat history unless they appreciate it, and this applies to programming as much as to anything else, including the BCA's present management.  Software development history professors are in short supply.

Universities don't even teach programming now across the sciences.  One of my colleagues from the 1980s has gone back to the university where we both used to work (he is now mid-70s) to teach programming part-time to the undergraduates.  I find this hard to believe but it has happened.

A lot of what I do is embedded code, and recruiting people now who can bring together hardware and software coupled to originality is a real struggle I can tell you.


 

aardgoose

Member
Intercal - now there is a language.  With the wonderful 'COME FROM' statement, for those that consider GOTO harmful.

RENAME/DELETE - sounds like VMS.

Then if you want nightmares try using DataGeneral AOS/VS where the copy command source and destination parameters are in the opposite order to any other OS I have met.  Dangerous.




 

ChrisJC

Well-known member
Stuart France said:
Your comment was succinctly ad hominem, Chris.
Well indeed!, it was your remark that required remarking on!

Stuart France said:
I made my living out of software for the past 40 years and created a business that employs other people doing the same.  I must have got something right somewhere along the line.

The sad thing is that people are doomed to repeat history unless they appreciate it, and this applies to programming as much as to anything else, including the BCA's present management.  Software development history professors are in short supply.

Universities don't even teach programming now across the sciences.  One of my colleagues from the 1980s has gone back to the university where we both used to work (he is now mid-70s) to teach programming part-time to the undergraduates.  I find this hard to believe but it has happened.

A lot of what I do is embedded code, and recruiting people now who can bring together hardware and software coupled to originality is a real struggle I can tell you.

I can agree with all of that though. As a practicing firmware / hardware engineer (although only for 25 years) I too know a little about the travails you mention.

It is interesting to note that 'C' has been around a very long time now, yet it is still the de-facto language for embedded systems. All these fancy higher level things just turn out to be too inefficient / unreliable / programmed by self-taught hackers who just 'fiddle around until it works'.

Chris.
 
Top