Discussion:
Cache Module that Caches EVERYTHING
Eli Marmor
2003-11-12 22:55:46 UTC
Permalink
Hi,

I'm curious to know if there is any module that does the following:

Caches EVERYTHING, including dynamic pages and GET/POST requests with
parameters (i.e. if http://domain.com/cgi.exe?key=valA returns fooA and
http://domain.com/cgi.exe?key=valB returns fooB, then the next call to
http://domain.com/cgi.exe?key=valA will return fooA without even
accessing the backend server and http://domain.com/cgi.exe?key=valB
will return fooB without accessing the backend server).

In other words, I'm looking for a special version of mod_cache that
handles situations of off-line browsing.

Is there anything?

Nick? Graham? Anybody?

Thanks,
--
Eli Marmor
***@netmask.it
CTO, Founder
Netmask (El-Mar) Internet Technologies Ltd.
__________________________________________________________
Tel.: +972-9-766-1020 8 Yad-Harutzim St.
Fax.: +972-9-766-1314 P.O.B. 7004
Mobile: +972-50-23-7338 Kfar-Saba 44641, Israel
Ian Holsman
2003-11-12 23:20:55 UTC
Permalink
I *believe* that the cache-disk/memcache in apache 2
could do this.
you would need to override the key generation (via the optional hook)
to make the queryargs part of the name.
Post by Eli Marmor
Hi,
Caches EVERYTHING, including dynamic pages and GET/POST requests with
parameters (i.e. if http://domain.com/cgi.exe?key=valA returns fooA and
http://domain.com/cgi.exe?key=valB returns fooB, then the next call to
http://domain.com/cgi.exe?key=valA will return fooA without even
accessing the backend server and http://domain.com/cgi.exe?key=valB
will return fooB without accessing the backend server).
In other words, I'm looking for a special version of mod_cache that
handles situations of off-line browsing.
Is there anything?
Nick? Graham? Anybody?
Thanks,
--
Eli Marmor
CTO, Founder
Netmask (El-Mar) Internet Technologies Ltd.
__________________________________________________________
Tel.: +972-9-766-1020 8 Yad-Harutzim St.
Fax.: +972-9-766-1314 P.O.B. 7004
Mobile: +972-50-23-7338 Kfar-Saba 44641, Israel
--
Ian Holsman
Director
Network Management Systems
CNET Networks
PH: (61) 3-9857-3742 (Australia)/ 415-344-2608 (USA)
Eli Marmor
2003-11-12 23:46:03 UTC
Permalink
Post by Ian Holsman
I *believe* that the cache-disk/memcache in apache 2
could do this.
you would need to override the key generation (via the optional hook)
to make the queryargs part of the name.
Wow.

My plan was that if there is not such a module, I would patch mod_cache
and/or its sub parts (mem/disk/etc.) to do exactly what you wrote.

But I haven't thought about this hook. Overriding it looks simpler and
more elegant.

Your answer is so simple but so brilliant...

And it's the third SUCCESSIVE time that a question of me is answered by
you (maybe it's finally the time to take a look at your wish list ;-)


In any case, if anybody else knows a module which does EXACTLY what I
need, he is still welcome to tell about it.

Thanks,
--
Eli Marmor
***@netmask.it
CTO, Founder
Netmask (El-Mar) Internet Technologies Ltd.
__________________________________________________________
Tel.: +972-9-766-1020 8 Yad-Harutzim St.
Fax.: +972-9-766-1314 P.O.B. 7004
Mobile: +972-50-23-7338 Kfar-Saba 44641, Israel
Ian Holsman
2003-11-13 00:03:08 UTC
Permalink
this is the hook in question.
http://lxr.webperf.org/source.cgi/modules/experimental/mod_cache.h#338

there may be some other things you need to do to the cache to make it
actually want to cache the content.
this just makes it differentiate based on query args.
Post by Eli Marmor
Post by Ian Holsman
I *believe* that the cache-disk/memcache in apache 2
could do this.
you would need to override the key generation (via the optional hook)
to make the queryargs part of the name.
Wow.
My plan was that if there is not such a module, I would patch mod_cache
and/or its sub parts (mem/disk/etc.) to do exactly what you wrote.
But I haven't thought about this hook. Overriding it looks simpler and
more elegant.
Your answer is so simple but so brilliant...
And it's the third SUCCESSIVE time that a question of me is answered by
you (maybe it's finally the time to take a look at your wish list ;-)
In any case, if anybody else knows a module which does EXACTLY what I
need, he is still welcome to tell about it.
Thanks,
--
Eli Marmor
CTO, Founder
Netmask (El-Mar) Internet Technologies Ltd.
__________________________________________________________
Tel.: +972-9-766-1020 8 Yad-Harutzim St.
Fax.: +972-9-766-1314 P.O.B. 7004
Mobile: +972-50-23-7338 Kfar-Saba 44641, Israel
--
Ian Holsman
Director
Network Management Systems
CNET Networks
PH: (61) 3-9857-3742 (Australia)/ 415-344-2608 (USA)
Eli Marmor
2003-11-13 00:08:49 UTC
Permalink
Post by Ian Holsman
this is the hook in question.
http://lxr.webperf.org/source.cgi/modules/experimental/mod_cache.h#338
there may be some other things you need to do to the cache to make it
actually want to cache the content.
this just makes it differentiate based on query args.
Ha...
I just found the cache_generate_key hook a minute before reading your
message... (next time I should wait for your messages before digging
into Apache's source? ;-)

It seems that there is already a partial support for arguments in the
default function.

The main fix is going to be, as you noted, "convincing" mod_cache to
cache EVERYTHING, no matter if it's static, dynamic, uncacheable, etc.

Maybe I'll submit something (if it will be useful) to the develops
list.

Thanks again,
--
Eli Marmor
***@netmask.it
CTO, Founder
Netmask (El-Mar) Internet Technologies Ltd.
__________________________________________________________
Tel.: +972-9-766-1020 8 Yad-Harutzim St.
Fax.: +972-9-766-1314 P.O.B. 7004
Mobile: +972-50-23-7338 Kfar-Saba 44641, Israel
Nick Kew
2003-11-13 03:29:25 UTC
Permalink
Ian seems to have written off-list; neither my inbox nor the web
archive at marc.theaimsgroup.com has it. I am doing some work
that involves smart cacheing, but I haven't figured out how I
could use mod_cache for it, so this discussion is something I'd
like to see!

(Actually I can see a basic approach to using mod_cache for my app:
an apr_http_client implementation that could hook in things like
mod_cache is on the wishlist, but I've not tried to tackle it).
--
Nick Kew
Eli Marmor
2003-11-14 09:03:55 UTC
Permalink
Post by Nick Kew
Ian seems to have written off-list; neither my inbox nor the web
archive at marc.theaimsgroup.com has it.
Thanks for this note.

Ian wrote TO the lists.
I just replied, without re-adding the lists to the "To:" field.
The point is that he is subscribed only to one of the two lists that
were included in the "To:" field, so the readers of that list
(including the archives) received his message immediately, while the
other list received after a delay (or haven't received at all).

Everybody is welcome to read messages that he missed in the archive
of the other list. In my messages, I'll continue to quote everything.
Post by Nick Kew
I am doing some work
that involves smart cacheing, but I haven't figured out how I
could use mod_cache for it, so this discussion is something I'd
like to see!
an apr_http_client implementation that could hook in things like
mod_cache is on the wishlist, but I've not tried to tackle it).
It's simple:

First, set mod_cache directives to the most liberal value:

CacheIgnoreCacheControl On
CacheIgnoreNoLastMod On
CacheExpiryCheck Off
CacheMaxStreamingBuffer 1000000

(the last one might become needless in latest versions of Apache).

Next, patch the function "cache_in_filter()" so it will cache more
things. It mainly involves commenting out "if" conditions like checks
for "no-store", "private", etc. There is also a check for the existence
of "args" when there is no "Expiration"; It should be commented out too
(by the way: most of the things that I mention had to be done by
current directives, at least according to the DOCs. So either the code
should be fixed, or the DOCs should be "fixed"...).

After everything is working, new directives should be added, to control
when that checks will be done and when not (at least for the "private"
check; The rest should be controlled by existing directives, as I
mentioned above) (I used the words "comment out" above, only as a quick
and dirty way to cause mod_cache to work in different way, but of
course after it's working it should be configured using directives, so
the original behavior will remain the default one, and the behavior of
offline/mirror will be activated by changing the default values of the
directives).

Then you reach the most demanding task: POST requests. Contrary to GET
requests, which are cached correctly (after making the mentioned above
patches), the arguments of POST are contained it the body, which is not
accessible during the beginning of the caching. The only way to resolve
it, is to write an input filter (which will parse the arguments or just
save them for future concatanation to the key after the "?").

Alternatively, it is possible to use apreq-2 for this purpose (but it
will require you to include apreq-2 in your future builds of Apache. I
would love to see it included as a standard APR module...).

Support for POST requests will not be complete without changing the
check for M_GET method in cache_url_handler(), which should check for
M_POST too (again, this should be controlled by directives and not done
automatically for everybody).

The last thing is to force the next accesses to the cached resource to
use it, although the stuff believes that it is not cacheable. And the
same hack for POST, should be done here too (I'm not sure if it is
needed to distinguish between GET and POST; we may cache both in the
same syntax/format, and ignore rare cases when the same request with
the same parameters but a different method is used. And even in such
rare cases, I'm not sure it may hurt anything if we refer to both
requests as the same one).


Now comes the biggest question:

Isn't it better to find an existing module that already does it?
I know there are such modules for squid, why aren't for Apache2?
Or are there and I am not aware of them?
--
Eli Marmor
***@netmask.it
CTO, Founder
Netmask (El-Mar) Internet Technologies Ltd.
__________________________________________________________
Tel.: +972-9-766-1020 8 Yad-Harutzim St.
Fax.: +972-9-766-1314 P.O.B. 7004
Mobile: +972-50-23-7338 Kfar-Saba 44641, Israel
Mike Collins
2003-11-13 14:01:07 UTC
Permalink
Check out oscache at www.opensymphony.com

It has a servlet filter that does what you are requestng.


----- Original Message -----
From: "Eli Marmor" <***@netmask.it>
To: <apache-***@covalent.net>; <modproxy-***@apache.org>
Sent: Wednesday, November 12, 2003 5:55 PM
Subject: Cache Module that Caches EVERYTHING


Hi,

I'm curious to know if there is any module that does the following:

Caches EVERYTHING, including dynamic pages and GET/POST requests with
parameters (i.e. if http://domain.com/cgi.exe?key=valA returns fooA and
http://domain.com/cgi.exe?key=valB returns fooB, then the next call to
http://domain.com/cgi.exe?key=valA will return fooA without even
accessing the backend server and http://domain.com/cgi.exe?key=valB
will return fooB without accessing the backend server).

In other words, I'm looking for a special version of mod_cache that
handles situations of off-line browsing.

Is there anything?

Nick? Graham? Anybody?

Thanks,
--
Eli Marmor
***@netmask.it
CTO, Founder
Netmask (El-Mar) Internet Technologies Ltd.
__________________________________________________________
Tel.: +972-9-766-1020 8 Yad-Harutzim St.
Fax.: +972-9-766-1314 P.O.B. 7004
Mobile: +972-50-23-7338 Kfar-Saba 44641, Israel
Loading...