mod_proxy reverse proxy optimization/performance question
Roman Gavrilov
2004-10-20 13:26:19 UTC
I am using a reverse proxy to cache a remote site. The files are mostly
rpms, with varying sizes: 3-30M or more.
Now if you have a number of requests for the same file which is not yet
cached locally, all of these requests will download the requested file
from the remote site. It will slow down the speed of each download as
the throughput of the line will be split among all processes.
So if there are lots of processes to download the same rpm from a remote
site, this can take lots of time to complete a request.
This can bring apache to a state where it can not serve other requests,
as all available processes are already busy.

In my opinion it would be more efficient to let one process complete the
request (using maximum line throughput) and return some busy code to
other identical, simultaneous requests until the file is cached locally.
As anyone run into a similar situation? What solution did you find?

I have created a solution, as I did not find anything else already
existing. I would like to discuss it here and get your opinions.
1. When a request for a file that is not yet in the local cache is
accepted by the proxy, a temporary lock file is created (based on the
proxy's pathname of the file, changed from directory slashes to
2. Other processes requesting the same file will check first for the
lock file. If found, they will return a busy code (ie: 408 Request
Timeout), and the request should be sent repeatedly until successful.

Please let me know what you think of this approach, especially if you
have done or seen something similar.
I am root. If you see me laughing... You better have a backup!
Graham Leggett
2004-10-20 13:46:49 UTC
Post by Roman Gavrilov
I am using a reverse proxy to cache a remote site. The files are mostly
rpms, with varying sizes: 3-30M or more.
Now if you have a number of requests for the same file which is not yet
cached locally, all of these requests will download the requested file
from the remote site. It will slow down the speed of each download as
the throughput of the line will be split among all processes.
So if there are lots of processes to download the same rpm from a remote
site, this can take lots of time to complete a request.
This can bring apache to a state where it can not serve other requests,
as all available processes are already busy.
This is a mod_cache issue rather than a proxy issue, the best place to
discuss something like this is ***@httpd.apache.org. (mod_cache was
separated from mod_proxy in httpd v2.0, this fix never went into httpd
v1.3 mod_proxy because it was a serious architecture change)

When mod_cache was separated from mod_proxy in httpd v2.0, one of the
problems the new cache code was supposed to solve was this exact problem
- whether this problem stayed solved in all the development to mod_cache
that has been done in the last while is a good question.

