Roman Gavrilov
2004-10-20 13:26:19 UTC
I am using a reverse proxy to cache a remote site. The files are mostly
rpms, with varying sizes: 3-30M or more.
Now if you have a number of requests for the same file which is not yet
cached locally, all of these requests will download the requested file
from the remote site. It will slow down the speed of each download as
the throughput of the line will be split among all processes.
So if there are lots of processes to download the same rpm from a remote
site, this can take lots of time to complete a request.
This can bring apache to a state where it can not serve other requests,
as all available processes are already busy.
In my opinion it would be more efficient to let one process complete the
request (using maximum line throughput) and return some busy code to
other identical, simultaneous requests until the file is cached locally.
As anyone run into a similar situation? What solution did you find?
I have created a solution, as I did not find anything else already
existing. I would like to discuss it here and get your opinions.
1. When a request for a file that is not yet in the local cache is
accepted by the proxy, a temporary lock file is created (based on the
proxy's pathname of the file, changed from directory slashes to
underscores).
2. Other processes requesting the same file will check first for the
lock file. If found, they will return a busy code (ie: 408 Request
Timeout), and the request should be sent repeatedly until successful.
Please let me know what you think of this approach, especially if you
have done or seen something similar.
rpms, with varying sizes: 3-30M or more.
Now if you have a number of requests for the same file which is not yet
cached locally, all of these requests will download the requested file
from the remote site. It will slow down the speed of each download as
the throughput of the line will be split among all processes.
So if there are lots of processes to download the same rpm from a remote
site, this can take lots of time to complete a request.
This can bring apache to a state where it can not serve other requests,
as all available processes are already busy.
In my opinion it would be more efficient to let one process complete the
request (using maximum line throughput) and return some busy code to
other identical, simultaneous requests until the file is cached locally.
As anyone run into a similar situation? What solution did you find?
I have created a solution, as I did not find anything else already
existing. I would like to discuss it here and get your opinions.
1. When a request for a file that is not yet in the local cache is
accepted by the proxy, a temporary lock file is created (based on the
proxy's pathname of the file, changed from directory slashes to
underscores).
2. Other processes requesting the same file will check first for the
lock file. If found, they will return a busy code (ie: 408 Request
Timeout), and the request should be sent repeatedly until successful.
Please let me know what you think of this approach, especially if you
have done or seen something similar.
--
-------------------------------------------------------------
I am root. If you see me laughing... You better have a backup!
-------------------------------------------------------------
I am root. If you see me laughing... You better have a backup!