Discussion:
modproxy load balancer
Bill Stoddard
2003-06-13 14:29:21 UTC
Permalink
Ping to all list citizens (listizens?) ...
Who would be interested in seeing some load balancing function being put
into mod_proxy? Anyone given any though on what you would like to see
or maybe even have a design proposal you'd like to discuss?

My short requirements list:
- selectable load balancing algorithm: Round robin, LRU, response time,
url driven, session affinity, ?
- automatic detection of backend server failure and removal of the
failed server from the load balancing routing tables (forever? for a
period of time? other?)
- connection pooling using HTTP keep-alive (this is a no brainer since
it is a simple extension of what browsers already do, but it needs to be
designed in from the start)
- must be effective with multiple child processes, each child must make
routing decisions globally based on stats maintained in a shared memory
segment

To do this properly, I would think we need some new config directives.
Perhaps a new container directive to define a group of backend servers,
another container directive to define URLs served by a particular group
of backend servers. Need some way to bind a url group to a server group.

Bill
Juan Rivera
2003-06-13 14:32:06 UTC
Permalink
Bill,

I would definitively be interested in these features and help out.

Juan

-----Original Message-----
From: Bill Stoddard [mailto:***@wstoddard.com]
Sent: Friday, June 13, 2003 10:29 AM
To: modproxy-***@apache.org
Subject: modproxy load balancer

Ping to all list citizens (listizens?) ...
Who would be interested in seeing some load balancing function being put
into mod_proxy? Anyone given any though on what you would like to see
or maybe even have a design proposal you'd like to discuss?

My short requirements list:
- selectable load balancing algorithm: Round robin, LRU, response time,
url driven, session affinity, ?
- automatic detection of backend server failure and removal of the
failed server from the load balancing routing tables (forever? for a
period of time? other?)
- connection pooling using HTTP keep-alive (this is a no brainer since
it is a simple extension of what browsers already do, but it needs to be
designed in from the start)
- must be effective with multiple child processes, each child must make
routing decisions globally based on stats maintained in a shared memory
segment

To do this properly, I would think we need some new config directives.
Perhaps a new container directive to define a group of backend servers,
another container directive to define URLs served by a particular group
of backend servers. Need some way to bind a url group to a server group.

Bill
Eli Marmor
2003-06-13 14:48:25 UTC
Permalink
Post by Bill Stoddard
Who would be interested in seeing some load balancing function being put
into mod_proxy?
Several millions users?
Many thousands of webmasters?
Post by Bill Stoddard
- selectable load balancing algorithm: Round robin, LRU, response time,
url driven, session affinity, ?
Before providing multi-choicing, a basic algorithm (maybe based on one
on the above) will be great too.
Post by Bill Stoddard
- automatic detection of backend server failure and removal of the
failed server from the load balancing routing tables (forever? for a
period of time? other?)
Hmmm...
You continue to sample it, once per a time (1 minute? more?)
You can also add a directive to define the time resolution, or the
action to be taken.
But why not just "steal" ideas from the LVS project?
They already passed all these steps (for another layer) and chose/
developed solutions.
Post by Bill Stoddard
Perhaps a new container directive to define a group of backend servers,
another container directive to define URLs served by a particular group
of backend servers. Need some way to bind a url group to a server group.
I would re-use the syntax of mod_rewrite, including its parser and
other stuff from that module.
It enables you a bunch of other goodies, such as choosing the exact way
to bring the stuff (proxy/redirect), etc.
--
Eli Marmor
***@netmask.it
CTO, Founder
Netmask (El-Mar) Internet Technologies Ltd.
__________________________________________________________
Tel.: +972-9-766-1020 8 Yad-Harutzim St.
Fax.: +972-9-766-1314 P.O.B. 7004
Mobile: +972-50-23-7338 Kfar-Saba 44641, Israel
George Schlossnagle
2003-06-13 15:03:11 UTC
Permalink
Post by Eli Marmor
Post by Bill Stoddard
Who would be interested in seeing some load balancing function being put
into mod_proxy?
Several millions users?
Many thousands of webmasters?
Isn't there ongoing discussion about incorporating mod_backhand into
mod_proxy for this?


George
Bill Stoddard
2003-06-13 15:39:04 UTC
Permalink
Post by George Schlossnagle
Isn't there ongoing discussion about incorporating mod_backhand into
mod_proxy for this?
George
Hi George,
Thanks for the reminder. I can't say that I have been paying much
attention lately, but this discussion goes back quite some time.

I am digging into the doc on mod_backhand but still don't quite grok
how backhand works (I am still reading). It appears to rely on all the
servers in the cluster being 'backhand aware' (for lack of a better
term) and that the servers communicate their status to each other via
UDP. Requiring all the servers in the cluster to be backhand aware
severly limits the usefulness of mod_backhand. There would also be
potential security issues with the ports backhand aware servers use to
communicate with each other (none that could not be fixed if they even
exist at all). Am I missing something important here?

Bill
Chuck Murcko
2003-06-13 17:11:15 UTC
Permalink
On Friday, Jun 13, 2003, at 08:03 America/Phoenix, George Schlossnagle
Post by George Schlossnagle
Post by Eli Marmor
Post by Bill Stoddard
Who would be interested in seeing some load balancing function being put
into mod_proxy?
Several millions users?
Many thousands of webmasters?
Isn't there ongoing discussion about incorporating mod_backhand into
mod_proxy for this?
I presumed there was some discussion between Theo and Graham at
ApacheCon about this. The copyrights prevent us from just dropping
mod_backhand into httpd. However, I have a patch that adds persistent
connections to mod_proxy that I am packaging and plan to submit over
the weekend, after I finish some current day job work.

Chuck
Graham Leggett
2003-06-17 16:00:36 UTC
Permalink
Post by George Schlossnagle
Isn't there ongoing discussion about incorporating mod_backhand into
mod_proxy for this?
This I think would be the quickest solution.

Proxy contains a placeholder (which should be replaced with a hook) that
says "I have a list of IP addresses, decide in what order I should try
these addresses here".

My understanding of backhand is that it answers the above question - in
theory we could pull the code in by hooking it in.

Regards,
Graham
--
-----------------------------------------
***@sharp.fm "There's a moon
over Bourbon Street
tonight..."
Theo E. Schlossnagle
2003-06-17 16:07:54 UTC
Permalink
Post by Graham Leggett
Post by George Schlossnagle
Isn't there ongoing discussion about incorporating mod_backhand into
mod_proxy for this?
This I think would be the quickest solution.
Proxy contains a placeholder (which should be replaced with a hook) that
says "I have a list of IP addresses, decide in what order I should try
these addresses here".
My understanding of backhand is that it answers the above question - in
theory we could pull the code in by hooking it in.
mod_backhand can also provide the list of IPs -- in fact, it would be best
that way.
--
Theo Schlossnagle
Principal Consultant
OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
Phone: +1 410 872 4910 x201 Fax: +1 410 872 4911
1024D/82844984/95FD 30F1 489E 4613 F22E 491A 7E88 364C 8284 4984
2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7
Graham Leggett
2003-06-17 19:06:56 UTC
Permalink
Post by Theo E. Schlossnagle
Post by Graham Leggett
Proxy contains a placeholder (which should be replaced with a hook)
that says "I have a list of IP addresses, decide in what order I
should try these addresses here".
My understanding of backhand is that it answers the above question -
in theory we could pull the code in by hooking it in.
mod_backhand can also provide the list of IPs -- in fact, it would be
best that way.
To rephrase it, mod_proxy should give an URL to one or more backend
modules (most likely backhand), which should return a list of IP
addresses saying "try these in this order".

The backend module might do simple DNS round robin in its simplest form,
going all the way up to all the functionality of backhand.

Regards,
Graham
--
-----------------------------------------
***@sharp.fm "There's a moon
over Bourbon Street
tonight..."
Bill Stoddard
2003-06-18 02:25:06 UTC
Permalink
Post by Graham Leggett
Post by Theo E. Schlossnagle
Post by Graham Leggett
Proxy contains a placeholder (which should be replaced with a hook)
that says "I have a list of IP addresses, decide in what order I
should try these addresses here".
My understanding of backhand is that it answers the above question -
in theory we could pull the code in by hooking it in.
mod_backhand can also provide the list of IPs -- in fact, it would be
best that way.
To rephrase it, mod_proxy should give an URL to one or more backend
modules (most likely backhand), which should return a list of IP
addresses saying "try these in this order".
The backend module might do simple DNS
round robin in its simplest form, going all the way up to all the
functionality of backhand.
Thinking out loud..... Should this be a hook or an optional function? A
hook could be useful for iterating across multiple load balancing
modules, routing requests for different urls using different algorithms;
would this be a common configuration? The load balance module would
also need to be told when the request was complete (it needs to keep
track of how many active connections there are to each backend machine)
and when an ip address was unsuccessfully tried (so that ip address can
be taken out of the list of candidates). The former can be done by
registering a cleanup against the request pool. The latter could be done
with a callback function, optional function or hook back into the load
balance module.

Bill
Graham Leggett
2003-06-18 08:25:49 UTC
Permalink
Post by Bill Stoddard
Thinking out loud..... Should this be a hook or an optional function? A
hook could be useful for iterating across multiple load balancing
modules, routing requests for different urls using different algorithms;
would this be a common configuration?
If given many options, I would want the ability to select more than one.
Even though in 90% of the cases the default round robin may suffice, I
would probably be annoyed if the last 10% of the time I needed the
ability and it was not available to me.
Post by Bill Stoddard
The load balance module would
also need to be told when the request was complete (it needs to keep
track of how many active connections there are to each backend machine)
and when an ip address was unsuccessfully tried (so that ip address can
be taken out of the list of candidates). The former can be done by
registering a cleanup against the request pool. The latter could be done
with a callback function, optional function or hook back into the load
balance module.
All of these can be achieved by registering hooks.

For example, a simple DNS round robin module would hook into the "give
me an URL I'll give you some IP addresses" bit, but would leave the
other hooks alone.

A more advanced backhand module might do the URL to IP translation, then
would hook into the end of the request to gather stats about that
request for it's own purposes.

I would also like to specify the order in which the modules are tried
somehow, in the same way that mod_cache chooses either memory or disk
for its cache.

Regards,
Graham
--
-----------------------------------------
***@sharp.fm "There's a moon
over Bourbon Street
tonight..."
Bill Stoddard
2003-06-25 17:02:05 UTC
Permalink
Post by Theo E. Schlossnagle
Post by Graham Leggett
Post by George Schlossnagle
Isn't there ongoing discussion about incorporating mod_backhand into
mod_proxy for this?
This I think would be the quickest solution.
Proxy contains a placeholder (which should be replaced with a hook)
that says "I have a list of IP addresses, decide in what order I
should try these addresses here".
My understanding of backhand is that it answers the above question -
in theory we could pull the code in by hooking it in.
mod_backhand can also provide the list of IPs -- in fact, it would be
best that way.
I'll start implementing some of the hook calls in mod_proxy. Once I get
the hooks in place, you should be able to write a backhand load balancer
module that declares interest in these hooks. Hope to start working on
the mod_proxy mods within the next few days.

Bill
Graham Leggett
2003-06-26 09:10:25 UTC
Permalink
Post by Bill Stoddard
I'll start implementing some of the hook calls in mod_proxy. Once I get
the hooks in place, you should be able to write a backhand load balancer
module that declares interest in these hooks. Hope to start working on
the mod_proxy mods within the next few days.
In line 433 of mod_proxy.c, there is the code that (IMHO) needs to be
changed.

First, if we are configured to connect to one or more further downstream
proxies, we try to connect to each one in the order they are specified
in the config file. If we are configured to connect direct (the usual
case), then we try that direct connection. The result is a connection to
some remote server.

In order to complete the request, a function:

proxy_run_scheme_handler(r, conf, url, ents[i].hostname, ents[i].port);

is run. This either connects to hostname and port, and asks for URL
(forward proxy), or if hostname and port are NULL, it connects to the
host in URL (reverse proxy).

I think the hook should go inside the proxy_run_scheme_handler()
function, and the hooked code should accept an URL (or a hostname and
port) and convert it into a connection, which is passed back to the rest
of the code path.

The hooked module can then do what it likes with connection failure:
retry with a round robin connection, etc until it is happy (or unhappy).

The existing code can be pulled out of what's there now, and moved into
a simple module called "proxy_dns" (or something).

One other thing that must be looked at is module ordering:

Take for example the case where you want to support "sticky"
connections. You would probably want to watch either a cookie or a
request variable called JSESSIONID, and make sure that all requests with
that session id go to that server.

But what happens if the sticky server is down? The module would say
DECLINED and hand it on to the next module, which might be proxy_dns,
whatever.

We need some way though of telling proxy that proxy_sticky comes before
proxy_dns.

Perhaps we can have a directive the same as in mod_cache, which
specifies the order in which the backend modules are tried.

Thoughts?

Regards,
Graham
--
-----------------------------------------
***@sharp.fm "There's a moon
over Bourbon Street
tonight..."
Theo E. Schlossnagle
2003-06-26 14:30:25 UTC
Permalink
Post by Graham Leggett
proxy_run_scheme_handler(r, conf, url, ents[i].hostname, ents[i].port);
is run. This either connects to hostname and port, and asks for URL
(forward proxy), or if hostname and port are NULL, it connects to the
host in URL (reverse proxy).
I am not sure how you invision the hooks being loaded at runtime. If they are
their own modules, and just place themselves in the mod_proxy chain, then I
can piggyback on the the builtin module inititalization functions. Otherwise,
I need someway to initialize my module.
Post by Graham Leggett
I think the hook should go inside the proxy_run_scheme_handler()
function, and the hooked code should accept an URL (or a hostname and
port) and convert it into a connection, which is passed back to the rest
of the code path.
I don't think the hook should be responsible for making the connection. I
think the hook should be solely responsible for listing, in order of
preference, where connections should be established. In perl syntax:

[
{ 'protocol' => 'http',
'IP' => '10.2.3.4',
'port' => '80' },
{ 'protocol' => 'http',
'IP' => '10.2.3.8',
'port' => '8080' },
]

then mod_proxy should be responsible for taking that list and making real and
usable connections out of them.
Post by Graham Leggett
retry with a round robin connection, etc until it is happy (or unhappy).
The existing code can be pulled out of what's there now, and moved into
a simple module called "proxy_dns" (or something).
Take for example the case where you want to support "sticky"
connections. You would probably want to watch either a cookie or a
request variable called JSESSIONID, and make sure that all requests with
that session id go to that server.
mod_backhand can do that now because it has access to the whole request_rec
structures in 1.3.x. So, similar access would be very useful.

My approach to mod_backhand 2.0 was to:
(a) take all systems code, shared segments, etc. and place them in a
standalone process
(b) throw out 80% of the code and use mod_proxy :-)
(c) rewrite the candidacy functions for the new API.
Post by Graham Leggett
But what happens if the sticky server is down? The module would say
DECLINED and hand it on to the next module, which might be proxy_dns,
whatever.
Perfect. Also, it is important for each "module" or hook to be able to see
the complete list that resulted from the previous hook. See this for a clear
idea of what I mean:

http://www.backhand.org/ApacheCon2000/EU/img24.htm
http://www.backhand.org/ApacheCon2000/EU/img25.htm
http://www.backhand.org/ApacheCon2000/EU/img26.htm

Obviously the API here would need to change a tad, but if the ServerSlot
structure contained all the information (IP:port) that mod_proxy needed to
establish a connection, the API actually would pop right in.

Having this ability will allow someone to mix and match modules to recall
achieve complex proxy decision making that matches their needs. And if it
doesn't do _exactly_ what they want, they can write a very small link to put
in the chain instead of writing a big hook that reinvents a lot of working,
tested code.
Post by Graham Leggett
We need some way though of telling proxy that proxy_sticky comes before
proxy_dns.
Perhaps we can have a directive the same as in mod_cache, which
specifies the order in which the backend modules are tried.
This ordering is important for mod_backhand integration. Most of the
candidacy functions (that reorder and augment the "list of hosts") are very
simply and complicated balancing logic is achieved by cascading them -- and
order matters :-)
--
Theo Schlossnagle
Principal Consultant
OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
Phone: +1 410 872 4910 x201 Fax: +1 410 872 4911
1024D/82844984/95FD 30F1 489E 4613 F22E 491A 7E88 364C 8284 4984
2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7
Graham Leggett
2003-06-26 15:30:56 UTC
Permalink
Post by Theo E. Schlossnagle
I am not sure how you invision the hooks being loaded at runtime. If
they are their own modules, and just place themselves in the mod_proxy
chain, then I can piggyback on the the builtin module inititalization
functions. Otherwise, I need someway to initialize my module.
Use the exact same model as is used now for proxy_http, proxy_ftp and
proxy_connect. All three of these modules depend on hooks defined inside
mod_proxy.

mod_dns, mod_sticky, mod_backhand, etc would simply be the 4th, 5th and
6th module dependant on mod_proxy.
Post by Theo E. Schlossnagle
I don't think the hook should be responsible for making the connection.
I think the hook should be solely responsible for listing, in order of
[
{ 'protocol' => 'http',
'IP' => '10.2.3.4',
'port' => '80' },
{ 'protocol' => 'http',
'IP' => '10.2.3.8',
'port' => '8080' },
]
then mod_proxy should be responsible for taking that list and making
real and usable connections out of them.
Ok... my thinking was that it would simplify the notification to the
backend of connection success or failure, but then doing it your way
simplifies the backend module.

What we could do is have two hooks - the first gets given an
URL/hostname/port, and returns a list of IPs to try.

Proxy then tries those IPs in turn.

Then a second hook is run saying "oh by the way, that IP address you
gave me is down with status whatever, or it worked fine thanks".

If a connection failed, all the backend modules get to find out and can
blacklist that server, whatever. If the connection succeeded, the time
difference between the first and second hook would be the total time of
connection, which could be used for loading stats.

Regards,
Graham
--
-----------------------------------------
***@sharp.fm "There's a moon
over Bourbon Street
tonight..."
Bill Stoddard
2003-06-27 19:08:44 UTC
Permalink
Post by Graham Leggett
Post by Theo E. Schlossnagle
I am not sure how you invision the hooks being loaded at runtime. If
they are their own modules, and just place themselves in the
mod_proxy chain, then I can piggyback on the the builtin module
inititalization functions. Otherwise, I need someway to initialize
my module.
Use the exact same model as is used now for proxy_http, proxy_ftp and
proxy_connect. All three of these modules depend on hooks defined
inside mod_proxy.
mod_dns, mod_sticky, mod_backhand, etc would simply be the 4th, 5th
and 6th module dependant on mod_proxy.
Post by Theo E. Schlossnagle
I don't think the hook should be responsible for making the
connection. I think the hook should be solely responsible for
listing, in order of preference, where connections should be
[
{ 'protocol' => 'http',
'IP' => '10.2.3.4',
'port' => '80' },
{ 'protocol' => 'http',
'IP' => '10.2.3.8',
'port' => '8080' },
]
then mod_proxy should be responsible for taking that list and making
real and usable connections out of them.
Ok... my thinking was that it would simplify the notification to the
backend of connection success or failure, but then doing it your way
simplifies the backend module.
What we could do is have two hooks - the first gets given an
URL/hostname/port, and returns a list of IPs to try.
Proxy then tries those IPs in turn.
Then a second hook is run saying "oh by the way, that IP address you
gave me is down with status whatever, or it worked fine thanks".
If a connection failed, all the backend modules get to find out and
can blacklist that server, whatever. If the connection succeeded, the
time difference between the first and second hook would be the total
time of connection, which could be used for loading stats.
Yes, this is exactly what I was thinking. Thanks for explaining your
reason for making the hook do the connection.

Bill
Bill Stoddard
2003-07-08 14:39:39 UTC
Permalink
Perhaps I am being silly, but do we need to standardize on a definition
of 'downstream' and 'upstream'? Here is a comment from proxy_http.c:

/* Note: Memory pool allocation.
* A downstream keepalive connection is always connected to the existence
* (or not) of an upstream keepalive connection. If this is not done then
* load balancing against multiple backend servers breaks (one backend
* server ends up taking 100% of the load), and the risk is run of
* downstream keepalive connections being kept open unnecessarily. This
* keeps webservers busy and ties up resources.
*
* As a result, we allocate all sockets out of the upstream connection
* pool, and when we want to reuse a socket, we check first whether the
* connection ID of the current upstream connection is the same as that
* of the connection when the socket was opened.
*/

If I am reading this correctly, my polarity must be different that the
author of this comment. The way I look at it, most bytes flow from the
webserver to the web client. Analogous to water, the bytes flow
'downstream' from the server to the client. Now a proxy maintains two
connections, one to the client and one to the webserver. I would call
the connection from the client to the proxy the 'downstream' connection
and the connection from the proxy to the server the 'upstream'
connection. What say you?

Bill
Graham Leggett
2003-07-10 08:36:37 UTC
Permalink
Post by Bill Stoddard
If I am reading this correctly, my polarity must be different that the
author of this comment. The way I look at it, most bytes flow from the
webserver to the web client.
But clients make connections to webservers, webservers do not make
connection to clients. The "stream" represents the sense of the data
connection (client down to webserver, down to backend), not the byte flow.

Regards,
Graham
--
-----------------------------------------
***@sharp.fm "There's a moon
over Bourbon Street
tonight..."
Mathias Herberts
2003-06-13 15:02:06 UTC
Permalink
How about mod_backhand ?

www.backhand.org
--
-- Informatique du Credit Mutuel ---- Reseaux et Systemes Distribues
-- 32 rue Mirabeau -- Le Relecq-Kerhuon -- 29808 Brest Cedex 9, FRANCE
-- Tel +33298004653 - Fax +33298284005 - Mail ***@gicm.fr
-- Key Fingerprint: 8778 D2FD 3B4A 6B33 10AB F503 63D0 ADAE 9112 03E4
Federico Mennite
2003-06-13 15:42:31 UTC
Permalink
Hi Bill,
Post by Bill Stoddard
Ping to all list citizens (listizens?) ...
Who would be interested in seeing some load balancing function being
put into mod_proxy? Anyone given any though on what you would like to
see or maybe even have a design proposal you'd like to discuss?
I'm definetively interested.
I managed to configure mod_proxy in combination with mod_rewrite
(internal rewrite do mod_proxy) to do some load balancing.
Using mod_rewrite's map feature, I was able to feed an home made program
(a script) with data gathered from the incoming connections. The program
returns to mod_rewrite the ip numbers which mod_proxy should use for
the backend connections.
The only requirement that I'm missing to put this on a productive
environment, is the ability to feed the external program through a unix
and/or network socket instead of its standard input and output.
This can probably be done without too much rocket science, but I didn't
have time to try to implement something yet.
Post by Bill Stoddard
- selectable load balancing algorithm: Round robin, LRU, response
time, url driven, session affinity, ?
- automatic detection of backend server failure and removal of the
failed server from the load balancing routing tables (forever? for a
period of time? other?)
- connection pooling using HTTP keep-alive (this is a no brainer since
it is a simple extension of what browsers already do, but it needs to
be designed in from the start)
- must be effective with multiple child processes, each child must
make routing decisions globally based on stats maintained in a shared
memory segment
This can probably all be handled by an external program as described
above (which for my requirements would be enough).
However having your points implemented might have performance advantages
(and other that I'm missing) over the mod_rewrite solution...

Regards.

--
Federico Mennite.
Ian Holsman
2003-06-13 18:53:40 UTC
Permalink
I guess my only 'wish' would be a ability of sticky sessions, so that a
user could return to the same app-server his initial request was served.
and would be cherry on the top would be if this method would work
across multiple web-servers as well, so regardless of which web-server
the guy went to he would always return to the same app-server in the
pool.

we do something like this now using Alteons. I think they around $2-5k
on ebay.. might be another option as well.
Post by Chuck Murcko
On Friday, Jun 13, 2003, at 08:03 America/Phoenix, George Schlossnagle
Post by George Schlossnagle
Post by Eli Marmor
Post by Bill Stoddard
Who would be interested in seeing some load balancing function
being put
into mod_proxy?
Several millions users?
Many thousands of webmasters?
Isn't there ongoing discussion about incorporating mod_backhand into
mod_proxy for this?
I presumed there was some discussion between Theo and Graham at
ApacheCon about this. The copyrights prevent us from just dropping
mod_backhand into httpd. However, I have a patch that adds persistent
connections to mod_proxy that I am packaging and plan to submit over
the weekend, after I finish some current day job work.
Chuck
--
Ian Holsman / 415-344-2608
Performance Measurement & Analysis @ CNET Networks

If everything seems under control, you're just not going fast enough.
—Mario Andretti
George Schlossnagle
2003-06-13 19:03:53 UTC
Permalink
Post by Ian Holsman
I guess my only 'wish' would be a ability of sticky sessions, so that
a user could return to the same app-server his initial request was
served.
and would be cherry on the top would be if this method would work
across multiple web-servers as well, so regardless of which web-server
the guy went to he would always return to the same app-server in the
pool.
You can do all of this with mod_backhand, fwiw.

George
Graham Leggett
2003-06-17 15:58:15 UTC
Permalink
Post by Bill Stoddard
Ping to all list citizens (listizens?) ...
Who would be interested in seeing some load balancing function being put
into mod_proxy?
Very interested: There is a placeholder in the existing code for this,
along the lines of "order the list of IPs I should try and connect to here".

A better approach would be to turn this into a hook - then we can have
proxy_balancer in addition to proxy_http, proxy_ftp, etc.

During Apachecon 2002, there were some discussions on bringing in
mod_backhand in to do this - backhand could handle the load balancing,
and proxy would handle the protocol.
Post by Bill Stoddard
- selectable load balancing algorithm: Round robin, LRU, response time,
url driven, session affinity, ?
Each load balancer in its own module.
Post by Bill Stoddard
- automatic detection of backend server failure and removal of the
failed server from the load balancing routing tables (forever? for a
period of time? other?)
And the concept of URL retry - example: if the first server returns a
4xx or a 5xx, then try the next one transparently.
Post by Bill Stoddard
- connection pooling using HTTP keep-alive (this is a no brainer since
it is a simple extension of what browsers already do, but it needs to be
designed in from the start)
Connection pooling was given a lot of thought, and I don't think that
the performance advantage is worth the effort.

In a reverse proxy situation, the network between the proxy and the
backend is likely to be fast enough that pooling gives virtually no
advantage.

In a forward proxy situation, the large spread of URLs being accessed
means that the vast majority of pooled connections will simply hang
around unused, eating up server resources.
Post by Bill Stoddard
- must be effective with multiple child processes, each child must make
routing decisions globally based on stats maintained in a shared memory
segment
To do this properly, I would think we need some new config directives.
Perhaps a new container directive to define a group of backend servers,
another container directive to define URLs served by a particular group
of backend servers. Need some way to bind a url group to a server group.
We should just define some sane namespaces for directives, and then do
them on a per-module basis.

Regards,
Graham
--
-----------------------------------------
***@sharp.fm "There's a moon
over Bourbon Street
tonight..."
Hildenbrand, Patrick
2003-06-18 07:04:30 UTC
Permalink
Hi,

I would also be more than interested and I guess much more ppl not reading this list, especially those using mod_rewrite as a reverse proxy.

Some more input

- load balancing algorithm: please add: priority driven by external function (can be file input, ...., timeout for refresh should be configurable) (we have some functions to determine the internal system load, we could query). If mod_backhand would be used, this should already work (at least as far as I understood the docs)
- automatic detection: until server refresh or until external function signals somehow else, that the server is up again. (some systems will be connectable even though they are malfunctioning, how can we remove them ?)
- doing routing decisions on URL basis would be great, as this would support Apache based Reverse proxy setups the best.

Session persistency would be a must though, as some others have stated already. Session persistency could be based on
- based on a cookie, mod_proxy sets (cookie injection)
- based on rewriting the URL to the browser (mod_proxy would need to remove the
additional stuff prior connecting to the backend
- based on a cookie or other content in the DataStream (something which got set
by the backend)
based on IP only is not the best solution, as for large companies, a lot of their users will come from their proxies IP address.
Doing this, the session would allow us to keep it's state based on information given to mod_proxy by the browser itself (this is a common setup for commercial loadbalancers). Especially cookie injection and URL rewriting would be of interest I guess, as these would not require internal session tables for persistency. You could use a per server token for the mapping, thus immediatly forwarding the request to the right server. It would only be tricky to support url based balancing with this setup.

At a later point it would be great, if session persistency could also be limited to certain URL's only. Stuff like graphics often can be retrieved from any backend server.

I'm not quite sure, whether mixing loadbalancing into mod_rewrite will help like Eli Manor suggested. We use mod_rewrite to limit access to backend systems, mixing these config directives with the LB setup would probably be even more confusing. I'm in favor of a separate set of directives as you suggested.

BTW: to my knowledge mod_backhand does not currently support apache 2.0 only 1.x, at least based on the FAQ section http://www.backhand.org/mod_backhand/FAQ.shtml#question0. Theo Schlossnagel should be the expert to this matters maybe he can shed some light on this.

Kind regards,

Patrick Hildenbrand

Any views or opinions presented in this email are solely mine and do not necessarily represent those of my company.

-----Original Message-----
From: Bill Stoddard [mailto:***@wstoddard.com]
Sent: Freitag, 13. Juni 2003 16:29
To: modproxy-***@apache.org
Subject: modproxy load balancer


Ping to all list citizens (listizens?) ...
Who would be interested in seeing some load balancing function being put
into mod_proxy? Anyone given any though on what you would like to see
or maybe even have a design proposal you'd like to discuss?

My short requirements list:
- selectable load balancing algorithm: Round robin, LRU, response time,
url driven, session affinity, ?
- automatic detection of backend server failure and removal of the
failed server from the load balancing routing tables (forever? for a
period of time? other?)
- connection pooling using HTTP keep-alive (this is a no brainer since
it is a simple extension of what browsers already do, but it needs to be
designed in from the start)
- must be effective with multiple child processes, each child must make
routing decisions globally based on stats maintained in a shared memory
segment

To do this properly, I would think we need some new config directives.
Perhaps a new container directive to define a group of backend servers,
another container directive to define URLs served by a particular group
of backend servers. Need some way to bind a url group to a server group.

Bill
Bill O'Donnell
2003-07-09 15:07:18 UTC
Permalink
FWIW, I tend to think of a server (web server or otherwise) as being
upstream also.

Though I guess water flow is a poor analogy in general, since the
bytes do "flow" both ways. Maybe "upstream/downstream" can only be
meaningful with respect to a single packet of bytes: they always flow
downstream: so the sender (client or server) is upstream and the
recipient is downstream.

Of course, this is all probably non sequitor to your actual question,
and it doesn't appear that the author of the comment was thinking this
way.

:-)

-billo



From: Bill Stoddard <***@wstoddard.com>
Date: Tue, 08 Jul 2003 10:39:39 -0400

Perhaps I am being silly, but do we need to standardize on a definition
of 'downstream' and 'upstream'? Here is a comment from proxy_http.c:

/* Note: Memory pool allocation.
* A downstream keepalive connection is always connected to the existence
* (or not) of an upstream keepalive connection. If this is not done then
* load balancing against multiple backend servers breaks (one backend
* server ends up taking 100% of the load), and the risk is run of
* downstream keepalive connections being kept open unnecessarily. This
* keeps webservers busy and ties up resources.
*
* As a result, we allocate all sockets out of the upstream connection
* pool, and when we want to reuse a socket, we check first whether the
* connection ID of the current upstream connection is the same as that
* of the connection when the socket was opened.
*/

If I am reading this correctly, my polarity must be different that the
author of this comment. The way I look at it, most bytes flow from the
webserver to the web client. Analogous to water, the bytes flow
'downstream' from the server to the client. Now a proxy maintains two
connections, one to the client and one to the webserver. I would call
the connection from the client to the proxy the 'downstream' connection
and the connection from the proxy to the server the 'upstream'
connection. What say you?

Bill

Loading...