Lighttpd
上QQ阅读APP看书,第一时间看更新

Selectors

The features that make the configuration of Lighttpd very powerful, yet keep it concise, are selectors. A selector is a criterion within a curly-braced region of the configuration that only applies if the criterion is met. After the optional else keyword, another curly-braced region can be added that applies for the inverse of the criteria. So the basic formula for a selector is one of the following:

criteria { configuration }

or

criteria { configuration } else { configuration }

Suppose that we want to serve .html files from the subdirectory /xhtml of our document root as application/xhtml+xml and from any other directory as text/html:

mimetype.assign = (...[our list of mime types, omitting .html]...)
$HTTP["url"] =~ "^/xhtml" {
mimetype.assign += (".html" => "application/xhtml+xml")
} else {
mimetype.assign += (".html" => "text/html")
}

As we can see in the example, each criterion consists of a value, an operator, and a pattern. The value to compare is either $SERVER["socket"], which matches the IP plus port (or just the port, if the IP is omitted in the pattern) or $HTTP["x"], where 'x' is one of the following:

The example values would occur if a user with the IP address 12.34.56.78 using the english version of Mozilla Firefox 2.0.0.1 would click on a link on the page at http://example.com/some.html bringing him or her to http://example.com/other.html.

There are two pairs of operators: == and != to check for equality and inequality respectively, of the values and the verbatim text. =~ and !~ match the value against a pattern using Perl-Compatible Regular Expressions (PCRE), and will only work if your Lighttpd is compiled with PCRE support. =~ applies if the pattern matches and !~ applies if the pattern does not match.

To become even more powerful, selectors can be nested, but not in any particular order. The value $HTTP["url"] always needs to be in the innermost selector. This is not a problem at all. Suppose you have two selectors:

$HTTP["url"] == "..." { $HTTP["cookie"] == "..." { ... } }

you can easily turn them inside out:

$HTTP["cookie"] == "..." { $HTTP["url"] == "..." { ... } }

Now, if you have an else-clause, this will not work, but remember that else clauses can be simulated by inverting the expression:

$HTTP["referer"] =~ "..." {
server.document-root = "/www/referred"
} else {
server.document-root = "/www/no_ref"
}

is equivalent to:

$HTTP["referer"] =~ "..." {
server.document-root = "/www/referred"
}
$HTTP["referer"] !~ "..." {
server.document-root = "/www/no_ref"
}

This way, we can split the else clauses apart and invert their ordering, if necessary. The following example shows a document-root based browser switch:

$HTTP["useragent"] =~ "MSIE" {
server.document-root = "/www/msie"
} else $HTTP["useragent"] =~ "Opera" {
server.document-root = "/www/opera"
} else { server.document-root = "/www/default" }

Alternatively, we can also set up a kind of virtual hosting by looking at the hostname and changing the document root:

$HTTP["host"] == "some.ourdomain.net" {
server.document-root = "/www/some"
} else $HTTP["host"] == "other.ourdomain.net" {
server.document-root = "/www/other"
}

... add as many subdomains as you like.

Excursion: Regular Expressions

Note

If you already know regular expressions, feel free to skip this section.

Regular Expressions, popularly known as regexes, regexen, or regexps, come from Noam Chomsky's formal language works. Chomsky searched for ways to formalize languages (without necessarily giving them meaning) and found that there was a class of languages that could be described by a finite automaton. This means a machine can decide if an input string is part of the language or not by only looking at each symbol once.

Due to the nature of their construction, regular languages, as Chomsky called them, can also be described by regular expressions. In fact, every regular expression constructs a regular language, and the PCRE used by Lighttpd actually builds something akin to the finite state machine to decide if the input is part of the regular language or not—that is, if the input "matches". It should be noted that the PCRE engine extends the classical regexes in a way that enables them to define some non-regular languages, thus giving them even more power.

Excursion: Regular Expressions

This diagram shows how regular expressions match an input text. It also goes to show that learning Lighttpd is not all fun and games (well, it's fun, but no games). The pattern needs to match only part of the input text. Most characters in the pattern match themselves. If you want to match only from the beginning, you can match the beginning itself with "^". Similarly the end is matched by a dollar-sign "$":

Excursion: Regular Expressions

Play it Again, Sam

One of the powerful features of regexps is repetition. The pattern a+ would match a, aa, aaa, any number of as. The pattern a* matches everything a+ matches, plus the empty string. The pattern a{2,4} matches aa, aaa, and aaaa. You can of course put any value instead of 2 and 4, or even omit the second value to match the exact number of as.

Note that the *, +, and {} operators are "greedy". This means they try to match as much as possible of the first occurrence.

Play it Again, Sam

Are You There?

If you want to match something if it is there, but not lose your match if it is not, you can put a question mark "?" after it. The question mark also turns the greedy operators + and * into meek operators that match only what is absolutely needed. A period in the pattern will match one character, regardless of what it is—so, * will match everything.

Are You There?

Decisions, Decisions

Sometimes we need to match either of the two values, for example www and web. The vertical bar does just that. So our pattern would read www|web. Now, if we also want to match net, we just extend our pattern with another vertical bar and get www|web|net. Note that the vertical bar binds weaker than the other operators, so ba*|cd+ would not match bdd. A usual case is to match a single digit, letter or other character. So, a shortcut was invented: [abc] is equal to a|b|c, [0-9] will match one digit and [a-z] will match one lowercase character. To invert the character range, use ^ at the beginning of the character group, for example [^0-9] will match one character that is not a digit. We can match, if we put it at the beginning of the range.

Decisions, Decisions

Group and Capture

Putting parenthesis around a pattern will group and capture this pattern. First, to go with the example above b(a*|c)d+ will match bdd. It will also "capture" an empty string (matched by a*) into $1. This is not very interesting for selectors but will be very useful when it comes to rewriting and redirecting. The captures are ordered by the position of the opening parenthesis.

It is also possible to create a non-capturing group using ?= pattern that will match a pattern without capturing, or even a negative group using ?! pattern that will only match if the pattern is not there.

Group and Capture

Lucky Escape

Now you might want to ask how to match against those characters that make up the operators? We can "escape" them by prefixing them with a backslash in the pattern, so they will match themselves verbatim. Also the usual C-string like escapes work as usual. The following table shows the escapes and what they mean:

Additionally, there are some abbreviations for commonly used character classes:

The character classes can be mixed and matched with single characters and ranges, for example [\w\-] or [a-z\d:\.]. The word boundary matches the position between \w and \W, or vice versa:

Lucky Escape

There are more tricks you can do with regular expressions—in fact there are entire books written about them. The basics presented here should be enough to understand the examples that will follow.

Note

Testing Regular Expressions

The best way to learn regular expressions is to test them with some input. There are some programs that do this. The PCRE library comes with a "pcretest" utility that lets you enter the regular expression (note that it requires you to "quote" the regexes) and then multiple input texts.

I also use the jEdit regexp tester sometimes. To get jEdit and the regexp tester plugin, visit http://jedit.org. You can also search for "regexp test" at http://freshmeat.net to find other regular expression testing programs.

Now, we can structure our configuration with selectors and carve out regions of our server landscape using regex matching. Let us put this ability to use to rewrite and redirect requests.