apache mod_rewrite module guide - part 1

apache mod_rewrite module guide - part 1

1. What is mod_rewrite?

mod_rewrite is an Apache module that allows for server-side manipulation of requested URLs . Incoming URLs are checked against a series of rules. The rules contain a regular expression to detect a particular pattern. If the pattern is found in the URL , and the proper conditions are met, the pattern is replaced with a provided substitution string or action. This process continues until there are no more rules left or the process is explicitly told to stop.

This is summarized in these three points:

  • There are a list of rules that are processed in order.
  • If a rule matches, it checks the conditions for that rule.
  • If everything is a go, it makes a substitution or action.

2. Basic rules

The most important term from previous lesson is URL-PATH

  • As always, anything that you can put in a .htaccess file can also be placed inside the global configuration file. With mod_rewrite, there is a small differences if you put a rule in one or the other. Most notably:

This is something to keep in mind if you see examples online or if you’re trying an example yourself: beware of the leading slash. I will attempt to clarify this below when we work through some examples together.

3. Enabling mod_rewrite on the Server

So, let's create a simlink:

~] cd /etc/apache2/mods-enabled/
~] ln -s ../mods-available/rewrite.load .
~] ls -alFh
...
lrwxrwxrwx 1 root root   30 Jan  8 13:54 rewrite.load -> ../mods-available/rewrite.load
...

Reload apache configuration with /etc/init.d/apache2 reload or with systemd command systemctl reload apache2.service. We can see mod_rewrite module is loaded to apache server configuration with this command:

~] apachectl -t -D DUMP_MODULES
 rewrite_module (shared)
 setenvif_module (shared)
 ssl_module (shared)
 status_module (shared)

After load mod_rewrite module we can enable mod_rewrite directives in .htaccess file or in Configuration File our (sub)domain with this directive:

# Enable Rewriting
RewriteEngine on

4. Regular Expressions

This tutorial does not intend to teach you regular expressions. From Apache 2.0 they're Perl Compatible Regular Expressions (PCRE) .

5. General record of RewriteRule

RewriteRule has the following syntax:

RewriteEngine on
RewriteRule   what-client-ask   what-client-really-get   [optional-parameters]

  1. The first line of RewriteEngine on is the magic to turn mod_rewrite on.

  2. RewriteRule:

  • What client ask (first parameter). It is a URL-PATH . e.g. /index.html , e.g. /archive/my_file.php. It is a regular expression
    When the server sees that the client wants a page that matches the first parameter, it starts doing something.

  • The page address that the user actually receives (the second parameter) is either an absolute address (starting at http:// or https://) or relative. The relative address is derived either from the current directory or from the root of the site - if the what-client-really-get entry begins with a slash. For example, https://myredlinux.com/file.html is the absolute address, /file.html is the relative address.

    This second parameter, unlike the first parameter, is not written as a regular expression, for example, it is not necessary to escape the dots with a backslash (but again, when they do not, it does not matter).

  • As far as [optional-parameters] are concerned, I will mention them differently in the following examples and at the end of this text

5.1 What is matched - WHAT-CLIENT-ASK

  • In VirtualHost context, The Pattern will initially be matched against the part of the URL after the hostname and port, and before the query string (e.g. "/app1/index.html"). This is the (%-decoded) URL-PATH .
  • In per-directory context (Directory and .htaccess ), the Pattern is matched against only a partial path, for example a request of "/app1/index.html" may result in comparison against "app1/index.html" or "index.html" depending on where the RewriteRule is defined.
  • The directory path where the rule is defined is stripped from the currently mapped filesystem path before comparison (up to and including a trailing slash). The net result of this per-directory prefix stripping is that rules in this context only match against the portion of the currently mapped filesystem path "below" where the rule is defined.
  • Directives such as DocumentRoot and Alias, or even the result of previous RewriteRule substitutions, determine the currently mapped filesystem path.
  • If you wish to match against the hostname, port, or query string, use a RewriteCond with the %{HTTP_HOST}, %{SERVER_PORT}, or %{QUERY_STRING} variables respectively.

5.1.1 Per-directory Rewrites (Directory directive or .htaccess file inside directory)

  • The rewrite engine may be used in .htaccess files and in <Directory> sections, with some additional complexity.
  • To enable the rewrite engine in this context, you need to set RewriteEngine On and Options FollowSymLinks must be enabled. If your administrator has disabled override of FollowSymLinks for a user's directory, then you cannot use the rewrite engine. This restriction is required for security reasons.
  • See the RewriteBase directive for more information regarding what prefix will be added back to relative substitutions.
  • If you wish to match against the full URL-path in a per-directory (htaccess) RewriteRule, use the %{REQUEST_URI} variable in a RewriteCond.
  • The removed prefix always ends with a slash, meaning the matching occurs against a string which never has a leading slash. Therefore, a Pattern with ^/ never matches in per-directory context.
  • Although rewrite rules are syntactically permitted in <Location> and <Files> sections (including their regular expression counterparts), this should never be necessary and is unsupported. A likely feature to break in these contexts is relative substitutions.

Example: RequestURI in the .htaccess file Example: RequestURI in the .htaccess file

5.2 WHAT-CLIENT-REALLY-GET - SUBSTITUTIONS

The what-client-really-get of a rewrite rule is the string that replaces the original URL-PATH that was matched by what-client-ask. The what-client-really-get may be a:

  • file-system path \
    Designates the location on the file-system of the resource to be delivered to the client. What-client-really-get strings are only treated as a file-system path when the rule is configured in server (virtualhost) context and the first component of the path in the substitution exists in the file-system

  • URL-PATH \
    A DocumentRoot-relative path to the resource to be served. Note that mod_rewrite tries to guess whether you have specified a file-system path or a URL-path by checking to see if the first segment of the path exists at the root of the file-system. For example, if you specify a what-client-really-get string of /www/file.html, then this will be treated as a URL-path unless a directory named www exists at the root or your file-system (or, in the case of using rewrites in a .htaccess file, relative to your document root), in which case it will be treated as a file-system path. If you wish other URL-mapping directives (such as Alias ) to be applied to the resulting URL-path, use the [PT] flag as described below.

  • Absolute URL \
    If an absolute URL is specified, mod_rewrite checks to see whether the hostname matches the current host. If it does, the scheme and hostname are stripped out and the resulting path is treated as a URL-path. Otherwise, an external redirect is performed for the given URL. To force an external redirect back to the current host, see the [R] flag below.

  • - (dash) \
    A dash indicates that no substitution should be performed (the existing path is passed through untouched). This is used when a flag (see below) needs to be applied without changing the path.

In addition to plain text, the what-client-really-get string can include

  1. back-references ($N) to the RewriteRule pattern
  2. back-references (%N) to the last matched RewriteCond pattern
  3. server-variables as in rule condition test-strings (%{VARNAME})
  4. mapping-function calls (${mapname:key|default})

5.3 Notes

In mod_rewrite, the NOT character ('!') is also available as a possible pattern prefix. This enables you to negate a pattern; to say, for instance: if the current URL does NOT match this pattern. This can be used for exceptional cases, where it is easier to match the negative pattern, or as a last default rule.

6. How is mod_rewrite rules processed

The rules in mod_rewrite apache module are processed in the order that they appear. Note that each RewriteRule is acting on the URL-PATH . When a rule makes a substitution, the modified URL-PATH will be handed to the next rule. This means that the URL that a rule is processing may have been edited by a previous rule! The URL is continually being updated by each rule that it matches. This is important to remember!!!

6.1 Flow Chart

Here is a flow chart that tries to provide a visualization of the generic flow of execution across multiple rules in a apache config file or .htaccess file. Note that, at the top of the flow chart, the value going into the rewrite rules is that “URL Part” and if the substitution is successful, the modified part proceeds into the next rule.

Apache2 mod rewrite flow chart Apache2 mod rewrite flow chart

I referred to rewriting conditions earlier, but didn’t go into detail. One or more RewriteCond is associated with a single RewriteRule. The conditions appear before the rule they are associated with one another, but only get evaluated if the rule’s pattern matched. As the flow chart illustrates, if a rewrite rule’s pattern matches, then Apache will check to see if there are any conditions for that rule. If there aren't, then it will make the substitution and continue. If there are conditions, on the other hand, then it will only make the substitution if all of the conditions are true. Let's visualize this in a concrete example.

Example: RewriteRules and Rewrite Conditions Example: RewriteRules and Rewrite Conditions

7. Redirect vs Remapping

The crucial for mod_rewrite apache module is understand, what is redirection and what is remapping

7.1 Redirect

Redirect is when I add the rule [R = 301] to the end of the line RewriteRule in square brackets as follows:

RewriteEngine on
RewriteRule   (.*)   /result.html   [R=301]

  • RewriteEngine on - turn mod_rewrite on
  • (.*) - regular expression that matches all chars in URL-PATH - it is what-client-ask
  • When I enter full url e.g. http://example.com/directory/question.html
  • then I'll see a different address in the browser line: http://example.com/directory/result.html because the server redirects me (and the browser accepts it)
  • [R = 301] - redirect as 301.

7.2 Remapping

I will explain remapping in the following example:

RewriteEngine on
RewriteRule   question-url-path\.html   remapping-url-path.html

In this example, redirection is not performed, but remapping (not there [R]). This means that the user will still see the address they entered (or clicked on), but server remap content question-url-path.html file with content of remapping-url-path.html file. Note that this time there are no square brackets - remapping is the default behavior of mod_rewrite.

  • I write to web browser url http://example.com/directory/question-url-path.html
  • I get content from URL http://example.com/directory/remapping-url-path.html
  • but I still see the original address in the browser http://example.com/directory/question-url-path.html

7.3 When is redirect and when in remapping

Default behavior for mod_rewrite module of apache web server is remapping. In what cases is a redirect?

  • there is a clear instruction to redirect (eg [R])
  • or can not be remapped - these are cases where the new address starts at http:// or https://. Then the server will not allow the page to remap, even if it was from its server. The following listing redirects even if it does not [R]:

RewriteRule   (.*)   https://www.mybluelinux.com

8. Variables from regular expressions

Redirect or remapping one file to another is not very useful. It's much better to find something in the called url and use it to call something else. That "something" will be variable.

Maybe I can find an article number and use it to call a hidden url. The following example assumes that the pages are written in php and their addresses normally have a question mark (?). Articles need this url:

example.com/script.php?id=234

But I would like to refer to this page and write it without a question mark, for example

example.com/page-234

I will do this in the rule for mod_rewrite to find the article number as a variable (it will be named $1) and use this variable to define what should be replaced. The rule entry looks like this:

RewriteRule   ^page-(.*)   script.php?id=$1

Explanation

  • the user will ask for url-path page-543
  • mod_rewrite will see it and notice that it matches the regular expression ^page-(.*). Conversion (.*) corresponds to any number of characters, and therefore corresponds to the string 543. The caption is stored in the variable $1 (the first because it is the first parenthesis).
  • mod_rewrite further remap user content that finds it at script.php?id=$1
  • which now corresponds to script script.php?id=543 because $1 equals 543
  • apache web server send as response the content of script.php?id=543 file
  • this intricate address with a question mark and parameters will not see the user at all, this is a hidden url even if it is functional
  • the user can see at the end of the address in web browser page-543 (this is a remapping)

Another part of this guide:

SUBSCRIBE FOR NEW ARTICLES

@
comments powered by Disqus