Talk:Filter (internet)

Page contents not supported in other languages.
From Simple English Wikipedia, the free encyclopedia

I've reverted my revert, Here are the points we should rethink:

(I'm sorry, I over-estimated the complexity of the article, but these points do need to be reviewd) Yotcmdr (talk) 12:17, 26 October 2008 (UTC)[reply]

Filters and Input Formats[change source]

The pillars of Drupal's text handling are filters and input formats. A filter is a set of rules that can be applied to transform text in some way. Some filters strip certain HTML tags or security hazards from text. Other filters look for special patterns and expand the text in a meaningful way. Other fun-oriented filters, such as the Pirate Filter, rewrite the text altogether (in this case, to make it "talk like a pirate"). Filters know how to do one thing, and do it well; text in, filtered text out.

Some filters have extra configuration options. The HTML filter, for example, strips all but an allowed set of HTML tags from text. The set of allowed tags can be determined by the administrator.

An input format is an ordered collection of filters. Any text that is being displayed to the browser should be run through the filters in an input format first. The input format then applies all of the filters, in the right order, so that one filter feeds its output to the next, forming a chain. This chaining of filters can be the source of great flexibility as well as great confusion. The flexibility comes from the fact that filters can be made to work together, the confusion comes from the case where filters inadvertently work against each other, one filter undoing the work of the previous filter. I'll show examples of both. Input versus Output

Drupal captures input in its raw form, saving whatever gets submitted straight to the database without alteration. Then, before displaying any such content in the browser, Drupal processes the text by choosing an input format to apply. Why doesn't Drupal apply the filters in an input format before saving input into the database? The answer is simple; flexibility. If you were to change the text that a user has input before saving it in the database, you could never get back to the original state. You could never change your mind about the configuration of the filters. By filtering on output, not on input, Drupal gives the site administrator the option of changing how content is displayed at any time. As an example, imagine that you notice the users on your site using character patterns to represent smiley faces. I know, that stuff is so 1998 :P But just for fun, let's say they're doing it ;-) You look around and find the Smiley Filter on Drupal.org, and install it. Now all of the keystroke patterns that your users had been using can be displayed as images This ability to change is only available if the input is saved verbatim and filtering is done on output. Meet Drupal's Core Filters

Here is a rundown of the filters that Drupal ships with:

* HTML Filter: The HTML filter is primarily responsible for removing HTML tags from text. It can be configured to allow any number of tags (whitelist) and it will remove the rest. It removes them either by stripping them, or by escaping them into entities like this: <div> If tags are escaped, they show up in the output as visible tags:

Some text

. The set of tags that are allowed by default include: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>

     The final task of the HTML filter is to add a spam link deterrent to anchor tags. The deterrent, proposed by Google, gives search engines a tip about which links to follow when crawling the web. If this option is enabled, rel="nofollow" will be added as an attribute of all anchor tags.
   * Line Break Converter: This filter converts line breaks into <br> or <p> tags depending on whether a single or double line break is found. This preserves the paragraph formatting in the text that is input.
   * URL Filter: Any web or email addresses that are found in the text will be converted to clickable links, thus saving the user the hassle of having to type <a href="....">
   * PHP Evaluator: The PHP Evaluator is the most radical of all Drupal's core filters. It looks for text enclosed in <?php ... ?> and evaluates it as PHP code. This effectively allows you to program and extend Drupal just by submitting content to the site! In 99% of cases, this is a bad idea, and the initial attraction of harnessing such power should be weighed by a healthy sense of fear. If you really need to write PHP code to accomplish what you're trying to do, writing a module is usually a better idea (and not that hard in most cases). Furthermore, in the wrong hands, the PHP Evaluator is an enormous security risk. A malicious attacker, with the PHP Evaluator at their disposal, could wipe out your database and take control of your web server.

Drupal's Core Input Formats

Drupal also comes with three input formats pre-defined.

   * Filtered HTML: This is the workhorse input format that is used most of the time for displaying posts such as blogs, pages, forum topics and so forth. It combines the URL Filter, the HTML Filter and the Line Break Converter in a way that allows users a small set of HTML tags for formatting while taking care of paragraphs and URLs behind the scenes. This is also the default input format for new Drupal installations. More on default input formats later.
   * PHP Code: This input format consists of only one filter, the PHP Evaluator filter. This input format is to be used when the goal is embedding PHP code in a post.
   * Full HTML: The Full HTML input format applies only the Line Break Converter filter. No HTML tags are stripped and no weblinks are converted to anchor tags.