test

Tidying PHP and HTML Code?

Situation one i can live with and there are ways around it. However, I would be grateful if anyone could offer a solution around situation two.

Tools I use eclipse 3.6, Aptanna 2.05 PDT 2.2

1 Answer 1

You could use HTML Tidy from within PHP to clean up your output. Use ob_start() and friends to get the whole HTML output as a string, then send it through Tidy. You might want to use som sort of caching if you do this, though.

 true, 'output-xhtml' => true, 'wrap' => 200); return tidy_repair_string($buffer, $config, 'utf8'); > // Do some output. ob_start("callback"); ?>  

Outputting stuff here

Testing a broken tag: This span should be closed by Tidy.

Thanks that’s a really nice solution for dynamic output. But what about embedded PHP and HTML in source? for example.

This is an example.

HTML tidy fails to indent PHP in this situation correctly, I believe.

That’s the point of the output buffering with callback. You buffer up the entirety of the script’s out put and then run it through Tidy as the LAST thing done before sending it to the client. At that point the entire page has been generated so you get the whole thing nicely formatted.

Sorry to be a bit slow. So do you mean run the code with HTMLtidy callback grab the HTML output and then paste formatted HTML back into the formatted PHP code? .. Therefore both HTML and PHP appear formatted correctly in the source and the output.

Источник

tidy_parse_string

The config config can be passed either as an array or as a string. If a string is passed, it is interpreted as the name of the configuration file, otherwise, it is interpreted as the options themselves.

The encoding parameter sets the encoding for input/output documents. The possible values for encoding are: ascii , latin0 , latin1 , raw , utf8 , iso2022 , mac , win1252 , ibm858 , utf16 , utf16le , utf16be , big5 , and shiftjis .

Return Values

tidy::parseString() returns true on success. tidy_parse_string() returns a new tidy instance on success. Both, the method and the function return false on failure.

Changelog

Examples

Example #1 tidy::parseString() example

$buffer = ob_get_clean ();
$config = array( ‘indent’ => TRUE ,
‘output-xhtml’ => TRUE ,
‘wrap’ => 200 );

$tidy = tidy_parse_string ( $buffer , $config , ‘UTF8’ );

$tidy -> cleanRepair ();
echo $tidy ;
?>

The above example will output:

       

error
another line

See Also

  • tidy::parseFile() — Parse markup in file or URI
  • tidy::repairFile() — Repair a file and return it as a string
  • tidy::repairString() — Repair a string using an optionally provided configuration file

User Contributed Notes 2 notes

/**
* Simpler version without pretty print config options.
*/
function tidy_html5 ( $html , array $config = [], $encoding = ‘utf8’ ) $config += [
‘doctype’ => » ,
‘drop-empty-elements’ => 0 ,
‘new-blocklevel-tags’ => ‘article aside audio bdi canvas details dialog figcaption figure footer header hgroup main menu menuitem nav section source summary template track video’ ,
‘new-empty-tags’ => ‘command embed keygen source track wbr’ ,
‘new-inline-tags’ => ‘audio command datalist embed keygen mark menuitem meter output progress source time video wbr’ ,
‘tidy-mark’ => 0 ,
];
$html = tidy_parse_string ( $html , $config , $encoding ); // doctype not inserted
tidy_clean_repair ( $html ); // doctype inserted
return $html ;
>

$html = ‘

Link

Seçond para

‘ ;

echo tidy_html5 ( $html , [ ‘indent’ => 2 , ‘indent-spaces’ => 4 ]);

echo tidy_html5 ( $html , [ ‘indent’ => 1 ], ‘ascii’ );

echo tidy_html5 ( $html , [ ‘show-body-only’ => 1 ]);

/**
* UTF-8 HTML5-compatible Tidy
*
* @param string $html
* @param array $config
* @param string $encoding
* @link http://tidy.sourceforge.net/docs/quickref.html
*/
function tidy_html5 ( $html , array $config = [], $encoding = ‘utf8’ ) $config += [
‘clean’ => TRUE ,
‘doctype’ => ‘omit’ ,
‘indent’ => 2 , // auto
‘output-html’ => TRUE ,
‘tidy-mark’ => FALSE ,
‘wrap’ => 0 ,
// HTML5 tags
‘new-blocklevel-tags’ => ‘article aside audio bdi canvas details dialog figcaption figure footer header hgroup main menu menuitem nav section source summary template track video’ ,
‘new-empty-tags’ => ‘command embed keygen source track wbr’ ,
‘new-inline-tags’ => ‘audio command datalist embed keygen mark menuitem meter output progress source time video wbr’ ,
];
$html = tidy_parse_string ( $html , $config , $encoding );
tidy_clean_repair ( $html );
return » . PHP_EOL . $html ;
>

Источник

Validate HTML5 Document in PHP using Tidy

I am trying to clean up a HTML string and create an HTML5 document using Tidy and PHP, however, am creating a HTML3.2 document. As seen, I am getting an Config: missing or malformed argument for option: doctype error. I am operating PHP Version 5.5.35 with Centos 6 and Apache 2.2, and php_info() shows the following:

tidy Tidy support enabled libTidy Release 14 June 2007 Extension Version 2.0 ($Id: e066a98a414c7f79f89f697c19c4336c61bc617b $) Directive Local Value Master Value tidy.clean_output no value no value tidy.default_config no value no value 
Hello

bla

bla

Hi there!

Opps, a mistake

EOD; $html=" $html"; echo($html."\n\n"); $config = array( 'indent' => true, 'indent-spaces' => 4, 'doctype' => '', ); $tidy = new tidy; $tidy->parseString($html, $config, 'utf8'); $tidy->cleanRepair(); print_r($tidy);
 

Hello

bla

bla

Hi there!

Opps, a mistake

tidy Object ( [errorBuffer] => Config: missing or malformed argument for option: doctype line 9 column 21 - Warning: discarding unexpected line 3 column 2 - Warning:

proprietary attribute "data-customattribute" [value] =>

Hello

bla

bla

Hi there!

Opps, a mistake

)

Источник

Tidying up HTML Code with Tidy PHP Extension

Tidy is a quite powerful program which main purpose is to fix errors in HTML documents. TidyLib is a library version of Tidy written in C and by reason of easy C linkage, it can be used from within nearly any programming language, including PHP.

The common way to invoke Tidy functions from PHP is to use the Tidy extension, which can be easily enabled. Tidy extension has dual (both procedural and object-oriented) nature and from now on we’ll focus on the latter. To start work with Tidy, we simply need to create a new object:

We can provide Tidy object with a string containing either file name or HTML document:

parseFile('myfile.html'); // or $tidy->parseString('syntax error my text'); 

In order to fix HTML code errors, we should invoke the cleanRepair() method. All in all, an example Tidy usage looks like this:

parseString('syntax error my text'); $tidy->cleanRepair(); echo $tidy; 

When we look at the script output in web browser, we should see something familiar to this:

   syntax error my text  

There is much difference between the input and the output. At first glance we can see that the DOCTYPE and also html , head , title and body elements have been added. But let’s take a closer look. In our input string, there was a tag paired with a tag instead of tag. Moreover, we used a , which is definitely not a valid HTML tag. As we can see, Tidy has got through it all without a hitch.

Admittedly, output code is valid, but it is not easily readable. Fortunately, Tidy comes to aid of making the code more readable using indentation, which is often called “beautifying”. We can change Tidy’s behavior by passing the $options array:

 true); $tidy->parseString('syntax error my text', $options); $tidy->cleanRepair(); echo $tidy; 
   syntax error my text  

Looks better, doesn’t it? And Tidy has many more options to play with. If you are about to build the XML or XHTML documents, you might be interested in output-xml and output-xhtml options, just put them in the $options array and set their value to true :

 true, 'output-xhtml' => true); $tidy->parseString('syntax error my text', $options); $tidy->cleanRepair(); echo $tidy; 

There are also some options that will be useful in reducing bandwidth usage. You may take a look at hide-comments , join-classes and join-styles options. It is advised to read the whole Tidy options list.

Tidy can be put inside your application in a painless way. Let’s say we have a template rendering mechanism, which outputs HTML code to the user. It would be a good idea to write a decorator:

parseString(ob_get_clean()); $tidy->cleanRepair(); echo $tidy; > > 

We have just used two simple tricks. Firstly, our new render() method is a standard decorator pattern example: we’ve added some functionality (HTML errors fixing) to the inheriting class and the base class ( ViewRenderer ) doesn’t have to know anything about it and thus it doesn’t need to be changed. Secondly, we made a good use of PHP output buffering functions. They made our add-on transparent.

There is another thing you might want to know about Tidy. It can be used as a HTML validator due to its errorBuffer property which we can easily iterate through:

parseString('syntax error my text'); $tidy->cleanRepair(); if ($tidy->errorBuffer) < echo "There are some errors!\n"; $errors = explode("\n", $tidy->errorBuffer); foreach ($errors as $error) < echo $error."\n"; >> else

This script displays a series of HTML warnings and errors:

There are some errors! line 1 column 1 - Warning: missing declaration line 1 column 1 - Warning: plain text isn't allowed in elements line 1 column 8 - Warning: replacing unexpected small by line 1 column 30 - Error: is not recognized! line 1 column 30 - Warning: discarding unexpected line 1 column 47 - Warning: discarding unexpected line 1 column 1 - Warning: inserting missing 'title' element 

It seems that we know much about Tidy library capabilities. Remember that the knowledge we gained can be used while writing applications in other languages. Good luck with tidying up the web!

Источник

Html tidy in php

This simple example shows basic Tidy usage.

Example #1 Basic Tidy usage

// Specify configuration
$config = array(
‘indent’ => true ,
‘output-xhtml’ => true ,
‘wrap’ => 200 );

// Tidy
$tidy = new tidy ;
$tidy -> parseString ( $html , $config , ‘utf8’ );
$tidy -> cleanRepair ();

User Contributed Notes 3 notes

If you are looking for HTML beautifier (a tool to indent HTML output produced by your script), Tidy extension might not be the right tool for the job.

First and foremost, you should not be using either Tidy or alternatives (e.g. HTML Purifier) in the production code. HTML post procession is relatively resource demanding task, esp. if the underlying implementation relies on DOM API. However, beyond performance, HTML beautification in production might hide far more serious output issues that will be hard to trace back, because output will not align with the input.

If you are indenting to use indentation (consistent, readable formatting of the output) for development purposes only then you might consider implementation that relies on regular expression. I have written, https://github.com/gajus/dindent for this purpose. The difference between earlier mentioned implementation and the latter is that regular expression based implementation does not attempt to sanitise, validate or otherwise manipulate your output beyond ensuring proper indentation.

$tidy_options = array( ‘indent’ => ‘auto’ ); // WILL NOT WORK
$tidy_options = array( ‘indent’ => 2 ); // equivalent of auto

$tidy = new Tidy ();
$tidy -> parseString ( $html , $tidy_options );
?>

If you’re using tidy to clean up your HTML but only want your string formatted and not the whole html and head tag, you can use the following configuration array:

$config = [
‘indent’ => true ,
‘output-xhtml’ => false ,
‘show-body-only’ => true
];

$tidy = new tidy ;
$tidy -> parseString ( $your_html_code , $config , ‘utf8’ );
$tidy -> cleanRepair ();

Источник

Читайте также:  Домен верхнего уровня html
Оцените статью