- Saved searches
- Use saved searches to filter your results more quickly
- wbrframe/pdf-to-html
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- About
- Saved searches
- Use saved searches to filter your results more quickly
- License
- tonchik-tm/pdf-to-html
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- Saved searches
- Use saved searches to filter your results more quickly
- License
- mgufrone/pdf-to-html
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
PDF to HTML converter with PHP using poppler-utils
wbrframe/pdf-to-html
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
PDF to HTML PHP Class (only for Linux)
PDF to HTML converter with PHP using tools, like poppler-utils. Currently only supported poppler-utils.
PdfToHtml from package poppler-utils always executing with next flags:
When you are in your active directory apps, you can just run this command to add this package on your app
composer req wbrframe/pdf-to-html
sudo apt-get install poppler-utils
In this example HTML file will be created in system temporary folder with in subfolder output a random name. Example: /tmp/output/5e8671ec8e0283.34152860.html
use Wbrframe\PdfToHtml\Converter\ConverterFactory; // if you are using composer, just use this include 'vendor/autoload.php'; // initiate $converterFactory = new ConverterFactory('test.pdf'); $converter = $converterFactory->createPdfToHtml(); $html = $converter->createHtml(); // Get absolute path created HTML file $htmlFilePath = $html->getFilePath(); // or get Crawler (symfony/dom-crawler) $crawler = $html->createCrawler(); ?>
You can change some options like is outputFolder , outputFilePath and binPath , where an option outputFolder is folder were HTML will be created, outputFilePath is absolute path for HTML file that you want to create, binPath is path to pdftohtml
NOTE: If outputFilePath is specified, option an outputFolder is was be missed.
use Wbrframe\PdfToHtml\Converter\ConverterFactory; use Wbrframe\PdfToHtml\Converter\PopplerUtils\PdfToHtmlOptions; // if you are using composer, just use this include 'vendor/autoload.php'; $converterFactory = new ConverterFactory('test.pdf'); $options = (new PdfToHtmlOptions()) ->setBinPath('/path/pdftohtml') ->setOutputFolder('/app/output') ->setOutputFilePath('/app/output/file.html') ; $converter = $converterFactory->createPdfToHtml($options); $html = $converter->createHtml(); ?>
About
PDF to HTML converter with PHP using poppler-utils
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
tonchik-tm / pdf-to-html Public archive
This PHP class can convert your pdf files to html using poppler-utils.
License
tonchik-tm/pdf-to-html
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
This PHP class can convert your pdf files to html using poppler-utils.
Big thanks Mochamad Gufron (mgufrone)! I did a packet based on its package (https://github.com/mgufrone/pdf-to-html).
Please see how to use below.
When you are in your active directory apps, you can just run this command to add this package on your app
composer require tonchik-tm/pdf-to-html:~1
Or add this package to your composer.json
Debian/Ubuntu
sudo apt-get install poppler-utils
For those who need this package in windows, there is a way. First download poppler-utils for windows here http://blog.alivate.com.au/poppler-windows/. And download the latest binary.
After download it, extract it.
2. We need to know where is utilities
Debian/Ubuntu
$ whereis pdftohtml pdftohtml: /usr/bin/pdftohtml $ whereis pdfinfo pdfinfo: /usr/bin/pdfinfo
$ which pdfinfo /usr/local/bin/pdfinfo $ which pdftohtml /usr/local/bin/pdfinfo
Go in extracted directory. There will be a directory called bin . We will need this one.
3. PHP Configuration with shell access enabled
// if you are using composer, just use this include 'vendor/autoload.php'; // initiate $pdf = new \TonchikTm\PdfToHtml\Pdf('test.pdf', [ 'pdftohtml_path' => '/usr/bin/pdftohtml', 'pdfinfo_path' => '/usr/bin/pdfinfo' ]); // example for windows // $pdf = new \TonchikTm\PdfToHtml\Pdf('test.pdf', [ // 'pdftohtml_path' => '/path/to/poppler/bin/pdftohtml.exe', // 'pdfinfo_path' => '/path/to/poppler/bin/pdfinfo.exe' // ]); // get pdf info $pdfInfo = $pdf->getInfo(); // get count pages $countPages = $pdf->countPages(); // get content from one page $contentFirstPage = $pdf->getHtml()->getPage(1); // get content from all pages and loop for they foreach ($pdf->getHtml()->getAllPages() as $page) < echo $page . '
'; >
Full list settings:
$full_settings = [ 'pdftohtml_path' => '/usr/bin/pdftohtml', // path to pdftohtml 'pdfinfo_path' => '/usr/bin/pdfinfo', // path to pdfinfo 'generate' => [ // settings for generating html 'singlePage' => false, // we want separate pages 'imageJpeg' => false, // we want png image 'ignoreImages' => false, // we need images 'zoom' => 1.5, // scale pdf 'noFrames' => false, // we want separate pages ], 'clearAfter' => true, // auto clear output dir (if removeOutputDir==false then output dir will remain) 'removeOutputDir' => true, // remove output dir 'outputDir' => '/tmp/'.uniqid(), // output dir 'html' => [ // settings for processing html 'inlineCss' => true, // replaces css classes to inline css rules 'inlineImages' => true, // looks for images in html and replaces the src attribute to base64 hash 'onlyContent' => true, // takes from html body content only ] ]
Send me an issue for improvement or any buggy thing. I love to help and solve another people problems. Thanks 👍
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
mgufrone / pdf-to-html Public archive
PDF to HTML PHP Class using Poppler-Utils
License
mgufrone/pdf-to-html
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Passing options to pdftohtml
Git stats
Files
Failed to load latest commit information.
README.md
This class brought to you so you can use php and poppler-utils convert your pdf files to html file
Please see how to use below, since it’s really upgraded and things in this package has already changed.
When you are in your active directory apps, you can just run this command to add this package on your app
composer require gufy/pdftohtml-php:~2
Or add this package to your composer.json
- Poppler-Utils (if you are using Ubuntu Distro, just install it from apt ) sudo apt-get install poppler-utils
- PHP Configuration with shell access enabled
// if you are using composer, just use this include 'vendor/autoload.php'; // initiate $pdf = new Gufy\PdfToHtml\Pdf('file.pdf'); // convert to html string $html = $pdf->html(); // convert a specific page to html string $page = $pdf->html(3); // convert to html and return it as [Dom Object](https://github.com/paquettg/php-html-parser) $dom = $pdf->getDom(); // check if your pdf has more than one pages $total_pages = $pdf->getPages(); // Your pdf happen to have more than one pages and you want to go another page? Got it. use this command to change the current page to page 3 $dom->goToPage(3); // and then you can do as you please with that dom, you can find any element you want $paragraphs = $dom->find('body > p'); // change pdftohtml bin location \Gufy\PdfToHtml\Config::set('pdftohtml.bin', '/usr/local/bin/pdftohtml'); // change pdfinfo bin location \Gufy\PdfToHtml\Config::set('pdfinfo.bin', '/usr/local/bin/pdfinfo'); ?>
###Passing options to getDOM By default getDom() extracts all images and creates a html file per page. You can pass options when extracting html:
$pdfDom = $pdf->getDom(['ignoreImages' => true]);
- singlePage, default: false
- imageJpeg, default: false
- ignoreImages, default: false
- zoom, default: 1.5
- noFrames, default: true
Usage note for Windows Users
For those who need this package in windows, there is a way. First download poppler-utils for windows here http://blog.alivate.com.au/poppler-windows/. And download the latest binary.
After download it, extract it. There will be a directory called bin . We will need this one. Then change your code like this
// if you are using composer, just use this include 'vendor/autoload.php'; use Gufy\PdfToHtml\Config; // change pdftohtml bin location Config::set('pdftohtml.bin', 'C:/poppler-0.37/bin/pdftohtml.exe'); // change pdfinfo bin location Config::set('pdfinfo.bin', 'C:/poppler-0.37/bin/pdfinfo.exe'); // initiate $pdf = new Gufy\PdfToHtml\Pdf('file.pdf'); // convert to html and return it as [Dom Object](https://github.com/paquettg/php-html-parser) $html = $pdf->html(); // check if your pdf has more than one pages $total_pages = $pdf->getPages(); // Your pdf happen to have more than one pages and you want to go another page? Got it. use this command to change the current page to page 3 $html->goToPage(3); // and then you can do as you please with that dom, you can find any element you want $paragraphs = $html->find('body > p'); ?>
Thanks to @kaleidoscopique for giving a try and make it run on OS/X for this package
1. Install brew
Brew is a famous package manager on OS/X : http://brew.sh/ (aptitude style).
2. Install poppler
3. Verify the path of pdfinfo and pdftohtml
$ which pdfinfo /usr/local/bin/pdfinfo $ which pdftohtml /usr/local/bin/pdfinfo
4. Whatever the paths are, use Gufy\PdfToHtml\Config::set to set them in your php code. Obviously, use the same path as the one given by the which command;
// if you are using composer, just use this include 'vendor/autoload.php'; // change pdftohtml bin location \Gufy\PdfToHtml\Config::set('pdftohtml.bin', '/usr/local/bin/pdftohtml'); // change pdfinfo bin location \Gufy\PdfToHtml\Config::set('pdfinfo.bin', '/usr/local/bin/pdfinfo'); // initiate $pdf = new Gufy\PdfToHtml\Pdf('file.pdf'); // convert to html and return it as [Dom Object](https://github.com/paquettg/php-html-parser) $html = $pdf->html(); ?>
Send me an issue for improvement or any buggy thing. I love to help and solve another people problems. Thanks 👍