pdftotext -raw -eol dos corrupted.pdf output.txt Librarians and archivists use pdfimages (with -png ) to extract figures from scientific papers stored in a 32-bit NAS:
| Test (100MB PDF, 500 pages) | Poppler 0.68.0-x86 (i686) | Poppler 0.68.0-x86_64 | |-----------------------------|----------------------------|-------------------------| | Text extraction ( pdftotext ) | 12.4 seconds | 8.2 seconds | | Image extraction ( pdfimages ) | 45 images in 6.1s | 45 images in 4.3s | | Memory peak (resident) | 312 MB | 298 MB | | Binary size ( pdftotext ) | 892 KB | 1.1 MB | poppler-0.68.0-x86
| Utility | Function | |---------|----------| | pdftotext | Extracts plain text from PDFs | | pdfimages | Saves embedded images as separate files | | pdftohtml | Converts PDF to HTML/XML with layout retention | | pdfinfo | Displays document metadata (author, creation date, page count) | | pdffonts | Lists all fonts used in a PDF | | pdfseparate | Splits a multi-page PDF into single-page files | | pdfunite | Merges multiple PDFs | | pdftocairo | Converts PDF to PNG, JPEG, PDF, PS, or SVG using Cairo | pdftotext -raw -eol dos corrupted