Class PdfReader
java.lang.Object
org.apache.pdfbox.contentstream.PDFStreamEngine
org.apache.pdfbox.text.PDFTextStripper
dev.botcity.botcity_document_processing.pdf.PdfReader
public class PdfReader
extends org.apache.pdfbox.text.PDFTextStripper
Overwrites PDF Box PDFTextStripper and converts each text element to an Entry object
 that is used to parse the document based on the visual structure.
- Author:
- Gabriel Archanjo
- 
Field SummaryFields inherited from class org.apache.pdfbox.text.PDFTextStrippercharactersByArticle, document, LINE_SEPARATOR, output
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionprotected floatcomputeFontHeight(org.apache.pdfbox.pdmodel.font.PDFont arg0) floatfloatvoidloadEntries(Object entries) protected voidshowGlyph(org.apache.pdfbox.util.Matrix arg0, org.apache.pdfbox.pdmodel.font.PDFont arg1, int arg2, String arg3, org.apache.pdfbox.util.Vector arg4) protected voidwriteString(String text, List<org.apache.pdfbox.text.TextPosition> textPositions) Methods inherited from class org.apache.pdfbox.text.PDFTextStripperendArticle, endDocument, endPage, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPage, processPages, processTextPosition, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setShouldSeparateByBeads, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePage, writePageEnd, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeText, writeWordSeparatorMethods inherited from class org.apache.pdfbox.contentstream.PDFStreamEngineaddOperator, applyTextAdjustment, beginMarkedContentSequence, beginText, decreaseLevel, endMarkedContentSequence, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getLevel, getResources, getTextLineMatrix, getTextMatrix, increaseLevel, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, registerOperatorProcessor, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showFontGlyph, showForm, showGlyph, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, showType3Glyph, transformedPoint, transformWidth, unsupportedOperator
- 
Constructor Details- 
PdfReader- Throws:
- IOException
 
 
- 
- 
Method Details- 
readFile- Throws:
- FileNotFoundException
- IOException
 
- 
getPageWidthpublic float getPageWidth()
- 
getPageHeightpublic float getPageHeight()
- 
writeStringprotected void writeString(String text, List<org.apache.pdfbox.text.TextPosition> textPositions) throws IOException - Overrides:
- writeStringin class- org.apache.pdfbox.text.PDFTextStripper
- Throws:
- IOException
 
- 
loadEntries
- 
showGlyphprotected void showGlyph(org.apache.pdfbox.util.Matrix arg0, org.apache.pdfbox.pdmodel.font.PDFont arg1, int arg2, String arg3, org.apache.pdfbox.util.Vector arg4) throws IOException - Overrides:
- showGlyphin class- org.apache.pdfbox.contentstream.PDFStreamEngine
- Throws:
- IOException
 
- 
computeFontHeight- Throws:
- IOException
 
 
-