How to convert a PDF file to TXT for TXT?

Posted on

Question :

Is there any way in java to convert a PDF extension file to the TXT extension?


Answer :

You can try using the iText library, which has some ready-to-extract functionality for PDF files. One way to do this would be:

public void parsePdf(String pdf, String txt) throws IOException {
    PdfReader reader = new PdfReader(pdf);
    PdfReaderContentParser parser = new PdfReaderContentParser(reader);
    PrintWriter out = new PrintWriter(new FileOutputStream(txt));
    TextExtractionStrategy strategy;
    for (int i = 1; i <= reader.getNumberOfPages(); i++) {
        strategy = parser.processContent(i, new SimpleTextExtractionStrategy());

Where the pdf parameter is the PDF file to extract the text and the txt parameter is the destination TXT file.

This code snippet was taken from a ready-made example created by the iText developer. This example, as well as the resulting TXT, can be found in this link .


Leave a Reply

Your email address will not be published. Required fields are marked *