

Converting Pdf to Text:  PDF to text conversion is a required step in integration workflows in order to automate data extraction and make PDF content available for further processing.  Common Questions:  Implementation:    1.first download the below jar files from the below links  https://mvnrepository.com/artifact/com.itextpdf/itextpdf/5.5.13 https://javadoc.io/doc/org.apache.pdfbox/pdfbox/2.0.26/index.html 2.once you download  the jar files upload in to Account […]


Converting Pdf to Text: 

PDF to text conversion is a required step in integration workflows in order to automate data extraction and make PDF content available for further processing. 

Common Questions: 

  1. In a pdf file i have 2 pages, but it reads only the first Page 

  1. How to find the number of pages in PDF Document 

  1. How to split the pdf file 


  1.first download the below jar files from the below links 

  • fontbox-2.0.16-javadoc.jar 

  •  itextpdf-5.5.13.jar 

  •  pdfbox-2.0.16-javadoc.jar 



2.once you download  the jar files upload in to Account libraries. 

go to Account>Setup>Account Libraries 

3. Once the files are uploaded, go to the Build tab and create a custom library. 

4. Set the custom library type as Scripting and add the 3 jars mentioned above, and deployed to the atom 

Boomi Process for Converting pdf to text: 

  1. In the Boomi process, once you receive the PDF file, add a Data Process shape to count the pages and split them using the code provided below. 

import com.itextpdf.text.Document; 
import com.itextpdf.text.pdf.PdfReader; 
import com.itextpdf.text.pdf.PdfCopy; 
import java.io.InputStream; 
import java.io.ByteArrayOutputStream; 
import java.io.ByteArrayInputStream; 
import java.util.ArrayList; 

// Function to split PDF pages 

ArrayList splitPdfPages(byte[] pdfBytes) { 
    PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfBytes)); 
    int numPages = reader.getNumberOfPages(); 
    ArrayList pdfPages = new ArrayList(); 
    for (int i = 1; i <= numPages; i++) { 
        Document document = new Document(); 
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); 
        PdfCopy copy = new PdfCopy(document, outputStream); 
        copy.addPage(copy.getImportedPage(reader, i)); 
    return pdfPages; 

// Loop through each document in the data context 

for (int i = 0; i < dataContext.getDataCount(); i++) { 
    InputStream pdfInputStream = dataContext.getStream(i); 
    // Convert InputStream to ByteArray for reuse 
    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); 
    byte[] buffer = new byte[1024]; 
    int bytesRead; 
    while ((bytesRead = pdfInputStream.read(buffer)) != -1) { 
        byteArrayOutputStream.write(buffer, 0, bytesRead); 
    byte[] pdfBytes = byteArrayOutputStream.toByteArray(); 

    // Get the total page count of the PDF 
    PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfBytes)); 
int totalPageCount = reader.getNumberOfPages(); 

    // Split the PDF into individual pages 
    ArrayList pdfPages = splitPdfPages(pdfBytes); 

    // Store each split page as a separate document in the data context 
    for (int j = 0; j < pdfPages.size(); j++) { 
        ByteArrayOutputStream pdfPage = pdfPages.get(j); 

        // Set the total page count of the original document as a Dynamic Document Property 

        Properties props = dataContext.getProperties(i); 
        props.setProperty(“document.dynamic.userdefined.PageCount”, String.valueOf(totalPageCount)); 

        // Store the split page in the data context 
        dataContext.storeStream(new ByteArrayInputStream(pdfPage.toByteArray()), props); 

     2. The Count will the assigned to the DDP PageCount Value  

     3. Use the below code to convert the Data from pdf to Text 

  import java.util.Properties; 
import java.io.InputStream; 
import com.itextpdf.text.pdf.PdfReader; 
import com.itextpdf.text.pdf.parser.PdfTextExtractor; 
for( int i = 0; i < dataContext.getDataCount(); i++ ) { 
   InputStream is = dataContext.getStream(i); 
   Properties props = dataContext.getProperties(i); 

// Convert inputstream to PdfReader 

  PdfReader reader = new PdfReader(is); 

//Extract the text from reader using PdfTextExtractor 

   String textFromPage = PdfTextExtractor.getTextFromPage(reader, 1); 

//Convert text to inputstream 

  is = new ByteArrayInputStream(textFromPage.getBytes()); 
   dataContext.storeStream(is, props); 


 In this example, I took the PDF files from the disk and used the Decision to separate them. If the PDF file had more than one page, those went to the false path. Then, I used the data process to convert each split page to text, and after all the pages were converted to text, we had to merge the document. 

Leave a Reply

Your email address will not be published. Required fields are marked *

Share the Post:

Related Posts


Test Automation

Test automation involves using specialized software to execute tests and compare actual outcomes with expected results. Instead of manually testing

Read More