Converting Pdf to Text:
PDF to text conversion is a required step in integration workflows in order to automate data extraction and make PDF content available for further processing.
Common Questions:
1.first download the below jar files from the below links
2.once you download the jar files upload in to Account libraries.
go to Account>Setup>Account Libraries
3. Once the files are uploaded, go to the Build tab and create a custom library.
4. Set the custom library type as Scripting and add the 3 jars mentioned above, and deployed to the atom
Boomi Process for Converting pdf to text:
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfCopy;
import java.util.ArrayList;
// Function to split PDF pages
ArrayList splitPdfPages(byte[] pdfBytes) {
PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfBytes));
int numPages = reader.getNumberOfPages();
ArrayList pdfPages = new ArrayList();
for (int i = 1; i <= numPages; i++) {
Document document = new Document();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
PdfCopy copy = new PdfCopy(document, outputStream);;
copy.addPage(copy.getImportedPage(reader, i));
return pdfPages;
// Loop through each document in the data context
for (int i = 0; i < dataContext.getDataCount(); i++) {
InputStream pdfInputStream = dataContext.getStream(i);
// Convert InputStream to ByteArray for reuse
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
byte[] buffer = new byte[1024];
int bytesRead;
while ((bytesRead = != -1) {
byteArrayOutputStream.write(buffer, 0, bytesRead);
byte[] pdfBytes = byteArrayOutputStream.toByteArray();
// Get the total page count of the PDF
PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfBytes));
int totalPageCount = reader.getNumberOfPages();
// Split the PDF into individual pages
ArrayList pdfPages = splitPdfPages(pdfBytes);
// Store each split page as a separate document in the data context
for (int j = 0; j < pdfPages.size(); j++) {
ByteArrayOutputStream pdfPage = pdfPages.get(j);
// Set the total page count of the original document as a Dynamic Document Property
Properties props = dataContext.getProperties(i);
props.setProperty(“document.dynamic.userdefined.PageCount”, String.valueOf(totalPageCount));
// Store the split page in the data context
dataContext.storeStream(new ByteArrayInputStream(pdfPage.toByteArray()), props);
2. The Count will the assigned to the DDP PageCount Value
3. Use the below code to convert the Data from pdf to Text
import java.util.Properties;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
for( int i = 0; i < dataContext.getDataCount(); i++ ) {
InputStream is = dataContext.getStream(i);
Properties props = dataContext.getProperties(i);
// Convert inputstream to PdfReader
PdfReader reader = new PdfReader(is);
//Extract the text from reader using PdfTextExtractor
String textFromPage = PdfTextExtractor.getTextFromPage(reader, 1);
//Convert text to inputstream
is = new ByteArrayInputStream(textFromPage.getBytes());
dataContext.storeStream(is, props);
In this example, I took the PDF files from the disk and used the Decision to separate them. If the PDF file had more than one page, those went to the false path. Then, I used the data process to convert each split page to text, and after all the pages were converted to text, we had to merge the document.