Help needed with formatting text from PDF

Yes you so have to write some Python code as bridge between Robot Framework and this library.

The following Python code can be used as a simple library to convert a PDF to text:

import pymupdf

def pdf_to_text(pdf_file):     
    """"
    Extracts text from the PDF file and returns the result.
    """
    text = ""
    doc = pymupdf.open(pdf_file) # open a document
    for page in doc: # iterate the document pages
        text = text + page.get_text()
    return text  # return the result

Just create a file named for instance pdf_to_text.py with the above code. Then in your Robot Framework file include this library like

*** Settings ***
Library         "..path_to_your_libraries.."/pdf_to_text.py

After this you can use the keyword Pdf To Text with as argument the location to the PDF document.

${pdf_text} Pdf To Text ${my_pdf_file}

Also don’t forget to first install PyMuPDF library.
In the documentation and here is example code how to deal with some specific situations.

2 Likes