In this tutorial, we’re going to see how to convert a PDF to XML with the PDF.co API. We are having the Node.js implementation here and it depends on the request. So let us install this module.
Step 1: Install the Module
Now that it is going to install these, let us see what the program contains so it has a placeholder for the API Keys here. We provide the API Keys here, we have the source file here. Let us see the contents of this file. It is the standard invoice file here.
Step 2: Enter API Key and Open PDF Pages to be Converted to XML
We’re going to provide the pages for which we want to generate the XML file. If our input page is password protected, we will need these but we are not having this case here. We provide the result file name here and we generate the URL. What we are using is the PDF converter to XML. We provide the request options. We provide the API key in the header. We provide a form of data. So we provide the name password and pages in the file here and the file that we are actually doing we are creating the read stream. So basically it is internally creating the byte array and going to pass here.
As a form data once, we got the response. We utilize some requests or posts. Basically, what we are doing here is parsing the JSON body and if all goes well, we are getting the destination file and we are downloading that.
Step 3: Execute the Conversion and Check the Result
Now that I have replaced the API key. Let us see these in action. So I’m going to fire up the terminal and execute this. It has created the result XML file. It has generated all the output in the XML format. If you see we have the text element here and we have the text in the content and in attributes. We have different informative attributes.
Like what is the font name, what is a font size, which is the coordinates, what’s the width, height, and so on. So that’s how easy to convert a PDF to XML with the PDF.co Web API.