How to Extract Text, Images, and Vector Line Drawing in a PDF using PDF.co and Zapier
Extracting text, images, and vector line drawings can be a time-consuming and challenging task to do especially when dealing with large volumes of documents.
PDF.co is a powerful PDF processing platform that allows users to extract text, images, and other information from PDFs quickly and easily. The platform supports a wide range of input formats, including scanned PDFs and even images, and can output data in a variety of formats, such as CSV, JSON, and XML.
In this tutorial, we will cover the process of how to extract text, images, and vector line drawings from a PDF document by using the applications PDF.co and Zapier.
We will use a sample PDF document to demonstrate the process of extracting text, images, and vector line drawings in JSON format using PDF.co.
Step 1: Create a Zap
- Start by logging into your Zapier account and clicking on the Create Zap button.
Step 2: Add Google Drive App
- Next, select the Google Drive app and choose the New File in Folder option. This will trigger the process when a new file is added to the specified folder.
Step 3: Connect Google Drive Account
- To proceed, connect your Google Drive account to Zapier and grant access to authorize the connection between the two services.
Step 4: Setup Trigger
Let’s set up the trigger.
- First, select My Google Drive as the drive to use.
- Next, specify the folder name where the source file is located.
Step 5: Test Trigger
- Now, let’s test the trigger to ensure that it was set up correctly.
Step 6: Test Trigger Result
- Awesome! The test trigger was successful in retrieving the file from Google Drive. Now, let’s proceed by adding another app to extract text, images, and vector line drawings from the PDF document.
Step 7: Add PDF.co App
- In this step, we will add the PDF.co app and choose the Custom API Call option.
Step 8: Connect PDF.co Account
- Now, connect your PDF.co account to Zapier by adding the API Key. You can obtain the API key from your PDF.co dashboard or by signing up.
Step 9: Setup Action
Let’s set up the action.
- First, enter the PDF to JSON2 endpoint to extract text, images, and vector line drawing from a PDF document.
- Next, select the Web Content Link from Google Drive as the input file.
- After that, input the JSON code containing the profiles with the save embed image option for image extraction.
{
"profiles": "{ 'SaveImages': 'Embed' }"
}
Step 10: Test Action
- Now, let’s test the action to ensure that we have set up the PDF.co Custom API Call correctly. This will send a request to PDF.co to extract text, images, and vector line drawing from the PDF document.
Step 11: Test Result
- Congratulations! The test was successful and PDF.co returned a temporary URL with the extracted text, images, and vector line drawing from the PDF document. To view the output, simply copy the URL and paste it into your browser.
Step 12: Extracted JSON Output
- Below is the JSON output containing the extracted text, images, and vector line drawing from the PDF document.
In this tutorial, you learned how to extract text, images, and vector line drawings in a PDF document using PDF.co and Zapier. You also learned how to use the PDF.co PDF to JSON2 endpoint to extract text and images from the PDF document.