If you need to extract data from a lot of PDF documents coming from different sources then the best way is to first sort them by the vendor. We’ve created a PDF Classifier tool that is available in both cloud and on-premise versions of our PDF Extractor API.
- How it Works
- Create and Test Classification Rules
- Test Classification Rules on Folders With PDF Documents
- Test Classification Rules on Scanned Documents
- Copy Rules as JSON
How it Works
- Create rules as CSV (comma-separated values) where every row has the following columns: classname, OR or AND logic (OR is used by default), keyword1 or phrase1, keyword2 or phrase2,…
- Test these rules on your sample PDF files
- Generate JSON request for use with PDF.co or just save rules as CSV and pass the link along with all requests
- Use pdf/classifier endpoint in PDF.co (cloud) or API Server (on-prem)
- pdf/classifier endpoint will return detected class for input PDF, JPG, PNG, or TIFF document
To make it easy to quickly test, maintain, update your classification rules we’ve created the classification rules testing tool that is available as a part of the PDF Multitool desktop app (download page is here).
Create and Test Classification Rules
Use the spreadsheet-like interface to define new classes with rules, use plain text, use regular expressions and quickly test rules to see how they work on your PDF documents.
Test Classification Rules on Folders With PDF Documents
As the ultimate goal is to sort PDF files in a batch, you can test classification rules on folders with PDF files to see which class every file will produce.
Test Classification Rules on Scanned Documents
You can also test classification rules on scanned documents as well.
Copy Rules as JSON
Get the PDF classification testing tool now from this page (you need to download the PDF Multitool app).