Why use PDF to Text API?

Extract PDF to Text
Retains Original Format of Source Text

PDF.co API platform can retain the original format of the source text. PDF.co can provide much better PDF to Text conversion compared to other PDF to Text tools.

Damaged and Scanned Text Support

The PDF.co engine supports damaged and scan text with the help of our built-in OCR (Optical Character Recognition).


Web API Integrations and Multiple Programming Languages

PDF.co Web API is available in multiple programming languages such as C#, Java, Javascript, PHP, .NET and ASP.NET, Visual Basic, and more.

You may also use automation platforms if you’re not a programmer. Here’s our list of available integrations: Zapier, Integromat, UiPath, BluePrism, Automation Anywhere, and others.

On-Premise Available for Enterprise Customers

PDF.co Web API uses certified and secure cloud infrastructure, but our enterprise customers can also try our on-premise version if they need to process very sensitive data in-house which also includes working completely offline when needed.


PDF to Text API Sample & Demo

I’ll be using this sample PDF file below for this demo.

Screenshot of Source PDF
Screenshot of Source PDF

By using the code snippets below which are in different programming languages. We’ll be converting the sample PDF file above into a text file.

The final result will look like this.

Screenshot of Output Text File
Screenshot of Output Text File

Before we proceed with the code. Let us first check the /v1/pdf/convert/to/text parameters and their uses.


URL: https://api.pdf.co/v1/pdf/convert/to/text
Method: POST
Parameter Description
url required. Link to the source file.
lang optional. english by default. Sets OCR (image to text extraction) language to be used for scanned PDF when a scanned document is detected or input is PNG, JPG images. Other supported values: eng, spa, deu, fra, jpn, chi_sim, chi_tra, kor. You can also specify two languages to be used on the same page, for example: eng+deu, jpn+kor or other combinations.
inline optional. Must be one of: true to return data as inline or false to return link to an output file (default).
unwrap optional. Unwrap lines to a single line within table cells when lineGrouping is enabled. Must be one of true or false.
pages optional. Comma-separated list of page indices (or ranges) to process. IMPORTANT: the very first page starts with (zero). To set a range use the dash , for example: 0, 2-5, 7-.
rect optional. Defines coordinates for extraction, e.g. 51.8, 114.8, 235.5, 204.0. Must be a string.
encrypt optional. Enable encryption for the output file: true or false
async optional. Runs processing asynchronously. Returns jobId to use with job/checktrue or false
name optional. Output file name.
profiles optional. Must be a String. Set custom configuration. See profiles examples here
lineGrouping optional. Line grouping with table cells. Set to 1 to enable the grouping. Must be a string.

Now we are ready to write some codes.

cURL Code Snippet

curl --location --request POST 'https://api.pdf.co/v1/pdf/convert/to/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: YOUR_API_KEY' \
--data-raw '{
"url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/pdf-to-text/sample.pdf",
"inline": false

This sample code and other cURL source source code samples are available here.

Now let’s see this program in action.

Output Text using cURL
Output Text using cURL

The Sample code for PDF to Text in JavaScript is available here.

The Sample code for PDF to Text in PHP is available here.

The Sample code for PDF to Text in Python is here.

The Sample code for PDF to Text in Java is available here.

The Sample code for PDF to Text in C# is available here.


NOTE: Use PDF.co Document Classifier to know the source of the document. You can easily create and maintain classification rules with the desktop-based Classifier Testing Tool (see the details here)

Related Pages:

Related Samples: