Extract Text Content from PDF or XPS in C#

In this tutorial, we will show you how to extract text content from PDF in C# using the PDF.co PDF to Text Web API. Below is the PDF Invoice that we will convert to Text.

PDF Source File
PDF Source File

Step 1: Create New Project

To begin, let’s create a new project inside app folder using this command dotnet new console -o app.

Create New Project
Create New Project

Step 2: Open VSCode

Type cd app to go to the folder and enter code . to open VSCode.

Open VSCode

Step 3: Add Source Code

Let’s copy the PDF to Text from URL in C# source code from the documentation sample.

Step 4: Add Package

Then, let’s add a Newtonsoft.Json package using this command dotnet add package Newtonsoft.Json.

Add Package

Step 5: Add API Key

Now, let’s add our PDF.co API Key in line 14. You can get your API Key in the PDF.co dashboard.

API Key

Step 6: Add Source File

In line 18, you can find the PDF Invoice URL. If you’d like to try your file, please replace the sample link.

Add Source File

Step 7: Add Destination File

In line 24, enter your desired output filename.

Add Destination File

Step 8: Run Project

We are now ready to run the project. In the terminal type the command dotnet run.

Run Project

Step 9: Extracted Text Output

Here’s our extracted text output.

Extracted Text Output
Extracted Text Output

In this tutorial, you learned how to extract text from PDF in C# using PDF.co Web API. You learned how to create a new project in C#. You also learned how to add a Newtonsoft.Json package.