Extract Text Content from PDF or XPS in C#

In this tutorial, we will show you how to extract text content from PDF in C# using the PDF.co PDF to Text Web API. Below is the PDF Invoice that we will convert to Text.

  1. Create New Project
  2. Open VSCode
  3. Add Source Code
  4. Add Package
  5. Add API Key
  6. Add Source File
  7. Add Destination File
  8. Run Project
  9. Extracted Text Output
PDF Invoice Source File
PDF Source File

Step 1 – Create New Project

To begin, let’s create a new project inside app folder using this command dotnet new console -o app

Create New Project

Step 2 – Open VSCode

Type cd app to go to the folder and enter code . to open VSCode.

Open Visual Studio Code

Step 3 – Add Source Code

Let’s copy the PDF to Text from URL in C# source code here.

Step 4 – Add Package

Then, let’s add a Newtonsoft.Json package using this command dotnet add package Newtonsoft.Json.

Add Newtonsoft.Json Package

Step 5 – Add API Key

Now, let’s add our PDF.co API Key in line 14. You can get your API Key in the dashboard here.

Add PDF.co API Key

Step 6 – Add Source File

In line 18, you can find the PDF Invoice URL. If you’d like to try your file, please replace the sample link.

PDF Source File URL

Step 7 – Add Destination File

In line 24, enter your desired output filename.

Text Output Filename

Step 8 – Run Project

We are now ready to run the project. In the terminal type the command dotnet run.

Run Project

Step 9 – Extracted Text Output

Here’s our extracted text output.

Extracted Text Output
Extracted Text Output

In this tutorial, you learned how to extract text from PDF in C# using PDF.co Web API. You learned how to create a new project in C#. You also learned how to add a Newtonsoft.Json package.