Tesseract OCR Read Extract Text from Image in ASPNet MVC

In this article I will explain with an example, how to read or extract text from image (OCR) in ASP.Net MVC.

This process of reading or extracting text from images is also termed as Optical Character Recognition (OCR) and it will be done with the help of Tesseract OCR library.

Installing and configuring Tesseract Library

Installing Tesseract Library

You will need to install the Tesseract package using the following command.

Install-Package Tesseract -Version 5.2.0

Note: For more details on how to install package from Nuget, please refer my article, Install Nuget package in Visual Studio 2017, 2019, 2022.

Downloading and configuring Tesseract Data Files

You will need to download the Tesseract Data files from the following link.

https://github.com/tesseract-ocr/tessdata

Once downloaded, unzip it.

Tesseract OCR: Read (Extract) Text from Image in ASP.Net MVC

Then copy it to the project root folder and rename it to tessdata as shown below.

Namespaces

You will need to import the following namespaces.

using System.IO;

using Tesseract;

Controller

The Controller consists of two Action methods.

Action method for handling GET operation

Inside this Action method, simply the View is returned.

Action method for handling POST operation

This Action method gets called when the Form is submitted.

Inside this Action Method, the posted file is saved inside the Uploads Folder (Directory) and then the file path is passed to the ExtractTextFromImage method.

Note: For more details, about how to upload file in ASP.Net MVC, please refer my article, ASP.Net MVC: Simple File Upload Tutorial with example.

Inside the ExtractTextFromImage method, first the Tesseract Engine is initialized by setting the tessdata folder path and the Language.

Then, the file is read from the saved path using Tesseract Pix object and then the text is extracted from the image using Tesseract Page object.

Finally, the extracted text is set into a ViewBag object.

public class HomeController : Controller

{

// GET: Home

public ActionResult Index()

{

return View();

}

[HttpPost]

public ActionResult Index(HttpPostedFileBase postedFile)

{

if (postedFile != null)

{

string filePath = Server.MapPath("~/Uploads/" + Path.GetFileName(postedFile.FileName));

postedFile.SaveAs(filePath);

string extractText = this.ExtractTextFromImage(filePath);

ViewBag.Message = extractText.Replace(Environment.NewLine, "<br />");

}

return View();

}

private string ExtractTextFromImage(string filePath)

{

string path = Server.MapPath("~/") + Path.DirectorySeparatorChar + "tessdata";

using (TesseractEngine engine = new TesseractEngine(path, "eng", EngineMode.Default))

{

using (Pix pix = Pix.LoadFromFile(filePath))

{

using (Tesseract.Page page = engine.Process(pix))

{

return page.GetText();

}

View

The View consists of an HTML Form which has been created using the Html.BeginForm method with the following parameters.

ActionName – Name of the Action. In this case the name is Index.

ControllerName – Name of the Controller. In this case the name is Home.

FormMethod – It specifies the Form Method i.e. GET or POST. In this case it will be set to POST.

Inside the Form, there is an INPUT FileUpload element and a Submit Button.

Submitting the Form

When the Upload button is clicked, the Form is submitted and the extracted Text from Image is displayed using ViewBag object.

Layout = null;

}

<!DOCTYPE html>

<html>

<head>

<title>Index</title>

</head>

<body>

@using (Html.BeginForm("Index", "Home", FormMethod.Post, new { enctype = "multipart/form-data" }))

{

<span>Select File:</span>

<hr />

<span>@ViewBag.Message</span>

}

</body>

</html>

Screenshots

Image with some text

The extracted Text

Downloads

Download Code

Tesseract OCR: Read (Extract) Text from Image in ASP.Net MVC

Related

Ask Question

Company

Explore

Follow us

Contact Us