In this article I will explain with an example, how to read or extract text from image (OCR) in ASP.Net MVC.
This process of reading or extracting text from images is also termed as Optical Character Recognition (OCR) and it will be done with the help of Tesseract OCR library.
Installing and configuring Tesseract Library
Installing Tesseract Library
You will need to install the Tesseract package using the following command.
Install-Package Tesseract -Version 5.2.0
Downloading and configuring Tesseract Data Files
You will need to download the Tesseract Data files from the following link.
Once downloaded, unzip it.
Then copy it to the project root folder and rename it to tessdata as shown below.
Namespaces
You will need to import the following namespaces.
using System.IO;
using Tesseract;
Controller
The Controller consists of two Action methods.
Action method for handling GET operation
Inside this Action method, simply the View is returned.
Action method for handling POST operation
This Action method gets called when the Form is submitted.
Inside this Action Method, the posted file is saved inside the Uploads Folder (Directory) and then the file path is passed to the ExtractTextFromImage method.
Inside the ExtractTextFromImage method, first the Tesseract Engine is initialized by setting the tessdata folder path and the Language.
Then, the file is read from the saved path using Tesseract Pix object and then the text is extracted from the image using Tesseract Page object.
Finally, the extracted text is set into a ViewBag object.
public class HomeController : Controller
{
// GET: Home
public ActionResult Index()
{
return View();
}
[HttpPost]
public ActionResult Index(HttpPostedFileBase postedFile)
{
if (postedFile != null)
{
string filePath = Server.MapPath("~/Uploads/" + Path.GetFileName(postedFile.FileName));
postedFile.SaveAs(filePath);
string extractText = this.ExtractTextFromImage(filePath);
ViewBag.Message = extractText.Replace(Environment.NewLine, "<br />");
}
return View();
}
private string ExtractTextFromImage(string filePath)
{
string path = Server.MapPath("~/") + Path.DirectorySeparatorChar + "tessdata";
using (TesseractEngine engine = new TesseractEngine(path, "eng", EngineMode.Default))
{
using (Pix pix = Pix.LoadFromFile(filePath))
{
using (Tesseract.Page page = engine.Process(pix))
{
return page.GetText();
}
}
}
}
}
View
The View consists of an HTML Form which has been created using the Html.BeginForm method with the following parameters.
ActionName – Name of the Action. In this case the name is Index.
ControllerName – Name of the Controller. In this case the name is Home.
FormMethod – It specifies the Form Method i.e. GET or POST. In this case it will be set to POST.
Inside the Form, there is an INPUT FileUpload element and a Submit Button.
Submitting the Form
When the Upload button is clicked, the Form is submitted and the extracted Text from Image is displayed using ViewBag object.
@{
Layout = null;
}
<!DOCTYPE html>
<html>
<head>
<meta name="viewport" content="width=device-width" />
<title>Index</title>
</head>
<body>
@using (Html.BeginForm("Index", "Home", FormMethod.Post, new { enctype = "multipart/form-data" }))
{
<span>Select File:</span>
<input type="file" name="postedFile" />
<input type="submit" value="Upload" />
<hr />
<span>@ViewBag.Message</span>
}
</body>
</html>
Screenshots
Image with some text
The extracted Text
Downloads