In this article I will explain with an example, how to read or extract text from image using Tesseract OCR library in Windows Forms (WinForms) Application using C# and VB.Net.
This process of reading or extracting text from images is also termed as Optical Character Recognition (OCR).
Installing and configuring Tesseract Library
Installing Tesseract Library
You will need to install the Tesseract package using the following command.
Install-Package Tesseract -Version 5.2.0
Downloading and configuring Tesseract Data Files
You will need to download the Tesseract Data files from the following link.
Once downloaded, unzip it.
Then copy it to the project root folder and rename it to tessdata as shown below.
Form Design
The following Windows Form consists of a Button, a Label and OpenFileDialog control.
Namespaces
You will need to import the following namespaces.
C#
using System.IO;
using Tesseract;
VB.Net
Imports System.IO
Imports Tesseract
Reading Text from Image File using C# and VB.Net
Inside the Button Click event handler, the Path of the selected File is read from the FileName property of the OpenFileDialog Box and passed to the ExtractTextFromImage method.
Inside the ExtractTextFromImage method, first the Tesseract Engine is initialized by setting the tessdata folder path and the Language.
Then, the file is read from the path using Tesseract Pix object and then the text is extracted from the image using Tesseract Page object.
Finally, the extracted text is assigned to the Label control.
C#
private void btnSelect_Click(object sender, EventArgs e)
{
if (openFileDialog1.ShowDialog() == DialogResult.OK)
{
string filePath = openFileDialog1.FileName;
string extractText = this.ExtractTextFromImage(filePath);
lblText.Text = extractText;
}
}
private string ExtractTextFromImage(string filePath)
{
string tessdataPath = Application.StartupPath.Replace("\\bin\\Debug", "") + Path.DirectorySeparatorChar + "tessdata";
using (TesseractEngine engine = new TesseractEngine(tessdataPath, "eng", EngineMode.Default))
{
using (Pix pix = Pix.LoadFromFile(filePath))
{
using (Tesseract.Page page = engine.Process(pix))
{
return page.GetText();
}
}
}
}
VB.Net
Private Sub btnSelect_Click(ByVal sender As Object, ByVal e As EventArgs) Handles btnSelect.Click
If openFileDialog1.ShowDialog() = DialogResult.OK Then
Dim filePath As String = openFileDialog1.FileName
Dim extractText As String = Me.ExtractTextFromImage(filePath)
lblText.Text = extractText
End If
End Sub
Private Function ExtractTextFromImage(ByVal filePath As String) As String
Dim tessdataPath As String = Application.StartupPath.Replace("\bin\Debug", "") + Path.DirectorySeparatorChar & "tessdata"
Using engine As TesseractEngine = New TesseractEngine(tessdataPath, "eng", EngineMode.Default)
Using pix As Pix = Pix.LoadFromFile(filePath)
Using page As Tesseract.Page = engine.Process(pix)
Return page.GetText()
End Using
End Using
End Using
End Function
Screenshots
Image with some text
The extracted Text
Demo
Downloads