In this article I will explain with an example, how to convert Image to Text using Microsoft Office Document Imaging (MODI) in ASP.Net with C# and VB.Net.
This process of reading or extracting text from images is also termed as Optical Character Recognition (OCR).
This article will explain how to upload an Image containing some text and the text will be read from the Image using OCR process and displayed in ASP.Net Label control.
Downloading and installing the Microsoft Office Document Imaging (MODI) library
Adding Reference of Microsoft Office Document Imaging (MODI) library to your project
In order to add reference of Microsoft Office Document Imaging (MODI) to your project, Right Click on the project in Solution Explorer and select Add, then Add Reference….
Then inside the Reference Manager dialog, expand the COM tab and look for the name Microsoft Office Document Imaging 12.0 Type Library from the list and check (select) the CheckBox and click OK.
After successfully referenced, you will see the Interop.MODI.dll in the Bin Folder (Directory).
HTML Markup
The following HTML Markup consists of an ASP.Net FileUpload control, a Button and a Label control.
Select File:
<asp:FileUpload ID="fuUpload" runat="server" />
<asp:Button Text="Upload" runat="server" OnClick="OnUpload" />
<hr />
<asp:Label ID="lblText" runat="server" />
Namespaces
You will need to import the following namespaces.
C#
using MODI;
using System.IO;
VB.Net
Imports MODI
Imports System.IO
Reading or extracting text from image using Microsoft Office Document Imaging (MODI)
When the Upload Button is clicked, the selected file is saved inside the Uploads Folder (Directory) and then the file path is passed to the ExtractTextFromImage method.
Inside the ExtractTextFromImage method, the file is read from the saved path using MODI Document object and the text is extracted from the image using MODI Image object and returned back.
Finally, the extracted text is assigned to the Label control.
Note: Before assigning to the Label control, the new line character is replaced with “<br />” for displaying new lines on web page. For Windows and Console application this process is not needed.
C#
protected void OnUpload(object sender, EventArgs e)
{
string filePath = Server.MapPath("~/Uploads/" + Path.GetFileName(FileUpload1.PostedFile.FileName));
FileUpload1.SaveAs(filePath);
string extractText = this.ExtractTextFromImage(filePath);
lblText.Text = extractText.Replace(Environment.NewLine, "<br />");
}
private string ExtractTextFromImage(string filePath)
{
Document modiDocument = new Document();
modiDocument.Create(filePath);
modiDocument.OCR(MiLANGUAGES.miLANG_ENGLISH);
MODI.Image modiImage = (modiDocument.Images[0] as MODI.Image);
string extractedText = modiImage.Layout.Text;
modiDocument.Close();
return extractedText;
}
VB.Net
Protected Sub OnUpload(sender As Object, e As EventArgs)
Dim filePath As String = Server.MapPath("~/Uploads/" + Path.GetFileName(FileUpload1.PostedFile.FileName))
FileUpload1.SaveAs(filePath)
Dim extractText As String = Me.ExtractTextFromImage(filePath)
lblText.Text = extractText.Replace(Environment.NewLine, "<br />")
End Sub
Private Function ExtractTextFromImage(filePath As String) As String
Dim modiDocument As New Document()
modiDocument.Create(filePath)
modiDocument.OCR(MiLANGUAGES.miLANG_ENGLISH)
Dim modiImage As MODI.Image = TryCast(modiDocument.Images(0), MODI.Image)
Dim extractedText As String = modiImage.Layout.Text
modiDocument.Close()
Return extractedText
End Function
Screenshots
Image with some text
The extracted Text
Downloads