Hi there,
I am currently working on a project that import a test from a "standard" word format to a MySql database using ASP NET C#.
I have been able to parse the file and gather the information that I want.
Unfortunately there is a problem with the formatting, because on the database it's insert only plain text from word.
I mean bold or underlined text is imported as plain text, as well as number of text list (excluded •- symbol), paragraph indentation etc.
Please let me know what you all come up with!
Here is some code
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(file, true))
{
body = wordDoc.MainDocumentPart.Document.Body;
contents = "";
var reg = new Regex(@"^[\s\p{L}•-]");
foreach (Paragraph co in
wordDoc.MainDocumentPart.Document.Body.Descendants<Paragraph>().Where<Paragraph>(somethingElse => reg.IsMatch(somethingElse.InnerText)))
{
contents += co.InnerText + "<br />";
//insert on db;
}
}