This project has moved. For the latest updates, please go here.

Apply Word style based on p class

Dec 9, 2011 at 3:11 PM

Very happy to have found this project. Thank you, onizet!

In the case of p tags (and preferably some others) I would like to look at the class attribute in the HTML and have the resulting Word paragraph use the Word style with the same name as the class, if it exists. E.g., if the tag is <p class="blue"> then apply the Word style "blue". Where should I start attacking that task?

Jan 25, 2012 at 4:14 PM

I second this request. After looking through the documentation I don't see if this is an option in the current build. Are we missing something?

Thanks for this project! It has saved me many hours.

Jan 25, 2012 at 4:37 PM
Edited Jan 25, 2012 at 4:38 PM

I was able to do this by making a small addition to ProcessParagraph in HtmlConverter.ProcessTag.cs

Here is what it looks like now:

 

private void ProcessParagraph(HtmlEnumerator en)
        {
            CompleteCurrentParagraph();
            AddParagraph(currentParagraph = htmlStyles.Paragraph.NewParagraph());

            // Respect this order: this is the way the browsers apply them
            String attrValue = en.StyleAttributes["text-align"];
            if (attrValue == null) attrValue = en.Attributes["align"];

            if (attrValue != null)
            {
                JustificationValues? align = ConverterUtility.FormatParagraphAlign(attrValue);
                if (align.HasValue)
                {
                    currentParagraph.InsertInProperties(new Justification { Val = align });
                }
            }

//start of my addition
            string classValue = en.Attributes["class"];
            if (classValue != null)
            {
                 currentParagraph.InsertInProperties(new ParagraphStyleId { Val = classValue });
            }
//end of my addition

            List<OpenXmlElement> styleAttributes = new List<OpenXmlElement>();
            bool newParagraph = ProcessContainerAttributes(en, styleAttributes);

            if (styleAttributes.Count > 0)
                htmlStyles.Runs.BeginTag(en.CurrentTag, styleAttributes.ToArray());

            if (newParagraph)
            {
                AlternateProcessHtmlChunks(en, "</p>");

                CompleteCurrentParagraph();
                AddParagraph(currentParagraph = htmlStyles.Paragraph.NewParagraph());
            }
        }

 

You need to make sure your base Word document has styles defined that use the exact same names as the classes. E.g. If you have something in your html with a class of "blue" you need to make sure you have a Word style in your base doc called "blue" (case sensitive). Otherwise no style is applied or the default style is applied.

I have not tested this extensively but it seems to do the trick for me.

I don't know how to check code back in to codeplex but anyone else can feel free to do so if it is useful.

Jan 25, 2012 at 7:05 PM

fwkb,

You totally rock! Thanks for the code. Works like a charm.

Matthew

Jan 25, 2015 at 2:49 PM
fwkb wrote:
I was able to do this by making a small addition to ProcessParagraph in HtmlConverter.ProcessTag.cs Here is what it looks like now:   private void ProcessParagraph(HtmlEnumerator en) { CompleteCurrentParagraph(); AddParagraph(currentParagraph = htmlStyles.Paragraph.NewParagraph()); // Respect this order: this is the way the browsers apply them String attrValue = en.StyleAttributes["text-align"]; if (attrValue == null) attrValue = en.Attributes["align"]; if (attrValue != null) { JustificationValues? align = ConverterUtility.FormatParagraphAlign(attrValue); if (align.HasValue) { currentParagraph.InsertInProperties(new Justification { Val = align }); } } //start of my addition string classValue = en.Attributes["class"]; if (classValue != null) { currentParagraph.InsertInProperties(new ParagraphStyleId { Val = classValue }); } //end of my addition List<OpenXmlElement> styleAttributes = new List<OpenXmlElement>(); bool newParagraph = ProcessContainerAttributes(en, styleAttributes); if (styleAttributes.Count > 0) htmlStyles.Runs.BeginTag(en.CurrentTag, styleAttributes.ToArray()); if (newParagraph) { AlternateProcessHtmlChunks(en, "</p>"); CompleteCurrentParagraph(); AddParagraph(currentParagraph = htmlStyles.Paragraph.NewParagraph()); } }   You need to make sure your base Word document has styles defined that use the exact same names as the classes. E.g. If you have something in your html with a class of "blue" you need to make sure you have a Word style in your base doc called "blue" (case sensitive). Otherwise no style is applied or the default style is applied. I have not tested this extensively but it seems to do the trick for me. I don't know how to check code back in to codeplex but anyone else can feel free to do so if it is useful.
what do you mean by add styles to base word document, i have all my styles in html but how to make them work in doc.
i can know missing styles by using

converter.HtmlStyles.StyleMissing += (s, e) =>
                {
missingStyles.Add(e.Name);
}

i want to understand how to add the missing styles to word doc
Jan 25, 2015 at 2:50 PM
can you please share how that worked for you
Coordinator
Jan 27, 2015 at 1:03 PM
It's strange I never reply to this post because I have commited back in the source this change.
So syyad, there is 2 ways to do that:

1) use a predefined template with these style having the same name as the class attribute. It works for any "container" tag: span, div, p, pre, etc...
2) handle the StyleMissing event as you did and create the style on the fly. Here is an example

You can refer to the documentation for more explanation
Nov 9, 2015 at 8:03 PM
Looking at the source code, this appears to work for Tables only? Am I wrong? I have HTML with a class, the class is defined in the Doc style sheet, and the classes are not being converted into doc styles.
Nov 19, 2015 at 9:05 AM
Thank you, onizet! You have made a great work.
I'm encountered this problem:
I have a .docx as template generated from Word2013.
In this template I have added 3 new style:
  • style_doc_1
  • style_doc_2
  • style_doc_3
This is the HTML I've tried to insert:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <meta content="text/html; charset=ISO-8859-1" http-equiv="content-type" />
</head>
<body>
    <p class="style_doc_2">Hello world!!!</p>
</body>
</html>
Then I've used this line of code:
var pars = converter.Parse(html)
with a delegate to HtmlStyles.StyleMissing.

Result: the style "style_doc_2" is missing!

I've discovered that the problem is in the TryGetValueIgnoreCase in the OpenXmlDocumentStyleCollection class at line:
int rc = String.Compare(name, keys[mid], StringComparison.OrdinalIgnoreCase);
Microsoft says at
https://msdn.microsoft.com/en-us/library/system.stringcomparison%28v=vs.110%29.aspx:
An operation that uses word sort rules performs a culture-sensitive comparison wherein certain nonalphanumeric Unicode characters might have special weights assigned to them. Using word sort rules and the conventions of a specific culture, the hyphen ("-") might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list.
So the special character "_" (underscore) in "style_doc_2" is not considered in the comparison.

I've modified that line in:
int rc = String.Compare(name, keys[mid],System.Globalization.CultureInfo.CurrentCulture, System.Globalization.CompareOptions.IgnoreCase);
and everithing is fine.
Coordinator
Nov 19, 2015 at 12:07 PM
Edited Nov 19, 2015 at 12:18 PM
Many thanks for providing such a fix. I really appreciate.
I have created a work item and commit your changes.