This project has moved. For the latest updates, please go here.

Nested numbered lists

Feb 6, 2015 at 6:04 PM
Edited Feb 6, 2015 at 9:33 PM
When the import HTML contains the nested <ol>, and I would like the 1st level to be numbered 1, 2, 3, but 2nd level i, ii, iii etc, should that be handled in HTML, document template I am importing into, or somehow else?

I load the template from .dotx, change the document type to document, and then add whatever in the HTML to the body in order to leverage the static items in the template, such as corporate letterhead etc.

Thank you!

Edit: after some playing, I created a new style inside .dotx called MultilevelList and defined it as above. Then I imported HTML below:
<ol class="MultilevelList">
    <li>List item 1</li>
    <li>Another List Item</li>
    <li>This has nested list
        <ol class="MultilevelList">
            <li>Nested list item
            </li>
            <li>Another nested list item </li>
        </ol>
    </li>
</ol>
But I am still getting this:
1.  List item 1
2.  Another List Item
3.  This has nested list
    1.  Nested list item
    2.  Another nested list item 
What am I doing wrong?

Thanks again!

Another edit: as I am subscribed to the missing style event, I know that my MultilevelList is not missing. To test if class specification should work I added a line to the HTML above:
<p class="Heading1">Paragraph, styled as Heading 1</p>
and that line appeared as Heading 1 just fine.
Feb 9, 2015 at 7:19 PM
Debugging the above, I noticed, that even though <ol> specified class="MultilevelList", the paragraph's style is ListParagraph.
What am I doing wrong that my MultilevelList is being replaced with ListParagraph even though no missing style event is occurring?
Feb 10, 2015 at 1:21 PM
The problem is the <ol> and <li> ignore the class.

I borrowed some code from the <p> class functions and inserted it into ProcessLi
private void ProcessLi(HtmlEnumerator en)
        {
            CompleteCurrentParagraph(false);
            currentParagraph = htmlStyles.Paragraph.NewParagraph();

            int numberingId = htmlStyles.NumberingList.ProcessItem(en);
            int level = htmlStyles.NumberingList.LevelIndex;

            // Save the new paragraph reference to support nested numbering list.
            Paragraph p = currentParagraph;
///////////
// Adapt some code from Paragraph styles 
            // implemented by ddforge
            String[] classes = en.Attributes.GetAsClass();
            string className= null;
            if (classes != null) 
            {
                for (int i = 0; i < classes.Length; i++)
                {
                    className = htmlStyles.GetStyle(classes[i], StyleValues.Paragraph, ignoreCase: true);
                    if (className != null) // only one Style can be applied in OpenXml and dealing with inheritance is out of scope
                    {
                        //// Insert the <li> class as the new paragraph Style
                        currentParagraph.InsertInProperties(prop => prop.ParagraphStyleId = new ParagraphStyleId() { Val = className });
                        break;
                    }
                }
            }
////// Only do the original "ListParagraph" if we cant find a class in the <li>
            if (className == null)
            {
                currentParagraph.InsertInProperties(prop =>
                {
                    prop.ParagraphStyleId = new ParagraphStyleId() { Val = htmlStyles.GetStyle("ListParagraph", StyleValues.Paragraph) };
                    prop.SpacingBetweenLines = new SpacingBetweenLines() { After = "0" };
                    prop.Indentation = new Indentation() { Hanging = "357", Left = (level * 357).ToString(CultureInfo.InvariantCulture) };
                    prop.NumberingProperties = new NumberingProperties
                    {
                        NumberingLevelReference = new NumberingLevelReference() { Val = level - 1 },
                        NumberingId = new NumberingId() { Val = numberingId }
                    };
                });
            }

            // Restore the original elements list
            AddParagraph(currentParagraph);

            // Continue to process the html until we found </li>
            HtmlStyles.Paragraph.ApplyTags(currentParagraph);
            AlternateProcessHtmlChunks(en, "</li>");
            p.Append(elements);
            this.elements.Clear();
        }
But again it ignores the class in <ol> and only uses the class in <li>
So your HTML would be:
<ol>
    <li class="MultilevelList">List item 1</li>
    <li class="MultilevelList">Another List Item</li>
    <li class="MultilevelList">This has nested list
        <ol>
            <li class="MultilevelList">Nested list item</li>
            <li class="MultilevelList">Another nested list item </li>
        </ol>
    </li>
</ol>
You might have to use a different class for your second level lists depending on your list setup.
It could still do with a little work but I'm sure you could adapt it to your needs
Feb 10, 2015 at 1:47 PM
What would passing the class down from <ol> to <li> entail?
Feb 10, 2015 at 1:53 PM
The Code above only looks for class="" inside a <li> element. <ol> is ignored. So it needs to be in the <li>.
I'm about to submit a patch so you can download my HtmlConverter.ProcessTag.cs file to implement the code above.
Feb 10, 2015 at 1:56 PM
Yes, I realize that, but I am wondering what would it entail if the class was passed from <ol> processor to <li> processor.
The problem with the above approach is that <li> may already be styled differently for whatever reasons.
That styling will have to be dropped just to accommodate multilevel numbering which should be global to the entire set of nested lists.
Feb 10, 2015 at 2:40 PM
First of all without the variation above the class of the <li> is ignored anyway.

I realise now my code will ignore numbering levels if there is a class.
It was more of a hack that suited my use-case.
Im not going to submit this as a patch now as it does not implement numbered lists.

To get it right, You would need to check for a class in the <ol> tag from ProcessNumberingList()
pass it to ProcessLi() implement the class to style and also keep tabs on the indenting and numbering.
its all in HtmlConverter.ProcessTag.cs
Coordinator
Feb 10, 2015 at 3:39 PM
hello, if you submit a patch, I will merge it with great pleasure.
But I know the List are a very complicated subjects and not well designed in OpenXml.

And sorry, I don't have really much time to debug/improve it right now.
Feb 10, 2015 at 3:41 PM
Edited Feb 10, 2015 at 4:49 PM
Not to be sorry, this is much appreciated as that will take at least some users at least half-way!

Actually I meant to add, that even when using Word GUI, I am still having trouble applying that MultilevelList style - it seems to be applied, but the list items remain ListParagraph. Visually the numbering format changes, just the style name does not stick.

Just to add that I tried the code above, and it did assign MultilevelList class. The missing class event did not fire, so the class should have been applied all right. But it was not! The paragraphs simply became Normal style.

I am growing skeptical of the whole concept of using OpenXML to create the Word documents and here is why:
  1. The styles applied programmatically do not behave the same, as if assigned through Word GUI.
  2. When Styles panel is configured to show all styles alphabetically, MultilevelList does not appear there.
  3. MultilevelList when applied, does not stick, being replaced by ListParagraph.
Word is sounding too buggy and unstable for driving it through OpenXML now, might just abandon the whole idea.
Feb 10, 2015 at 5:37 PM
Edited Feb 10, 2015 at 5:39 PM
After checking the source code, I found a dirty workaround for the immediate document I am working with: using
 style="list-style-type:lower-roman"
in the 2nd level <ol>.

That took care of the numbering formats, but there is still an issue for which I blame Word - lists continuing previous numbering globally in the document, instead of starting a new list.

Just thought I'd mention.
Feb 10, 2015 at 9:16 PM
I think one of the critical things in word is defining your styles to use are in the template.
You need to either Program, import or manually create your styles in a doc or a XML file FIRST

html2openxml will apply the conversion from css class to word style, but it does not format the style, you need to do that yourself before you convert from html.

I use a predefined document that I then populate all the content controls with snipits of html.
But if I was creating new documents I would either open a blank doc with my styles or import the styles.xml & numbering.xml.

Use this to check your style are defined first:
converter.HtmlStyles.DefaultStyle = converter.HtmlStyles.GetStyle("MultilevelList");
Feb 10, 2015 at 9:27 PM
This is why i said, that MultilevelList is defined in the template that is used to base the new document on. My code loads the template, which already contains MultilevelList definition, converts the template into a document, and then adds all paragraphs imported from HTML.

I added the line in your reply before doing actual conversion, but nothing changed. Was that the right place?
HtmlConverter converter = new HtmlConverter(document.MainDocumentPart);
converter.HtmlStyles.StyleMissing += delegate(object sender1, StyleEventArgs e1)
{
    lblResult.Text += String.Concat("Missing style: ", e1.Name, "<br/>");
};

converter.HtmlStyles.DefaultStyle = converter.HtmlStyles.GetStyle("MultilevelList");

foreach (var p in converter.Parse(String.Concat("<html><head></head><body>", String.Join("\n\r", File.ReadAllLines(htmlPath)), "</body></html>")))
{
    body.Append(p);
}
Feb 10, 2015 at 9:32 PM
One thing to watch, make sure the Css name converts to the word name. camel case converts to include spaces.

converter.HtmlStyles.DefaultStyle = converter.HtmlStyles.GetStyle("Multilevel List");
<li class"MultilevelList">
Feb 10, 2015 at 9:34 PM
The style is actually called "MultilevelList" in the template. If I was using a wrong name, the missing style event should have fired, I guess.
Feb 10, 2015 at 9:43 PM
Yes I see. But I found some class & style names dont work..I could not explain it. but the only reliable way I found was:

class:"MbmBulletList" === style:"Mbm Bullet List"

and table styles needed to start with "table". class:"TableMyTable" = style:"Table My Table"

I dont understand why, I have not investigated it. but I discovered this by trial and error before I got the source.
Coordinator
Feb 10, 2015 at 9:59 PM
Edited Feb 10, 2015 at 9:59 PM
mart1c, you are perfectly right about predefining your styles first in Word, then importing them.
This is the correct way to do.


Beware that Style in OpenXml can be applied at 4 different scopes: Paragraph, Character (=Run), Numbering and Table.
So even if the naming is important (case insensitive but yes, beware of spaces), the scope is also important.
You can name 2 different scopes with the same name, thought.

Hope this help your understanding
Feb 10, 2015 at 10:03 PM
Wow, have folks at MS sunk so low?

Deleted the style, added it again under "List Multi Level" name, changed the code, and still no difference - Decimal numbers at all levels.

It still does not stick in Word either. Mind you - that is a List type style in Word, not Paragraph, so I am not sure it should be visible when a list item is selected. All I see is ListParagraph.
Feb 10, 2015 at 10:11 PM
I'm interested to see your template doc with the styles setup. Are you able to post somewhere?
Feb 10, 2015 at 10:17 PM
Let me describe this scenario and ask if you think this is an issue with the library, or I am doing something wrong.
When I tried using "list-style-type:lower-roman" in order to get the roman numbering for the 2nd level listings, they stopped restarting numbering after each new 1st level list.

So if I imported this, the numbers of the 2nd level list in the 1st and 2nd listings start from 1.
<!doctype html>
<html><head><title>a</title></head><body>

<ol>
    <li>List item
    <li>List item
        <ol>
            <li>Nested item</li>
            <li>Nested item</li>
        </ol>
    </li>
</ol>
<ol>
    <li>List item
    <li>List item
        <ol>
            <li>Nested item</li>
            <li>Nested item</li>
        </ol>
    </li>
</ol>

</body></html>
I see the following in the imported document:
1. List item
2. List item
  1. Nested item
  2. Nested item
1. List item
2. List item
  1. Nested item
  2. Nested item
But this HTML results in a different import:
<!doctype html>
<html><head><title>a</title></head><body>

<ol>
    <li>List item
    <li>List item
        <ol style="list-style-type:lower-roman">
            <li>Nested item</li>
            <li>Nested item</li>
        </ol>
    </li>
</ol>
<ol>
    <li>List item
    <li>List item
        <ol style="list-style-type:lower-roman">
            <li>Nested item</li>
            <li>Nested item</li>
        </ol>
    </li>
</ol>

</body></html>
This is what I see in the document:
1. List item
2. List item
  i. Nested item
  ii. Nested item
1. List item
2. List item
  iii. Nested item
  iv. Nested item
Should the last 2nd level list start from i, not from iii?
Feb 10, 2015 at 11:25 PM
Edited Feb 10, 2015 at 11:28 PM
I think if you use css styles you are bound to fail.
htm2openxml tries to create word styles on the fly: eg. (just some random lines of code)
                        StartNumberingValue = new StartNumberingValue() { Val = 1 },
                        NumberingFormat = new NumberingFormat() { Val = NumberFormatValues.LowerLetter },
                        LevelIndex = 0,
                        LevelText = new LevelText() { Val = "%1." },
                        PreviousParagraphProperties = new PreviousParagraphProperties {Indentation = new Indentation() { Left = "420", Hanging = "360" }
                    prop.SpacingBetweenLines = new SpacingBetweenLines() { After = "0" };
                    prop.Indentation = new Indentation() { Hanging = "357", Left = (level * 357).ToString(CultureInfo.InvariantCulture) };
therefor fighting with the predefined word styles and messing with number styles and indentation, which are complicated enough to define correctly in word.

In the same concept as any HTML document ditch the inline styles and use use a predefined class in your stylesheet.css
word documents are much cleaner if you stop using the font and paragraph buttons and only use the predefined styles.

To solve this, need to use a predefined word list style (css class) and only manipulate the level of the paragraphs numbering properties.
I think this should work, add the commented section to my code above.
        currentParagraph.InsertInProperties(prop => prop.ParagraphStyleId = new ParagraphStyleId() { Val = className });
        /// Handle Numbering Levels     
                currentParagraph.InsertInProperties(prop => 
                            prop.NumberingProperties = new NumberingProperties
                            {
                                NumberingLevelReference = new NumberingLevelReference() { Val = level - 1 },
                                NumberingId = new NumberingId() { Val = numberingId }
                            });
     ///
                        break;
I think this should work, but you will still need to use the class in your <li class="MultilevelList">
I made a regex function to replace all my <li> :
cvHtml = RegexReplace(@"((?:<li.*?>))", @"<li class=""MultilevelList"" >", cvHtml);

private static string RegexReplace(string strRegex, string strReplace, string withInString)
        {
            Regex myRegex = new Regex(strRegex, RegexOptions.None);
            string withLiClass = myRegex.Replace(withInString ?? "", strReplace);
            return withLiClass;
        }
Ive got to run now, but I think that will be enough to get around the levels and submit a useful patch.
Feb 11, 2015 at 5:56 PM
Totally agree - I want to do as little formatting in HTML as possible and rely completely on the template used to create the document.
Using roman style was just a dirty workaround that backfired of course.
I am still trying to find a place and time to upload the template to show to you, but I am not set up on any free file hosting and also getting pulled in all directions at work so it may take me some time.