This project has moved. For the latest updates, please go here.

Html parsing

Jan 12, 2016 at 12:03 PM
Edited Jan 12, 2016 at 2:04 PM
Hi there,

I just updated my .dll from 1.5 to 1.6 and i get a null exception when i try to parse something looking like that :
<html>
  <head>
    <title />
  </head>
  <body>
    <table width="100%" border="1" class="Normal">
      <tr style="font-weight:bold;font-size:12pt;">
        <td></td>
    </table>
    <table width="100%" border="1" class="Normal">
      <tr style="font-family:Arial;font-size:12pt;">
        <td width="50%">
          <table width="100%" border="0">
            <tr>
              <td width="5%">1</td>
              <td><p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce et ipsum egestas, scelerisque nisi ac, fermentum purus.&nbsp;</p></td>
            </tr>
          </table>
          <p class="SubtleReference">&nbsp;
          </p>
        </td>
      </tr>
      <tr>
        <td colspan="2" style="background-color:#DAEEF3;">
          <br /><p>AD - fsadf dsaf dsa fsdafsa f safsa fsa fsaf sa</p><p style="height:1px;">&nbsp;
            </p></td>
      </tr>
    </table>
  </body>
</html>
by calling your converter as follow :
HtmlConverter converter = new HtmlConverter(mainPart);
var paragraphs = converter.Parse(html);
Before the update i was getting a list with my tables and paragraphs but with the new release i get a null exception.

If i updated the .dll is only to fix a bullet point problem in my html lists but the result is that it's breaking my code.

Have you any idea from what could be the cause of that null exception?

I had a quick look at the Parse() code and it's seems that you modified the way you parse the html, it looks to do the same but raise an error.

Thanks for your time and great project anyway.

Romain
Coordinator
Jan 12, 2016 at 2:49 PM
Hello,


Yes the v1.5 is far older than this release, so I may imagine the gap it provides for user who didn't grap the latest version from the trunk.
I'm sorry about that problem ; I found the bug it and fix it.
By the way, this is related because the library is tolerant to bad formatted HTML. You miss closing the first <TR> and this was causing an error.

I'm already preparing a 1.6.1 with the whitespaces block tags and flow tags.
Coordinator
Jan 12, 2016 at 2:51 PM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.
Jan 12, 2016 at 5:13 PM
Thanks for your quick answer, I replied to your work item but it seems to be closed. Just have a look to it because the <tr> was closed in my full html.

Cheers

Romain
Coordinator
Jan 12, 2016 at 8:26 PM
I replied you inside the work item. If you confirm me it's not solved, I will reopen the issue.