[tex4ht] [bug #618] Incomplete XML Document, domfilter error, truncated build on large file.

Nasser M. Abbasi nma at 12000.org
Tue Dec 12 04:19:48 CET 2023


On 12/11/2023 9:06 PM, William F Hammond wrote:
> Hello Nasser,
> 
> You don't give us much to go on.  But it does provoke my curiosity.
> 

Sorry, but I did send Michal detailed information on this.
I just added bug for tracking and did not think anyone else will
be interested in all the boring details of  my build.

> I assume that you are able to build the 57,000 page pdf from the tex source
> that you want to process with tex4ht.
> 

Oh, yes ofcourse. The file builds OK in lualatex. Here is the link

<https://12000.org/my_notes/CAS_integration_tests/reports/summer_2023_Rubi_4_17_3/test_cases/210_Hebisch/report.htm>

THere are over 10,000 subsections,. and tex4ht breaks down on
reportsubsection1100

Which is this

<https://12000.org/my_notes/CAS_integration_tests/reports/summer_2023_Rubi_4_17_3/test_cases/210_Hebisch/reportsubsection1100.htm#x1117-109610003.10.84>

If you click <NEXT> from the top of the above page you get error link not found
since no more subsections are processed after that. There is almost 9,000
subsections that should be there. All are not generated.


> Is html output the final tex4ht target?  I'm assuming it is.
> 

Yes, only HTML (mathjax) mode.

> You say:
> 
> [INFO]    make4ht-lib: parse_lg process file: reportsubsection1100.htm
> [WARNING] domfilter: DOM parsing of reportsubsection1100.htm failed:
> [WARNING] domfilter:
> ...ive/2023/texmf-dist/tex/luatex/luaxml/luaxml-mod-xml.lua:175: Incomplete
> XML Document [char=33675]
>>From this I deduce that the 57,000 page document is being written in HTML
> pieces by tex4ht, "reportsubsection1100.htm" is one of those pieces, and
> perhaps not all expected pieces have been generated.
> 
> Have you checked whether "reportsubsection1100.htm" is well-formed XML
> using, say, the tool "xmlwf" found in the expat distribution?
> 

I build that code in reportsubsection1100.htm on its own, and it builds OK
with make4ht. It is only when the code is part of report.tex (the full
latex file which includes everything) where this problem is found.

I do not know xmlwf. I just see these domfilter and XML messages show
up and that is all. I really know very little about these.

But to help Michal, I send him ZIP file with everything in it so he can
reproduce this on his computer also.

It seems related to use tables, since that is the place where it fails.

--Nasser

>              -- Bill
> 
> 
> William F Hammond
> Email: gellmu at gmail.com
> https://www.facebook.com/william.f.hammond
> http://www.albany.edu/~hammond/
> 
> 𝑻𝒉𝒆 𝒕𝒊𝒎𝒆 𝒕𝒐 𝒔𝒂𝒗𝒆 𝒂 𝒅𝒆𝒎𝒐𝒄𝒓𝒂𝒄𝒚 𝒊𝒔 𝒃𝒆𝒇𝒐𝒓𝒆 𝒊𝒕
> 𝒊𝒔 𝒍𝒐𝒔𝒕.   -- 𝐊𝐞𝐧 𝐁𝐮𝐫𝐧𝐬
> 
> 
> 
> 
> On Mon, Dec 11, 2023 at 5:04 PM Nasser M. Abbasi <puszcza-hackers at gnu.org.ua>
> wrote:
> 
>> URL:
>>    <http://puszcza.gnu.org.ua/bugs/?618>
>>
>>                   Summary: Incomplete XML Document, domfilter error,
>> truncated
>> build on large file.
>>                   Project: tex4ht
>>              Submitted by: nma123
>>              Submitted on: Tue Dec 12 01:04:12 2023
>>                  Category: None
>>                  Priority: 5 - Normal
>>                  Severity: 7 - Important
>>                    Status: None
>>                   Privacy: Public
>>               Assigned to: None
>>          Originator Email:
>>               Open/Closed: Open
>>           Discussion Lock: Any
>>
>>      _______________________________________________________
>>
>> Details:
>>
>> I have been working with Michal on this via private email but thought to
>> enter
>> a bug report on this just for tracking and documentation.
>>
>> I have one large file (57,000 PDF pages) that when compiled with tex4ht
>> (takes
>> 14 hrs), and at about 10% when generating the final HTML pages, it gets XML
>> error and stops.
>>
>> i.e. the 90% rest of the sections are missing from the final web pages.
>>
>> -------------------------------------------------------
>>
>> [INFO]    make4ht-lib: parse_lg process file: reportsubsection1100.htm
>> [WARNING] domfilter: DOM parsing of reportsubsection1100.htm failed:
>> [WARNING] domfilter:
>> ...ive/2023/texmf-dist/tex/luatex/luaxml/luaxml-mod-xml.lua:175: Incomplete
>> XML Document [char=33675]
>>
>> [INFO]    make4ht-lib: parse_lg process file: reportsubsection1100.htm
>> [WARNING] domfilter: DOM parsing of reportsubsection1100.htm failed:
>> [WARNING] domfilter:
>> ...ive/2023/texmf-dist/tex/luatex/luaxml/luaxml-mod-xml.lua:175: Incomplete
>> XML Document [char=33675]
>>
>> [INFO]    make4ht-lib: parse_lg process file: reportsubsection1100.htm
>>
>> ----------------------------------
>>
>> I've just send Michal a link to complete self contained ZIP file (450 MB)
>> with
>> instructions how to run as standalone in order to see these errors on his
>> end.
>>
>>
>> I tried this on latest texlive 2023 on new Linux installation.
>>
>> I will work with Michal to provide any additional information he needs from
>> me, to hopefully find the cause of this problem.
>>
>> This happens only on this file. I think may be due to the large size, since
>> the Latex code is all generated by same program and only this file gives
>> this
>> error.
>>
>> --Nasser
>>
>>
>>
>>
>>
>>      _______________________________________________________
>>
>> Reply to this item at:
>>
>>    <http://puszcza.gnu.org.ua/bugs/?618>
>>
>> _______________________________________________
>>    Message sent via/by Puszcza
>>    http://puszcza.gnu.org.ua/
>>
>>
> 



More information about the tex4ht mailing list.