[tex4ht] [bug #620] tex4ht breaks URL text leading to empty spaces between words generated in final HTML when \urldef

Nasser M. Abbasi puszcza-hackers at gnu.org.ua
Tue Jan 16 00:27:58 CET 2024


URL:
  <http://puszcza.gnu.org.ua/bugs/?620>

                 Summary: tex4ht breaks URL text leading to empty spaces
between words generated in final HTML when \urldef
                 Project: tex4ht
            Submitted by: nma123
            Submitted on: Mon Jan 15 23:27:58 2024
                Category: None
                Priority: 5 - Normal
                Severity: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
        Originator Email: 
             Open/Closed: Open
         Discussion Lock: Any

    _______________________________________________________

Details:

reference and screen shot at

https://tex.stackexchange.com/questions/707149/tex4ht-breaks-url-text-leading-to-empty-spaces-between-words-generated-in-final

I use \urldef in order to make href, because the names are folder/file path
which can contain many different strange characters.

This works fine in PDF with lualatex. But I noticed that the HTML generated by
tex4ht breaks the names into 2 lines, which causes BLANK space to show in the
name when looking at it on the screen in the page. This makes it hard to read
sometimes. Here is MWE

------------------------
\documentclass[12pt,oneside]{book}
\usepackage{hyperref} 
\usepackage{url}

\begin{document}

\section{Tests completed}
\begin{enumerate}
\item
\urldef\mytarget\nolinkurl{test_cases/rubi_tests/0_Independent_test_suites/1_Apostol_Problems}
\href{test_cases/rubi_tests/0_Independent_test_suites/1_Apostol_Problems/output/report.htm}{\mytarget}
\hspace{5pt}  [175]
\item
\urldef\mytarget\nolinkurl{test_cases/rubi_tests/0_Independent_test_suites/2_Bondarenko_Problems}
\href{test_cases/rubi_tests/0_Independent_test_suites/2_Bondarenko_Problems/output/report.htm}{\mytarget}
\hspace{5pt}  [35]
\end{enumerate}
\end{document}
--------------------------

When compiled using

make4ht -ulm default -a debug  index.tex "mathjax,htm,nostyle"

It gives

enter image description here

The reason this happens is because tex4ht breaks the name when it sees _. Here
is the raw html

--------------------------
<!DOCTYPE html> 
<html lang='en-US' xml:lang='en-US'> 
<head><title></title> 
<meta charset='utf-8' /> 
<meta content='TeX4ht (https://tug.org/tex4ht/)' name='generator' /> 
<meta content='width=device-width,initial-scale=1' name='viewport' /> 
<link href='index.css' rel='stylesheet' type='text/css' /> 
<meta content='index.tex' name='src' /> 
<script>window.MathJax = { tex: { tags: "ams", }, }; </script> 
 <script async='async' id='MathJax-script'
src='https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml-full.js'
type='text/javascript'></script>  
</head><body> 
<h3 class='sectionHead' id='tests-completed'><span class='titlemark'>0.1  
</span> <a id='x1-10000.1'></a>Tests completed</h3>
<!-- l. 9 --><p class='noindent'>      
</p>
<ol class='enumerate1'>
<li class='enumerate' id='x1-1002x1'>             <a
href='test_cases/rubi_tests/0_Independent_test_suites/1_Apostol_Problems/output/report.htm'><span
class='ec-lmtt-12'>test_cases/rubi_tests/0_Independent_test_suites/1_
Apostol_Problems</span></a>  [175]
</li>
<li class='enumerate' id='x1-1004x2'>             <a
href='test_cases/rubi_tests/0_Independent_test_suites/2_Bondarenko_Problems/output/report.htm'><span
class='ec-lmtt-12'>test_cases/rubi_tests/0_Independent_test_suites/2_
Bondarenko_Problems</span></a>  [35]</li></ol>
 
</body> 
</html>
--------------------

If I edit the index.htm by hand and make the name one long line by removing
the extra CR it added, the HTML now becomes

----------------------------------

<!DOCTYPE html> 
<html lang='en-US' xml:lang='en-US'> 
<head><title></title> 
<meta charset='utf-8' /> 
<meta content='TeX4ht (https://tug.org/tex4ht/)' name='generator' /> 
<meta content='width=device-width,initial-scale=1' name='viewport' /> 
<link href='index.css' rel='stylesheet' type='text/css' /> 
<meta content='index.tex' name='src' /> 
<script>window.MathJax = { tex: { tags: "ams", }, }; </script> 
 <script async='async' id='MathJax-script'
src='https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml-full.js'
type='text/javascript'></script>  
</head><body> 
<h3 class='sectionHead' id='tests-completed'><span class='titlemark'>0.1  
</span> <a id='x1-10000.1'></a>Tests completed</h3>
<!-- l. 9 --><p class='noindent'>      
</p>
<ol class='enumerate1'>
<li class='enumerate' id='x1-1002x1'>             <a
href='test_cases/rubi_tests/0_Independent_test_suites/1_Apostol_Problems/output/report.htm'><span
class='ec-lmtt-12'>test_cases/rubi_tests/0_Independent_test_suites/1_Apostol_Problems</span></a>
 [175]
</li>
<li class='enumerate' id='x1-1004x2'>             <a
href='test_cases/rubi_tests/0_Independent_test_suites/2_Bondarenko_Problems/output/report.htm'><span
class='ec-lmtt-12'>test_cases/rubi_tests/0_Independent_test_suites/2_Bondarenko_Problems</span></a>
 [35]</li></ol>
 
</body> 
</html>

--------------

and on screen it now looks like this

enter image description here

How to fix tex4ht so it does not break long names in href and keep the name on
same line?

TL 2023 installed few days ago on Linux.





    _______________________________________________________

Reply to this item at:

  <http://puszcza.gnu.org.ua/bugs/?620>

_______________________________________________
  Message sent via/by Puszcza
  http://puszcza.gnu.org.ua/



More information about the tex4ht mailing list.