[tldoc] Script to check the links in the documentation

Uwe Ziegenhagen ziegenhagen at gmail.com
Sun May 12 21:27:50 CEST 2013


Hi everyone,

I just wrote a small Python script to check if the links inside the
documentation still work:

# http://www.noah.org/wiki/RegEx_Python
import re
import urllib2

#filehandle =
open("C:/Users/Uwe/Desktop/texlive/texlive-de/texlive-de-new.tex")
filehandle = open("C:/Users/Uwe/Desktop/texlive/texlive-en/texlive-en.tex")

text = filehandle.read()
filehandle.close()

m =
re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_ at .&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+',
text)

i = 0
for item in m:
        i=i+1
        print i, '\t', item, '\t',
        try:
            response = urllib2.urlopen(item)
        except urllib2.HTTPError, e:
                    print e.code
        except urllib2.URLError, u:
                    print u.args
        print "\n"

Maybe you find it helpful. In my German version I found 11 broken links, in
the English version three:

 http://groups.google.com/group/comp.text.tex/topics
http://ctan.example.org/tex-archive/systems/texlive/tlnet/
http://mirror.ctan.org/tex-archive/fonts/greek/cb

Uwe


-- 
Uwe Ziegenhagen
<http://www.uweziegenhagen.de>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/tldoc/attachments/20130512/c1d77c32/attachment.html>


More information about the tldoc mailing list