Security of PDFs, NumPy deprecating PDF docs

Jonathan Fine jfine2358 at gmail.com
Tue May 24 12:57:41 CEST 2022


Hi

I think from time to time we should discuss on this list problems in the
world of PDF. Below are three news items, with details and links in the
Appendix.

1. NumPy produces HTML and PDF documentation. It's considering dropping PDF
due to problems in maintaining and fixing production problems in the PDF
version.

2. Automation of discovery of Word and Acrobat bugs.

3. PDF being used as a vector for malicious Word documents.

Happy TeXing

Jonathan

APPENDIX.

1. NumPy dropping PDF

https://mail.python.org/archives/list/numpy-discussion@python.org/thread/CCTBM3GONFQWM3DUZRBBT3YYGKXGGPLT/

I think this is the most important issue for us. First, it affects
technical documentation, and so affects many in the core community of LaTeX
users. Second, it is something we have responsibility for. Third, it's
something we can do something about. And finally, it will make TeX/LaTeX
more relevant to early career developers.

The problem seems to be that the PDF build isn't working well in Continuous
Integration, and the output isn't that useful. I'm intending to open a
discussion of this on the texlive list. Here's a couple of further related
URLs.

https://github.com/numpy/numpy/issues/21557#issuecomment-1133920412
https://github.com/scipy/scipy/issues/15635

And finally, it's important because at TUG 2022 Carlos Evia will give a
keynote on the future of technical documentation.

2. Automated discovery of Word and Acrobat bugs

https://www.theregister.com/2022/05/13/cooperative_mutation_flaw_finder/
Researchers have automated the discovery of Word and Acrobat bugs, netting
$22,000 of bug bounties and 32 CVE entries, two of which are 8.8/10. The
discovery tool is written in Python.

Speaking at the Black Hat Asia conference in Singapore, PhD student Xu Peng
> of the Chinese Academy of Sciences – one of the tool's co-authors –
> explained that the likes of Word and Acrobat accept input from scripting
> languages. Acrobat, for example, allows JavaScript to manipulate PDF files.


Making that happen requires the PDF both to define native PDF objects and
> to parse JavaScript code. The native objects are processed by Acrobat
> modules, and an embedded JavaScript engine handles the scripts. A "binding
> layer" does the translation.


3. PDF being used as vector for malicious Word document

https://www.theregister.com/2022/05/24/hp-pdf-phishing-malware/
HP cybersecurity have discovered that PDFs are being used as a vector for a
malicious Word document.

A perfect example is a PDF document. The … PDF is a document type that
> people trust. That's because the public's perception is that it is a secure
> document that can't be manipulated. After all, that's why you issue an
> invoice as a PDF file and not a Word document. Unfortunately, the trust
> that users have in PDFs as a 'safe' document is false.


For what it's worth, at the foundation is "a code-execution vulnerability
in Microsoft Equation Editor".

[END]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/texhax/attachments/20220524/d841a223/attachment.html>


More information about the texhax mailing list.