Broken Internal Links in XHTML/PDF Export


We have a self-hosted instance of Pressbooks, currently on 6.7 (we don’t have access to PHP 8.1 yet, so we can’t update to the latest version). I’ve been looking into a couple linking issues with our digital PDFs. After delving into the code, I believe I’ve located the problems, both of which are in the “fixInternalLinks” function in pressbooks/inc/modules/export/xhtml/class-xhtml11.php

In the first case, links to “interactive elements” in the book were being omitted and, instead, a link is provided to the web version of the book in Pressbooks. That’s great and works perfectly for h5p elements, but YouTube links were broken. Looking at the XHTML export that the PDF is based on, I noticed that the links were just a href=“#oembed-1”, as an example, which obviously does not work. When I found the “fixInternalLinks” function, I noticed an exception for h5p in there, but nothing for oembed. Once I added an oembed exception, it worked fine (Line 743):

if ( in_array( "#{$fragment}", $external_anchors, true ) || str_starts_with( $fragment, 'h5p') || str_starts_with( $fragment, 'oembed') ) {
} else {
    $link->setAttribute( 'href', "#{$fragment}" );

It seems to me that there may be other types of media associated with media.blade.php that may also need exceptions here… but I’m not at all sure about that.

The second error I was looking into involved a link to a video transcript included as a “Back Matter” section of the book. This link looked like a link in the PDF, but did nothing. The link was entered in the book as an absolute path, but when I examined the XHTML export, I saw that it had been transformed into a local link. This was entirely appropriate in this case, but on further investigation it appeared that the internal link was wrong. The link was pointing at “#back-matter-1-2-transactional-communication-transcript”, but when I examined the section it was meant to point to, the id there was instead “back-matter-slug-1-2-transactional-communication-transcript” with no sign of another element with a non-slug id. So I altered another part of the function (Line 736):

} elseif ( preg_match( '%(front\-matter|chapter|back\-matter|part)/([a-z0-9\-]*)([/]?)%', $href, $matches ) ) {
    // Convert type + slug to #fragment
    $fragment = "{$matches[1]}-slug-{$matches[2]}";
} else {

It’s quite possible this is some strange side effect of this particular book and editing class-xhtml11.php is entirely incorrect in this case. But this is working for us so far.

I’m creating this topic here and not GitHub because I’m unable to fully test these issues in the latest version of Pressbooks, as I don’t have access to PHP 8.1 in our environment yet. I did check the code for 6.10 and it doesn’t appear that either of these issues have been addressed yet, if indeed they are general issues and not just quirks of our book or our instance. There doesn’t appear to be any open issues on this topic here or in GitHub, so I felt I should report in case these are bugs that need addressing. Or if my edits here are missing something or are just plain wrong, I’m hoping you can point me in the right direction.

Thank you!

Good morning. We are also self-hosted. What are you using for PDF exports? We have tried MPDF and a couple others with no luck. Can you at least point us towards something that will work? HTML exports are too small to print correctly. Do the PDF exports retain accessibility updates?