Language tags missing from PDF exports

We are using the Prince pdf export routine, and, while producing a series of books with a mix of languages, our digital publishing specialist discovered that language tags from the web book do not appear in the pdf exports. These are a required accessibility feature, and because there are thousands of them in these texts, this is not a remediation that we can manage by hand.

I do not have a ton of experience configuring Prince, but I was wondering if others ran into this same issue or could point me towards a potential fix. I also wanted to raise this issue as an accessibility concern if there was not any current pathway in the system to push these tags into the PDF.

I’ll open an issue in the github repository, as I believe this is a bug. Here is a bit of my testing:

I notice that, oddly, footnotes with <span lang=”fr”> are added in the pdf correctly (e.g., the span shows up as a tag and that tag has the language as a property)

Pressbooks uses the XHTML for the source document for Prince. I notice that in the XHTML document, footnotes have <span lang=”fr”> while other places in the document have <span xml:lang=”fr”> or "<p xml:lang=”fr”> and, for some reason, Prince is not recognizing the xml:lang property but is recognizing the lang property.

After consulting the Prince documentation, I get a few leads. To cover my bases, I add @namespace xml “``http://www.w3.org/XML/1998/namespace”``; to the pdf css, with no effect, and then tell prince that the source of the digital pdf is xml by adding $prince->setInputType(‘xml’); to the Prince wrapper for digital pdfs. This does cause the PDF to generate with the appropriate language property in all of the places I had <span lang=”fr”> as well as other places like "<p xml:lang=”fr”> that were rendering as xml:lang in the xhtml.

It would be nice if someone with a bit more experience with this toolchain could review this issue and provide feedback.