Word formatting not retained; parser errors

2 issues. Possibly related.

When importing text from Word, select formatting is not carrying over and it includes   in spaces. Lists, some bold, and some headings are coming but not others. For instance, I imported a chapter with H4s and H2s but only the H4s carried over. I do not have Paste As Text selected.

This also results in parser errors in my HTML book exports until I manually remove all the &nbsp: I also get the following error when exporting epubs: “SXWN9000: The parent axis starting at a document node will never select anything”.

See attached screenshots of logs.

Pressbooks 5.8.1
McLuhan 2.8.1

#1

Not all formatting in Word Documents is preserved.

Everything other than basic styling is stripped out by design … so fancy fonts and such won’t get imported. Italics, bold etc. should.

https://guide.pressbooks.com/chapter/import-from-word-docx/

#2

HTMLBook was an attempt to migrate away from our own home grown XHTML format. We hopped it would allow us to use HMTL5 more loosely. Unfortunately, our assumption was wrong. HTMLBook is more strict than XHTML when it comes to nagging about XML problems. I wouldn’t worry about it. Unless you’re a developer and you absolutely need HTMLBook to do some experimental hacking… simply don’t use that format. It’s not for public consumption, yet.

https://oreillymedia.github.io/HTMLBook/

#3

Known bug? Your version of java, or epubcheck, needs to be updated…

Regards,

Thanks @dac.chartrand.

#1 I had forgotten about the import from Word option. I will do this from now on with Word files. I also was having issues with spacing and found this chapter of the PB userguide that helped: [https://guide.pressbooks.com/chapter/section-breaks-page-breaks-and-blank-pages/]

#2 No problem. I don’t have an exact use case for HTML Book, it was just annoying me that it wasn’t working, but I won’t worry about it.

#3 In response to your suggestion that our version of java, epubcheck needs to be updated, my systems guy says:

Messages like this come from the formatting of the document, the parent axis comes from the above message, Error while parsing file 'attribute “value” not allowed here; expected attribute “class”, “dir”, “id”, “lang”, “style”, “title” or “xml:lang” . This comes from the formatting of the document, in this specific message from having a numbered list, with text in it outside of the numbered list - then resuming the list. As that is not supported by EPUB. https://publishdrive.zendesk.com/hc/en-us/articles/115004351854-Error-while-parsing-file-attribute-value-not-allowed-here-expected-attribute-class-dir-id-lang-style-title-or-xml-lang-. Other errors are also coming from badly formatted documents. We cannot install older versions of software that may leave us open to issues. The messages are not being made directly from pressbooks, but from a epub xslt. As the epub format does not support some types of text formatting.

So I suspect this may be resolved by following the procedures in #1?

1 Like