Best way to track down unicode

beckej · February 24, 2021, 10:01pm

So I cloned a book that a faculty member would like to use.

Somewhere on 7 different pages of this 600 page volume are unicode characters that caused the PDF export with prince to fail. Since the PDF export failed, I can’t see the chapter or page it was on. How would a master detective find the characters that need to be changed?

The full error log is below.

Array
(
[time] => Wed Feb 24 21:47:32 2021
[user] => beckej
[site_url] => https://excelfordecisionmaking.pressbooks.sunycreate.cloud
[blog_id] => 15
[theme] => McLuhan
[warning] => 1
[url] => SUNY Create
)
Wed Feb 24 21:47:27 2021: ---- begin
Wed Feb 24 21:47:31 2021: page 58: warning: no font for Geometric Shapes character U+25E6, fallback to ‘?’
Wed Feb 24 21:47:31 2021: page 68: warning: no font for Geometric Shapes character U+25AA, fallback to ‘?’
Wed Feb 24 21:47:31 2021: page 306: warning: no font for Arrows character U+2193, fallback to ‘?’
Wed Feb 24 21:47:31 2021: page 342: warning: no font for Japanese character U+30C4, fallback to ‘?’
Wed Feb 24 21:47:32 2021: page 480: warning: no font for General Punctuation character U+2033, fallback to ‘?’
Wed Feb 24 21:47:32 2021: page 551: warning: no font for General Punctuation character U+2033, fallback to ‘?’
Wed Feb 24 21:47:32 2021: page 575: warning: no font for Latin character U+0043, fallback to ‘?’
Wed Feb 24 21:47:32 2021: internal error: no available fonts
Wed Feb 24 21:47:32 2021: ---- end

SteelWagstaff · February 24, 2021, 10:48pm

@beckej This is a tricky one! Are you able to view the XHTML preview for the book at https://excelfordecisionmaking.pressbooks.sunycreate.cloud/format/xhtml?debug=prince? That’s be the first place I’d try to look, because that preview will try to use the pagedjs module to give you a PDF preview of the book in your browser. If that doesn’t work, perhaps changing the book’s body font in PDF theme options to one of the Noto fonts or GNU FreeFont serif might help?
Screenshot from 2021-02-24 14-46-03
Those fonts were designed to have more complete glyph sets for the range of unicode characters. This ‘Shapeshifter’ feature is available in the McLuhan theme as of our 2.10.5 release.

beckej · February 25, 2021, 2:40am

Pretty frustrated with this…

But its nice that Shapeshifter is part of the McLuhan now! I must have missed it when it was released. Switching fonts took care of all but two problem characters, and I was able to search for the Japanese character (the author had used it as part of an emoji).

The last character that I just can’t find is https://www.htmlsymbols.xyz/unicode/U+0043 which is basically a capital C. It’s so close to C that I can’t search for it, because when I cut and paste the unicode capital C, it just converts it into a normal C.

I’m stuck.

SteelWagstaff · February 25, 2021, 2:45am

Can you look in the text editor or try the search and replace function? Can you generated an XHTML export and search in it? Do you have a rough idea of the chapter that included the preceding character (24 pages earlier) at least? Just brainstorming here …

beckej · March 1, 2021, 4:50pm

Search and Replace worked for the Japanese character. But the problem with the “Latin C” character is that my browser keeps replacing whatever the Latin capital C is with a more standard character, even when I carefully copy and paste.

I think I’m just going to give up for now. Thank you for your help, and for your suggestion to try different fonts. At this point, 1 character on some page is holding up the entire book.

I wish that it would do the export PDF with a ? so I could just go to the page that prince told me and find the character. I tried looking through the xhtml source created from the process, and then searching for every ? in the book, but I didn’t find it. I’m guessing that my browser was able to substitute a standard C for it.

SteelWagstaff · March 1, 2021, 5:07pm

@beckej Don’t know if this is helpful, but I just cloned the book to one of our staging networks and produced a PDF export, which completed without errors(!). See Excel For Decision Making – Simple Book Publishing (the print PDF is exportable there). Not sure what’s different between your and my set up? Maybe try cloning the original book and re-exporting? Maybe it’s a system font difference? Maybe something else with your server/Prince/DocRaptor setup? If you share the diagnostics page info, I can do a quick comparison to see if I see anything else odd.

SteelWagstaff · March 1, 2021, 5:10pm

Not sure if it helps, but page 575 in the exports I made appeared to be coming from the Data Files appendix …

beckej · March 1, 2021, 5:15pm

I’m using McLuhan, for the export. I will share 2 error logs. One when I use “Theme Default” for font, and one when I use one of your suggested fronts.

For paper size I am using 81/2x11

System Information

Book Info

Book ID: 15
Book URL: http://excelfordecisionmaking.pressbooks.sunycreate.cloud/
Book Privacy: Public

Browser

Platform: OS X
Browser Name: Chrome
Browser Version: 88.0.4324.192
User Agent String: Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.192 Safari/537.36

WordPress Configuration

Network URL: http://pressbooks.sunycreate.cloud/
Network Type: Subdomain
Version: 5.6.2
Language: en_US
WP_ENV: Not set
WP_DEBUG: Enabled
Memory Limit: 64M

Pressbooks Configuration

Version: 5.18.1
Book Theme: McLuhan
Book Theme Version: 2.11.0
Root Theme: Aldine
Root Theme Version: 1.9.0

Pressbooks Dependencies

Epubcheck: Installed
Kindlegen: Installed
xmllint: Installed
PrinceXML: Installed
Saxon-HE: Installed

Must-Use Plugins

hm-autoloader.php: n/a

Network Active Plugins

GitHub Updater: 9.9.10
Pressbooks: 5.18.1

Book Active Plugins

GitHub Updater: 9.9.10
Pressbooks: 5.18.1

Inactive Plugins

Akismet Anti-Spam: 4.1.8
Candela Citation: 0.2.3
Cookies for Comments: 0.5.5
H5P: 1.15.0
Hypothesis: 0.6.0
Limit Login Attempts Reloaded: 2.19.2
Pressbooks Shortcode Handler: 0.1
Subscribe To Comments: 2.3
TablePress: 1.12
WordPress Importer: 0.7
WP QuickLaTeX: 3.8.6

Server Configuration

PHP Version: 7.3.20
MySQL Version: 5.7.30
Webserver Info: Apache

PHP Configuration

Safe Mode: Disabled
Memory Limit: 256M
Upload Max Size: 128M
Post Max Size: 256M
Upload Max Filesize: 128M
Time Limit: 30
Max Input Vars: 1000
URL-aware fopen: N/A
Display Errors: N/A

PHP Extensions

OPcache: Zend
XDebug: Disabled
cURL: Supported
cURL Version: 7.71.0
imagick: Not Installed
xsl: Installed

beckej · March 1, 2021, 5:15pm

Theme default font error log:

Array

(

[time] => Mon Mar 1 17:13:56 2021

[user] => beckej

[site_url] => https://excelfordecisionmaking.pressbooks.sunycreate.cloud

[blog_id] => 15

[theme] => McLuhan

[warning] => 1

[url] => http://excelfordecisionmaking.pressbooks.sunycreate.cloud/format/xhtml?timestamp=1614618826&hashkey=a36caf940498585198442c583160d599&optimize-for-print=1

)

Mon Mar 1 17:13:48 2021: ---- begin

Mon Mar 1 17:13:55 2021: page 46: warning: no font for Geometric Shapes character U+25E6, fallback to ‘?’

Mon Mar 1 17:13:55 2021: page 54: warning: no font for Geometric Shapes character U+25AA, fallback to ‘?’

Mon Mar 1 17:13:55 2021: page 228: warning: no font for Arrows character U+2193, fallback to ‘?’

Mon Mar 1 17:13:56 2021: page 354: warning: no font for General Punctuation character U+2033, fallback to ‘?’

Mon Mar 1 17:13:56 2021: page 406: warning: no font for General Punctuation character U+2033, fallback to ‘?’

Mon Mar 1 17:13:56 2021: page 425: warning: no font for Latin character U+0043, fallback to ‘?’

Mon Mar 1 17:13:56 2021: internal error: no available fonts

Mon Mar 1 17:13:56 2021: ---- end

beckej · March 1, 2021, 5:17pm

This is set to GNU Freefont Serif

Array

(

[time] => Mon Mar 1 17:16:44 2021

[user] => beckej

[site_url] => https://excelfordecisionmaking.pressbooks.sunycreate.cloud

[blog_id] => 15

[theme] => McLuhan

[warning] => 1

[url] => http://excelfordecisionmaking.pressbooks.sunycreate.cloud/format/xhtml?timestamp=1614618995&hashkey=f3648be2328d88b55faa5121afa8c809&optimize-for-print=1

)

Mon Mar 1 17:16:37 2021: ---- begin

Mon Mar 1 17:16:44 2021: internal error: no available fonts

Mon Mar 1 17:16:44 2021: ---- end

beckej · March 1, 2021, 5:18pm

@SteelWagstaff The error message “internal error: no available fonts” showed up despite not having an specific characters that it didn’t understand. In my last export, there were no characters that it didn’t understand, but the export still failed.

Is this possible a bug in the new shapeshifter feature? Do I need to install different fonts on a my server or do an update of some kind?

Ed

SteelWagstaff · March 1, 2021, 5:33pm

@beckej I’m seeing this error even in your first message. I think it’s a Prince error – here’s a relevant post I found from their forums: internal error - no available fonts - Prince forum with an answer from Mike Day, the lead developer on that project.

It looks like you have recent, compatible versions of Pressbooks and McLuhan installed, so should be all set to use Shapeshifter. We don’t have this error/problem on any of our production/staging instances, so I suspect that your issue is with your server config/Prince setup. Good luck!

beckej · March 1, 2021, 5:57pm

Thanks Steel.

I think that link is going to help me fix the problem. We will work on getting the Microsoft Fonts installed on the server.

beckej · March 3, 2021, 1:25am

@SteelWagstaff

Thanks for all your help. The last link was what helped me. Installing the fonts and all of the other issues went away. I put a suggestion in for the documentation page to add an extra bullet that Linux servers need to have Microsoft fonts installed.

SteelWagstaff · March 4, 2021, 2:54pm

Thanks @beckej – can you drop a link to the suggestion you made so I can make sure it gets seen and addressed soon?

beckej · March 4, 2021, 3:14pm

SteelWagstaff · March 4, 2021, 3:58pm

Change made: Linux dependency- Microsoft Fonts · Issue #41 · pressbooks/docs · GitHub. Thanks Ed!

ganthercage · August 10, 2021, 9:49pm

might simply not be in the database yet. For the time being, I only use glyphs from free Unicode fonts to avoid licensing problems. The algorithms behind the character recognition could handle many more characters, so I hope to extend the database in the near future. See the Unicode list if you want to know which Unicode blocks are currently supported.