Importing Content into Pressbooks

paradisojr · January 24, 2018, 1:13pm

I have an import concern, as I need to load a large chunk of content from two OER eTextbook titles (International Business and Principles of Finance) into Pressbooks ASAP for two separate faculty members.

The first title has PDF and Online options for viewing, yet it doesn’t seem like there is a way to import PDFs into Pressbooks directly. Would I do best to convert that PDF to .docx and upload the (.docx) pages individually? (painful, but doable) As far as formatting is concerned, I’ve had the best luck simply copying and pasting content from the web page, but again, that is quite time consuming when trying to convert 100+ pages from an eTextbook.

Any words of wisdom would be much appreciated, as I am on a short timeline.

Thank you!

Jim

hughmcguire · January 24, 2018, 2:09pm

Converting PDF to reasonable web formats (as Pressbooks wants) is an unsolved problem in the universe, let alone Pressbooks.

The web version appears to be hosted on github, which is promising. I recommend you contact Saylor & see if they can/do make the source available… which would make the path to getting it into Pressbooks easier.

SteelWagstaff · January 24, 2018, 2:59pm

Hi Jim, there’s a HTML original of the international business text here: https://2012books.lardbucket.org/books/challenges-and-opportunities-in-international-business/index.html. I went ahead and used Pressbooks’ HTML importer to grab and import two chapters from the URLs given there, and they both imported successfully. I don’t know if there’s a way to modify that import routine quickly/easily to make a bulk HTML importer, but that might be an option? Here’s the sample content after a quick manual import of two chapters (introduction and what is international business?): https://wisc.pb.unizin.org/internationalbusiness/. Sorry I don’t have more time to devote to this right now (semester just started yesterday). Incidentally, one of the authors of the original text was a beloved UW business professor who died of cancer a few years ago: https://news.wisc.edu/business-professor-carpenter-passes-away/

As for the second book, it looks like it makes heavy use of the Boundless Finance text. Boundless went out of business a couple of years ago, but Lumen Learning maintains their published texts. You should be able to find (and quickly import) that text here: https://courses.lumenlearning.com/boundless-finance/ with the new PB cloning tool. You’ll have to rearrange to get it in the Saylor order, but that should be a major time saver, I hope.

paradisojr · January 24, 2018, 6:16pm

Thank you for your prompt response, Hugh. I’ll work through Steel’s advice below and try to make the import process as seamless as possible!

paradisojr · January 24, 2018, 10:03pm

@SteelWagstaff and/or @hughmcguire – I received an alert (cf. screenshot) when attempting to clone the Boundless Finance title. Is there a workaround for this? I imported the ePub version as an alternative, but there is considerably more work involved in getting the title ready for launch under that structure. Thanks!

SteelWagstaff · January 25, 2018, 12:24am

I think that repo is maintained by Lumen, so perhaps @bryan could let you know if they play to upgrade the PB version? That might be the simplest way forward.

ned · January 25, 2018, 12:30am

There’s no workaround other than Lumen’s instance being updated to Pressbooks 4.1 or later. We introduced the cloning feature and API components to support it in Pressbooks 4.1, so both the source book and target book have to be running that version in order to handle the clone operation.

paradisojr · January 25, 2018, 2:30am

10-4. Thanks for the comment, @ned!

paradisojr · January 25, 2018, 2:33am

Thanks, @SteelWagstaff. I’ll try to follow up with @bryan or another member of the Lumen team to see what their plan (if any) might be in this regard.

bracken · January 25, 2018, 4:37pm

Hi, Bracken from Lumen here. We generally try to stay fairly current with PB updates, but this particular update took a lot of work on our side to make our themes compatible and we’re still working on some scripts so that we can roll back after we deploy it in case there are major problems and that’s not straight forward.
So once we get over this hurdle we’ll be able to be more consistent again.

hughmcguire · January 25, 2018, 4:43pm

Thanks for the update Bracken! Happy to hear Lumen is planning to update. I know this was a BIG release, and there’s another decent sized one in the pipes that might impact themes. Once that’s out though, things should be stabilized, with more feature improvements than wholesale changes … though we’re keeping our eye on the Gutenberg Editor which threatens to be a bit complicated for all of us

paradisojr · January 25, 2018, 4:45pm

Thanks for your quick, concise response, Bracken. If possible, I’d love to hear back from the Lumen team after the rollout is completed.

SteelWagstaff · January 25, 2018, 6:15pm

In the meantime, @paradisojr–Lumen might be willing to generate and share a WXR export for the book in question and you could make an “old-fashioned,” non cloning tool import from that?

paradisojr · January 25, 2018, 7:31pm

Thank you for that, Steel.
@bracken or @bryan Would Lumen be willing to facilitate a WXR export for the following 2 titles: Boundless Finance and International Business? (If the professors didn’t need to make major modifications to the text, we’d use the Lumen version, of course.)
What’s at stake (on my end) is ~$200,000–$300,000 total in textbook savings (per semester) for students who take these two course sections, so however I can most seamlessly get this content into Pressbooks, the better.
I appreciate your consideration.

dac.chartrand · February 19, 2018, 3:37pm

So…First and foremost, we’ve updated to PB 4.5.1! Yay!

> Source

I just tested locally. I can clone the linked books. Is this resolved?

paradisojr · February 19, 2018, 7:52pm

Thanks for checking in, @dac.chartrand! Yeah! When I saw @bryan’s post a few days ago, my first order of business was to clone those titles and everything went smoothly! Thanks to everyone on the Lumen and Pressbooks teams (and otherwise) for your continued support!
(P.S. While I have you here, I had a separate concern with cloning. When I clone a title using Google Chrome, my browser ALWAYS gives me an HTTP ERROR 504, which requires me to wait about 5 minutes and then reload the tab. Upon refresh, the properly cloned text appears–great! I’m not quite sure why the error persists, though. Any ideas?)

dac.chartrand · February 19, 2018, 8:23pm

PHP Timeout. The webserver will stop responding after a certain amount of time, configured by the administrator.

We do:

set_time_limit( 300 ); // five minutes

in class-cloner.php to try to prevent this from happening. On our servers, because we are using NGINX and FPM, we also have to configure it there.

The same thing happened to me (localhost) because Boundless Finance had 339 images and cloning took a long time downloading and creating thumbnails for everything. The web browser showed a timeout, but the process kept going until it was done.

Better explained here: NGINX 502 Bad Gateway: PHP-FPM | Datadog

In a more sophisticated system we could handle cloning, exports, and imports by passing these jobs to background tasks using tools such as Phive, Gearman, Cavalcade […] but this increases the complexity of running Pressbooks.

Needs more research. Ideas welcome.

paradisojr · February 19, 2018, 9:07pm

Thanks for the prompt reply, Dac! That makes total sense now and would explain why I can’t get back into Pressbooks for that short time-slot (as it’s finalizing the cloning routine).

As long as the content isn’t affected due to the error, I can certainly live with it. (I don’t have any fresh ideas to add at this juncture.)

I’ll poke around with some guys on my dev team, though, and see if they have any thoughts. (We were recently trying to determine why I kept getting a 502 Bad Gateway error [in Chrome only] from another [in-house] application at my university, too–probably just a coincidence, but proving quite pesky! It fixed itself, and then recently started happening again. Just a quick off-topic rant! Thanks again!)

bryan · February 19, 2018, 10:08pm

@dac.chartrand, I wonder if it would be worth sending an alert of some kind to the user letting them know of the PHP timeout issue. It doesn’t solve the problem, which sucks, but at least provides some insight for the user into what’s happening so it’s not so bewildering when they all the sudden get hit with the 504.

¯\_(ツ)_/¯ Just an idea.

paradisojr · February 19, 2018, 10:12pm

I, too, think that’d be helpful–if it’s not too much of a bother. Something along the lines of… “Yes, you got this error, but if you refresh the page within the next few minutes, you will see the cloned content (intact) in its entirety.” ? (maybe that’s a bit too presumptuous–just a thought)