Pressbooks 6.30 Background Export System

Hi Everyone,

Our self hosted SUNY Pressbooks Network just upgraded to 6.30, which includes the new background export system, and we just had a few questions about this new feature and best practices.

Our user experience is that when we hit export now, it seems like exports are hang a long time before they get started. It made us suspect that the background export system is tied to WordPress Cron, and that the job wouldn’t start until someone goes and visited the book frontpage.

We tried that a couple times, and we think we understand what’s happening now, that if you have a private book, so no traffic, and then do an export, if you are just sitting on the export page waiting for your book to export, the job may actually never start, because its waiting for WordPress Cron to run. @SteelWagstaff ← Is this assumption accurate?

Can you share what your settings are for Cron? Do you have a server script running instead of defaulting to the WordPress behavior of running when there is a page visit? Are you running it every minute? Every 2 minutes?

Can a recommendation be documented in the installation guide or the upgrade notes?

hi Ed, I don’t know the answer, but will ask the members of our team who do. Hopefully we can have good clear documentation for you soon.

1 Like

Hi Ed,

Thank you for bringing this up. We do have plans on releasing a script to manage the cron specifically for the exports. What we’re currently using is designed to run in a mulit-tenant environment and handle hundreds of installs, which are all using Roots/Bedrock to manage the installs. I’m working on a version for a single Pressbooks install, without any assumptions of using Bedrock, and hope to have it for you later today.

NOTE: These scripts were tested by me, but I’m also the author. Please review them before you decide to put them on a production network. Our team will also give them a review and we’ll make that reviewed version publicly available.

Here are 2 bash scripts that can manage your background exports. The one named ‘manage_export_jobs.sh’ is a long running script. It checks for pending jobs and spawns worker processes that actually perform the export. You can run it in a screen backgrounded, or, better still would be to have a service like monit ensure that it’s running.

manage_export_jobs.sh

#!/bin/bash

#######################################################
# Set these varibles to correct values for your network
#
PIDDIR='/var/run/exports'  # process management dir
PIDFILE='parent-process'   # the pidfile for this script
PATHTOPENDING='/var/run/exports/pending'
PATHTOSCRIPTS='/usr/local/bin/'
NETWORK='pressbooks.example.com'
MAXPROC=8  # set this according to the capacity of your server
DB_HOST='localhost'
DB_NAME='pressbooks_db'
DB_USER='pressbooksuser'
DB_PASSWORD='theactualpassword'
#
#######################################################

USER=$(whoami)
STARTTIME=$(date)
THISSCRIPT=$(basename "$0")

if [ ! -d $PIDDIR ]; then
  sudo mkdir $PIDDIR
fi

if [ ! -d $PATHTOPENDING ]; then
  sudo mkdir $PATHTOPENDING
fi

# make sure this script can write to these directories
sudo chown $USER $PIDDIR
sudo chown $USER $PATHTOPENDING

# Check to see if this script is already running
if [ -f $PIDDIR/$PIDFILE.pid ] ; then

    # pid file already exists
    if [ "$(ps -p `cat $PIDDIR/$PIDFILE.pid` | wc -l)" -gt 1 ]; then
        echo "$DATE: $0: Refusing to run: lingering process `cat $PIDDIR/$PIDFILE.pid`"
        exit 1
    else
        echo " $0: Process orphaned. Lock file deleted."
        rm $PIDDIR/$PIDFILE.pid
    fi
fi

#Going to run, write pid file with current process ID
echo $$ > $PIDDIR/$PIDFILE.pid

while true; do

  # Fetch the pending jobs from the database, write them to the filesystem because it's easier to control concurrency there
  PENDINGJOB=$(echo "SELECT CONCAT('$NETWORK|', b.path, '|', j.id) FROM wp_pressbooks_export_jobs AS j, wp_blogs AS b WHERE j.book_id = b.blog_id AND status = 'pending';" | mysql -N -B -h $DB_HOST -u $DB_USER --password=$DB_PASSWORD $DB_NAME -A 2>/dev/null)

  # If there are jobs in the DB, put them in the pending directory
  if [ ! -z "$PENDINGJOB" ]; then

    while IFS= read -r EXPORTTASK; do

      EXPORTTASK="${EXPORTTASK//\//}"
      PENDINGFILE="/$PATHTOPENDING/$EXPORTTASK"

      if [ ! -e "PENDINGFILE*" ]; then
	# only add it if it's not already handled
        touch $PENDINGFILE
      fi

    done <<< "$PENDINGJOB"

  fi

  # flush stale lock files (crashed or incompleted processes) to free up queues that might be stuck
  for PROCESSPIDFILE in $(ls -1 /var/run/exports/child-process* 2>/dev/null); do

    if [ ! "$(ps -p `cat $PROCESSPIDFILE` | wc -l)" -gt 1 ]; then
      rm $PROCESSPIDFILE
    fi
  done

  # grab the latest pending job in chronological order
  EXPORTPROC=$(ls -1tr /$PATHTOPENDING/*[0-9] 2>/dev/null | head -1)

  if [ ! -z "$EXPORTPROC" ]; then

    #check to see how many processes are running, wait until there's a free slot
    while [ "$(ls -1 $PIDDIR/ | grep -c child-process)" -ge "$MAXPROC" ]; do
      # check again in a second
      sleep 1
    done

    # Get just the file name
    FILENAME=$(basename "$EXPORTPROC")

    # Split filename into variables using IFS
    IFS='|' read -r NETWORK BOOK ID <<< "$FILENAME"

    # Move the pending file into a processing state
    mv $EXPORTPROC "$EXPORTPROC"-processing

    # Spawn a background export process
    $PATHTOSCRIPTS/run_single_export.sh "$NETWORK" "$BOOK" "$ID" &

  fi
  sleep 1
done

# Don't die until all the forked processes are done, we should never get here unless true becomes false
wait

rm -f $PIDDIR/$PIDFILE.pid

This script is the one that is called by the script above. You will need to have WP-CLI installed for it to work. The script above can be named anything you want, but this one needs to be named run_single_export.sh (unless you change the code in the script above).

run_single_export.sh

#!/bin/bash

#######################################################
# Set these varibles to correct values for your network
#
PIDDIR="/var/run/exports"
PIDFILE="child-process-$NETWORK-$JOBID"
PATHTOPENDING='/var/run/exports/pending'
PATHTOSCRIPTS='/usr/local/bin/'
PATHTOINSTALL='/var/www/pressbooksdocroot'
PATHTOLOGS='/var/www/exportslogs'
NETWORK='pressbooks.example.com'
MAXPROC=8  # set this according to the capacity of your server
DB_HOST='localhost'
DB_NAME='pressbooks_db'
DB_USER='pressbooksuser'
DB_PASSWORD='theactualpassword'
#
#######################################################

NETWORK=$1
BOOK=$2
JOBID=$3
STARTTIME=$(date '+%Y-%m-%d %H:%M:%S')
USER=$(whoami)
THISSCRIPT=$(basename "$0")

# you can remove these if your script doesn't run with sudo access, but you will need to create these directories
if [ ! -d $PIDDIR ]; then
  sudo mkdir $PIDDIR
fi

if [ ! -d $PATHTOPENDING ]; then
  sudo mkdir $PATHTOPENDING
fi

if [ ! -d $PATHTOLOGS ]; then
  sudo mkdir $PATHTOLOGS
fi

sudo chown $USER $PIDDIR

if [ -z "$NETWORK" ]; then
  echo "$THISSCRIPT: You must provide a network. Exiting"
  exit
fi

if [ -z "$BOOK" ]; then
  echo "$THISSCRIPT: You must provide a book path. Exiting"
  exit
fi

if [ -z "$JOBID" ]; then
  echo "$THISSCRIPT: You must provide a JOBID. Exiting"
  exit
fi

if [ ! -d "$PATHTOINSTALL" ]; then
  echo "$THISSCRIPT: The network: $NETWORK does not seem to exist. Exiting"
  exit
fi

# Check to see if this script is already running
if [ -f $PIDDIR/$PIDFILE.pid ] ; then

  # pid file already exists
  if [ "$(ps -p `cat $PIDDIR/$PIDFILE.pid` | wc -l)" -gt 1 ]; then
    #echo "$DATE: $0: Refusing to run: lingering process `cat $PIDDIR/$PIDFILE.pid`"
    exit 1
  else
    #echo " $0: Process orphaned. Lock file deleted."
    rm $PIDDIR/$PIDFILE.pid
  fi
fi

#Going to run, write pid file with current process ID
echo $$ > $PIDDIR/$PIDFILE.pid

# check to see if the job is still pending
JOB=$(echo "SELECT j.id FROM wp_pressbooks_export_jobs as j, wp_blogs as b WHERE j.book_id = b.blog_id AND j.id = $JOBID  AND b.path='/$BOOK/' and status = 'pending';" | mysql -N -B -h $DB_HOST -u $DB_USER --password=$DB_PASSWORD $DB_NAME -A 2>/dev/null)

# it's still pending run the cron
if [ ! -z "$JOB" ]; then

  cd $PATHTOINSTALL
  # capture stderr to debug log, --quiet will suppress warnings and info
  wp cron event run pressbooks_process_export_job --url=https://$NETWORK/$BOOK/ --quiet 2> >(while read -r line; do echo "$(date '+%Y-%m-%d %H:%M:%S') [$NETWORK/$BOOK/ - $JOBID] $line"; done >> $PATHTOLOGS/debug.log)

fi

echo "$(date '+%Y-%m-%d %H:%M:%S') $NETWORK/$BOOK/ - $JOBID Completed. Started at $STARTTIME" >> $PATHTOLOGS/exports.log

# remove the processing file
rm -f "$PATHTOPENDING/$NETWORK|$BOOK|$JOBID-processing"
rm -f $PIDDIR/$PIDFILE.pid

Please let me know if you have any issues and/or any suggestions to improving them! Also, please upgrade to 6.30.1 which addresses an issue we found with PDF theme options.