PB Media Cloner causing weird issue on multi app server infrastructure?

So, we’ve updated to PB 5.5.6, and we noticed that the WXR import routine stopped working as expected, but only in our environments that have multiple app servers spun up.

Behavior:

After selecting the WXR import routine and uploading a valid XML file (generated from the Pressbooks XML export routine), and after selecting the pages and parts we want to import, when we click the “Import Selection” button, the page will just refresh and will fail to import anything. Again, this behavior does NOT occur when there is only one app server.

As a workaround, I reverted the change from using buildListOfKnownMedia( $xml ) to using the old buildListOfKnownImages( $xml ) function, and that seems to have solved our problem (still testing, but initial signs point to this being a fix for us).

This is where I landed in Github that brought me to our current workaround:

Additionally, though we haven’t tried it yet, I’m wondering if the Pressbooks cloning feature plays nicely with infrastructure that utilizes multiple app servers to handle jobs (more on our findings when we get there).

Thanks!

Bryan

UPDATE:

The “fix” only sort of works sometimes. I’m still digging into the issue described above.

We’ve come up with a potential hacky fix for this. Here’s the situation:

The full path to where the file was being uploaded was being set at the start of the import, and sometimes a new app server would be spun up by the load balancer, and the new app server wouldn’t have that exact path at the time of actually doing the import.

My work around checks to see if the path is what we think it should be, and if it’s not, it’s a new app server instance so set it to the new path.

class-wxr.php

...
function import( array $current_import ) {

	try {
		$parser = new Parser();

		/**
		 * LUMEN HACK
		 *
		 * Because our service provider uses load balancers to spin up new app server
		 * instances during processing-heavy tasks, we need to make sure each app
		 * server knows where to find the current import file, because the path
		 * root is different for each app server instance.
		 */
		if ( $current_import && ! isset( $current_import['error'] ) ) {
    		$import_path = trailingslashit( $_SERVER['DOCUMENT_ROOT'] ) . implode( "/", array_slice( explode( "/", $current_import['file'] ), 5 ) );

    		if ( $current_import['file'] !== $import_path ) {
        		$current_import['file'] = $import_path;
    		}
		}

It would be awesome if we could add a hook here that would allow plugin devs to set a different path if they need to for the various import types.

I’m also totally open to explore other, better options, if there’s something I’m missing. :slight_smile:

Cheers!

Bryan

If the files are the same, but the paths are different, then isn’t this a problem with your load balancer?

The $upload['file'] is the array returned from wp_handle_upload:

The filter you are looking for is here:

Cheers.

Or, you could go one deeper and figure out what’s going on in wp_upload_dir

Thanks @dac.chartrand. I don’t think it’s a problem with the load balancer, but rather a problem with wp_handle_upload, which sets the full path to the file uploaded on whatever app server did the job in the array it returns. When a new application server spins up after the file is uploaded, it looks in $current_import['file'] and, because it’s a full path, the import stops because the newly spun up app server doesn’t have that same full path.

Ex:

App Server 1: /srv/123/code/
App Server 2: /srv/456/code/

App server 1 starts the import and runs setImportOptions() (which uses wp_handle_upload that sets the file’s path in the array it returns)

App server 2 spins up and runs doImport, cases off the import type and passes the array down to, in our case, class-wxr's import function. $currrent_upload['file']'s value is /srv/123/code/path/to/file.xml, but app server 2’s base file path is /srv/456/code/..., not /srv/123/code/..., so it just fails.

The work around I came up with just checks to make sure the path in the array matches the base path of the app server that’s going to run doImport.

$currrent_upload['file'] 's value is /srv/123/code/path/to/file.xml , but app server 2’s base file path is /srv/456/code/... , not /srv/123/code/... , so it just fails.

I understand, but this answer doesn’t change my question: isn’t this a problem with your load balancer?

If /srv/456/code/.../myfile.gif
and /srv/123/code/.../myfile.gif
are the same file, then you should configure your upload path to be
/shared/path/all/the/servers/are/sharing/.../myfile.gif
And set the UPLOADS constant accordingly?

The code that builds the upload path in wp_upload_dir, linked above, is:

	$upload_path = trim( get_option( 'upload_path' ) );
	if ( empty( $upload_path ) || 'wp-content/uploads' == $upload_path ) {
		$dir = WP_CONTENT_DIR . '/uploads';
	} elseif ( 0 !== strpos( $upload_path, ABSPATH ) ) {
		// $dir is absolute, $upload_path is (maybe) relative to ABSPATH
		$dir = path_join( ABSPATH, $upload_path );
	} else {
		$dir = $upload_path;
	}

You don’t think you should normalize WP_CONTENT_DIR in this case?

(Edit: Typos, bad copy/pasta…)

For reasons beyond me at this point, setting the UPLOADS constant doesn’t work. I’ll keep tinkering here, because it’s probably something small that I’m doing wrong. Thanks for chatting this through with me.

1 Like

I think the UPLOADS constant is just some sort of creative writing in the documentation. The code I am (now) looking at is doing:

$dir = ABSPATH . UPLOADS;

So maybe UPLOADS is only to control the suffix. I think you’ll have better luck with this filter:

	/**
	 * Filters the uploads directory data.
	 *
	 * @since 2.0.0
	 *
	 * @param array $uploads Array of upload directory data with keys of 'path',
	 *                       'url', 'subdir, 'basedir', and 'error'.
	 */
	$uploads = apply_filters( 'upload_dir', $cache[ $key ] );

It will pass an array, and you would want to try to change $uploads['path'], I think.

I’ve never looked into this before. You might want to get a second opinion. My point is the uploads path is coming from wp_handle_upload (and wp_upload_dir) so I’m not sure it’s a Pressbooks problem, that’s all.

New Idea

Maybe we shouldn’t be storing the fullpath of a file, just the relative path, between steps.

1 Like

I’m sorry but I’ve reconsidered "new idea"

Above is a screenshot that shows the output of WordPress’ very own wp_handle_upload function.

The returned value is an array that has: $file, $type, and $url.

When I was brainstorming, I was expecting the output to be more like this, that I would have access to subdir and basedir. I don’t.

Because of this my original question stands, and I’m nudging towards WordPress’ own filters here because this doesn’t seem to be a Pressbooks specific issue.

1 Like

Awesome, that seems like a reasonable place to land. I’ll take a stab at working with WP’s filters here. Again, thank you for your time and thoughtful replies.