Import::tidy is overridden by at least 4 child classes that we know of.
If we apply your change consistently then sometimes you would get:
$config = apply_filters( 'pb_htmLawed_config', [
'safe' => 1,
]);
But other times it might look like (import\ooxml\class-docx.php):
$config = apply_filters( 'pb_htmLawed_config', [
'safe' => 1,
'valid_xhtml' => 1,
'xml:lang' => 1, // keep xml:lang *and* lang
'no_deprecated_attr' => 2,
'deny_attribute' => 'div -id',
'hook' => '\Pressbooks\Sanitize\html5_to_xhtml11',
]);
If you were to tweak that second one incorrectly, it would break DOCX imports.
I think the solution should focus on the modules you are interested in and not the tidy routine itself.
What we are proposing is that since its safe (LOUD OBNOXIOUS AIR HORN: IT IS NOT) to assume that a WXR is probably WordPress content then we could change (import/wordpress/class-wxr.php) from:
$doc = new HTML5();
$html = $this->tidy( wpautop( $p['post_content'] ) );
$dom = $doc->loadHtml( $html );
// snip ...
To:
$doc = new HTML5();
$html = wpautop( $p['post_content'] );
if ( ! current_user_can( 'unfiltered_html' ) ) {
$html = $this->tidy( $html );
}
$dom = $doc->loadHtml( $html );
// Snip ...
This would bypass safety like how WordPress does it in kses.
contractors who we don’t want to give super admin access
Unfortunately, the WXR import module uses wp_insert_post()
to create the new posts. As far as we can tell that function results in kses sanitization being applied. Here are some Trac tickets dating back six years about this:
So even if you modify tidy
, you will get your iframes taken away by WordPress later. (As seen in my first reply…)
Still up for discussion though. Not entirely against the filter idea. But we need a consistent approach for child classes.
Let’s try to find a compromise?