Porting scanner module to D8 - Part 2

Porting scanner module to D8 - Part 2

Submitted by Christian Crawford on Mon, 05/06/2019 - 17:02

This is a continuation of a previous post. If you haven't read it I suggest that you do so before reading further, otherwise you won't understand the context.

Plugins

Drupal 8 introduced the idea of plugins. In this article I will be briefly explaining how you can create your own custom plugin type. The original D7 scanner module provided support for node entities and that was it. The site that I needed the module for relied heavily on the paragraphs module, so I knew that I had to also including support for paragraphs. Given the fact there are feature requests for supporting other entity types (taxonomy terms, users, etc) I decided that I should build it in such a way that that kind of functionality could be easily added as needed. This is why I decided to implement them as plugins.

In order to create a brand new plugin you must write the following pieces:

  1. Plugin manager
  2. Plugin interface
  3. Plugin base
  4. Some sort of discoverability mechanism (an annotation in our case).
Plugin Manger

The plugin manager is responsible for discovering, instantiating, and altering any instances of the matching plugin type. In order to simplify things for yourself you will often extend the Drupal\Core\Plugin\DefaultPluginManager class. If you do that the only method that you will need to implement is the constructor.

class ScannerPluginManager extends DefaultPluginManager {

  public function __construct(\Traversable $namespaces, CacheBackendInterface $cache_backend, ModuleHandlerInterface $module_handler) {
    parent::__construct('Plugin/Scanner', $namespaces, $module_handler, 'Drupal\scanner\Plugin\ScannerPluginInterface', 'Drupal\scanner\Annotation\Scanner');
    $this->alterInfo('scanner_info');
    $this->setCacheBackend($cache_backend, 'scanner');
  }

}

In the constructor you will call the parent's construct method. In that call there are three arguments that I'll draw your attention to.

  1. Plugin/Scanner
    • This tells Drupal to look for any plugins that are in that namespace.
  2. Drupal\scanner\Plugin\ScannerPluginInterface 
    • This is the interface for you plugin type 
    • Drupal will use this is enforce the method definitions of any of your plugins.
  3. Drupal\scanner\Annotation\Scanner
    • This is the Annotation plugin that we have written to allow Drupal to discover our plugins.

After the construct method we allow for other modules to alter the plugins and finally we tell Drupal to use the default cache backend for storing any data that it needs to cache.

The PluginManager is a service so you will need to create a mymod.services.yml like the one below.

services:
  plugin.manager.scanner:
    class: Drupal\scanner\Plugin\ScannerPluginManager
    parent: default_plugin_manager
Plugin Interface

If you're familiar with Object-Oriented program (OOP) then you know that interfaces provide a standardized list of methods (and definitions) which you can use if you implement them. The interface simply consists of the method signatures. In my case I have three methods that each plugin can implement.

interface ScannerPluginInterface extends PluginInspectionInterface, ContainerFactoryPluginInterface {

  public function search($field,$values);

  public function replace($field,$values,$undo_data);

  public function undo($data);

}
Plugin Base

This will be the base class that all of your plugins will extend. Pay attention to the fact that this is an abstract class, all abstract methods of abstract classes must be implemented when they're extended.

ScannerPluginBase.php  
abstract class ScannerPluginBase extends PluginBase implements ScannerPluginInterface {

  protected $tempStore;
  protected $scannerManager;

  public function __construct(array $configuration, $plugin_id, $plugin_definition, PrivateTempStoreFactory $tempStore, ScannerPluginManager $scannerManager) {
    parent::__construct($configuration, $plugin_id, $plugin_definition);
    $this->tempStore = $tempStore;
    $this->scannerManager = $scannerManager;
  }

  public static function create(ContainerInterface $container,array $configuration,$plugin_id,$plugin_definition) {
    return new static(
      $configuration,
      $plugin_id,
      $plugin_definition,
      $container->get('user.private_tempstore'),
      $container->get('plugin.manager.scanner')
    );
  }

  abstract public function search($field,$values);

  abstract public function replace($field,$values,$undo_data);

  abstract public function undo($data);

}

We implement the our ScannerPluginInterface and mark each of the three key methods as abstract since I want each plugin to implement those method themselves. We will be using dependency injection in order to add the tempstore and our plugin manager to all Scanner plugins. You could have defined any number of other non-abstract methods here if you wanted, I simply didn't need any.

Annotation

The last thing that you need for a plugin is the annotation plugin. This will allow you to "tag" your plugin instances and allow Drupal to discover them. The code for an annotation is very simple. There are no methods, only public variables. These variables can be of any type and will be used when you use the annotation. At the very least you must have an id variable so the system and uniquely identify it.

class Scanner extends Plugin {

  public $id;
  public $type;

}

I chose to have the id variable and one additional variable called type which should be the entity type. Below is an example of the annotation in the node plugin

/**
 * Class Node.
 *
 * @Scanner(
 *   id = "scanner_node",
 *   type = "node",
 * )
 */

And so finally we get to the plugins themselves. I chose to create a parent Entity class that all other plugins extend, but that was optional and not required. I could have added the logic in the ScannerPluginBase class instead.

Entity.php  
/**
 * Class Entity.
 *
 * @Scanner(
 *   id = "scanner_entity",
 *   type = "entity",
 * )
 */
class Entity extends ScannerPluginBase {

  protected $scannerRegexChars = '.\/+*?[^]$() {}=!<>|:';

  public function search($field,$values) {
    $data = [];
    list($entityType,$bundle,$fieldname) = explode(':', $field);

    // Attempt to load the matching plugin for the matching entity.
    try {
      $plugin = $this->scannerManager->createInstance("scanner_$entityType");
    } catch(PluginException $e) {
      // The instance could not be found so fail gracefully and let the user know.
      \Drupal::logger('scanner')->error($e->getMessage());
      drupal_set_message(t('An error occured: '. $e->getMessage()), 'error');
    }
    // Perform the search on the current field.
    $results = $plugin->search($field, $values);
    if (!empty($results)) {
      $data = $results;
    }
    return $data;
  }

  public function replace($field,$values,$undo_data) {
    $data = [];
    list($entityType,$bundle,$fieldname) = explode(':', $field);

    try {
      $plugin = $this->scannerManager->createInstance("scanner_$entityType");
    } catch(PluginException $e) {
      // The instance could not be found so fail gracefully and let the user know.
      \Drupal::logger('scanner')->error($e->getMessage());
      drupal_set_message(t('An error occured: '. $e->getMessage()), 'error');
    }   
  
    // Perform the replace on the current field and save results.
    $results = $plugin->replace($field, $values, $undo_data);
    if (!empty($results)) {
      $data = $results;
    }

    return $data;
  }

  public function undo($data) {
    foreach($data as $key => $value) {
      list($entityType,$id) = explode(':',$key);
      // Attempt to load the matching plugin for the matching entity.
      try {
        $plugin = $this->scannerManager->createInstance("scanner_$entityType");
        $plugin->undo($value);
      } catch(PluginException $e) {
        \Drupal::logger('scanner')->error($e->getMessage());
        drupal_set_message(t('An error occured: '. $e->getMessage()),'error');
      }
    }
  }

  protected function buildCondition($search,$mode,$wholeword,$regex,$preceded,$followed) {
    $preceded_php = '';
    if (!empty($preceded)) {
      if (!$regex) {
        $preceded = addcslashes($preceded, $this->scanerRegexChars);
      }
      $preceded_php = '(?<=' . $preceded . ')';
    }
    $followed_php = '';
    if (!empty($followed)) {
      if (!$followed) {
        $followed = addcslashes($followed, $this->scanerRegexChars);
      }
      $followed_php = '(?=' . $followed . ')';
    }

    // Case 1.
    if ($wholeword && $regex) {
      $value = "[[:<:]]" . $preceded . $search . $followed ."[[:>:]]";
      $operator = 'REGEXP';
      $phpRegex = '/\b' . $preceded_php . $search . $followed_php . '\b/';
    }
    // Case 2.
    else if ($wholeword && !$regex) {
      $value = '[[:<:]]' . $preceded . addcslashes($search, $this->scannerRegexChars) . $followed . '[[:>:]]';
      $operator = 'REGEXP';
      $phpRegex = '/\b' . $preceded_php . addcslashes($search, $this->scannerRegexChars) . $followed . '\b/';
    }
    // Case 3.
    else if (!$wholeword && $regex) {
      $value = $preceded . $search . $followed;
      $operator = 'REGEXP';
      $phpRegex = '/' . $preceded_php . $search . $followed_php . '/';
    }
    // Case 4.
    else {
      $value = '%' . $preceded . addcslashes($search, $this->scannerRegexChars) . $followed . '%';
      $operator = 'LIKE';
      $phpRegex = '/' . $preceded . addcslashes($search, $this->scannerRegexChars) . $followed . '/';
    }

    if($mode) {
      return [
        'condition' => $value,
        'operator' => $operator . ' BINARY',
        'phpRegex' => $phpRegex
      ];
    } else {
      return [
        'condition' => $value,
        'operator' => $operator,
        'phpRegex' => $phpRegex . 'i'
      ];
    }
  }

}

The Entity class extends our ScannerPluginBase and each of the class methods creates an instance of the specific plugin (node, paragraph, user, etc) and then calls the same method on the new plugin passing in the required arguments and then finally returns the results. The buildCondition method simply builds an array containing the mysql where condition, the operator, and the php regex pattern.

Next we'll go over the the node plugin.

Node.php  
class Node extends Entity {

  public function search($field,$values) {
    $title_collect = []; 
    // $field will be string composed of entity type, bundle name, and field name delimited by ':' characters.
    list($entityType,$bundle,$fieldname) = explode(':', $field);

    $query = \Drupal::entityQuery($entityType);
    $query->condition('type', $bundle, '=');
    if ($values['published']) {
      $query->condition('status', 1);
    }
    $conditionVals = parent::buildCondition($values['search'], $values['mode'], $values['wholeword'], $values['regex'], $values['preceded'], $values['followed']);
    if ($values['language'] !== 'all') {
      $query->condition('langcode', $values['language'], '=');
      $query->condition($fieldname, $conditionVals['condition'], $conditionVals['operator'], $values['language']);
    } else {
      $query->condition($fieldname, $conditionVals['condition'], $conditionVals['operator']);
    }
    
    $entities = $query->execute();
    // Iterate over matched entities (nodes) to extract information that will be rendered in the results.
    foreach($entities as $key => $id) {
      $node = \Drupal\node\Entity\Node::load($id);
      $type = $node->getType();
      $nodeField = $node->get($fieldname);
      $fieldType = $nodeField->getFieldDefinition()->getType();
      if (in_array($fieldType, ['text_with_summary','text','text_long'])) {
        $fieldValue = $nodeField->getValue()[0];
        $title_collect[$id]['title'] = $node->getTitle();
        // Find all instances of the term we're looking for.
        preg_match_all($conditionVals['phpRegex'], $fieldValue['value'], $matches,PREG_OFFSET_CAPTURE);
        $newValues = [];
        // Build an array of strings which are displayed in the results.
        foreach($matches[0] as $k => $v) {
          // The offset of the matched term(s) in the field's text.
          $start = $v[1];
          if ($values['preceded'] !== '') {
            // Bolding won't work if starting position is in the middle of a word (non-word bounded searches), therefore 
            // we move the start position back as many character as there are in the 'preceded' text
            $start -= strlen($values['preceded']);
          }
          // Extract part of the text which include the search term plus six "words" following it.
          // After we found our string we want to bold the search term.
          $replaced = preg_replace($conditionVals['phpRegex'], "<strong>$v[0]</strong>", preg_split("/\s+/", substr($fieldValue['value'], $start), 6));
          if (count($replaced) > 1) {
            // The final index contains the remainder of the text, which we don't care about so we discard it.
            array_pop($replaced);
          }
          $newValues[] = implode(' ', $replaced);
        }
        $title_collect[$id]['field'] = $newValues;
      } else if ($fieldType == 'string') {
        $title_collect[$id]['title'] = $node->getTitle();
        preg_match($conditionVals['phpRegex'], $nodeField->getString(), $matches, PREG_OFFSET_CAPTURE);
        $match = $matches[0][0];
        $replaced = preg_replace($conditionVals['phpRegex'], "<strong>$match</strong>", $nodeField->getString());
        $title_collect[$id]['field'] = [$replaced];
      }   
    }
    return $title_collect;
  }

  public function replace($field,$values,$undo_data){
    $data = $undo_data;
    list($entityType,$bundle,$fieldname) = explode(':', $field);

    $query = \Drupal::entityQuery($entityType);
    $query->condition('type', $bundle);
    if ($values['published']) {
      $query->condition('status', 1);
    }
    $conditionVals = parent::buildCondition($values['search'], $values['mode'], $values['wholeword'], $values['regex'], $values['preceded'], $values['followed']);
    if ($values['language'] !== 'all') {
      $query->condition($fieldname, $conditionVals['condition'], $conditionVals['operator'], $values['language']);
    } else {
      $query->condition($fieldname, $conditionVals['condition'], $conditionVals['operator']);
    }
    $entities = $query->execute();

    foreach($entities as $key => $id) {
      $node = \Drupal\node\Entity\Node::load($id);
      $nodeField = $node->get($fieldname);
      $fieldType = $nodeField->getFieldDefinition()->getType();
      if (in_array($fieldType, ['text_with_summary','text','text_long'])) {
        $fieldValue = $nodeField->getValue()[0];
        // Replace the search term with the replace term.
        $fieldValue['value'] = preg_replace($conditionVals['phpRegex'], $values['replace'], $fieldValue['value']);
        $node->$fieldname = $fieldValue;
        // This check prevents the creation of multiple revisions if more than one field of the same node has been modified.
        if (!isset($data["node:$id"]['new_vid'])) {
          $data["node:$id"]['old_vid'] = $node->vid->getString();
          // Crete a new revision so that we can have the option of undoing it later on.
          $node->setNewRevision(true);
          $node->revision_log = t('Replaced %search with %replace via Scanner Search and Replace module.', ['%search' => $values['search'], '%replace' => $values['replace']]);
        }
        // Save the updated node.
        $node->save();
        // Fetch the new revision id.
        $data["node:$id"]['new_vid'] = $node->vid->getString();
      } else if ($fieldType == 'string') {
        $fieldValue = preg_replace($conditionVals['phpRegex'], $values['replace'], $nodeField->getString());
        $node->$fieldname = $fieldValue;
        if (!isset($data["node:$id"]['new_vid'])) {
          $data["node:$id"]['old_vid'] = $node->vid->getString();
          $node->setNewRevision(true);
          $node->revision_log = t('Replaced %search with %replace via Scanner Search and Replace module.', ['%search' => $values['search'], '%replace' => $values['replace']]);
        }
        $node->save();
        $data["node:$id"]['new_vid'] = $node->vid->getString();
      }
    }

    return $data;

  }

  public function undo($data) {
    $revision = \Drupal::entityTypeManager()->getStorage('node')->loadRevision($data['old_vid']);
    $revision->setNewRevision(true);
    $revision->revision_log = t('Copy of the revision from %date via Search and Replace Undo', ['%date' => format_date($revision->getRevisionCreationTime())]);
    $revision->isDefaultRevision(true);
    $revision->save(); 
  }

}

The class implements the three required methods, however unlike the parent class the code is actually performing the actions. In the D7 version of the module the sql queries where built by concatenating table and field names to form the appropriate field join statements. The person who started the D8 port took a different (and in my opinion, correct) approach by using entityQuery to build the queries and fetch the results. The entityQueue provides a level of abstraction, which frees us from having to build "hard coded" table names, and instead lets the API lookup and provide the names we need. 

We're calling our parent's buildCondition method and placing the results into the variable. We then use those values when we build the condition clause for each of the fields. Inside the foreach loop we need slightly different logic for string fields (simple text) vs the formatted text fields (text_with_summary and long_text) because getting the value of the field varies based on the type.

The replace method starts off similarly to the search method, but there is some variation when we go to edit and save the node

if (!isset($data["node:$id"]['new_vid'])) {
  $data["node:$id"]['old_vid'] = $node->vid->getString();
  // Crete a new revision so that we can have the option of undoing it later on.
  $node->setNewRevision(true);
  $node->revision_log = t('Replaced %search with %replace via Scanner Search and Replace module.', ['%search' => $values['search'], '%replace' => $values['replace']]);
}
// Save the updated node.
$node->save();
// Fetch the new revision id.
$data["node:$id"]['new_vid'] = $node->vid->getString();

The replace method has a third argument, $undo_data, which contains the revision ids for the entities we're modifying. Since we only want to create a single revision all of the changes to each of the nodes we only create a new revision if this is the first time we're "seeing" the node. We can save the node any number of times, but we want all of the changes to be in a single revision so we can revert the changes as easily as possible.

The undo code is pretty straightforward.

  1. Load the old revision
  2. Create a new revision based on that old revision,
  3. Set it as the default revision and re-save the entity.

The paragraph plugin is quite similar to the node plugin. The major difference is that because paragraphs are entity references we need to handle the relationship hierarchy. For that I wrote a method which can handle up to three levels of depth (ex: paragraph inside another paragraph inside of a node). The code could likely be made to allow for an arbitrary level of depth, but three was the deepest that existed in my test cases. You can see the entire plugin code yourself by going here.

Batch API

The batch api was introduced in Drupal 5 and has remained largely unchanged since then. Batch jobs are most commonly initiated in the submit handler of a form and are composed of a several parts including input data, a processing function, and a function to call when the job it finished. These three components are placed into an array and then the batch is kicked off. I have incorporated the batch api into each of the three functions that the module provides. At the end of part one of this series I showed the code for the undo confirmation form. The code related to the batches was left out in order to simplify things, but I will now include the full code below.

Batch Example  
public function submitForm(array &$form, FormStateInterface $form_state) {
    $pluginManager = \Drupal::service('plugin.manager.scanner');
    $connection = \Drupal::service('database');
    $undo_id = $form_state->getValue('undo_id',0);
    if (!empty($undo_id) && $undo_id > 0) {
      // Query the database in order to find the specific record we're trying to undo.
      $query = $connection->query('SELECT undo_data from scanner WHERE undone = :undone and undo_id = :id',[':undone' => 0, ':id' => $undo_id]);
      $results = $query->fetchCol()[0];
      $data = unserialize($results);

      @$operations[] = ['\Drupal\scanner\Form\ScannerConfirmUndoForm::batchUndo', [$data,$undo_id]];

      $batch = [
        'title' => t('Scanner Replace Batch'),
        'operations' => $operations,
        'finished' => '\Drupal\scanner\Form\ScannerConfirmUndoForm::batchFinished',
        'progress_message' => t('Processed @current out of @total'),
      ];
      batch_set($batch);
    }
    $form_state->setRedirect('scanner.undo');
  }

  public static function batchUndo($data,$undo_id,&$context) {
    $pluginManager = \Drupal::service('plugin.manager.scanner');

    try {
      $plugin = $pluginManager->createInstance('scanner_entity');
      // This process can take a while so we want to extend the execution time
      // if it's less then 300 (5 minutes).
      if (ini_get('max_execution_time') < 300) {
        ini_set('max_execution_time','300');
      }
    } catch(PluginException $e) {
      // The instance could not be found so fail gracefully and let the user know.
      \Drupal::logger('scanner')->error($e->getMessage());
      drupal_set_message(t('An error occured: '. $e->getMessage()),'error');
    }
    $plugin->undo($data);
    $context['results']['undo_id'] = $undo_id;
    $context['message'] = 'Undoing...';
  }

  public static function batchFinished($success, $results, $operations) {
    if ($success) {
      $connection = \Drupal::service('database');
      // Set the status of the record to '1', denoting being done.
      $updateQuery = $connection->update('scanner')
        ->fields(['undone' => 1])
        ->condition('undo_id',$results['undo_id'],'=')
        ->execute();
    } else {
      $message = t('There were some errors.');
    }
  }

The batch job is created in the submit handler and consists of a title, the operations, the method to call upon completion, and a progress message. The operations is itself an array consisting of a method and an array of the data that needs to be passed into each batch job. In the case of the undo batch job it all happens as a single job, however in the replace form there are multiple "operations".

$fields = \Drupal::config('scanner.admin_settings')->get('fields_of_selected_content_type');
foreach($fields as $key => $field) {
  $operations[] = ['\Drupal\scanner\Form\ScannerConfirmForm::batchReplace', [$field, $values]];
}

When the batch_set function is called it starts going through the operations array, calling your specified method (batchUndo in our case).

The $context is a variable is added by the system and is persisted across jobs. It has several array keys which the batch api is looking for data to be in:

  • results
    • After all of the jobs are completed the values stored here will be handed off to the 'finished' method defined in your $batch variable
    • We used it for storing the revisions ids
  • message
    • You can add a message which will be presented to the user as each job executes
    • We used it so show the user which field currently being operated on
  • sandbox

The batch functions are static methods, which means that any dependencies will need to be statically instantiated. Batch methods don't have to be static, however due to the way I'm referencing them, PHP 7 requires them to be static.

In the search and replace jobs we create a job for each field, whereas in the undo job we simply process the entire thing in one go. If you look at the batchReplace method in the replace form, you'll see that as we finish each job we append the returned values into the $context['results'] array. We do this so that once we've finished all of the jobs were can serialize the data and insert it into our database table so we can use it for the undoing operation.

public static function batchFinished($success, $results, $operations)

The method that you specified as the 'finished' value in the batch definition has three arguments. The first is a boolean which is true if all jobs have completed without a fatal error, the second argument is whatever you've placed into $context['result'], and the final argument is any operations which didn't get processed for whatever reason.

Hopefully these two article have been helpful and if you think of any additional features that you think should be included in the module head over the the issue queue and open a feature request ticket. I gave this presentation at our local Drupal meeting and the video presentation can be viewed here.

Christian Crawford

Profile picture for user Christian Crawford
Senior Engineering Manager & Lead Software Developer
  • Drupal site building, module development, theming (since Drupal 7)
  • Cloud Infrastructure (AWS, Azure, Google)
  • Docker & Kubernetes
  • SQL (MySQL and Oracle), NoSQL (MongoDB)
  • ReactJS, Svelte, jQuery, NodeJS
  • HTML, CSS, SASS/LESS
  • Nginx and Apache Stacks