Series Overview & ToC | Previous Article | Next Article - coming soon!
In the previous article, we started working on a node to user migration. Today, we expand on that example to accommodate more content model changes. First, we will learn how to migrate Drupal 7 field collections into Drupal 10 paragraphs. Then, we’ll populate an entity reference revision field attached to the user entity to add relationships to the newly migrated paragraph entities. Finally, we will learn how to create a custom process plugin to combine three separate Drupal 7 fields into a single multi-value Drupal 10 field.
Entity ID and high water mark considerations
Even though today's migration will create paragraph entities in Drupal 10, the source data is Drupal 7 field collections. Note that while the paragraph module existed in Drupal 7, it was not common for a site to use field collections and paragraphs in the same installation. If your project makes use of both and you need to migrate the data of the two modules, you will have to review more Drupal 7 tables to determine a suitable value for the AUTO_INCREMENT value of Drupal 11's paragraph entity.
Our example only makes use of the field collections. From the drupal7
folder, execute ddev mysql
to open an interactive SQL shell. Then execute the following queries to help you determine which values to use for the AUTO_INCREMENT value:
-- Get the highest field collection id value.
SELECT item_id FROM field_collection_item ORDER BY item_id DESC LIMIT 1;
-- Get the highest field collection revision value.
SELECT revision_id FROM field_collection_item_revision ORDER BY revision_id DESC LIMIT 1;
For brevity, we are going to show how to configure the auto_increment_alter_content_entities
setting of the AUTO_INCREMENT Alter module to apply the new AUTO_INCREMENT value. Refer to article 23 for more information on how this module works or how to perform the operation in custom code.
$settings['auto_increment_alter_content_entities'] = [
'paragraph' => [500, 1500], // Alter the tables for the paragraph content entity.
];
Now, execute the command provided by the AUTO_INCREMENT Alter module to trigger the alter operation in the Drupal 10 project.
ddev drush auto-increment-alter:content-entities
As for the high water property, we need to identify one field returned by the source plugin whose value always increases. Paragraphs are revisionable entities. The automated upgrade path for paragraphs generates two content migrations for each field collection in the source site. One migrates the current revision and another migrates all past revisions. They use the d7_field_collection_item and d7_field_collection_item_revision source plugins respectively. Both source plugins retrieve data from the field_collection_item
and field_collection_item_revision
tables in Drupal 7.
Both migrations can use the revision_id
column in the revision table as the high water mark as follows:
source:
plugin: d7_field_collection_item
high_water_property:
name: revision_id
alias: fr
Migrating field collections as paragraphs
The Paragraphs module offers an upgrade path for Drupal 7's Field collection module. Back in article 15, we explained how to migrate paragraph types out of field collections. As the name suggests, Drupal 7 field collections have fields attached to them. Adding those fields to the paragraphs entities was accounted for in the multiple field-related migrations: storage, instance, widget, and formatters. With the migration of configuration taken care of, we can now focus on migrating content.
Paragraphs are attached to entity bundles (like content types) via entity reference revision fields. Because of this, migrating paragraph content is a two step process. First, you need to migrate the paragraph entity data. Second, you need to migrate the entity reference revision data that connects the paragraph with its host entity. Before we start writing migrations, let's review our Drupal 7 configuration and devise an approach to follow based on our migration plan.
In Drupal 7, we have the field_favorite_quote
field collection which has two fields attached to it: field_quote_name
and field_quote_message
. The field_favorite_quote
field collection is used in the speaker
content type. In Drupal 10, the field_favorite_quote
field collection was migrated as the favorite_quote
paragraph type with the field_quote_name
and field_quote_message
still attached to it. Our migration plan says that speaker
nodes should be migrated as Drupal 10 user
entities. Back in article 22, we attached a field_favorite_quote
entity reference revisions field to users, which allows referencing favorite_quote
paragraphs entities.
There are two tasks we still need to complete: migrate the paragraphs entities and update the user migration to populate the field_favorite_quote
reference field.
Paragraphs are revisionable entities. As part of the upgrade path from field collections, the paragraph module provides two content migrations:
- d7_field_collection.yml migrates the current paragraph revision.
- d7_field_collection_revisions.yml migrates all past paragraph revisions.
Each of these migrations use a deriver that generates a migration file for each Drupal 7 field collection. This is similar to how node migrations work — a separate migration is generated for each content type. In our example, after running the automatted migration, we ended up with the upgrade_d7_field_collection_favorite_quote.yml
and upgrade_d7_field_collection_revisions_favorite_quote.yml
migrations in the ref_migrations
folder. Our migration plan says we do not need to migrate revisions for any content entity. Therefore, we will only migrate the current paragraphs revisions.
We use upgrade_d7_field_collection_favorite_quote
to migrate the current revision for the one field collection that existed in Drupal 7. Copy it from the reference folder into our tag1_migration
custom module and rebuild caches for the migration to be detected.
cd drupal10
cp ref_migrations/migrate_plus.migration.upgrade_d7_field_collection_favorite_quote.yml web/modules/custom/tag1_migration/migrations/upgrade_d7_field_collection_favorite_quote.yml
ddev drush cache:rebuild
Note that while copying the file, we also changed its name and placed it in a migrations
folder inside our tag1_migration
custom module. After copying the file, make the following changes:
- Remove the following keys:
uuid
,langcode
,status
,dependencies
,field_plugin_method
,cck_plugin_method
, andmigration_group
. - Add two migration tags:
paragraph
andtag1_content
. - Add
key: migrate
under the source section. - Add the
high_water_property
property as demonstrated above. - Remove the migration dependencies. They currently list configuration migration. Early on we decided that our content migration will not depend on configuration migrations.
After the modifications, the upgrade_d7_field_collection_favorite_quote.yml
file should look like this:
id: upgrade_d7_field_collection_favorite_quote
class: Drupal\migrate\Plugin\Migration
migration_tags:
- 'Drupal 7'
- Content
- 'Field Collection Content'
- paragraph
- tag1_content
label: 'Field Collections (Favorite quote)'
source:
key: migrate
plugin: d7_field_collection_item
field_name: field_favorite_quote
high_water_property:
name: revision_id
alias: fr
process:
type:
-
plugin: get
source: bundle
parent_id:
-
plugin: get
source: parent_id
parent_type:
-
plugin: get
source: parent_type
parent_field_name:
-
plugin: get
source: field_name
field_quote_name:
-
plugin: get
source: field_quote_name
field_quote_message:
-
plugin: get
source: field_quote_message
destination:
plugin: 'entity_reference_revisions:paragraph'
default_bundle: favorite_quote
migration_dependencies:
required: { }
optional: { }
The generated migration has the Field Collection Content
tag. This is important and will be explained in the next section. For now, make sure to preserve it.
Now, rebuild caches for our changes to be detected and execute the migration. Run migrate:status
to make sure we can connect to Drupal 7. Then, run migrate:import
to perform the import operations.
ddev drush cache:rebuild
ddev drush migrate:status upgrade_d7_field_collection_favorite_quote
ddev drush migrate:import upgrade_d7_field_collection_favorite_quote
If things are properly configured, you should not get any errors. But where can you see the migrated paragraph entities? Out of the box, the paragraphs module does not provide a way to review their content entities from the user interface. Paragraphs are meant to be attached to a host entity via a reference field. It is by viewing the host entity that you can see the content of the paragraphs. The upgrade_d7_field_collection_favorite_quote
migration generated content entities of the favorite_quote
paragraph type. In our example, we will update the user migration to be able to see the migrated favorite_quote
paragraphs referenced by the field_favorite_quote
field.
In the meantime, there are other ways to make sure our paragraph migration created Drupal 10 content entities. One is to create a view that lists paragraph entities. This will be left as an exercise for the curious reader. Another is to run SQL queries against the Drupal 10 tables that store paragraph data. This is what we will do using a MySQL client.
From the drupal10
folder, execute ddev mysql
to open an interactive SQL shell. Then execute the following queries:
-- Query paragraph entity data.
SELECT * FROM paragraphs_item;
SELECT * FROM paragraphs_item_field_data;
-- Query field data attached to paragraph entities.
SELECT * FROM paragraph__field_quote_name;
SELECT * FROM paragraph__field_quote_message;
Technical note: Even though we are not migrating past revisions, the entity API still generates data in the revisions table for the paragraph entity and the fields attached to it. This is how Drupal works out of the box. That is why it is important to provide AUTO_INCREMENT values for revisions when the content entity supports it, even if you have no plans to migrate revisions.
Connecting paragraphs to their host entities
After confirming the field collection to paragraph migration worked, we need to update the migration of the host entity. In particular, we need to populate the references in the user entity to the newly migrated paragraph entities. To accomplish this, we need to update the upgrade_d7_node_speaker_to_user
migration created in the previous article.
Before showing how to migrate the relationship to the paragraph entities, I would like to acknowledge that our example is a rather simple one. Paragraphs migrations can get quite complex when you need to migrate revision, translations, and nested paragraph relationships. When coming from field collections in Drupal 7, the automated upgrade path provided by the paragraph module is quite flexible and offers tools to account for many different scenarios.
That said, it is valid to extend migration plugins provided by the paragraphs module or write custom ones altogether. It is our hope that throughout the series you have gained a deeper understanding of the Migrate API to plan and execute custom migrations.
The one strategy I would advise against is to create paragraph entities on the fly when migrating their host entities. That violates the principles of ETL and the created entities might end up in the site even after clean up operations, like migration rollbacks. Instead, create separate migrations for each entity/bundle combination and add the relationships among them following an approach similar to what we are about to describe.
Back to our example, the generated migration already contains a process pipeline we can use to establish the relationship between paragraphs and their host entities. But where? In Drupal 7, the speaker
content type uses the field_favorite_quote
field collection. So, take a look at the upgrade_d7_node_speaker
migration in the ref_migrations
folder. The relevant part is the mapping of the field_favorite_quote
field in the process
section. Copy the snippet below from the generated upgrade_d7_node_speaker
migration into the upgrade_d7_node_speaker_to_user
migration we created in the previous article:
process:
field_favorite_quote:
-
plugin: sub_process
source: field_favorite_quote
process:
target_id:
-
plugin: paragraphs_lookup
tags: 'Field Collection Content'
source: value
-
plugin: extract
index:
- id
target_revision_id:
-
plugin: paragraphs_lookup
tags:
- 'Field Collection Revisions Content'
- 'Field Collection Content'
tag_ids:
'Field Collection Revisions Content':
- revision_id
'Field Collection Content':
- value
-
plugin: extract
index:
- revision_id
Now, rebuild caches for our changes to be detected. Then rollback the upgrade_d7_node_speaker_to_user
migration and import it again.
ddev drush cache:rebuild
ddev drush migrate:status upgrade_d7_node_speaker_to_user
ddev drush migrate:rollback upgrade_d7_node_speaker_to_user
ddev drush migrate:import upgrade_d7_node_speaker_to_user
If things are properly configured, you should not get any errors. Go to https://migration-drupal10.ddev.site/admin/people?role=speaker
and view or edit any of the migrated users with the Speaker
role assigned to them. Clicking on the node for Frank
will reveal a quote by Albert Einstein.
Wait, what just happened? Paragraph relationships use entity reference revision fields. To properly connect a paragraph field from the host entity to the paragraph entity, you need to *make sure to set values for the target_id
and target_revision_id
sub-fields.
In the snippet above, we are letting the automated upgrade path provided by the paragraph module do its job. It offers the paragraphs_lookup process plugin, which extends Drupal core's migration_lookup with extra functionality tailored to migrating data for entity reference revision fields. When paragraph entities are found, the process pipeline extracts their id
and revision_id
to populate the target_id
and target_revision_id
sub-fields of the field_favorite_quote
field in the user
entity. That is how the relationship between users and paragraphs is established.
Note the use of extra migration tags in the derived d7_field_collection
and d7_field_collection_revisions
migrations: Field Collection Content
and Field Collection Revisions Content
respectively. This limits the migrations that will be used in the lookup operation to those containing the listed tags. Feel free to review the ParagraphsLookup process plugin's code to better understand how the process pipeline is set up.
Migrate process plugin
Migrate process plugins are responsible for transforming source data into the format expected by the destination system. In our case, we are converting Drupal 7 data into a Drupal 10 suitable format. From a technical point of view, they leverage Drupal's Plugins API.
Below are some highlights regarding their implementation:
- Migrate process plugins are classes in the
Drupal\[module]\Plugin\migrate\process
namespace and implement the MigrateProcessInterface interface. - The ProcessPluginBase base class is available for convenience. It implements common methods invoked for process plugins.
- Most process plugins implement a transform method that takes care of the data manipulation operation. If such a method does not exist in the class, the process pipeline invoking the process plugin should specify a
method
key indicating which method in the process plugin’s class should be executed. An example of this is the skip_on_empty which can be called using the stringrow
orprocess
as the value ofmethod
configuration option. - If you need to inject a service, the plugin needs to implement the ContainerFactoryPluginInterface interface and its create method. Examples of this are the migration_lookup and machine_name process plugins.
- For discovery, they use PHP attributes. Before version 10.2 Drupal core used annotations for discovery. Conversions to PHP attributes have started and the plan is to eventually deprecate annotations altogether.
- A process plugin can signal that it supports handling multiple values by setting
handle_multiples
in its annotation. When set toTRUE
, the plugin will expect an array as input and iterate over it, potentially changing the whole array. Examples of this are the sub_process and flatten process plugins. Conversely, a process plugin can signal that its return value requires multiple handling by returningTRUE
from the multiple method of the plugin class. Examples of this are the sub_process and explode process plugins. - The plugin.manager.migrate.process service is the plugin manager responsible for process plugins. You can use it to get a list of available plugins based on enabled modules and obtain more details about their definition.
# List of migrate process plugin ids.
ddev drush php:eval "print_r(array_keys(\Drupal::service('plugin.manager.migrate.process')->getDefinitions()));"
# Details on a specific migrate process plugin id.
ddev drush php:eval "print_r(\Drupal::service('plugin.manager.migrate.process')->getDefinitions()['PROCESS_PLUGIN_ID']);"
Creating a custom process plugin
We are almost done with the speaker to user migration. The missing piece is migrating the field_drupal_org_profile
, field_linkedin_profile
, and field_x_twitter_profile
URL fields in Drupal 7's speaker
content type into a single Drupal 10 field_social_media_links
social link field that accepts multiple values.
Back in article 16, we had a deep dive into how Drupal fields work — both from the perspective of PHP code and database tables. With regards to our task today, I explain how to combine multiple Drupal 7 URL fields into a single Drupal 10 social links field in this presentation. It covers a lot of ground from the technical side of things. I highly recommend revisiting article 16 and watching the video recording to have a better understanding of what we will do today.
In Drupal 7, the URL field has 3 sub-fields: value
, title
, attributes
. In our example, the URL fields were configured to only store the URL value. As such, only the value
sub-field is populated. An example value is: https://www.drupal.org/u/baltowen In Drupal 10, the social links field has 2 sub-fields: social
and link
. A single entry pointing to the same Drupal.org profile would store the values as drupal
and baltowen
, respectively.
We need to come up with a process pipeline that takes multiple URLs as stored in Drupal 7, breaks them into the platform/handle format used in Drupal 10, and returns the data to populate a multi-value social link field. A custom process plugin is ideal for this scenario.
Create a PHP file named Tag1SocialLinks.php
inside our tag1_migration
custom module's /src/Plugin/migrate/process
folder. The path relative to the Drupal 10's project docroot is web/modules/custom/tag1_migration/src/Plugin/migrate/process/Tag1SocialLinks.php
. The content of the file should be:
<?php
namespace Drupal\tag1_migration\Plugin\migrate\process;
use Drupal\migrate\Attribute\MigrateProcess;
use Drupal\migrate\MigrateExecutableInterface;
use Drupal\migrate\ProcessPluginBase;
use Drupal\migrate\Row;
#[MigrateProcess(
id: "tag1_social_links",
handle_multiples: TRUE,
)]
class Tag1SocialLinks extends ProcessPluginBase {
/**
* {@inheritdoc}
*/
public function transform($value, MigrateExecutableInterface $migrate_executable, Row $row, $destination_property) {
if (!is_array($value)) {
$value = [$value];
}
$url_values = array_filter($value);
$social_link_patterns = [
'drupal' => '/^(http(s)?\:\/\/)?(www\.)?drupal\.org\/u\//',
'twitter' => '/^(http(s)?\:\/\/)?(www\.)?(twitter|x)\.com\//',
'linkedin' => '/^(http(s)?\:\/\/)?(www\.)?linkedin\.com\//',
];
$result = [];
foreach ($url_values as $url_value) {
foreach ($social_link_patterns as $social_link_platform => $social_link_regex) {
if (preg_match($social_link_regex, $url_value)) {
$result[] = [
'social' => $social_link_platform,
'link' => preg_replace($social_link_regex, '', $url_value),
];
}
}
}
return $result;
}
}
Then, update the upgrade_d7_node_speaker_to_user
migration with the following snippet in the process section:
process:
field_social_media_links:
-
plugin: tag1_social_links
source:
- field_drupal_org_profile/0/value
- field_linkedin_profile/0/value
- field_x_twitter_profile/0/value
Note that in our upgrade_d7_node_speaker_to_user
migration, we are calling the tag1_social_links
. This should match the id
of the MigrateProcess
PHP attribute used in our custom process plugin.
Now, rebuild caches for our new process plugin and the changes to the migration to take effect. Then rollback the upgrade_d7_node_speaker_to_user
migration and import it again.
ddev drush cache:rebuild
ddev drush migrate:status upgrade_d7_node_speaker_to_user
ddev drush migrate:rollback upgrade_d7_node_speaker_to_user
ddev drush migrate:import upgrade_d7_node_speaker_to_user
If things are properly configured, you should not get any errors. Go to https://migration-drupal10.ddev.site/admin/people?role=speaker
and view or edit any of the migrated users with the Speaker
role assigned to them. Clicking on the node for Mia
will reveal links to her Drupal.org profile, LinkedIn, and X (Twitter).
A detailed review of the PHP code for the custom process plugin will be left as an exercise to the curious reader. What we want to highlight is the relationship between the data transformation logic and the data that is passed to the plugin itself. From our migration, we pass an array of 3 values: field_drupal_org_profile/0/value
, field_linkedin_profile/0/value
, and field_x_twitter_profile/0/value
. Each of them attempts to retrieve the URL we are interested in processing by retrieving the value
sub-field from the first delta. Do we need to extract the URL before sending data to the process plugin? Not necessarily. We could have extracted the URLs within the process plugin itself. This serves as a good reminder that when implementing custom migration logic, we are the ones who decide what each component is responsible for doing.
The above code will send an array with the following structure to our tag1_social_links
custom process plugin:
[
'https://drupal.org/u/baltowen',
'https://www.linkedin.com/in/wendybaltodano',
'https://x.com/baltowen',
]
After calling the transform
method, our tag1_social_links
custom process plugin will return an array with the following structure:
[
0 => [
'social' => 'drupal',
'link' => 'baltowen',
],
1 => [
'social' => 'linkedin',
'link' => 'in/wendybaltodano',
],
2 => [
'social' => 'twitter',
'link' => 'baltowen',
],
]
This array will be stored in the field_social_media_links
destination property. When the entity:node
destination plugin calls the entity save operation, the field_social_media_links
field on the user
entity will be populated with profile information for the three social media platforms.
Writing custom process plugins can make migrations easier to read and maintain over time. That said, make sure you are familiar with plugins that are available to avoid reinventing the wheel. This documentation page includes a list of process plugins in Drupal core and some contributed modules that are commonly used in custom migrations. The list is not exhaustive as it would be impractical to list all plugins across all contributed modules. When in doubt about what is available ask the Drupal API with the snippet we shared in the Migrate process plugin
section above.
Next time, we’ll learn to write custom source plugins. We will also walk you through populating regular entity revision fields; that is, fields that unlike paragraphs do not need to point to a specific revision of the referenced entity. All of this will be explained in the context of migrating a new entity type: media. Let's go!
Image by Manuel de la Fuente from Pixabay