Series Overview & ToC | Previous Article | Next Article - coming soon!


In this article, we start implementing content model changes. Namely, we’ll migrate Drupal 7 nodes as Drupal 10 user and taxonomy term entities. After covering entity ID and high water mark considerations, we will explain how to map data between two different entity types. We’ll also show how to introspect the Drupal 10 site to determine which properties and fields are available in target entity types. Then, we’ll write the two migrations by hand to practice what we have learned so far in the series. Finally, we’ll share all properties and fields for users and taxonomy terms in our example to serve as reference in similar projects.

Entity ID and high water mark considerations

Let’s begin with migrating nodes into users and taxonomy terms. In the previous article, we discussed the need to account for node IDs and revision IDs when deciding on the new AUTO_INCREMENT values for the two entities. No need to repeat that today. Just remember that when incorporating content model changes, multiple Drupal 7 entities and tables should be considered.

Also note that when changing entity types, it will not be possible to reuse the same identifier from the original entity in most cases. Imagine that a node with nid 3 in Drupal 7 belongs to a content type that will be converted to the user entity in Drupal 10. And in Drupal 7 we already had a user with uid 3. If we preserve user IDs, trying to migrate nid 3 on the already migrated uid 3 will cause data loss. This is extra problematic when there are entity references to preserve. This was discussed in article 4 and later expanded in article 5.

Two possible ways to avoid problems in this type of situation are:

  1. Do not migrate the identifier of the Drupal 7 entity that will be migrated into a different Drupal 10 entity. In the scenario described above, we would not migrate the nid property of the nodes that will be converted to users. This is the approach we will follow in our example project.
  2. Apply an offset to the identifier of the Drupal 7 entity that will be migrated into a different Drupal 10 entity. The offset should be high enough to avoid conflicts with the two entities at play. In the scenario described above, we could apply an offset of 1,000,000 so that nid 3 will be imported as uid 1,000,003. This would work under the assumption that, by the time we run the final migration prior to launch, there are less than 1,000,000 users in Drupal 7. When migrating revisionable entities, a different offset value might be required for the revision identifier. In the case of nodes, that is the version id (vid) property.

When we have a migration that involves an entity type change, the high_water_property property should be selected based on the original entity type from Drupal 7, not the new one in Drupal 10. This is because the high water is a configuration option of the source plugin. Our source is Drupal 7 nodes; therefore, we need to choose a node property to use as the high water. We will use the node revision ID (vid) as follow:

source:
 key: migrate
 plugin: d7_node
 high_water_property:
   name: vid
   alias: nr

Technical note: Per our migration plan, we do not need to migrate past node revisions. Because of this, we used the node classic migration when generating the migrations. This is being highlighted because the node classic migration and the node complete migration use different source plugins: d7_node and d7_node_complete, respectively. Different source plugins will expose different sets of fields. Not all of them are listed by the drush migrate:fields-source command. Debugging the migration is the best way to find all available fields.

Destination entity properties and fields

When implementing content model changes, it is important to have a good understanding of Drupal entities. As part of the process pipeline, we need to set properties and fields in the Drupal 10 destination entity using data from the Drupal 7 entity retrieved by the source plugin. In today's example, we will set properties and fields for the Drupal 10 user and taxonomy_term entities using data from the Drupal 7 node entities.

But, how do you know what properties and fields need to be set? There are multiple ways:

  1. Use the generated migrations (in the ref_migrations folder) to get a sense of what properties are expected based on the destination plugin of each migration.
  2. Consult online references. This article has a list of properties for various Drupal core content entities. This other article has a list of properties for various Drupal Commerce content entities.
  3. Ask the Drupal API. The commands below will extract data for your current Drupal 10 installation based on the modules that are enabled:

# Get all entities Drupal 10 entities in the current installation.
ddev drush php:eval "print_r(array_keys(\Drupal::entityTypeManager()->getDefinitions()));"

# List only content entities.
# The use of single quotes to pass the code snippet to php:eval is important.
ddev drush php:eval 'print_r(array_keys(array_filter(\Drupal::entityTypeManager()->getDefinitions(), fn($entity) => $entity->getGroup() === "content")));'

# List all properties and fields for an entity. Replace ENTITY_ID with the machine name of the entity. Example: 'node'.
# Only content entities are supported.
ddev drush php:eval "print_r(array_keys(\Drupal::service('entity_field.manager')->getFieldStorageDefinitions('ENTITY_ID')));"

# List all properties and fields for a bundle of an entity. Replace ENTITY_ID and BUNDLE with the machine name of the entity and bundle, respectively. Example: 'node' and 'article'.
# Only entity types that implement \Drupal\Core\Entity\FieldableEntityInterface are supported.
ddev drush php:eval "print_r(array_keys(\Drupal::service('entity_field.manager')->getFieldDefinitions('ENTITY_ID', 'BUNDLE')));"

# List fields across bundles of an entity. Replace ENTITY_ID with the machine name of the entity. Example: 'node'.
ddev drush php:eval "print_r(\Drupal::service('entity_field.manager')->getFieldMap()['ENTITY_ID']);"

Knowing the names of an entity's properties and fields is the first step. You also need to know what type of data is expected. Is it a scalar value like an integer, a string, or a boolean? Is it an array? If so, what is the structure of the array? When assigning values to fields, it is important to know their cardinality and what sub-fields are available. This article includes a list of sub-fields per field type. It is also useful to know what is the default sub-field, if one exists, for the field type.

Again, looking at the generated migrations and consulting online references can help with this. When in doubt, ask the Drupal API:


# Get a list of sub-fields. Replace ENTITY_ID and PROPERTY_OR_FIELD with the machine name of the entity and machine name of the property/field, respectively. Example: 'node' and 'nid'/'body'.
ddev drush php:eval "print_r(array_keys(\Drupal::service('entity_field.manager')->getFieldStorageDefinitions('ENTITY_ID')['PROPERTY_OR_FIELD']->getColumns()));"

# Get details about the sub-field, including database base schema information.
ddev drush php:eval "print_r(\Drupal::service('entity_field.manager')->getFieldStorageDefinitions('ENTITY_ID')['PROPERTY_OR_FIELD']->getColumns());"

# Get the default sub-field, if one exists for the field type.
ddev drush php:eval "var_dump(\Drupal::service('entity_field.manager')->getFieldStorageDefinitions('ENTITY_ID')['PROPERTY_OR_FIELD']->getMainPropertyName());"

Properties and fields for user and taxonomy term entities

When creating Drupal 10 entities, it is not necessary to provide a value for every property and field. In fact, in some cases we will not have suitable Drupal 7 data to populate some Drupa 10 properties and fields. In other cases, we will intentionally not set a destination property. In our example, we will not set the primary identifiers of the user and taxonomy term entities (uid and tid) to avoid potential ID conflicts as explained above.

You can get a list of available destination properties and fields for the user and taxonomy_term entities in your current installation execute with the following commands inside the drupal10 folder:


ddev drush php:eval "print_r(array_keys(\Drupal::service('entity_field.manager')->getFieldStorageDefinitions('user')));"
ddev drush php:eval "print_r(array_keys(\Drupal::service('entity_field.manager')->getFieldStorageDefinitions('taxonomy_term')));"

If you want to obtain even more details, open an interactive PHP shell executing ddev drush php:cli in the Drupal 10 folder. Then, run the following code:


// Get all property and field definitions for an entity. Pick one of the following.
$field_storage_definitions = \Drupal::service('entity_field.manager')->getFieldStorageDefinitions('user');
$field_storage_definitions = \Drupal::service('entity_field.manager')->getFieldStorageDefinitions('taxonomy_term');

// Find more details about each property and field.
$field_storage_data = array_map(function ($field_definition) {
 $module = ($field_definition instanceof \Drupal\Core\Field\BaseFieldDefinition) ? $field_definition->getProvider() : $field_definition->get('module');
 $label = $field_definition->getLabel();
 $description = $field_definition->getDescription();
 
 if ($field_definition instanceof \Drupal\field\Entity\FieldStorageConfig) {
   $label = 'Field ' . $field_definition->getName();
   $description = 'Attached to bundle(s): ' . implode(', ', $field_definition->getBundles()) . '.';
 }
 
 return [
   'module' => $module,
   'type' => $field_definition->getType(),
   'label' => ($label instanceof \Drupal\Core\StringTranslation\TranslatableMarkup) ? $label->render() : $label,
   'description' => ($description instanceof \Drupal\Core\StringTranslation\TranslatableMarkup) ? $description->render() : $description,
   'cardinality' => $field_definition->getCardinality(),
   'default_subfield' => $field_definition->getMainPropertyName(),
   'subfields' => array_keys($field_definition->getColumns()) ?? $field_storage->getPropertyNames(),
 ];
}, $field_storage_definitions);

// Print the names of the entity's properties and fields.
array_keys($field_storage_definitions)

// Print details about the entity's properties and fields.
$field_storage_data

// Print details for a single property or field in the entity.
$field_storage_data['user_picture']

// The output of the $field_storage_data['user_picture'] is:
 [
   "module" => "image",
   "type" => "image",
   "label" => "Field user_picture",
   "description" => "Attached to bundle(s): user.",
   "cardinality" => 1,
   "default_subfield" => "target_id",
   "subfields" => [
     "target_id",
     "alt",
     "title",
     "width",
     "height",
   ],
 ]

Migrating nodes as taxonomy terms

We could use the migrate_plus.migration.upgrade_d7_node_sponsor.yml and migrate_plus.migration.upgrade_d7_taxonomy_term_tags.yml files in the ref_migrations as a reference to migrate Drupal 7 nodes into Drupal 10 taxonomy terms. We could also review the upgrade_d7_taxonomy_term migration we created in the previous article.

Instead, we will create the migration file from scratch. We have seen and customized many migrations already in the series. At this point, you should have gained familiarity with the structure of migration files.

As a reminder, we want to migrate nodes of type sponsor as taxonomy term entities. A corresponding sponsor Drupal 10 vocabulary was created back in article 15. Below is a summary of how Drupal 7 data will be migrated into Drupal 10:

  • The node title will be migrated as the taxonomy term name.
  • The description field will be migrated as the taxonomy term description.
  • The logo field will be migrated into a newly created image field attached to the sponsor taxonomy vocabulary.

Create an upgrade_d7_node_sponsor_to_taxonomy_term file in the web/modules/custom/tag1_migration/migrations folder of our Drupal 10 project. The same file name we will use as the migration ID. You can come up with any name as long as it is unique. When creating migrations that involve entity type conversions, I recommend including the source and destination entity machine names in the migration ID.

Below is the content of the file:

id: upgrade_d7_node_sponsor_to_taxonomy_term
class: Drupal\migrate\Plugin\Migration
migration_tags:
 - 'Drupal 7'
 - Content
 - taxonomy_term
 - tag1_content
label: 'Nodes (Sponsor) to taxonomy terms'
source:
 key: migrate
 plugin: d7_node
 node_type: sponsor
 high_water_property:
   name: vid
   alias: nr
process:
 name:
   -
     plugin: get
     source: title
 description:
   -
     plugin: sub_process
     source: field_description
     process:
       value: value
       format:
         -
           plugin: static_map
           source: format
           map:
             filtered_html: restricted_html
           bypass: TRUE
 changed:
   -
     plugin: get
     source: changed
 langcode:
   -
     plugin: default_value
     source: language
     default_value: und
 field_logo:
   -
     plugin: sub_process
     source: field_logo
     process:
       target_id: fid
       alt: alt
       title: title
       width: width
       height: height
destination:
 plugin: 'entity:taxonomy_term'
 default_bundle: 'sponsors'
migration_dependencies:
 required:
   - upgrade_d7_file
   - upgrade_d7_taxonomy_term
 optional: { }

Note that the source plugin fetches nodes and the destination plugin is configured to create taxonomy terms. Then, in the process section, we map available Drupal 7 node data to Drupal 10 taxonomy term data.

Drupal 7 uses a rich text field for the description. This means that the field_description stores information about the text format used. As noted in article 22, in our example project text formats between Drupal 7 and 10 do not match verbatim. In the migration above, if the Drupal 7 filtered_html format is used, it will be migrated as the restricted_html in Drupal 10.

Text formats play an important role in the security of a Drupal website. They can filter out malicious markup that can be used to breach into a website and compromise their users. Security hardening is outside the scope of this series. Yet, we want to make sure you consider how text formats have an impact on security and incorporate that in your migration plan.

To specify the vocabulary a term belongs to, you can either use the vid property as we did in the upgrade_d7_taxonomy_term or use the specify the default_bundle property in the entity:taxonomy_term destination plugin. If both are set, the vid property will take precedence.

Before executing the upgrade_d7_node_sponsor_to_taxonomy_term migration, make sure to account for potential entity ID conflicts. This is particularly important when a migration performs content model changes. We covered this in great detail in article 23. You can use the AUTO_INCREMENT Alter module with the configuration from the start of this article.

Now, rebuild caches for our new migration to be detected and execute it. Run migrate:status to make sure we can connect to Drupal 7. Then, run migrate:import to perform the import operations.

ddev drush cache:rebuild
ddev drush migrate:status upgrade_d7_node_sponsor_to_taxonomy_term
ddev drush migrate:import upgrade_d7_node_sponsor_to_taxonomy_term

If things are properly configured, you should not get any errors. Go to https://migration-drupal10.ddev.site/admin/structure/taxonomy/manage/sponsors/overview and look at the list of migrated taxonomy terms.

Migrated Taxonomy Terms

Migrating nodes as users

We could use the migrate_plus.migration.upgrade_d7_node_speaker.yml and migrate_plus.migration.upgrade_d7_user.yml files in the ref_migrations as a reference to migrate Drupal 7 nodes into Drupal 10 taxonomy terms. We could also review the upgrade_d7_user migration we created in the previous article. However, for this example, we will create the migration file from scratch again. It is good practice and will help us get a better understanding of the system.

As a reminder, we want to migrate nodes of type speaker as user entities. In Drupal 7, this content type has many fields attached to it. The corresponding Drupal 10 fields were added to the user entity in article 22 by applying recipes and manually creating some. Below is a summary of how Drupal 7 data will be migrated into Drupal 10:

  • The node title will be migrated as the username (name property).
  • The field_email field will be migrated as the user email (mail property).
  • The field_profile_picture field will be migrated into the user_picture image field.
  • The field_website and field_biography fields will be migrated into corresponding Drupal 10 fields with the same name.
  • The field_drupal_org_profile, field_linkedin_profile, and field_x_twitter_profile will be combined into a single field_social_media_links field of type social links.
  • The field_favorite_quote field collection will be migrated into the field_favorite_quote paragraph.

The last two elements of the list above will be covered in the next article. The rest will be addressed below. Additionally, all users migrated from speaker nodes will get the Speaker user role we created in article 22.

Unpublished nodes of type speaker will not be migrated. Also note that even though we are creating user entities, we do not have suitable password information in Drupal 7. After the migration, individual users can trigger a password reset operation or administrators can force this operation in bulk.

Create an upgrade_d7_node_speaker_to_user file in the web/modules/custom/tag1_migration/migrations folder of our Drupal 10 project. The same file name we will use as the migration ID. Below is the content of the file:

id: upgrade_d7_node_speaker_to_user
class: Drupal\migrate\Plugin\Migration
migration_tags:
 - 'Drupal 7'
 - Content
 - user
 - tag1_content
label: 'Nodes (Speaker) to user accounts'
source:
 key: migrate
 plugin: d7_node
 node_type: speaker
 high_water_property:
   name: vid
   alias: nr
process:
 name:
   -
     plugin: get
     source: title
 mail:
   -
     plugin: sub_process
     source: field_email
     process:
       value: email
 created:
   -
     plugin: get
     source: created
 changed:
   -
     plugin: get
     source: changed
 status:
   -
     plugin: skip_on_empty
     source: status
     method: row
     message: 'Node was not migrated because it is unpublished.'
 init:
   -
     plugin: get
     source: '@mail'
 roles:
   -
     plugin: default_value
     default_value: [speaker]
 user_picture:
   -
     plugin: sub_process
     source: field_profile_picture
     process:
       target_id: fid
       alt: alt
       title: title
       width: width
       height: height
 field_biography:
   -
     plugin: get
     source: field_biography
 field_website:
   -
     plugin: sub_process
     source: field_website
     process:
       uri: value
       title: title
       options: attributes
destination:
 plugin: 'entity:user'
migration_dependencies:
 required:
   - upgrade_d7_file
 optional: {  }

Note that the source plugin fetches nodes and the destination plugin is configured to create users. Then, in the process section, we map available Drupal 7 node data to Drupal 10 user data.

This example uses different concepts and techniques we have covered throughout the series. We do not want to repeat ourselves too much, but here is a list of highlights from our example:

  • Some Drupal 7's fields use different property names to hold data compared to their Drupal 10's counterparts. In such cases, the sub_process process plugin can be used to account for such changes in property names. This was needed to migrate Drupal 7's Email and URL fields into Drupal 10's mail property and Link field respectively.
  • The source and destination entity might share a property with the same name, but different meanings. In this example, the status property exists both in Drupal 7 nodes and in Drupal 10 users. For nodes, status indicates the publication status: published or unpublished. For users, the status indicates whether the user account is active or blocked. In our case, we use the status as retrieved from the node to determine if the record should be migrated or not using the skip_on_empty process plugin. When the node is published, it stores a value of 1, which is sent to the destination status property meaning the account is active. Reusing a property with the same will not always be possible. It will depend on the meaning between the two entities and the process plugin chain used to assign the destination property.
  • **The init user property stores the email address used for initial account creation. We are reusing the value already assigned in the mail destination property to set the init property.
  • The roles user property expects a flat array array structure with the machine names of the roles the user will be assigned. We use the default_value process plugin to assign the Speaker roles.

Before executing the upgrade_d7_node_speaker_to_user migration, make sure to account for potential entity ID conflicts as mentioned above. Now, rebuild caches for our new migration to be detected and execute it. Run migrate:status to make sure we can connect to Drupal 7. Then, run migrate:import to perform the import operations.

ddev drush cache:rebuild
ddev drush migrate:status upgrade_d7_node_speaker_to_user
ddev drush migrate:import upgrade_d7_node_speaker_to_user

If things are properly configured, you should not get any errors. Go to https://migration-drupal10.ddev.site/admin/people?role=speaker and look at the list of migrated users.

Migrated Users

Entity properties and fields for user and taxonomy terms entities

For reference, below is a list of the entity properties and fields attached to user and taxonomy terms based on the modules enabled in our example project.

The following are properties and fields in the user entity:

  1. uid (type: integer): User ID. The user ID.
  2. uuid (type: uuid): UUID. The user UUID.
  3. langcode (type: language): Language code. The user language code.
  4. preferred_langcode (type: language): Preferred language code. The user's preferred language code for receiving emails and viewing the site.
  5. preferred_admin_langcode (type: language): Preferred admin language code. The user's preferred language code for viewing administration pages.
  6. name (type: string): Name. The name of this user.
  7. pass (type: password): Password. The password of this user (hashed).
  8. mail (type: email): Email. The email of this user.
  9. timezone (type: string): Timezone. The timezone of this user.
  10. status (type: boolean): User status. Whether the user is active or blocked.
  11. created (type: created): Created. The time that the user was created.
  12. changed (type: changed): Changed. The time that the user was last edited.
  13. access (type: timestamp): Last access. The time that the user last accessed the site.
  14. login (type: timestamp): Last login. The time that the user last logged in.
  15. init (type: email): Initial email. The email address used for initial account creation.
  16. roles (type: entity_reference): Roles. The roles the user has.
  17. default_langcode (type: boolean): Default translation. A flag indicating whether this is the default translation.
  18. field_biography (type: string_long): Field field_biography. Attached to bundle(s): user.
  19. field_favorite_quote (type: entity_reference_revisions): Field field_favorite_quote. Attached to bundle(s): user.
  20. field_social_media_links (type: social_links): Field field_social_media_links. Attached to bundle(s): user.
  21. field_website (type: link): Field field_website. Attached to bundle(s): user.
  22. user_picture (type: image): Field user_picture. Attached to bundle(s): user.

The following are properties and fields in the taxonomy_term entity:

  1. tid (type: integer): Term ID. The term ID.
  2. uuid (type: uuid): UUID. The term UUID.
  3. revision_id (type: integer): Revision ID.
  4. langcode (type: language): Language. The term language code.
  5. vid (type: entity_reference): Vocabulary. The vocabulary to which the term is assigned.
  6. revision_created (type: created): Revision create time. The time that the current revision was created.
  7. revision_user (type: entity_reference): Revision user. The user ID of the author of the current revision.
  8. revision_log_message (type: string_long): Revision log message. Briefly describe the changes you have made.
  9. status (type: boolean): Published.
  10. name (type: string): Name.
  11. description (type: text_long): Description.
  12. weight (type: integer): Weight. The weight of this term in relation to other terms.
  13. parent (type: entity_reference): Term Parents. The parents of this term.
  14. changed (type: changed): Changed. The time that the term was last edited.
  15. default_langcode (type: boolean): Default translation. A flag indicating whether this is the default translation.
  16. revision_default (type: boolean): Default revision. A flag indicating whether this was a default revision when it was saved.
  17. revision_translation_affected (type: boolean): Revision translation affected. Indicates if the last edit of a translation belongs to current revision.
  18. field_logo (type: image): Field field_logo. Attached to bundle(s): sponsors.

Next time, we’ll update this node to user migration to populate the field_favorite_quote and field_social_media_links fields. This will require migrating paragraphs and creating a custom process plugin. Stay tuned.


Image by Anne and Saturnino Miranda from Pixabay