Paul's Internet Landfill/ 2017/ Drupal 8 Migration Survival Strategies

Drupal 8 Migration Survival Strategies

We received word that KWLUG needed to move hosting providers this year. Like an idiot, I took this opportunity to migrate the KWLUG website from Drupal 6 to Drupal 8. This is a giant dump of what I learned. It is so long and so boring I can barely proofread it.

  1. Drupal 8 Migration Survival Strategies
    1. Migration Overview
      1. Local Modules
      2. What's missing
      3. Other Notes
    2. About the KWLUG website
    3. Getting Started
      1. Setting up databases
      2. Migration module and migration group
    4. Setting System UUID
    5. Reducing Content Types
      1. Classifying content types
    6. Enabling Display Fields in Migrated Content
    7. Textfields and Textareas
    8. Deleting Spam Accounts
    9. Simplifying User Roles
    10. Fixing Text Formats
    11. Merging Presentations and Agendas
      1. Migrating Agendas
      2. Migrating Orphaned Presentations
      3. Creating Redirects
    12. Linking Nodes via Entity References
      1. FLOSS Fund Nominees
      2. Locations
      3. Podcasts and Vidcasts
    13. Fixing Dates and Timezones
    14. Converting Flexinodes
    15. Migrating Attachments
      1. File migration location
      2. Image locations
    16. Migrating Comments
      1. RDF Module Breaks Rollbacks
    17. Filtering Taxonomies
    18. Setting Redirects Using a CSV Source
    19. Where to Do What
      1. Migration Patterns
      2. Source Plugins
      3. YAML Mapping Files
      4. Process Plugins
      5. Destination Plugins
      6. Migration IDs and Maps
      7. Troubleshooting
    20. Failures and Improvements
  2. Sidebar!

I budgeted 2-3 weeks for the data migration; it took almost two months, and is barely "good enough" to get by. As with everything else in Drupal, the learning curve was steep, and I spent hours and hours struggling to understand how the Drupal migration system wanted me to approach problems. Things that seemed simple on the surface took days of frustration and effort to get working.

The lack of documentation around this process was particularly difficult. I found myself reading the same dozen blog posts again and again, trying to figure out how to generalize their examples to my situation. My hope is that writing out these conceptual difficulties will save you time in figuring out your issues. All of the code I wrote for this will be on my Github account: https://github.com/pnijjar/kwlug-drupal8-migration. I can also produce a tarball on request.

On the plus side, the migration team has put a lot of work into doing Drupal 6 to Drupal 8 upgrades, and this effort provided good scaffolding upon which I built my migration. In addition, configuration management saved my tofu again and again. It is probably the best thing about Drupal 8. With it, I can use the GUI to configure the site and then preserve that configuration for future migrations. This made migrations far more repeatable, and thus easier to develop.

This blog post will focus on migration, as opposed to site building or setting up a development environment. I cover those topics in a companion post.

I refer to Drupal 8 as "D8" and Drupal 6 as "D6" a lot.

Believe it or not, this gigantic blog post does not document every single migration I did on the site. I tried to include only things that were interesting, and/or which other people might benefit from seeing.

UPDATE: I delivered a talk for the Waterloo Region Drupal User Group about D8 migrations. Here are the slides and here are the slide sources .

Migration Overview

Here is an outline of my migration journey:

Throughout the process I kept track of the following things (in a computer file. Not in my head!):

Keeping track of these things was enormously helpful, because it served as a checklist of things to remember when deploying the site.

Local Modules

I created several custom modules for this migration. I could probably have consolidated them if I was wiser, but oh wells.

kwlug_migrate is the main migration module.

To run a migration I first enabled this module, and then ran

drush migrate-import --verbose --execute-dependencies --group=kwlug_migrate --yes

kwlug_content_types contains configuration information for the site that was not directly related to migration. This included the following:

kwlug_dependencies is a stupid little module consisting of a single file (kwlug_dependencies.info.yml). Its purpose is to list dependencies for the project that were to be enabled. As such it was a poor-man's install profile. (Unlike a real install profile, you cannot install themes this way.)

What's missing

Despite the Migrate team's work, there are a bunch of things that do not upgrade cleanly. The best place to start is here: https://www.drupal.org/docs/8/upgrade/known-issues-when-upgrading-from-drupal-6-or-7-to-drupal-8

Here are some of the things that burned me:

Other Notes

Drupal 8 is an I/O hog. Use an SSD on your development machine if you possibly can. I do not know why Drupal 8 in general and migrations in particular hit the database so hard, but they do. (I am not the only one who finds Drupal 8 slow: https://deekayen.net/drupal8-xdebug-installer-timeout .)

I disabled Drupal cron on my development sites because running cron slowed everything to a crawl for over half an hour. (See "Drupal 8 is an I/O hog" above.)

I wrote a bunch of local scripts to make migrations more repeatable. I put those in the bin/ folder of the code for this project.

About the KWLUG website

Before proceeding to specific examples I will waste some time talking about the structure of the data I was migrating, and some design decisions I made.

The KWLUG website has been around since Drupal 4. The current iteration has been around since 2005. Originally we had planned to use KWLUG as a content generation hub: members would contribute reviews and forum posts and blogs to the site. This never took off, and KWLUG morphed into an information site focused on meetings and meeting announcements.

In migrating http://kwlug.org, I had a number of concrete objectives in mind:

Getting Started

Setting up databases

The following guide is pretty good for getting the database set up: http://affinitybridge.com/blog/migrating-from-drupal-6-to-drupal-8

There are parts in that blog post that use Drupal Console, but I was not able to get Drupal Console working on my setup, so I just made my YML files manually.

There is a migration GUI, but don't bother with it. It times out for even small migrations. Use Drush instead.

For some reason I believed that the settings.local.php wanted both a $databases['migrate']['default'] and $databases['upgrade']['default'] entry pointing to the D6 database. So I did the following to set them both to be equal:

// Database entry for `drush migrate-upgrade --configure-only`
$databases['upgrade']['default'] = array (
  'database' => 'd6_db_name',
  'username' => 'd6_db_user',
  'password' => 'd6_db_password',
  'prefix' => '',
  'host' => 'localhost',
  'port' => '3306',
  'namespace' => 'Drupal\\Core\\Database\\Driver\\mysql',
  'driver' => 'mysql',
);

$databases['migrate']['default'] = $databases['upgrade']['default'];

To generate the initial set of migration settings, I then ran:

drush migrate-upgrade --configure-only 

This generated configurations which I could then export:

drush config-export --destination=/tmp/migrate01

Then I copied the migration .yml files that began with migrate_plus.migration. to a new folder. These would be the basis files for my migration.

Migration module and migration group

To set up the kwlug_migrate module, I did the following:

The kwlug_migrate.info.yml file looked like this:

name: kwlug_migrate
type: module
description: Migrate content from Drupal 6 to Drupal 8
core: 8.x
package: Custom
dependencies:
  - migrate_plus
  - migrate_drupal
  - migrate_tools
  - migrate_upgrade
  - kwlug_content_types

This was actually enough to try a migration:

drush migrate-import --verbose --execute-dependencies --yes

but the migration took a long time and did not do what I wanted. The next step was to set up a migration group. I called mine kwlug_migrate, because I name things creatively.

To set the migration group I added a file to the config/install folder called migrate_plus.migration_group.kwlug_migrate.yml . It defined the migration group as follows:

id: kwlug_migrate
label: D6 imports
descriptions: Content to import to the new site
source_type: Drupal 6
shared_configuration:
  source:
key: upgrade

This file might not even be necessary. What is necessary is selecting a target YAML file (say migrate_plus.migration.upgrade_d6_node_blog.yml) and changing the following line from:

migration_group: migrate_drupal_6

to

migration_group: kwlug_migrate

Then I reran the migration as:

drush migrate-import --group=kwlug_migrate --verbose --execute-dependencies --yes

and Drupal attempted to migrate everything in the migrate group (in my case upgrade_d6_node_blog) and all the associated dependencies (regardless of which migration group they are in). It is nice to track down those dependencies and put them in the kwlug_migrate group as well. Then you will have a set of YAML files you can keep (because they are in the migration group) and a set you can discard.

Setting System UUID

If you install the initial system with drush site-install then Drupal sets a UUID. Then when you try to override certain configurations (in my case system.site.yml to change the front page display) you may get messages like Site UUID in source storage does not match the target storage. This problem is documented here: https://github.com/drush-ops/drush/issues/1625 .

The quick fix is to explicitly set the UUID of the site after it is installed, so it matches the UUID in system.site.yml:

drush cset system.site uuid 3112d604-7bb2-4dba-b418-f4f542f2682c --yes

Reducing Content Types

I discovered that I had a number of content types (blogs, pages, locations, book) that were all effectively the same, in the sense that they had the same sets of fields. I guess using different content types to semantically differentiate content is okay, but I decided to consolidate these types and differentiate them in a different way.

Take the example of locations. The migration for these is specified in migrate_plus.migration.upgrade_d6_node_location.yml . The source and destination sections of this YAML file originally looked like this (with all other sections omitted):

source:
  plugin: d6_upload_node
  node_type: location
  constants:
bundle_type: location

destination:
  plugin: 'entity:node'
  default_bundle: location

I wanted all location nodes to be turned into pages. To do this, I modified the destination bundle as follows:

source:
  plugin: d6_upload_node
  node_type: location
  constants:
bundle_type: location

destination:
  plugin: 'entity:node'
  default_bundle: page

Of course, I needed to ensure that all the target fields for pages were specified in the YAML file as well.

Classifying content types

I wanted to maintain distinctions between locations and other page types. My original thinking was to use a taxonomy term for each page type, and assign that taxonomy term during migration. But this article (which is well worth reading) convinced me otherwise: http://blog.dcycle.com/blog/83/what-content-what-configuration/ . This article argues that taxonomy terms are data that can be changed at any time. Furthermore taxonomy terms are kept in the database, not in Drupal configuration (which could be exported into YAML files). The suggested solution was to add a select field to my page content type. This field would have a fixed set of values -- one for each content type.

To create this I used the (Drupal 8) GUI:

The field.storage.node.field_page_category.yml is almost editable by hand, in case you want to add other content types to the list later on.

The next step was to assign the content type in the migration YAML file. To do this for location was fairly easy, since every single location would have the same value. I started by adding a constant to the source section of the YAML file:

source:
  plugin: d6_upload_node
  node_type: location
  constants:
category: 'Meeting Location'
bundle_type: page

and then assigning that category to the field:

process:
  [stuff snipped]

  field_page_category: constants/category

Enabling Display Fields in Migrated Content

At some point I was convinced that my custom fields were being migrated properly, but they were not showing up when I displayed nodes. When I navigated to the associated content types, the fields were listed as "disabled" in the "Manage Form Display" tab. Enabling these fields in the "Manage Form Display" and "Manage Display" tabs makes the (populated!) fields display properly.

The known issues page (https://www.drupal.org/docs/8/upgrade/known-issues-when-upgrading-from-drupal-6-or-7-to-drupal-8) acknowledges that this is a problem, but the listed solution is unsatisfactory: after each migration you are supposed to manually re-enable the fields. That is awful, so here is a better way:

The reason you import the configuration after the main migration is that these .yml files have a bunch of dependencies, and including all of these dependencies is messy and fragile.

Of course, every time you update the content type with new fields (or new orderings of the fields, or new widgets for field display...) then you have to update these files.

Textfields and Textareas

Say your Drupal 6 site has a content-type with a string field. That string field is set as follows:

When you migrate this field it will migrate, but will be displayed as a Textarea (with multiple lines of text). This is due to ambiguity in migrating the field: https://www.drupal.org/node/1117028 .

I tried a bunch of automated ways to set this information during the migration, but gave up. The easy way to deal with this is to alter the Drupal 6 database: set each affected text field to have a maximum length of 255. Then the migration will assign the right type, and the forms will have textfield widgets.

Deleting Spam Accounts

Instead of attempting to delete spam accounts in the D6 site directly, I got rid of them during the migration. To do this, I wrote a custom source plugin for users (ContributingUser.php). I defined a "contributing user" as a user that had authored a node. Then in the plugin I had the following query() method:

/**
 * {@inheritdoc}
 */
public function query() {
  // Make a subquery of all the UIDs who have authored nodes.
  $node_authors = $this->select('node','n')
->fields('n', array('uid'));

  return $this->select('users','u')
->fields('u', array_keys($this->baseFields()))
->condition('u.uid', 0, '>')
->condition('u.uid', $node_authors, 'IN');

} // end query

The first query finds all authors of a node, and the second picks only users that are in that list of authors. This filters out any account that has not authored a node, which includes all spam accounts (and some legitimate lurker accounts, unfortunately).

This technique can be used to filter out all kinds of input, so long as you can distinguish legitimate from illegitimate data using a query.

I guess I should point out a couple of other elements of the plugin. Firstly, I reused most of the existing User plugin by extending it:

use Drupal\migrate\Row;
use Drupal\user\Plugin\migrate\source\d6\User as D6User;

class ContributingUser extends D6User {

I also had to define an ID for this plugin, which is done in a comment:

/*
 * @MigrateSource(
 *   id = "d6_contributing_user"
 * )
 */

Then in the migrate_plus.migration.upgrade_d6_user.yml I had to specify the use of this plugin:

source:
  plugin: d6_contributing_user

I made one other change of note: I disabled all user accounts, with the idea that active users could have their accounts re-enabled later. This required setting a default value for the status field in the YAML file:

status:
  plugin: default_value
  default_value: 0

(Yes, being allowed to use 0 as a constant when you have to define strings feels inconsistent to me as well.)

Simplifying User Roles

In addition to having too many spam users the old D6 site had accumulated a lot of spurious user roles over the years ("librarian", "speaker") that were no longer needed. I decided to start fresh by including only the built-in "administrators", "authenticated users" and "anonymous users", then adding other roles in the new website as required.

This meant I had to filter out roles somehow. To do this I changed the migrate_plus.migration.upgrade_d6_user_role.yml file to migrate only the three built in roles and ignore the rest. In the process section, I changed the id stanza from:

id:
  -
plugin: machine_name
source: name
  -
plugin: user_update_8002

to

id:
  -
plugin: machine_name
source: name
  -
plugin: static_map
source: name
bypass: false
map:
  'administrator': 'administrator'
  'authenticated_user': 'authenticated'
  'anonymous_user': 'anonymous'

# plugin: user_update_8002

(Once again, I am mystified why I was allowed to use straight strings on the right hand sides of the map. YAML is weird.)

The map part is the easy part of this migration: some names in the D6 database had changed names for D8. The secret of this static map is the bypass: false part, which states that the migration should ignore any entry that is not in the static map.

I am sure the plugin: user_update_8002 does something very important, but I didn't know what it was and the migration seemed okay without it, so I commented it out.

Fixing Text Formats

This is also acknowledged in the "Known Issues" page, but again the solution was not obvious. Some input filters (notably the PHP input filter) are no longer supported in Drupal 8, and others are missing. These are replaced by something called filter_null, which messes up the site.

Symptoms you are affected include:

There is a pretty good description of the problem here: https://www.hywel.me/drupal/2016/02/11/a-website-upgrade-from-drupal-6-to-drupal-8-part-4.html

The issue is that some filter or setting in the text filter is missing. PHP filter is one culprit, but in my migration there was some other problem that affected a lot of my content.

My solution was to migrate filter formats early in the migration process. Drupal 8 provides some default text formats (in the standard installation profile?) and I mapped my old formats to those.

The process plugin was called MapKWLUGFormatFilter, and it lived in the src/Plugin/migrate/process folder of the kwlug_migrate custom module.

The heart of the function was very easy. Here is an excerpt from the transform() method:

public function transform($value, 
    MigrateExecutableInterface $migrate_executable, 
    Row $row, 
    $destination_property) {

  $filter_mapping = array(
    0 => 'restricted_html', // unknown but it exists
    1 => 'restricted_html', // filtered_html
    2 => 'full_html',       // php_code
    3 => 'full_html',       // full_html
    4 => 'restricted_html', // unknown. Some image format that has been lost.
    5 => 'plain_text',      // messaging plain text. Unused.
  );

  $retval = $filter_mapping[$value];

  if (!$retval) {
    $retval = 'restricted_html';
  }

  return $retval;
} // end transform

(One big difference between "Basic HTML" and "Restricted HTML" is the use of CKFilter, I think.)

I found the filter mappings that existed for my Drupal 6 site by looking in the filters and filter_formats tables in the database.

I used this process plugin in content types that had a node body. Basically, I would change:

body/format:
  plugin: migration
  migration: upgrade_d6_filter_format
  source: format

to

body/format:
  plugin: map_kwlug_format_filter
  source: format

I think I do not need the migration plugin in this stanza because the migration plugin looks up ID maps of migrated filters, and in this case I am setting the filters with a static map. (This explanation might be wrong.)

I also found that I wanted to customize these filter formats. The usual trick of making changes in the GUI, using drush config-export and copying the relevant filter.format.*.yml to premigrate_settings of kwlug_content_types.

Merging Presentations and Agendas

Wow this took a long time. The basic idea was to merge two content types: "Meeting Agendas", which mostly had a meeting date and location, and "Presentations" which listed topics for the meetings.

Most meeting agendas were associated with exactly one presentation, but some early meeting agendas were associated with two. A few presentations were not associated with any meeting agendas.

As mentioned above, a module called Node Relativity associated agendas with presentations, but it also associated agendas with a different content type called 'FLOSS Fund Nominees'. So I had to be careful about picking proper associations.

With this in mind, here was my strategy:

I also had to migrate auxiliary content such as attachments (which were typically attached to presentations) and images, but I will document these later.

Migrating Agendas

I did the bulk of the work in a source plugin for agendas, called AgendaNode (which extended Drupal\node\Plugin\migrate\source\d6\Node). Surprisingly, I did not need to override the query() method. Instead I did the bulk of the work in the prepareRow() method.

To find Presentation nodes associated with a particular agenda, I had to write a query that looked through the Node Relativity tables for matches:

$nid = $row->getSourceProperty("nid");

// Look for associated presentation topics in relativity table
$query = $this->select('node', 'p')
  ->fields('p', ['nid','title'])
  ->condition('p.type', 'presentation');
$query->join('relativity', 'r', 'r.nid = p.nid');
$query->condition('r.parent_nid', $nid, '=');

$query->join('node_revisions', 'nr', 'nr.nid = p.nid');
$query->addField('nr', 'body');

$presentation_info = $query->execute()
  ->fetchAll();

Now $presentation_info contained zero or more presentations. I looped through this array, grabbed each presentation's data, and populated variables for the YAML file. For example, here is an extract where I took the body texts of each presentation and appended them to the Agenda body (this is not identical to the actual code, but it is close). I also collected the NIDs:

if ($presentation_info) { 

  foreach ($presentation_info as $p) { 

    $body_so_far = $row->getSourceProperty('body');
    $pbody = $p['body'];

    if ($body_so_far) { 
      $body_so_far = $body_so_far . "\n\n* * *\n" . $pbody;
    } else { 
      $body_so_far = $pbody;
    } // end if body

    $row->setSourceProperty('body', $body_so_far);
    $row->setDestinationProperty('body', $body_so_far);

  } // end foreach

} // end if presentation_info exists

You can see my confusion about source and destination properties here:

$row->setSourceProperty('body', $body_so_far);
$row->setDestinationProperty('body', $body_so_far);

I now believe you should only be setting source properties in source plugins. Setting the destination did not harm anything, but it was not effective.

(You can also see my confusion in getting the existing body at the beginning of each loop iteration and setting it at the end of each iteration, instead of pulling that functionality out of the loop. Oops. I am not changing it now, though.)

This example is cheating because instead of collecting an array of associated presentation bodies, I am just concatenating them into one big body. There are other examples where I did have to collect arrays of data, but I will cover them below.

In addition to grabbing presentation info, I had to get data from custom fields that were already associated with the agenda (meeting MCs, meeting dates and locations):

// I do not know why this stuff doesn't migrate itself, 
// but whatever.
$query_agenda = $this->select('content_type_agenda','c')
  ->fields('c', ['field_emcee_uid', 'field_date_value', 
      'field_location_nid'])
  ->condition('c.nid', $nid, '=');
$agenda_info = $query_agenda->execute()
  ->fetchAll();

if ($agenda_info) { 

  // There SHOULD be only one row. I guess we are taking the last 
  // value if there are multiple. 
  // Lots of these will be NULL, though. 
  // Also we just want to append to the body if there is an emcee.
  foreach ($agenda_info as $a) { 
    $row->setSourceProperty('meeting_date', $a['field_date_value']);
    $row->setSourceProperty('emcee_uid', $a['field_emcee_uid']);
    $row->setSourceProperty('meeting_location_nid', $a['field_location_nid']);

   } // end foreach agenda_info
} // end if agenda_info

Migrating Orphaned Presentations

The key to this migration was to filter out all nodes associated with Agendas, since those presentations are migrated in the Agenda migration. Thus I created a source plugin (PresentationNode.php) that overrode the query() method:

public function query() {
  // Make a subquery of all the NIDs in the relativity table.
  // Return presentation nodes not in this set.
  $linked_presentations = $this->select('relativity', 'r')
->fields('r', array('nid'));

  $parent_q = parent::query();
  $parent_q->condition('n.type', 'presentation')
->condition('n.nid', $linked_presentations, 'NOT IN');

  return $parent_q;

} // end query

The rest of this migration was fairly standard. The hard part was in figuring out that I never want to touch presentation nodes that are associated with agendas.

Creating Redirects

In the new website I wanted linked presentation nodes to disappear, but I wanted the old URLs to be preserved. Thus I wanted redirects from merged presentation nodes to the agendas that digested them.

Drupal migrations have "ID maps" that map NIDs from the D6 site to entity IDs in the D8 one. I kept thinking that I could read these identity maps to create the redirect, but this was stupid. The right way to this was to go through the presentation nodes a second time, this time selecting those nodes that HAD been merged (MergedPresentationNodes.php). Then I needed to fill in the YAML file, but in my initial migration I could not find an appropriate YAML file.

After installing the redirect D8 module, I found a template: modules/contrib/redirect/migration_templates/d6_path_redirect.yml. This specified the fields I needed to fill in.

There were a few tricks in this YAML file, so I will reproduce big chunks of it here:

source:
  plugin: d6_merged_presentation_node
  node_type: presentation

Even though my target was to create a redirect, I could use a node content type as the source. I found this interesting.

constants:
  nodelist: node/
  internal: internal:/

In the D8 database I saw that source redirects took the form node/<entity-id> but that targets were of the form internal:/node/<entity-id>. These constants help create those strings.

process:
  # If you omit this will it auto-generate?
  # rid: rid

This part confused me a lot. This was not a migration from redirects to redirects, so populating the rid (redirect ID?) did not make sense. I tried using things like the presentation node ID, but that did not work well either. I found that omitting the rid entirely made it autogenerate, which is a neat trick that can be used elsewhere.

  redirect_source:
    plugin: concat
    source:
      - constants/nodelist
      - nid

  # This is broken broken broken for multiple presentations.
  # There is no easy way to fix this without an iterator, though.
  redirect_redirect:
    plugin: concat
    source:
      - constants/internal
      - constants/nodelist
      - agenda_nid

The redirect_source and redirect_redirect fields came straight out of the template. The concat plugin allowed me to build (simple) strings for the redirections.

destination:
  plugin: 'entity:redirect'

This was the magic that made a redirect and not a node.

In this case it did not matter whether the agenda nodes had been created be

Linking Nodes via Entity References

I wanted the Agenda content type to be the centre of the new website. Agendas needed to refer to locations, podcasts, video recordings, and other nodes associated with particular meetings. Pre-migration the set of links were a mishmash:

FLOSS Fund Nominees

There is not much that is new to say. Since Node Relativity already related nominees to agendas, I took the same approach that I did when merging presentations into agendas above. But instead of merging strings I pulled out nominee NIDs and used them to populate the YAML file:

field_floss_fund_nominee_link/target_id: floss_fund_nominee

(Truth be told, figuring out HOW to populate these entity references caused me a lot of grief. But that was my own fault.)

Locations

Since the NIDs of locations were already included with the Agenda as nodereferences, populating the entity reference links was not hard. More challenging was getting the Agenda form view to select only locations, as opposed to every possible node of type page. To do this I went into the GUI.

I navigated to the Agenda content type, found the location field, and changed the Reference method from "Default" to "Views: Filter by an entity reference view". In order to do this I had to make a view (duh). The view had the following properties:

After doing this (and using configuration export to save the view and field settings in my config) the Agenda form restricted possible meeting locations to nodes of type "Meeting Location".

Podcasts and Vidcasts

Creating entity reference fields for podcasts and vidcasts was not that difficult: I just used the GUI to add them, and then used the magic of configuration export to retain those settings. Populating the fields was a different matter, because in the D6 database these fields were not formally linked in any way.

Fortunately, most podcasts and vidcasts followed a standard naming scheme: "YYYY-MM: ". In the Agendas I had a meeting date stored. So in the source plugin AgendaNode.php I correlated the two. Here is some of the code I used to correlate podcasts with meeting agendas:

$meeting_date_raw = $row->getSourceProperty('meeting_date');
// Looks like: 2016-03-07T00:00:00

$is_match = preg_match('/^\d\d\d\d-\d\d/', $meeting_date_raw,
  $substr_array);

// There is NO WAY that there should not be a match, because
// all (post-flexinode) agendas have a date.
// HOWEVER, some podcasts should not be associated with some
// agendas (laptop rescue missions). Unfortunately this ruins
// SFD podcasts, which need to be added manually.
if ($is_match && $row->getSourceProperty('presentation_nid')) {

  $meeting_YYmm = $substr_array[0] . ":%";

  // Look for podcasts
  $query = $this->select('node', 'n')
    ->fields('n', array('nid'));
  $query->condition('n.title', $meeting_YYmm, 'LIKE');
  $query->condition('n.type', 'podcast');
  $row->setSourceProperty('podcast_nid', $query->execute()->fetchAll());

} // end if is_match

As the comments indicate, my first attempt had unintended consequences: some agendas (Laptop Rescue Missions) had no associated presentations, but were getting populated with podcasts from presentations held in the same month. Thus I needed to filter out these agendas, which messed up a handful of other Agendas that DID have podcasts but did NOT have presentation nodes (Software Freedom Day celebrations). I opted to fix those manually afterwards.

Overall I am unreasonably happy with this hack, because it will save me hours of tediously associating podcasts and vidcasts with meeting nodes.

Note that this database query is gross and unsafe. I opted to trust the user input because I know who generated it, but if you are working with untrusted data you should not be dumb like me.

(I feel this code is fragile to the problem of multiple podcasts being associated with a single Agenda, but I do not think that happened in our D6 site. Sorry, future me.)

Fixing Dates and Timezones

Ugh. Dates and times. Ugh.

The Agenda content type came with a "Meeting Date" field. The Drupal 6 site regarded this field as being of type Date. As a plain date it had no hour or minute fields, and I am not sure it was timezone-aware.

Timezones mess everything up. The dates get migrated, but for some reason that I have forgotten the hours and minutes become significant, and the dates of the meetings sometimes switch. To fix this, I had to manually set timezones on date fields. In the source plugin AgendaNode.php, I wrote code to set the timezones properly. In the library imports of the module I had:

use \DateTime;
use \DateTimeZone;

and then in prepareRow() I put:

// This should not be hardcoded?
$LOCAL_TIMEZONE = 'America/Toronto';
$DEFAULT_MEETING_TIME = "19:00:00";
$EMPTY_MEETING_TIME = "00:00:00";

// Ugh. Times get stored at 00:00:00, then Drupal does 
// timezone magic to make the time incorrect. So munge the 
// dates. 
if ($a['field_date_value']) { 

  list($date, $time) = explode('T', $a['field_date_value']);
  // This should look like 2016-12-26T00:00:00

  if ((!$time) || ($time === $EMPTY_MEETING_TIME)) { 
    $target_time = $DEFAULT_MEETING_TIME;
  } else { 
    $target_time = $time;
  } // end set time

  $localdate = new DateTime( $date . "T" . $target_time,
      new DateTimeZone($LOCAL_TIMEZONE));

  $localdate->setTimeZone(new DateTimeZone('UTC'));

  $munged_date = $localdate->format('Y-m-d\TH:i:s');
  $row->setSourceProperty('meeting_date', $munged_date);

} // end if field_date_value exists

This code sets a meeting time in local time, and then converts the meeting time to UTC for storage in the database.

Unfortunately this sets all meeting times to 7:00pm, which is in fact incorrect for some of our meetings. I thought about being more clever, but in the end opted to fix the incorrect time fields manually.

I am still not certain why I could not use plain Date fields, which appear to migrate properly: https://www.drupal.org/node/2566779#comment-11783277.

Converting Flexinodes

As mentioned above, flexinodes were an early competitor to CCK. I guess they had been migrated from the Drupal 5 site to Drupal 6. Flexinode agendas were still displayed in the D6 site, but were not editable (I think because they were missing node_revision entries in the database).

The key to migrating flexinodes was in understanding the database structure. In the D6 database, the flexinode_type table provided a list of the "content types" created using flexinodes:

mysql> select ctype_id,name from flexinode_type;
+----------+--------------------+
| ctype_id | name               |
+----------+--------------------+
|        1 | Presentation topic |
|        2 | Meeting Agenda     |
+----------+--------------------+
2 rows in set (0.00 sec)

Associated with these types are fields, which are defined in the flexinode_field table:

mysql> select field_id,ctype_id,label,field_type from flexinode_field;
+----------+----------+------------------------+--------------+
| field_id | ctype_id | label                  | field_type   |
+----------+----------+------------------------+--------------+
|        2 |        1 | Abstract               | textarea     |
|        3 |        1 | Presentation Material  | textarea     |
|        4 |        1 | Reference material     | url          |
|        5 |        1 | Attachment             | file         |
|       11 |        2 | Pre-meeting Topic      | presentation |
|       12 |        2 | Location               | textarea     |
|       10 |        2 | Presentation Topic     | presentation |
|       13 |        2 | Meeting host / emcee   | usergroup    |
|       14 |        2 | Pre-meeting activities | textfield    |
|       15 |        2 | Introduction           | textarea     |
+----------+----------+------------------------+--------------+

Thus the source plugin for flexinode data had to pick out data by field_id and associate them with the proper fields in the D8 content types. Fortunately I was again merging "Presentation topic" and "Meeting Agenda" nodes.

Data for these fields was stored in the flexinode_data table, which has a definitively weird schema:

mysql> desc flexinode_data;
+-----------------+------------------+------+-----+---------+-------+
| Field           | Type             | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+-------+
| nid             | int(10) unsigned | NO   | PRI | 0       |       |
| field_id        | int(10) unsigned | NO   | PRI | 0       |       |
| textual_data    | mediumtext       | NO   |     | NULL    |       |
| numeric_data    | int(10) unsigned | NO   |     | 0       |       |
| serialized_data | mediumtext       | NO   |     | NULL    |       |
+-----------------+------------------+------+-----+---------+-------+
5 rows in set (0.00 sec)

Different fields put data in different places. For example, here is field ID 10, which is like Node Relativity for Flexinodes:

mysql> select * from flexinode_data where field_id=10;
+-----+----------+------------------------------------+--------------+-----------------+
| nid | field_id | textual_data                       | numeric_data | serialized_data |
+-----+----------+------------------------------------+--------------+-----------------+
|  58 |       10 | a:1:{i:0;s:2:"50";}                |            0 |                 |
| 432 |       10 | a:1:{i:0;s:3:"375";}               |            0 |                 |
| 450 |       10 | a:1:{i:0;s:3:"452";}               |            0 |                 |
|  62 |       10 | a:1:{i:0;s:2:"61";}                |            0 |                 |
|  65 |       10 | a:2:{i:0;s:2:"63";i:1;s:2:"64";}   |            0 |                 |
| 311 |       10 | a:1:{i:0;s:3:"321";}               |            0 |                 |
| 318 |       10 | a:2:{i:0;s:3:"313";i:1;s:3:"356";} |            0 |                 |
| 320 |       10 | a:1:{i:0;s:3:"319";}               |            0 |                 |
| 324 |       10 | a:1:{i:0;s:3:"323";}               |            0 |                 |
| 374 |       10 | N;                                 |            0 |                 |
| 376 |       10 | a:1:{i:0;s:3:"391";}               |            0 |                 |
| 383 |       10 | N;                                 |            0 |                 |
| 386 |       10 | a:2:{i:0;s:3:"384";i:1;s:3:"385";} |            0 |                 |
| 395 |       10 | a:1:{i:0;s:3:"394";}               |            0 |                 |
| 397 |       10 | a:1:{i:0;s:3:"396";}               |            0 |                 |
| 400 |       10 | N;                                 |            0 |                 |
| 431 |       10 | a:1:{i:0;s:3:"434";}               |            0 |                 |
+-----+----------+------------------------------------+--------------+-----------------+
17 rows in set (0.00 sec)

Note that the serialized data is in the textual_data column. Oy.

Therefore putting the pieces together required the following:

I did this work in the prepareRow() method. This sounds like a bad idea since I want to iterate over agenda nodes. Fortunately the node table has a field called node_type, and you can select that node type in the YAML file:

source:
  plugin: d6_flexinode_agenda_node
  node_type: flexinode-2

and then do the usual trick for turning these nodes into Agendas:

destination:
  plugin: 'entity:node'
  default_bundle: agenda

Once I figured out that I had to query all fields associated with a NID and then classify on field_id, the actual code of the source plugin FlexinodeAgendaNode.php is fairly straightforward (if tedious). Check the source if it would give you joy.

Astute readers might wonder whether there were any Flexinode presentations not associated with Flexinode agendas. There were a couple, but I decided against writing a second source plugin for two nodes. Even I have limits. Instead I opted to migrate the content from these orphaned presentations post-migration. I also opted not to deal with Flexinode attachments.

Migrating Attachments

Man, I don't even know how this works.

Here is what I do know: The Drupal 6 database has two relevant tables: files and upload. I guess files stores filename information (filename, path, etc) for all the files Drupal knows about, and upload records information about files uploaded (ie attached) to specific nodes. The upload table relates files to nodes by the vid column of the node table, NOT the nid.

Thus migration of file attachments proceeds in several phases:

This workflow really confused me, because there is a d6_upload plugin and an associated YAML file called migrate_plus.migration.upgrade_d6_upload.yml . I found that I did not want to use this. Explaining why is tricky, so bear with me. Say that we enabled this migration, and say that I am concerned about blog posts.

I do not know whether the file migration works if you are not consolidating content types or not.

My solution to this ended up copying code from the migration template at core/modules/file/src/Plugin/migrate/source/d6/Upload.php. One example can be found in UploadNode.php:

use Drupal\node\Plugin\migrate\source\d6\Node as D6Node;

/**
 * @MigrateSource(
 *   id = "d6_upload_node"
 * )
 * Find uploaded files.
 */
class UploadNode extends D6Node {

  /**
   * {@inheritdoc}
   */
  public function prepareRow(Row $row) {

    $nid = $row->getSourceProperty('nid');


    // This is copied from Upload.php
    $query = $this->select('upload', 'u')
      ->distinct()
      ->fields('u', array('fid', 'description', 'list'))
      ->condition('u.nid', $nid, '=');
    $row->setSourceProperty('upload', $query->execute()->fetchAll());


    // print_r($row);

    return parent::prepareRow($row);

  } // end prepareRow

  /**
   * {@inheritdoc}
   */
  public function fields() {
    /* Add an upload field. 
     */

    $orig_fields = parent::fields();
    $new_fields = array(
      'upload' => $this->t('Uploaded Files'),
    );

    $fields = array_merge($orig_fields, $new_fields);

    return $fields;

  } // end fields
} // end class. 

The prepareRow() method queries the database for uploads related to this node. The fields() adds a field called upload which can be used in the YAML mapping. Note that these methods explicitly reference methods and variables from their parent class (namely, Node).

Some content types need source plugins. For these content types, you can extend UploadNode directly, which will add the upload field:

class BlogNode extends UploadNode {
    // stuff goes here
} // end class

Some content types do not need me to write separate source plugins. For these content types I modified the YAML files directly. For example, in migrate_plus.migration.upgrade_d6_node_location.yml I changed the source plugin from:

source:
  plugin: d6_node
  node_type: location

to

source:
  plugin: d6_upload_node
  node_type: location

For these content types I could then add a stanza to the YAML file to set the file attachments:

field_file_attachments:
  plugin: iterator
  source: upload
  process:
    target_id:
      plugin: migration
      migration: upgrade_d6_file
      source: fid
    display:
      plugin: default_value
      default_value: 1
    description: description

The iterator was there because there can be many attachments. I hardcoded the default_value for the display field to 1 so that all attachments would be visible. (This may have been a mistake. It would have been possible to propagate this setting as well.)

In one case (namely Agenda nodes) I needed to add code to the prepareRow() method directly. As usual, merging presentation nodes and adding their attachments to the associated Agenda caused issues, but the concept was the same.

File migration location

One way to specify the location of the Drupal 6 files is to pass a legacy-root parameter to the drush migrate-upgrade command. But if you forget this, it looks like you can set this manually in the migrate_plus.migration.upgrade_d6_file.yml YAML file. In my installation the key constant is source_base_path:

source:
  plugin: d6_file
  constants:
source_base_path: /home/linuxuser/drupal/files
process:
  fid: fid
  filename: filename
  source_full_path:
-
  plugin: concat
  delimiter: /
  source:
    - constants/source_base_path
    - filepath
-
  plugin: urlencode

I believe that I have changed this constant and successfully pointed the file source correctly.

Image locations

I found that several image migrations were not working. The migrations were failing because the files were not found. I found that there were a bunch of subfolders in the D6 sites/default/files folder, and that these subfolders were not being searched:

My solution was to flatten the hierarchy. To do this I used a program called Meld, because some of the images in the subfolder had identical names to other files.

I do not know what these subfolders are for or why they were created, although I can guess.

Migrating Comments

Comment migration is similar to file migration in that you first have to migrate nodes which have comments, and then migrate the comments for those nodes later.

One quirk is that D8 wants to split comments into two subtypes: comment and comment_no_subject. I foolishly rebelled against this and decided to turn all comments into comment_no_subject types. That made life much more difficult. You probably want to conform to whatever Drupal decides to do.

I had a lot of problems actually getting comments to display, even after the comment nodes were migrated. Here is what I learned:

First: similarly to files, you need to add a field to each content type that will host comments. This field is called comment_no_subject and is specified in YML files with names like field.field.node.page.comment_no_subject.yml (These live in config/install of kwlug_content_types.)

Inside each of these field definition, there is a status default variable:

default_value:
  -
    status: 1
    cid: 0
    last_comment_timestamp: 0
    last_comment_name: null
    last_comment_uid: 0
    comment_count: 0

It is very important that the status be set to 1 if you want comments to display. 0 means comments are hidden. 2 probably means comments are read/write (which might be good for your site, but not for mine -- comments were read only, and kept for historical purposes).

In addition to setting the default value for this field, I explicitly set the field in my node migration template. For example, in migrate_plus.migration.upgrade_d6_node_page.yml there is a stanza that reads:

process:

  # other stuff skipped..

  comment_no_subject/status:
    plugin: default_value
    default_value: 1

However, I am pretty sure that the default_value in the field definition field.field.node.page.comment_no_subject.yml takes precedence.

There were some database tables in the D8 database that were useful in figuring out these settings:

In the configuration migration there are a bunch of different YML files you could incorporate. I incorporated the following:

To re-merge comment and comment_no_subject I needed to modify a bunch of YAML files. In migrate_plus.migration.upgrade_d6_comment_type.yml I needed to map the id to a default value:

process:
  # id: comment_type
  # Make all comment types the same
  id:
    plugin: default_value
    default_value: comment_no_subject

I then had to use this migrated value in TWO places in migrate_plus.migration.upgrade_d6_comment.yml :

process: 
  # stuff omitted 

  field_name:
    plugin: migration
    migration: upgrade_d6_comment_type
    source: comment_type
  comment_type:
    plugin: migration
    migration: upgrade_d6_comment_type
    source: comment_type

I had missed field_name for a long time and eventually discovered that it was causing comment_entity_statistics to break.

Finally, there was the issue of merged presentations and agendas. Picking the right nodes to migrate comments from presentation nodes to agenda ones required me to specify a bunch of mappings manually:

process: 
  # stuff skipped

  # If this is a merged presentation node then
  # use the agenda, not the presentation node
  entity_id:
    plugin: migration
    migration:
      - kwlug_migrate_dummy_merged_presentations
      - upgrade_d6_node_agenda
      - upgrade_d6_node_blog
      - upgrade_d6_node_book
      - upgrade_d6_node_location
      - upgrade_d6_node_nominee
      - upgrade_d6_node_page
      - upgrade_d6_node_podcast
      - upgrade_d6_node_presentation
      - kwlug_migrate_forum
      - kwlug_migrate_library
    source: nid
    no_stub: true

The key was the kwlug_migrate_dummy_merged_presentations, which mapped presentation nodes to the agenda NIDs that absorbed them. Unfortunately I ended up having to specify all the content types with comments to migrate as well, which was messy and irritating.

RDF Module Breaks Rollbacks

As of this writing, trying to roll back comment migrations failed when I had the rdf module installed:

PHP Fatal error:  Call to a member function url() on null in
/home/linuxuser/kwlug-drupal-v05/web/core/modules/rdf/rdf.module on
line 252

It looks like the RDF module has caused problems in the past: https://www.drupal.org/node/2340401 .

Instead of being a good citizen and fixing the problem I just uninstalled the RDF module.

Filtering Taxonomies

There were a number of taxonomy vocabularies defined in the old site, and I wanted to migrate exactly one. I could not find a clean way to do this, so I resorted to a dirty hack.

First, I found the name and vocabulary ID of the vocabulary I wanted to keep. Then I wrote a process plugin with the following transform() method:

public function transform($value, MigrateExecutableInterface $migrate_executable, Row $row, $destination_property) {

  $allowed_taxonomies = array(
    'blogtags' => 9,
  );

  if (in_array($value, $allowed_taxonomies)) {
    return $value;
  } // end if

  return FALSE;
} // end transform

Then I did something sneaky. In the YAML file migrate_plus.migration.upgrade_d6_taxonomy_term.yml I added the following stanza to the process section:

process:

  # Only allow terms from taxonomies we care aboot
  dummy_test:
    - plugin: select_taxonomy
      source: vid
    -
      plugin: skip_on_empty
      method: row

This dummy_test works as follows: it calls my custom select_taxonomy process plugin and passes the vocabulary ID as a parameter. This is the vocabulary associated with this term. If the term has a vocabulary ID that is in the whitelist, the migration continues and the term is migrated. Otherwise the select_taxonomy plugin returns FALSE (ie nothing) and the skip_on_empty prevents the migration of this term.

This hack was not my preferred approach. Ordinarily I would have overridden the query() method in a source plugin. There was a reason I avoided this, but I do not remember what it was. Maybe it was because hardcoding VID values is ugly, and it would have been easier to see the hardcoding in a process plugin.

My actual preferred approach would have been to filter out unwanted taxonomy terms right in the YAML file with no plugins, but there was a reason that failed too.

I played the same sneaky trick in the migrate_plus.migration.upgrade_d6_taxonomy_vocabulary.yml to migrate only vocabulary names I had whitelisted.

Setting Redirects Using a CSV Source

RSS feeds and views had some URLs that would be different in the D8 site than the D6 one. So that old feeds would not break, I wanted to create redirects from the old feed locations to the new ones.

One option would have been to create these redirects manually after the site had been migrated. But I am forgetful, so I decided to automate this with a migration, using a CSV file as the source.

The key to doing this was to install and enable the migrate_source_csv module. This defines a CSV source plugin, which I used in my migration kwlug_migrate_rss_redirect:

source:
  plugin: csv
  path: /home/linuxuser/drupal/rss_redirect.csv
  header_row_count: 1
  keys:
    - sourcepath
  column_names:
    0:
      sourcepath: Source
    1:
      destpath: Destination
    2:
      statuscode: Status Code

The code is fairly self-explanatory. The sourcepath, destpath, and statuscode entries are used in the migration, but Source, Destination and Status Code are not (as far as I can tell).

I did not want to specify the status code 301 if I did not need to, so I added a default_value plugin to my process section:

process:
  status_code:
    plugin: default_value
    source: statuscode
    default_value: 301

Where to Do What

I struggled a lot with figuring out how the Drupal migration process wanted me to think. A few blog posts helped get me started:

Despite these resources, it took me a long time to
understand the overall structure migrations want. This section documents some of these lessons.

The high-level view of migration components is as follows:

For low level examples, read on.

Migration Patterns

Drupal migrate really wants to transform exactly one node/entity from the D6 database into exactly one entity in the D8 database. If you want to merge two D6 nodes into one D8 entity (for example, when I merged Meeting Agenda and Presentation content types) then you have to write custom code in a source module to pull associated nodes from the database.

Similarly, if you want to migrate one node from the D6 database into two distinct entities in D8 (for example, in generating a redirect entity from each merged Presentation in addition to migrating the node itself) then you may need TWO migrations (effectively reading the D6 database twice). Trying to output two nodes from one YAML migration file does not seem to work. Trying to reuse "migration map" database entries tends not to work unless you can specify a YAML file that uses the migration process plugin cleanly.

If you want to associate nodes with each other (eg associating file attachments to their nodes) then you should migrate the linkee first, and then the linker (ie files should be migrated first, and then nodes that have those files as attachments migrated later). To do this you have to declare a dependency in the migration_dependencies section of the linker. (I think Drupal can handle circular dependencies by rerunning migrations again and again.)

Your ability to incorporate additional information into migrated nodes is pretty limited (eg adding a new field to a content type and populating it). If you want to do it, you have the following options:

Source Plugins

Source plugins have two interesting methods: query() and prepareRow(). (There are also two less-interesting ones: getIDs() and fields(), which are pretty straightforward.) Both query() and prepareRow() pull from the D6 database, but there are some conceptual differences between them:

To assign fields in the prepareRow() method, use the $row->setSourceProperty(). I was confused because there is also a $row->setDestinationProperty() but I think this is not relevant in prepareRow(). You want to set the source properties in source plugins.

Drupal 7 apparently had more methods to override. For example, https://www.drupal.org/node/1132582 documents prepare() and complete() methods, but these no longer exist in Drupal 8.

Drupal has some elaborate query builder syntax. Fortunately the syntax appears to be similar to Drupal 7, so there are cheatsheets available: http://www.eilyin.name/note/database-queries-drupal-8-7 helped get me started with Drupal 8 syntax, and the "Drupal 7 database Cheat Sheet" from https://wizzlern.nl/drupal/cheat-sheets got me most of the rest of the way.

YAML Mapping Files

The YAML Mapping files from source to destination entities is supposed to be the easy part, but I found that it was difficult to set mappings unless the source plugin output exactly the information I needed.

Sources and destinations

One thing I struggled with in YAML mapping files is where the different components came from.

Consider the following fragment of a mapping file:

process:
  field_presentation_title: presentation_title
  field_floss_fund_nominee_link/target_id: floss_fund_nominee

This means that field_presentation_title and field_floss_fund_nominee_link are fields in the destination content type. The target_id is confusing, and I do not remember how exactly I found it (maybe here: http://drupal.stackexchange.com/questions/223715/migrate-multi-value-paragraph-field), but I do see that there is a clue about the name in the database schema:

mysql> desc node__field_floss_fund_nominee_link;
+-----------------------------------------+------------------+------+-----+---------+-------+
| Field                                   | Type             | Null | Key | Default | Extra |
+-----------------------------------------+------------------+------+-----+---------+-------+
| bundle                                  | varchar(128)     | NO   | MUL |         |       |
| deleted                                 | tinyint(4)       | NO   | PRI | 0       |       |
| entity_id                               | int(10) unsigned | NO   | PRI | NULL    |       |
| revision_id                             | int(10) unsigned | NO   | MUL | NULL    |       |
| langcode                                | varchar(32)      | NO   | PRI |         |       |
| delta                                   | int(10) unsigned | NO   | PRI | NULL    |       |
| field_floss_fund_nominee_link_target_id | int(10) unsigned | NO   | MUL | NULL    |       |
+-----------------------------------------+------------------+------+-----+---------+-------+
7 rows in set (0.00 sec)

The presentation_title and floss_fund_nominee are field names from the source plugin (ie defined by the fields() method in the source plugin.) If you are adding extra information (for example, more fields) to a content type then you must define these names.

Updating imported YAML files

Another quirk about YAML files is that changing them (as you do repeatedly when troubleshooting them) is a pain. If you change a YAML file and resume a migration (perhaps with drush migrate-rollback followed by drush migrate-import) then the migration will continue to use the version of the YAML file it imported when you installed the associated module (in my case, kwlug_migrate). You somehow need to get rid of this configuration object and replace it with your updated version in order to test your changes.

If you try to naively uninstall and enable the module you will get stuck because the configuration objects are already registered with Drupal: exception 'Drupal\Core\Config\PreExistingConfigException' with message 'Configuration objects (migrate_plus.migration.kwlug_migrate_agenda_redirect.yml) provided by kwlug_migrate already exist in active configuration'

To get around this problem I just reinstalled Drupal again and again, but in writing this entry I found a better way: use config-import to reread the .yml files. Say my configurations are in a folder called test. Then you might do something like this:

#!/bin/bash

element=$1
srcdir=/path/to/drupal/source
testdir=$srcdir/test

pushd .
cd $srcdir

time drush migrate-reset-status $element  --yes 
time drush migrate-rollback $element  --yes 
time drush config-import --partial --source=$testdir --yes
time drush migrate-import --execute-dependencies $element --yes --notify 

popd

The argument $1 should be the name of a migration (eg upgrade_d6_node_location), and the corresponding YAML file should be in the $srcdir/test folder.

Once you are happy with the YAML file you can then move it back to the config/install/ folder of kwlug_migrate.

If this does not work for you and you are not a dumb-dumb who reruns the entire migration every time, there are some other possible approaches documented here: http://drupal.stackexchange.com/questions/164612/how-do-i-remove-a-configuration-object-from-the-active-configuration.

Assigning constant strings

Sometimes you want to assign a constant to a field in YAML file:

process:
  title: 'Every page should have the same title'

This does not work. Every field on the right-hand side of a process statement seems to look for a variable on the left hand side, even if that variable is in quotes.

The solution is to define a constant in the source section of the YAML file, and assign that instead:

source:
  constants:
    static_title: 'Every page should have the same title'

process:
  title: constants/static_title

Process Plugins

If you are writing source plugins then process plugins tend to be simple or unnecessary, because you can probably massage data in the source plugin's query() or prepareRow() methods. However, I found process plugins useful for the following things:

Process plugins can be chained together. This is good for specifying default values, or for inserting a debug plugin. The official documentation is pretty good here, but it took me a long time to find: https://www.drupal.org/docs/8/api/migrate-api/migrate-process/migrate-process-overview

Destination Plugins

Destination plugins are black magic. I know nothing about them except that you can specify the destination entity type in the YAML migration file, in the destination section:

destination: 
  plugin: 'entity:redirect'

will specify the target is a redirect, even if the source is a node.

Migration IDs and Maps

Doing a Drupal migration creates a bunch of tables with names prefixed with migrate_map_ and migrate_message_. I am not clear what migrate_message_ is for (although I can guess).
The migrate_map_ tables store mappings of NIDs (or entity IDs) on the old site to the new one.

Why do you need a map? The primary reason is to use the migrate process plugin. For example, in migrate_plus.migration.upgrade_d6_comment.yml I have:

comment_type: plugin: migration migration: upgrade_d6_comment_type source: comment_type

When migrating comments I squished comment and comment_no_subject together, so I used this migration to indicate that the comment_type value from the source (source: comment_type) should be transformed in the same way for this comment.

A secondary reason for maps are incremental and live updates. If users continue to update the D6 site while you are migrating the site to D8, you do not want NIDs to clash. You might also want to refer to migration maps from one entity type when modifying another (see https://www.drupal.org/docs/8/api/migrate-api/migrate-process/process-plugin-migration for examples of this).

I have a feeling that in a real site migration with lots of users and lots of nodes I would have had to be more careful around using migration maps correctly, but I was sloppy with them when writing my own YAML files.

Troubleshooting

Keep drush sqlc running on both your D6 and D8 database. I found I was digging through database structure all the time to figure out field names and how tables related to each other. show tables like '%field%' and desc tablename were good friends.

Sometimes migration failures are logged in the Drupal logs. Use drush wd-show to see a brief summary, and the web interface to see a lot more detail.

I found that running script when running migrations was super useful, because the migrations could get too verbose for my terminal's scrollback buffer. Unfortunately the colour output drush produces make the script output gross, but the Internet has a solution here: http://unix.stackexchange.com/questions/14684/removing-control-chars-including-console-codes-colours-from-script-output . I have this code into a clean-typescript.pl helper script.

I used print_r a lot in my source and process plugins to figure out what data structures I was trying to query/populate. In complicated source plugins I had code like this at the end of my prepareRow() method:

if ($nid >= $this->DEBUG_NID_START  && $nid <= $this->DEBUG_NID_END ) {

  print_r("\n row is\n");
  print_r($row);

} // end if debug

By setting DEBUG_NID_START and DEBUG_NID_END to appropriate windows, I could see what was going on for a few target nodes without getting overwhelmed.

Similarly, I created a debug_contents process plugin to see the contents of fields I was trying to map. Here is example usage from a YAML file. Say I was having trouble understanding what presentation_title was. I could then change:

process:
  field_presentation_title: presentation_title

to

process:
  field_presentation_title: 
    - 
      plugin: debug_contents
      source: presentation_title

    - 
      plugin: get

This would print out the data structure to the console before running the migration.

Failures and Improvements

The first failure was that this migration took so long (maybe 2.5 months of sporadic work). Many of the techniques I documented in this post took days of experimentation and reading to figure out.

When I started the migration I did not know about Drupal Console or Composer. I am still not sure why Drupal Console is important, but apparently Composer is quickly becoming the standard for Drupal packaging.

I deliberately did not worry about incremental migrations (resyncing the database by migrating only new content) or rolling back migrations. These are important considerations for larger sites.

I did not manage to get a bunch of header images migrated properly. I believe most of the data exists, but I am not clear how to associate it with nodes properly.

I wish I had been able to find better sources of help than I did. I was too scared to post threads on drupal.org directly, which was a mistake. Instead I relied on DuckDuckGo searches, and when I got really desperate I attempted to ask questions on the #drupal-migrate IRC channel and on https://drupal.stackexchange.com . Neither of these support channels worked well. The IRC channel was basically dead, and few people seem active on the Stack Exchange group. (Then again, it isn't as if I am answering other people's questions on those forums, so...)

If I had been more conscientious I would have started with a minimal (or custom) install profile rather the standard one. The standard one created some content types and menu items I did not like.