Drupal 8 Migration Survival Strategies

We received word that KWLUG needed to move hosting providers this year. Like an idiot, I took this opportunity to migrate the KWLUG website from Drupal 6 to Drupal 8. This is a giant dump of what I learned. It is so long and so boring I can barely proofread it.

Drupal 8 Migration Survival Strategies
Sidebar!

I budgeted 2-3 weeks for the data migration; it took almost two months, and is barely "good enough" to get by. As with everything else in Drupal, the learning curve was steep, and I spent hours and hours struggling to understand how the Drupal migration system wanted me to approach problems. Things that seemed simple on the surface took days of frustration and effort to get working.

The lack of documentation around this process was particularly difficult. I found myself reading the same dozen blog posts again and again, trying to figure out how to generalize their examples to my situation. My hope is that writing out these conceptual difficulties will save you time in figuring out your issues. All of the code I wrote for this will be on my Github account: https://github.com/pnijjar/kwlug-drupal8-migration. I can also produce a tarball on request.

On the plus side, the migration team has put a lot of work into doing Drupal 6 to Drupal 8 upgrades, and this effort provided good scaffolding upon which I built my migration. In addition, configuration management saved my tofu again and again. It is probably the best thing about Drupal 8. With it, I can use the GUI to configure the site and then preserve that configuration for future migrations. This made migrations far more repeatable, and thus easier to develop.

This blog post will focus on migration, as opposed to site building or setting up a development environment. I cover those topics in a companion post.

I refer to Drupal 8 as "D8" and Drupal 6 as "D6" a lot.

Believe it or not, this gigantic blog post does not document every single migration I did on the site. I tried to include only things that were interesting, and/or which other people might benefit from seeing.

UPDATE: I delivered a talk for the Waterloo Region Drupal User Group about D8 migrations. Here are the slides and here are the slide sources .

Migration Overview

Here is an outline of my migration journey:

I grabbed a copy of the D6 website and set it up on a local machine (including the database and the files directory, which are the most important parts).
I looked through the D6 site and mapped out what I wanted to migrate. A spreadsheet helped here. (local-docs/content-types.ods in the sources.)
I downloaded D8 and did a dummy install.
I used drush site-install to automatically install a dummy site.
I registered the D6 database credentials on the D8 site.
I used drush migrate-upgrade to generate a bunch of migration .yml files (YAML migration files), which I exported using drush config-export
I created a custom module for migrations and dumped the exported migrations there.
I set up scripts to reinstall Drupal and run the migrations automatically.
I created a new migration group so I could run only the migrations I had checked and fixed. This also allowed me to avoid migrating things I did not care about (such as blocks).
One by one, I enabled the migrations I wanted for the new site by adding them to my custom migration group, then developing and troubleshooting the migration until that migration worked. This often involved writing source and process plugins, and modifying the given YAML migration files.
When I had most of the migrations working, I did the rest of the site development (views, themes, etc), which I won't document here.
When I was ready to deploy the site, I pulled a fresh copy of the D6 database and reran the migration on this fresh copy.

Throughout the process I kept track of the following things (in a computer file. Not in my head!):

What got migrated and what was too much trouble to migrate.
Steps I would have to do before the migration.
Steps I would have to do to fix up the site post-migration.

Keeping track of these things was enormously helpful, because it served as a checklist of things to remember when deploying the site.

Local Modules

I created several custom modules for this migration. I could probably have consolidated them if I was wiser, but oh wells.

kwlug_migrate is the main migration module.

Inside its install/config folder I put the YAML migration files for my migrations.
Inside the src/Plugin/migrate/source folder I put custom source plugin PHP files.
Inside the src/Plugin/migrate/process folder I put custom process plugin PHP files.

To run a migration I first enabled this module, and then ran

drush migrate-import --verbose --execute-dependencies --group=kwlug_migrate --yes

kwlug_content_types contains configuration information for the site that was not directly related to migration. This included the following:

a premigrate_settings folder with configuration that needed to be run before the main migration, because otherwise the migration would break. This included:
- system.date.yml to set the timezone
- user.settings.yml to disable account creation by users
- filter.format.full_html.yml and filter.format.restricted_html.yml which defined text filters.
To import these before the migration I ran
```
drush config-import --partial \
    --source=modules/custom/kwlug_content_types/premigrate_settings \
    --yes
```
A bunch of configurations in the config/install folder. This mainly consisted of custom field definitions (eg field.storage.node.field_page_category.yml) that did not have many dependencies, but were needed for the migration to work well.
A postmigrate_settings folder which was run after the migration.

Including these configurations in the config/install folder was a bad idea because it would have meant adding a lot of spurious dependencies to that folder, making the site more fragile overall.

In practice this ended up being the form and displays for my content types, which were needed so that I could see migrated content after the migration ws finished.

I activated these similarly to the premigration settings.

kwlug_dependencies is a stupid little module consisting of a single file (kwlug_dependencies.info.yml). Its purpose is to list dependencies for the project that were to be enabled. As such it was a poor-man's install profile. (Unlike a real install profile, you cannot install themes this way.)

What's missing

Despite the Migrate team's work, there are a bunch of things that do not upgrade cleanly. The best place to start is here: https://www.drupal.org/docs/8/upgrade/known-issues-when-upgrading-from-drupal-6-or-7-to-drupal-8

Here are some of the things that burned me:

"Fields missing on the edit form" was irritating. The fix was to add configuration to the postmigrate_settings folder that would make these fields display.
"Text/Input formats": the filter_null filter messes everything up. Any node assigned a filter of filter_null will not display. I fixed this by migrating my input filters explicitly to exclude filter_null.
"Views": not migrated but I did not care that much. I wanted to redo views anyways.
"Aggregator Categories": this hurt me for a feature I wanted to develop later, but did not hurt me during the migration.
"Image and file attachments" : This did not work well for me. I found myself writing my own migrations to deal with file attachments.

Other Notes

Drupal 8 is an I/O hog. Use an SSD on your development machine if you possibly can. I do not know why Drupal 8 in general and migrations in particular hit the database so hard, but they do. (I am not the only one who finds Drupal 8 slow: https://deekayen.net/drupal8-xdebug-installer-timeout .)

I disabled Drupal cron on my development sites because running cron slowed everything to a crawl for over half an hour. (See "Drupal 8 is an I/O hog" above.)

I wrote a bunch of local scripts to make migrations more repeatable. I put those in the bin/ folder of the code for this project.

About the KWLUG website

Before proceeding to specific examples I will waste some time talking about the structure of the data I was migrating, and some design decisions I made.

The KWLUG website has been around since Drupal 4. The current iteration has been around since 2005. Originally we had planned to use KWLUG as a content generation hub: members would contribute reviews and forum posts and blogs to the site. This never took off, and KWLUG morphed into an information site focused on meetings and meeting announcements.

In migrating http://kwlug.org, I had a number of concrete objectives in mind:

I wanted to clean up and simplify content types:
- There were a bunch of content types that essentially had the same field sets, but used names to distinguish the types of content. I wanted to make these all Page nodes.
- There were content types with fields that were rarely used and no longer needed.
- There were content types that needed additional fields.
In the Drupal 6 site Presentation Topic and Meeting Agenda were two separate content types, linked using an old module called Node Relativity. The intentions behind having two distinct content types were noble (having a queue of upcoming presentation topics to be scheduled) but in practice we just created presentations when scheduling meetings. Also Node Relativity was a dead-end module, superseded by entity references. There is also a longstanding bug that made attaching presentations to agendas difficult to use. Thus I wanted to do the following:
- Get rid of Node Relativity.
- Merge the Presentation Topic and Meeting Agenda content types so there would only be Meeting Agendas.
- Preserve "orphaned" Presentation Topic nodes that were not associated with agendas.
- Ensure that links to merged presentation topics continued to work (because such links sometimes exist in podcast show notes).
Some ancient version of the Drupal site used Flexinode for the earliest meeting agendas. Flexinode was a competitor to CCK in Drupal 6, but CCK won and nobody had bothered to migrate this old content. I wanted to fix that.
There were a bunch of podcasts and video recordings of meetings, but they were not linked to respective meeting agendas. I wanted to fix this, in an automated way if possible.
I wanted to reorganize the site to make it easier to find information people cared about.
I wanted to retheme the site to make it mobile-friendly but still usable on desktops. This was less about migration than site building.
I wanted to preserve historical content as much as possible. It is likely that nobody will ever look back at old posts, but sometimes it is enlightening to explore how KWLUG did things in the past.
I wanted to preserve URLs as much as possible.
I wanted to delete hundreds of spam accounts in the migration.
Over the years we had built up a lot of cruft in permissions and roles. I wanted to simplify and start fresh.
I wanted to pretend that spending two months automating the migration was better spent than manually copying a few hundred nodes manually. My thinking was that building a migration would scale to tens of thousands of nodes, and thus make me more employable.
I wanted to automate the process as much as possible and make it as repeatable as possible. I thought this would make the cutover process easier.

Getting Started

Setting up databases

The following guide is pretty good for getting the database set up: http://affinitybridge.com/blog/migrating-from-drupal-6-to-drupal-8

There are parts in that blog post that use Drupal Console, but I was not able to get Drupal Console working on my setup, so I just made my YML files manually.

There is a migration GUI, but don't bother with it. It times out for even small migrations. Use Drush instead.

For some reason I believed that the settings.local.php wanted both a $databases['migrate']['default'] and $databases['upgrade']['default'] entry pointing to the D6 database. So I did the following to set them both to be equal:

// Database entry for `drush migrate-upgrade --configure-only`
$databases['upgrade']['default'] = array (
  'database' => 'd6_db_name',
  'username' => 'd6_db_user',
  'password' => 'd6_db_password',
  'prefix' => '',
  'host' => 'localhost',
  'port' => '3306',
  'namespace' => 'Drupal\\Core\\Database\\Driver\\mysql',
  'driver' => 'mysql',
);

$databases['migrate']['default'] = $databases['upgrade']['default'];

To generate the initial set of migration settings, I then ran:

drush migrate-upgrade --configure-only

This generated configurations which I could then export:

drush config-export --destination=/tmp/migrate01

Then I copied the migration .yml files that began with migrate_plus.migration. to a new folder. These would be the basis files for my migration.

Migration module and migration group

To set up the kwlug_migrate module, I did the following:

I made a folder called kwlug_migrate in the modules/custom folder of my Drupal install.
I made a file called kwlug_migrate.info.yml in this folder.
I made a nested config/install folder inside the kwlug_migrate folder.
I added the migrate_plus.migration. YAML files to the config/install folder.

The kwlug_migrate.info.yml file looked like this:

name: kwlug_migrate
type: module
description: Migrate content from Drupal 6 to Drupal 8
core: 8.x
package: Custom
dependencies:
  - migrate_plus
  - migrate_drupal
  - migrate_tools
  - migrate_upgrade
  - kwlug_content_types

This was actually enough to try a migration:

drush migrate-import --verbose --execute-dependencies --yes

but the migration took a long time and did not do what I wanted. The next step was to set up a migration group. I called mine kwlug_migrate, because I name things creatively.

To set the migration group I added a file to the config/install folder called migrate_plus.migration_group.kwlug_migrate.yml . It defined the migration group as follows:

id: kwlug_migrate
label: D6 imports
descriptions: Content to import to the new site
source_type: Drupal 6
shared_configuration:
  source:
key: upgrade

This file might not even be necessary. What is necessary is selecting a target YAML file (say migrate_plus.migration.upgrade_d6_node_blog.yml) and changing the following line from:

migration_group: migrate_drupal_6

migration_group: kwlug_migrate

Then I reran the migration as:

drush migrate-import --group=kwlug_migrate --verbose --execute-dependencies --yes

and Drupal attempted to migrate everything in the migrate group (in my case upgrade_d6_node_blog) and all the associated dependencies (regardless of which migration group they are in). It is nice to track down those dependencies and put them in the kwlug_migrate group as well. Then you will have a set of YAML files you can keep (because they are in the migration group) and a set you can discard.

Setting System UUID

If you install the initial system with drush site-install then Drupal sets a UUID. Then when you try to override certain configurations (in my case system.site.yml to change the front page display) you may get messages like Site UUID in source storage does not match the target storage. This problem is documented here: https://github.com/drush-ops/drush/issues/1625 .

The quick fix is to explicitly set the UUID of the site after it is installed, so it matches the UUID in system.site.yml:

drush cset system.site uuid 3112d604-7bb2-4dba-b418-f4f542f2682c --yes

Reducing Content Types

I discovered that I had a number of content types (blogs, pages, locations, book) that were all effectively the same, in the sense that they had the same sets of fields. I guess using different content types to semantically differentiate content is okay, but I decided to consolidate these types and differentiate them in a different way.

Take the example of locations. The migration for these is specified in migrate_plus.migration.upgrade_d6_node_location.yml . The source and destination sections of this YAML file originally looked like this (with all other sections omitted):

source:
  plugin: d6_upload_node
  node_type: location
  constants:
bundle_type: location

destination:
  plugin: 'entity:node'
  default_bundle: location

I wanted all location nodes to be turned into pages. To do this, I modified the destination bundle as follows:

source:
  plugin: d6_upload_node
  node_type: location
  constants:
bundle_type: location

destination:
  plugin: 'entity:node'
  default_bundle: page

Of course, I needed to ensure that all the target fields for pages were specified in the YAML file as well.

Classifying content types

I wanted to maintain distinctions between locations and other page types. My original thinking was to use a taxonomy term for each page type, and assign that taxonomy term during migration. But this article (which is well worth reading) convinced me otherwise: http://blog.dcycle.com/blog/83/what-content-what-configuration/ . This article argues that taxonomy terms are data that can be changed at any time. Furthermore taxonomy terms are kept in the database, not in Drupal configuration (which could be exported into YAML files). The suggested solution was to add a select field to my page content type. This field would have a fixed set of values -- one for each content type.

To create this I used the (Drupal 8) GUI:

First I made sure that the Page content type was migrated into Drupal 8.
In the Page content type, I added a new field called page_category of type List (text).
For Allowed values I made one entry per content type (whether they were merged content types or not). I kept the key values easy to parse ('meeting_agenda' instead of 'Meeting Agenda').
I reused this field and added it to my other content types as well.
I used drush config-export to export the configuration and pick out field.storage.node.field_page_category.yml and each of the field.field.node.*.field_page_category.yml files. These went into the config/install of the kwlug_content_types module.

The field.storage.node.field_page_category.yml is almost editable by hand, in case you want to add other content types to the list later on.

The next step was to assign the content type in the migration YAML file. To do this for location was fairly easy, since every single location would have the same value. I started by adding a constant to the source section of the YAML file:

source:
  plugin: d6_upload_node
  node_type: location
  constants:
category: 'Meeting Location'
bundle_type: page

and then assigning that category to the field:

process:
  [stuff snipped]

  field_page_category: constants/category

Enabling Display Fields in Migrated Content

At some point I was convinced that my custom fields were being migrated properly, but they were not showing up when I displayed nodes. When I navigated to the associated content types, the fields were listed as "disabled" in the "Manage Form Display" tab. Enabling these fields in the "Manage Form Display" and "Manage Display" tabs makes the (populated!) fields display properly.

The known issues page (https://www.drupal.org/docs/8/upgrade/known-issues-when-upgrading-from-drupal-6-or-7-to-drupal-8) acknowledges that this is a problem, but the listed solution is unsatisfactory: after each migration you are supposed to manually re-enable the fields. That is awful, so here is a better way:

Do the migration
Go to the GUI and manually fix the displays and form displays once
Use drush config-export to export the configuration
Add the relevant config entries to the postmigrate_settings folder of the kwlug_content_types custom module. For example, for the Page content type I had to add:
- core.entity_view_display.node.page.default.yml
- core.entity_view_display.node.page.teaser.yml
- core.entity_form_display.node.page.default.yml
Enable these configurations after the main migration, with an invocation like this: drush config-import --partial drush config-import --partial

The reason you import the configuration after the main migration is that these .yml files have a bunch of dependencies, and including all of these dependencies is messy and fragile.

Of course, every time you update the content type with new fields (or new orderings of the fields, or new widgets for field display...) then you have to update these files.

Textfields and Textareas

Say your Drupal 6 site has a content-type with a string field. That string field is set as follows:

It has no maximum length
Its form field is configured to be a Textfield (ie one line of text)

When you migrate this field it will migrate, but will be displayed as a Textarea (with multiple lines of text). This is due to ambiguity in migrating the field: https://www.drupal.org/node/1117028 .

I tried a bunch of automated ways to set this information during the migration, but gave up. The easy way to deal with this is to alter the Drupal 6 database: set each affected text field to have a maximum length of 255. Then the migration will assign the right type, and the forms will have textfield widgets.

Deleting Spam Accounts

Instead of attempting to delete spam accounts in the D6 site directly, I got rid of them during the migration. To do this, I wrote a custom source plugin for users (ContributingUser.php). I defined a "contributing user" as a user that had authored a node. Then in the plugin I had the following query() method:

/**
 * {@inheritdoc}
 */
public function query() {
  // Make a subquery of all the UIDs who have authored nodes.
  $node_authors = $this->select('node','n')
->fields('n', array('uid'));

  return $this->select('users','u')
->fields('u', array_keys($this->baseFields()))
->condition('u.uid', 0, '>')
->condition('u.uid', $node_authors, 'IN');

} // end query

The first query finds all authors of a node, and the second picks only users that are in that list of authors. This filters out any account that has not authored a node, which includes all spam accounts (and some legitimate lurker accounts, unfortunately).

This technique can be used to filter out all kinds of input, so long as you can distinguish legitimate from illegitimate data using a query.

I guess I should point out a couple of other elements of the plugin. Firstly, I reused most of the existing User plugin by extending it:

use Drupal\migrate\Row;
use Drupal\user\Plugin\migrate\source\d6\User as D6User;

class ContributingUser extends D6User {

I also had to define an ID for this plugin, which is done in a comment:

/*
 * @MigrateSource(
 *   id = "d6_contributing_user"
 * )
 */

Then in the migrate_plus.migration.upgrade_d6_user.yml I had to specify the use of this plugin:

source:
  plugin: d6_contributing_user

I made one other change of note: I disabled all user accounts, with the idea that active users could have their accounts re-enabled later. This required setting a default value for the status field in the YAML file:

status:
  plugin: default_value
  default_value: 0

(Yes, being allowed to use 0 as a constant when you have to define strings feels inconsistent to me as well.)

Simplifying User Roles

In addition to having too many spam users the old D6 site had accumulated a lot of spurious user roles over the years ("librarian", "speaker") that were no longer needed. I decided to start fresh by including only the built-in "administrators", "authenticated users" and "anonymous users", then adding other roles in the new website as required.

This meant I had to filter out roles somehow. To do this I changed the migrate_plus.migration.upgrade_d6_user_role.yml file to migrate only the three built in roles and ignore the rest. In the process section, I changed the id stanza from:

id:
  -
plugin: machine_name
source: name
  -
plugin: user_update_8002

id:
  -
plugin: machine_name
source: name
  -
plugin: static_map
source: name
bypass: false
map:
  'administrator': 'administrator'
  'authenticated_user': 'authenticated'
  'anonymous_user': 'anonymous'

# plugin: user_update_8002

(Once again, I am mystified why I was allowed to use straight strings on the right hand sides of the map. YAML is weird.)

The map part is the easy part of this migration: some names in the D6 database had changed names for D8. The secret of this static map is the bypass: false part, which states that the migration should ignore any entry that is not in the static map.

I am sure the plugin: user_update_8002 does something very important, but I didn't know what it was and the migration seemed okay without it, so I commented it out.

Fixing Text Formats

This is also acknowledged in the "Known Issues" page, but again the solution was not obvious. Some input filters (notably the PHP input filter) are no longer supported in Drupal 8, and others are missing. These are replaced by something called filter_null, which messes up the site.

Symptoms you are affected include:

During the migration you see the message Missing filter plugin: filter_null.
Content pages are blank when you view them, even though you believe they have been migrated. Editing such a node displays a message like Missing filter. All text is removed.
In the GUI, going to "Configuration" -> "Content Monitoring" -> "Text Formats and Editors", editing the affected content filter and immediately saving makes the problem go away. You may see a message like The filter_null filter is missing, and will be removed once this format is saved.

There is a pretty good description of the problem here: https://www.hywel.me/drupal/2016/02/11/a-website-upgrade-from-drupal-6-to-drupal-8-part-4.html

The issue is that some filter or setting in the text filter is missing. PHP filter is one culprit, but in my migration there was some other problem that affected a lot of my content.

My solution was to migrate filter formats early in the migration process. Drupal 8 provides some default text formats (in the standard installation profile?) and I mapped my old formats to those.

The process plugin was called MapKWLUGFormatFilter, and it lived in the src/Plugin/migrate/process folder of the kwlug_migrate custom module.

The heart of the function was very easy. Here is an excerpt from the transform() method:

public function transform($value, 
    MigrateExecutableInterface $migrate_executable, 
    Row $row, 
    $destination_property) {

  $filter_mapping = array(
    0 => 'restricted_html', // unknown but it exists
    1 => 'restricted_html', // filtered_html
    2 => 'full_html',       // php_code
    3 => 'full_html',       // full_html
    4 => 'restricted_html', // unknown. Some image format that has been lost.
    5 => 'plain_text',      // messaging plain text. Unused.
  );

  $retval = $filter_mapping[$value];

  if (!$retval) {
    $retval = 'restricted_html';
  }

  return $retval;
} // end transform

(One big difference between "Basic HTML" and "Restricted HTML" is the use of CKFilter, I think.)

I found the filter mappings that existed for my Drupal 6 site by looking in the filters and filter_formats tables in the database.

I used this process plugin in content types that had a node body. Basically, I would change:

body/format:
  plugin: migration
  migration: upgrade_d6_filter_format
  source: format

body/format:
  plugin: map_kwlug_format_filter
  source: format

I think I do not need the migration plugin in this stanza because the migration plugin looks up ID maps of migrated filters, and in this case I am setting the filters with a static map. (This explanation might be wrong.)

I also found that I wanted to customize these filter formats. The usual trick of making changes in the GUI, using drush config-export and copying the relevant filter.format.*.yml to premigrate_settings of kwlug_content_types.

Merging Presentations and Agendas

Wow this took a long time. The basic idea was to merge two content types: "Meeting Agendas", which mostly had a meeting date and location, and "Presentations" which listed topics for the meetings.

Most meeting agendas were associated with exactly one presentation, but some early meeting agendas were associated with two. A few presentations were not associated with any meeting agendas.

As mentioned above, a module called Node Relativity associated agendas with presentations, but it also associated agendas with a different content type called 'FLOSS Fund Nominees'. So I had to be careful about picking proper associations.

With this in mind, here was my strategy:

Run one migration for the Agenda content type. This migration would merge in all associated Presentation nodes for that agenda.
Run one migration for the Presentation content, that would only pick out "orphaned" presentations not associated with any Agenda.
Run one migration to redirect deleted presentation nodes to the relevant Agenda.

I also had to migrate auxiliary content such as attachments (which were typically attached to presentations) and images, but I will document these later.

Migrating Agendas

I did the bulk of the work in a source plugin for agendas, called AgendaNode (which extended Drupal\node\Plugin\migrate\source\d6\Node). Surprisingly, I did not need to override the query() method. Instead I did the bulk of the work in the prepareRow() method.

To find Presentation nodes associated with a particular agenda, I had to write a query that looked through the Node Relativity tables for matches:

$nid = $row->getSourceProperty("nid");

// Look for associated presentation topics in relativity table
$query = $this->select('node', 'p')
  ->fields('p', ['nid','title'])
  ->condition('p.type', 'presentation');
$query->join('relativity', 'r', 'r.nid = p.nid');
$query->condition('r.parent_nid', $nid, '=');

$query->join('node_revisions', 'nr', 'nr.nid = p.nid');
$query->addField('nr', 'body');

$presentation_info = $query->execute()
  ->fetchAll();

Now $presentation_info contained zero or more presentations. I looped through this array, grabbed each presentation's data, and populated variables for the YAML file. For example, here is an extract where I took the body texts of each presentation and appended them to the Agenda body (this is not identical to the actual code, but it is close). I also collected the NIDs:

if ($presentation_info) { 

  foreach ($presentation_info as $p) { 

    $body_so_far = $row->getSourceProperty('body');
    $pbody = $p['body'];

    if ($body_so_far) { 
      $body_so_far = $body_so_far . "\n\n* * *\n" . $pbody;
    } else { 
      $body_so_far = $pbody;
    } // end if body

    $row->setSourceProperty('body', $body_so_far);
    $row->setDestinationProperty('body', $body_so_far);

  } // end foreach

} // end if presentation_info exists

You can see my confusion about source and destination properties here:

$row->setSourceProperty('body', $body_so_far);
$row->setDestinationProperty('body', $body_so_far);

I now believe you should only be setting source properties in source plugins. Setting the destination did not harm anything, but it was not effective.

(You can also see my confusion in getting the existing body at the beginning of each loop iteration and setting it at the end of each iteration, instead of pulling that functionality out of the loop. Oops. I am not changing it now, though.)

This example is cheating because instead of collecting an array of associated presentation bodies, I am just concatenating them into one big body. There are other examples where I did have to collect arrays of data, but I will cover them below.

In addition to grabbing presentation info, I had to get data from custom fields that were already associated with the agenda (meeting MCs, meeting dates and locations):

// I do not know why this stuff doesn't migrate itself, 
// but whatever.
$query_agenda = $this->select('content_type_agenda','c')
  ->fields('c', ['field_emcee_uid', 'field_date_value', 
      'field_location_nid'])
  ->condition('c.nid', $nid, '=');
$agenda_info = $query_agenda->execute()
  ->fetchAll();

if ($agenda_info) { 

  // There SHOULD be only one row. I guess we are taking the last 
  // value if there are multiple. 
  // Lots of these will be NULL, though. 
  // Also we just want to append to the body if there is an emcee.
  foreach ($agenda_info as $a) { 
    $row->setSourceProperty('meeting_date', $a['field_date_value']);
    $row->setSourceProperty('emcee_uid', $a['field_emcee_uid']);
    $row->setSourceProperty('meeting_location_nid', $a['field_location_nid']);

   } // end foreach agenda_info
} // end if agenda_info

Migrating Orphaned Presentations

The key to this migration was to filter out all nodes associated with Agendas, since those presentations are migrated in the Agenda migration. Thus I created a source plugin (PresentationNode.php) that overrode the query() method:

public function query() {
  // Make a subquery of all the NIDs in the relativity table.
  // Return presentation nodes not in this set.
  $linked_presentations = $this->select('relativity', 'r')
->fields('r', array('nid'));

  $parent_q = parent::query();
  $parent_q->condition('n.type', 'presentation')
->condition('n.nid', $linked_presentations, 'NOT IN');

  return $parent_q;

} // end query

The rest of this migration was fairly standard. The hard part was in figuring out that I never want to touch presentation nodes that are associated with agendas.

Creating Redirects

In the new website I wanted linked presentation nodes to disappear, but I wanted the old URLs to be preserved. Thus I wanted redirects from merged presentation nodes to the agendas that digested them.

Drupal migrations have "ID maps" that map NIDs from the D6 site to entity IDs in the D8 one. I kept thinking that I could read these identity maps to create the redirect, but this was stupid. The right way to this was to go through the presentation nodes a second time, this time selecting those nodes that HAD been merged (MergedPresentationNodes.php). Then I needed to fill in the YAML file, but in my initial migration I could not find an appropriate YAML file.

After installing the redirect D8 module, I found a template: modules/contrib/redirect/migration_templates/d6_path_redirect.yml. This specified the fields I needed to fill in.

There were a few tricks in this YAML file, so I will reproduce big chunks of it here:

source:
  plugin: d6_merged_presentation_node
  node_type: presentation

Even though my target was to create a redirect, I could use a node content type as the source. I found this interesting.

constants:
  nodelist: node/
  internal: internal:/

In the D8 database I saw that source redirects took the form node/<entity-id> but that targets were of the form internal:/node/<entity-id>. These constants help create those strings.

process:
  # If you omit this will it auto-generate?
  # rid: rid

This part confused me a lot. This was not a migration from redirects to redirects, so populating the rid (redirect ID?) did not make sense. I tried using things like the presentation node ID, but that did not work well either. I found that omitting the rid entirely made it autogenerate, which is a neat trick that can be used elsewhere.

  redirect_source:
    plugin: concat
    source:
      - constants/nodelist
      - nid

  # This is broken broken broken for multiple presentations.
  # There is no easy way to fix this without an iterator, though.
  redirect_redirect:
    plugin: concat
    source:
      - constants/internal
      - constants/nodelist
      - agenda_nid

The redirect_source and redirect_redirect fields came straight out of the template. The concat plugin allowed me to build (simple) strings for the redirections.

destination:
  plugin: 'entity:redirect'

This was the magic that made a redirect and not a node.

In this case it did not matter whether the agenda nodes had been created be

Linking Nodes via Entity References

I wanted the Agenda content type to be the centre of the new website. Agendas needed to refer to locations, podcasts, video recordings, and other nodes associated with particular meetings. Pre-migration the set of links were a mishmash:

A content type called "FLOSS Fund Nominees" were linked to Agendas in the same way presentation nodes were: via the Node Relativity module.
The D6 site Agendas had a custom field for Location, which pointed to nodes of content-type Location. These nodes were selectable via a dropdown box. In D8, Location nodes were to be converted to Pages using the page_category field to distinguish them, as described in the "Classifying content types" section above.
There were separate, unlinked nodes for video recordings and audio podcasts of the meetings.

FLOSS Fund Nominees

There is not much that is new to say. Since Node Relativity already related nominees to agendas, I took the same approach that I did when merging presentations into agendas above. But instead of merging strings I pulled out nominee NIDs and used them to populate the YAML file:

field_floss_fund_nominee_link/target_id: floss_fund_nominee

(Truth be told, figuring out HOW to populate these entity references caused me a lot of grief. But that was my own fault.)

Locations

Since the NIDs of locations were already included with the Agenda as nodereferences, populating the entity reference links was not hard. More challenging was getting the Agenda form view to select only locations, as opposed to every possible node of type page. To do this I went into the GUI.

I navigated to the Agenda content type, found the location field, and changed the Reference method from "Default" to "Views: Filter by an entity reference view". In order to do this I had to make a view (duh). The view had the following properties:

Format: Entity reference list
Show: Entity Reference inline fields
Field: Content: Title
Filter Criteria:
- Content: Publishing status (= yes)
- Content: Page Category (= Meeting Location)

After doing this (and using configuration export to save the view and field settings in my config) the Agenda form restricted possible meeting locations to nodes of type "Meeting Location".

Podcasts and Vidcasts

Creating entity reference fields for podcasts and vidcasts was not that difficult: I just used the GUI to add them, and then used the magic of configuration export to retain those settings. Populating the fields was a different matter, because in the D6 database these fields were not formally linked in any way.

Fortunately, most podcasts and vidcasts followed a standard naming scheme: "YYYY-MM: ". In the Agendas I had a meeting date stored. So in the source plugin AgendaNode.php I correlated the two. Here is some of the code I used to correlate podcasts with meeting agendas:

$meeting_date_raw = $row->getSourceProperty('meeting_date');
// Looks like: 2016-03-07T00:00:00

$is_match = preg_match('/^\d\d\d\d-\d\d/', $meeting_date_raw,
  $substr_array);

// There is NO WAY that there should not be a match, because
// all (post-flexinode) agendas have a date.
// HOWEVER, some podcasts should not be associated with some
// agendas (laptop rescue missions). Unfortunately this ruins
// SFD podcasts, which need to be added manually.
if ($is_match && $row->getSourceProperty('presentation_nid')) {

  $meeting_YYmm = $substr_array[0] . ":%";

  // Look for podcasts
  $query = $this->select('node', 'n')
    ->fields('n', array('nid'));
  $query->condition('n.title', $meeting_YYmm, 'LIKE');
  $query->condition('n.type', 'podcast');
  $row->setSourceProperty('podcast_nid', $query->execute()->fetchAll());

} // end if is_match

As the comments indicate, my first attempt had unintended consequences: some agendas (Laptop Rescue Missions) had no associated presentations, but were getting populated with podcasts from presentations held in the same month. Thus I needed to filter out these agendas, which messed up a handful of other Agendas that DID have podcasts but did NOT have presentation nodes (Software Freedom Day celebrations). I opted to fix those manually afterwards.

Overall I am unreasonably happy with this hack, because it will save me hours of tediously associating podcasts and vidcasts with meeting nodes.

Note that this database query is gross and unsafe. I opted to trust the user input because I know who generated it, but if you are working with untrusted data you should not be dumb like me.

(I feel this code is fragile to the problem of multiple podcasts being associated with a single Agenda, but I do not think that happened in our D6 site. Sorry, future me.)

Fixing Dates and Timezones

Ugh. Dates and times. Ugh.

The Agenda content type came with a "Meeting Date" field. The Drupal 6 site regarded this field as being of type Date. As a plain date it had no hour or minute fields, and I am not sure it was timezone-aware.

Timezones mess everything up. The dates get migrated, but for some reason that I have forgotten the hours and minutes become significant, and the dates of the meetings sometimes switch. To fix this, I had to manually set timezones on date fields. In the source plugin AgendaNode.php, I wrote code to set the timezones properly. In the library imports of the module I had:

use \DateTime;
use \DateTimeZone;

and then in prepareRow() I put:

// This should not be hardcoded?
$LOCAL_TIMEZONE = 'America/Toronto';
$DEFAULT_MEETING_TIME = "19:00:00";
$EMPTY_MEETING_TIME = "00:00:00";

// Ugh. Times get stored at 00:00:00, then Drupal does 
// timezone magic to make the time incorrect. So munge the 
// dates. 
if ($a['field_date_value']) { 

  list($date, $time) = explode('T', $a['field_date_value']);
  // This should look like 2016-12-26T00:00:00

  if ((!$time) || ($time === $EMPTY_MEETING_TIME)) { 
    $target_time = $DEFAULT_MEETING_TIME;
  } else { 
    $target_time = $time;
  } // end set time

  $localdate = new DateTime( $date . "T" . $target_time,
      new DateTimeZone($LOCAL_TIMEZONE));

  $localdate->setTimeZone(new DateTimeZone('UTC'));

  $munged_date = $localdate->format('Y-m-d\TH:i:s');
  $row->setSourceProperty('meeting_date', $munged_date);

} // end if field_date_value exists

This code sets a meeting time in local time, and then converts the meeting time to UTC for storage in the database.

Unfortunately this sets all meeting times to 7:00pm, which is in fact incorrect for some of our meetings. I thought about being more clever, but in the end opted to fix the incorrect time fields manually.

I am still not certain why I could not use plain Date fields, which appear to migrate properly: https://www.drupal.org/node/2566779#comment-11783277.

Converting Flexinodes

As mentioned above, flexinodes were an early competitor to CCK. I guess they had been migrated from the Drupal 5 site to Drupal 6. Flexinode agendas were still displayed in the D6 site, but were not editable (I think because they were missing node_revision entries in the database).

The key to migrating flexinodes was in understanding the database structure. In the D6 database, the flexinode_type table provided a list of the "content types" created using flexinodes:

mysql> select ctype_id,name from flexinode_type;
+----------+--------------------+
| ctype_id | name               |
+----------+--------------------+
|        1 | Presentation topic |
|        2 | Meeting Agenda     |
+----------+--------------------+
2 rows in set (0.00 sec)

Associated with these types are fields, which are defined in the flexinode_field table:

mysql> select field_id,ctype_id,label,field_type from flexinode_field;
+----------+----------+------------------------+--------------+
| field_id | ctype_id | label                  | field_type   |
+----------+----------+------------------------+--------------+
|        2 |        1 | Abstract               | textarea     |
|        3 |        1 | Presentation Material  | textarea     |
|        4 |        1 | Reference material     | url          |
|        5 |        1 | Attachment             | file         |
|       11 |        2 | Pre-meeting Topic      | presentation |
|       12 |        2 | Location               | textarea     |
|       10 |        2 | Presentation Topic     | presentation |
|       13 |        2 | Meeting host / emcee   | usergroup    |
|       14 |        2 | Pre-meeting activities | textfield    |
|       15 |        2 | Introduction           | textarea     |
+----------+----------+------------------------+--------------+

Thus the source plugin for flexinode data had to pick out data by field_id and associate them with the proper fields in the D8 content types. Fortunately I was again merging "Presentation topic" and "Meeting Agenda" nodes.

Data for these fields was stored in the flexinode_data table, which has a definitively weird schema:

mysql> desc flexinode_data;
+-----------------+------------------+------+-----+---------+-------+
| Field           | Type             | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+-------+
| nid             | int(10) unsigned | NO   | PRI | 0       |       |
| field_id        | int(10) unsigned | NO   | PRI | 0       |       |
| textual_data    | mediumtext       | NO   |     | NULL    |       |
| numeric_data    | int(10) unsigned | NO   |     | 0       |       |
| serialized_data | mediumtext       | NO   |     | NULL    |       |
+-----------------+------------------+------+-----+---------+-------+
5 rows in set (0.00 sec)

Different fields put data in different places. For example, here is field ID 10, which is like Node Relativity for Flexinodes:

mysql> select * from flexinode_data where field_id=10;
+-----+----------+------------------------------------+--------------+-----------------+
| nid | field_id | textual_data                       | numeric_data | serialized_data |
+-----+----------+------------------------------------+--------------+-----------------+
|  58 |       10 | a:1:{i:0;s:2:"50";}                |            0 |                 |
| 432 |       10 | a:1:{i:0;s:3:"375";}               |            0 |                 |
| 450 |       10 | a:1:{i:0;s:3:"452";}               |            0 |                 |
|  62 |       10 | a:1:{i:0;s:2:"61";}                |            0 |                 |
|  65 |       10 | a:2:{i:0;s:2:"63";i:1;s:2:"64";}   |            0 |                 |
| 311 |       10 | a:1:{i:0;s:3:"321";}               |            0 |                 |
| 318 |       10 | a:2:{i:0;s:3:"313";i:1;s:3:"356";} |            0 |                 |
| 320 |       10 | a:1:{i:0;s:3:"319";}               |            0 |                 |
| 324 |       10 | a:1:{i:0;s:3:"323";}               |            0 |                 |
| 374 |       10 | N;                                 |            0 |                 |
| 376 |       10 | a:1:{i:0;s:3:"391";}               |            0 |                 |
| 383 |       10 | N;                                 |            0 |                 |
| 386 |       10 | a:2:{i:0;s:3:"384";i:1;s:3:"385";} |            0 |                 |
| 395 |       10 | a:1:{i:0;s:3:"394";}               |            0 |                 |
| 397 |       10 | a:1:{i:0;s:3:"396";}               |            0 |                 |
| 400 |       10 | N;                                 |            0 |                 |
| 431 |       10 | a:1:{i:0;s:3:"434";}               |            0 |                 |
+-----+----------+------------------------------------+--------------+-----------------+
17 rows in set (0.00 sec)

Note that the serialized data is in the textual_data column. Oy.

Therefore putting the pieces together required the following:

Finding each meeting agenda
Using the relativity to find associated presentations (and decoding these using the unserialize() Drupal function)
Picking out the titles, bodies, etc and merging them into a single node

I did this work in the prepareRow() method. This sounds like a bad idea since I want to iterate over agenda nodes. Fortunately the node table has a field called node_type, and you can select that node type in the YAML file:

source:
  plugin: d6_flexinode_agenda_node
  node_type: flexinode-2

and then do the usual trick for turning these nodes into Agendas:

destination:
  plugin: 'entity:node'
  default_bundle: agenda

Once I figured out that I had to query all fields associated with a NID and then classify on field_id, the actual code of the source plugin FlexinodeAgendaNode.php is fairly straightforward (if tedious). Check the source if it would give you joy.

Astute readers might wonder whether there were any Flexinode presentations not associated with Flexinode agendas. There were a couple, but I decided against writing a second source plugin for two nodes. Even I have limits. Instead I opted to migrate the content from these orphaned presentations post-migration. I also opted not to deal with Flexinode attachments.

Migrating Attachments

Man, I don't even know how this works.

Here is what I do know: The Drupal 6 database has two relevant tables: files and upload. I guess files stores filename information (filename, path, etc) for all the files Drupal knows about, and upload records information about files uploaded (ie attached) to specific nodes. The upload table relates files to nodes by the vid column of the node table, NOT the nid.

Thus migration of file attachments proceeds in several phases:

The files table has to be migrated. The destination table in Drupal 8 is file_managed. The migration is upgrade_d6_file.
The upload table has to be migrated, to the table node__field_file_attachments. As the table name suggests, attachments are now fields of existing entities, so those entities need to migrate the attachments along with the other fields.

This workflow really confused me, because there is a d6_upload plugin and an associated YAML file called migrate_plus.migration.upgrade_d6_upload.yml . I found that I did not want to use this. Explaining why is tricky, so bear with me. Say that we enabled this migration, and say that I am concerned about blog posts.

Because I am simplifying content types, I have a migration that transforms blog posts into pages.
Every upload is associated with a node. If that node is a blog post, and the upload runs first, then the migration will CREATE a blog node, fill in the upload field, and stub out the rest of the migration.
Later, my custom blog migration will run. It will discover that there is already an entity that was created with the name NID (entity ID). So it does not run my migration, and does not create a Page.
As a result blog posts without attachments become Page nodes, and blog posts with attachments become Blog nodes.

I do not know whether the file migration works if you are not consolidating content types or not.

My solution to this ended up copying code from the migration template at core/modules/file/src/Plugin/migrate/source/d6/Upload.php. One example can be found in UploadNode.php:

use Drupal\node\Plugin\migrate\source\d6\Node as D6Node;

/**
 * @MigrateSource(
 *   id = "d6_upload_node"
 * )
 * Find uploaded files.
 */
class UploadNode extends D6Node {

  /**
   * {@inheritdoc}
   */
  public function prepareRow(Row $row) {

    $nid = $row->getSourceProperty('nid');


    // This is copied from Upload.php
    $query = $this->select('upload', 'u')
      ->distinct()
      ->fields('u', array('fid', 'description', 'list'))
      ->condition('u.nid', $nid, '=');
    $row->setSourceProperty('upload', $query->execute()->fetchAll());


    // print_r($row);

    return parent::prepareRow($row);

  } // end prepareRow

  /**
   * {@inheritdoc}
   */
  public function fields() {
    /* Add an upload field. 
     */

    $orig_fields = parent::fields();
    $new_fields = array(
      'upload' => $this->t('Uploaded Files'),
    );

    $fields = array_merge($orig_fields, $new_fields);

    return $fields;

  } // end fields
} // end class.

The prepareRow() method queries the database for uploads related to this node. The fields() adds a field called upload which can be used in the YAML mapping. Note that these methods explicitly reference methods and variables from their parent class (namely, Node).

Some content types need source plugins. For these content types, you can extend UploadNode directly, which will add the upload field:

class BlogNode extends UploadNode {
    // stuff goes here
} // end class

Some content types do not need me to write separate source plugins. For these content types I modified the YAML files directly. For example, in migrate_plus.migration.upgrade_d6_node_location.yml I changed the source plugin from:

source:
  plugin: d6_node
  node_type: location

source:
  plugin: d6_upload_node
  node_type: location

For these content types I could then add a stanza to the YAML file to set the file attachments:

field_file_attachments:
  plugin: iterator
  source: upload
  process:
    target_id:
      plugin: migration
      migration: upgrade_d6_file
      source: fid
    display:
      plugin: default_value
      default_value: 1
    description: description

The iterator was there because there can be many attachments. I hardcoded the default_value for the display field to 1 so that all attachments would be visible. (This may have been a mistake. It would have been possible to propagate this setting as well.)

In one case (namely Agenda nodes) I needed to add code to the prepareRow() method directly. As usual, merging presentation nodes and adding their attachments to the associated Agenda caused issues, but the concept was the same.

File migration location

One way to specify the location of the Drupal 6 files is to pass a legacy-root parameter to the drush migrate-upgrade command. But if you forget this, it looks like you can set this manually in the migrate_plus.migration.upgrade_d6_file.yml YAML file. In my installation the key constant is source_base_path:

source:
  plugin: d6_file
  constants:
source_base_path: /home/linuxuser/drupal/files
process:
  fid: fid
  filename: filename
  source_full_path:
-
  plugin: concat
  delimiter: /
  source:
    - constants/source_base_path
    - filepath
-
  plugin: urlencode

I believe that I have changed this constant and successfully pointed the file source correctly.

Image locations

I found that several image migrations were not working. The migrations were failing because the files were not found. I found that there were a bunch of subfolders in the D6 sites/default/files folder, and that these subfolders were not being searched:

teaser/files
thumbnail/files/images
imagefield_thumbs/images
pictures

My solution was to flatten the hierarchy. To do this I used a program called Meld, because some of the images in the subfolder had identical names to other files.

I do not know what these subfolders are for or why they were created, although I can guess.

Migrating Comments

Comment migration is similar to file migration in that you first have to migrate nodes which have comments, and then migrate the comments for those nodes later.

One quirk is that D8 wants to split comments into two subtypes: comment and comment_no_subject. I foolishly rebelled against this and decided to turn all comments into comment_no_subject types. That made life much more difficult. You probably want to conform to whatever Drupal decides to do.

I had a lot of problems actually getting comments to display, even after the comment nodes were migrated. Here is what I learned:

First: similarly to files, you need to add a field to each content type that will host comments. This field is called comment_no_subject and is specified in YML files with names like field.field.node.page.comment_no_subject.yml (These live in config/install of kwlug_content_types.)

Inside each of these field definition, there is a status default variable:

default_value:
  -
    status: 1
    cid: 0
    last_comment_timestamp: 0
    last_comment_name: null
    last_comment_uid: 0
    comment_count: 0

It is very important that the status be set to 1 if you want comments to display. 0 means comments are hidden. 2 probably means comments are read/write (which might be good for your site, but not for mine -- comments were read only, and kept for historical purposes).

In addition to setting the default value for this field, I explicitly set the field in my node migration template. For example, in migrate_plus.migration.upgrade_d6_node_page.yml there is a stanza that reads:

process:

  # other stuff skipped..

  comment_no_subject/status:
    plugin: default_value
    default_value: 1

However, I am pretty sure that the default_value in the field definition field.field.node.page.comment_no_subject.yml takes precedence.

There were some database tables in the D8 database that were useful in figuring out these settings:

node__comment_no_subject which shows statuses for each node that has a comment with no subject.
comment_field_data which shows which comments have been linked to which node (entity_id)
comment_entity_statistics tallies the number of comments associated with each node. It specifies the comment trees, and splits by type comment vs comment_no_subject .

In the configuration migration there are a bunch of different YML files you could incorporate. I incorporated the following:

migrate_plus.migration.upgrade_d6_comment_field.yml
migrate_plus.migration.upgrade_d6_comment_type.yml
migrate_plus.migration.upgrade_d6_comment.yml

To re-merge comment and comment_no_subject I needed to modify a bunch of YAML files. In migrate_plus.migration.upgrade_d6_comment_type.yml I needed to map the id to a default value:

process:
  # id: comment_type
  # Make all comment types the same
  id:
    plugin: default_value
    default_value: comment_no_subject

I then had to use this migrated value in TWO places in migrate_plus.migration.upgrade_d6_comment.yml :

process: 
  # stuff omitted 

  field_name:
    plugin: migration
    migration: upgrade_d6_comment_type
    source: comment_type
  comment_type:
    plugin: migration
    migration: upgrade_d6_comment_type
    source: comment_type

I had missed field_name for a long time and eventually discovered that it was causing comment_entity_statistics to break.

Finally, there was the issue of merged presentations and agendas. Picking the right nodes to migrate comments from presentation nodes to agenda ones required me to specify a bunch of mappings manually:

process: 
  # stuff skipped

  # If this is a merged presentation node then
  # use the agenda, not the presentation node
  entity_id:
    plugin: migration
    migration:
      - kwlug_migrate_dummy_merged_presentations
      - upgrade_d6_node_agenda
      - upgrade_d6_node_blog
      - upgrade_d6_node_book
      - upgrade_d6_node_location
      - upgrade_d6_node_nominee
      - upgrade_d6_node_page
      - upgrade_d6_node_podcast
      - upgrade_d6_node_presentation
      - kwlug_migrate_forum
      - kwlug_migrate_library
    source: nid
    no_stub: true

The key was the kwlug_migrate_dummy_merged_presentations, which mapped presentation nodes to the agenda NIDs that absorbed them. Unfortunately I ended up having to specify all the content types with comments to migrate as well, which was messy and irritating.

RDF Module Breaks Rollbacks

As of this writing, trying to roll back comment migrations failed when I had the rdf module installed:

PHP Fatal error:  Call to a member function url() on null in
/home/linuxuser/kwlug-drupal-v05/web/core/modules/rdf/rdf.module on
line 252

It looks like the RDF module has caused problems in the past: https://www.drupal.org/node/2340401 .

Instead of being a good citizen and fixing the problem I just uninstalled the RDF module.

Filtering Taxonomies

There were a number of taxonomy vocabularies defined in the old site, and I wanted to migrate exactly one. I could not find a clean way to do this, so I resorted to a dirty hack.

First, I found the name and vocabulary ID of the vocabulary I wanted to keep. Then I wrote a process plugin with the following transform() method:

public function transform($value, MigrateExecutableInterface $migrate_executable, Row $row, $destination_property) {

  $allowed_taxonomies = array(
    'blogtags' => 9,
  );

  if (in_array($value, $allowed_taxonomies)) {
    return $value;
  } // end if

  return FALSE;
} // end transform

Then I did something sneaky. In the YAML file migrate_plus.migration.upgrade_d6_taxonomy_term.yml I added the following stanza to the process section:

process:

  # Only allow terms from taxonomies we care aboot
  dummy_test:
    - plugin: select_taxonomy
      source: vid
    -
      plugin: skip_on_empty
      method: row

This dummy_test works as follows: it calls my custom select_taxonomy process plugin and passes the vocabulary ID as a parameter. This is the vocabulary associated with this term. If the term has a vocabulary ID that is in the whitelist, the migration continues and the term is migrated. Otherwise the select_taxonomy plugin returns FALSE (ie nothing) and the skip_on_empty prevents the migration of this term.

This hack was not my preferred approach. Ordinarily I would have overridden the query() method in a source plugin. There was a reason I avoided this, but I do not remember what it was. Maybe it was because hardcoding VID values is ugly, and it would have been easier to see the hardcoding in a process plugin.

My actual preferred approach would have been to filter out unwanted taxonomy terms right in the YAML file with no plugins, but there was a reason that failed too.

I played the same sneaky trick in the migrate_plus.migration.upgrade_d6_taxonomy_vocabulary.yml to migrate only vocabulary names I had whitelisted.

Setting Redirects Using a CSV Source

RSS feeds and views had some URLs that would be different in the D8 site than the D6 one. So that old feeds would not break, I wanted to create redirects from the old feed locations to the new ones.

One option would have been to create these redirects manually after the site had been migrated. But I am forgetful, so I decided to automate this with a migration, using a CSV file as the source.

The key to doing this was to install and enable the migrate_source_csv module. This defines a CSV source plugin, which I used in my migration kwlug_migrate_rss_redirect:

source:
  plugin: csv
  path: /home/linuxuser/drupal/rss_redirect.csv
  header_row_count: 1
  keys:
    - sourcepath
  column_names:
    0:
      sourcepath: Source
    1:
      destpath: Destination
    2:
      statuscode: Status Code

The code is fairly self-explanatory. The sourcepath, destpath, and statuscode entries are used in the migration, but Source, Destination and Status Code are not (as far as I can tell).

I did not want to specify the status code 301 if I did not need to, so I added a default_value plugin to my process section:

process:
  status_code:
    plugin: default_value
    source: statuscode
    default_value: 301

Where to Do What

I struggled a lot with figuring out how the Drupal migration process wanted me to think. A few blog posts helped get me started:

https://cheppers.com/blog/migrate-d8-pt2 : an overview of which components to modify. This was the first blog post I found really helpful.
http://webikon.com/cases/migrating-to-drupal-8 : I found this more abstract, but it is more thorough than the above link.
https://www.slideshare.net/isholgueras/migrating-data-to-drupal-8 : This is dated, but was useful in understanding where different files go.
https://www.drupaleasy.com/blogs/ultimike/2016/04/drupal-6-drupal-81x-custom-content-migration : This is a simplified example, but it includes some of the modifications one might make to a YAML file.

Despite these resources, it took me a long time to
understand the overall structure migrations want. This section documents some of these lessons.

The high-level view of migration components is as follows:

Source plugins read from the D6 database (or a CSV file, or a different database, or...) and populate variables ("fields") that can be specified in YAML files. There will typically be one source plugin per entity type.

In addition to the standard source plugins I wrote a bunch of custom ones for content types with extra fields or weird structures.
YAML files map fields from the source plugins to fields in the destination node/entity.
Process plugins manipulate the output of source plugin fields into formats suitable for the destination fields.
Destination plugins actually create the node or entity in question, populated as directed by the YAML files.

For low level examples, read on.

Migration Patterns

Drupal migrate really wants to transform exactly one node/entity from the D6 database into exactly one entity in the D8 database. If you want to merge two D6 nodes into one D8 entity (for example, when I merged Meeting Agenda and Presentation content types) then you have to write custom code in a source module to pull associated nodes from the database.

Similarly, if you want to migrate one node from the D6 database into two distinct entities in D8 (for example, in generating a redirect entity from each merged Presentation in addition to migrating the node itself) then you may need TWO migrations (effectively reading the D6 database twice). Trying to output two nodes from one YAML migration file does not seem to work. Trying to reuse "migration map" database entries tends not to work unless you can specify a YAML file that uses the migration process plugin cleanly.

If you want to associate nodes with each other (eg associating file attachments to their nodes) then you should migrate the linkee first, and then the linker (ie files should be migrated first, and then nodes that have those files as attachments migrated later). To do this you have to declare a dependency in the migration_dependencies section of the linker. (I think Drupal can handle circular dependencies by rerunning migrations again and again.)

Your ability to incorporate additional information into migrated nodes is pretty limited (eg adding a new field to a content type and populating it). If you want to do it, you have the following options:

If there is some indication in the source node about what information needs to be added, AND the number of destination values is limited, then a static_map in your YAML migration file might do the trick.
You can define a new data source (from a CSV file or something) and use a migration to make entities of that, and then link those entities to your target nodes. I have never done this, but based on the file attachment example I think this should work.
You can define your data source, incorporate it into the D6 database, and then pull that data with prepareRow().
Maybe you can use the prepareRow() method in the node's source plugin to pull the new generated data, and offer that data to the destination via the YAML migration file.

Source Plugins

Source plugins have two interesting methods: query() and prepareRow(). (There are also two less-interesting ones: getIDs() and fields(), which are pretty straightforward.) Both query() and prepareRow() pull from the D6 database, but there are some conceptual differences between them:

query() is used to pull the SET of database rows that will be migrated to D8 entities. It returns a query that represents the set of rows to transform. You can do database joins here if you want, but the key point is that each row that is returned should correspond to one entity in the migration.

If you need to filter out nodes/entities then do it here. For example, when migrating users I wanted to filter out all users that had never created content, so I wrote an appropriate query here.
prepareRow() gets a SINGLE row from the database. It can then manipulate and massage this data to make it suitable for the D8 target. It does this by populating fields defined in the fields() method.

There is a mechanism for prepareRow() to reject rows (and thus refrain from migrating that row into a D8 entity) but you don't want to do this, because it messes up your migration status (rows that you did not intend to migrate show up as incomplete migrations). Instead, be more specific in the query() method.

When doing weird migrations I wrote a lot of code in prepareRow() methods.

To assign fields in the prepareRow() method, use the $row->setSourceProperty(). I was confused because there is also a $row->setDestinationProperty() but I think this is not relevant in prepareRow(). You want to set the source properties in source plugins.

Drupal 7 apparently had more methods to override. For example, https://www.drupal.org/node/1132582 documents prepare() and complete() methods, but these no longer exist in Drupal 8.

Drupal has some elaborate query builder syntax. Fortunately the syntax appears to be similar to Drupal 7, so there are cheatsheets available: http://www.eilyin.name/note/database-queries-drupal-8-7 helped get me started with Drupal 8 syntax, and the "Drupal 7 database Cheat Sheet" from https://wizzlern.nl/drupal/cheat-sheets got me most of the rest of the way.

YAML Mapping Files

The YAML Mapping files from source to destination entities is supposed to be the easy part, but I found that it was difficult to set mappings unless the source plugin output exactly the information I needed.

Sources and destinations

One thing I struggled with in YAML mapping files is where the different components came from.

Consider the following fragment of a mapping file:

process:
  field_presentation_title: presentation_title
  field_floss_fund_nominee_link/target_id: floss_fund_nominee

This means that field_presentation_title and field_floss_fund_nominee_link are fields in the destination content type. The target_id is confusing, and I do not remember how exactly I found it (maybe here: http://drupal.stackexchange.com/questions/223715/migrate-multi-value-paragraph-field), but I do see that there is a clue about the name in the database schema:

mysql> desc node__field_floss_fund_nominee_link;
+-----------------------------------------+------------------+------+-----+---------+-------+
| Field                                   | Type             | Null | Key | Default | Extra |
+-----------------------------------------+------------------+------+-----+---------+-------+
| bundle                                  | varchar(128)     | NO   | MUL |         |       |
| deleted                                 | tinyint(4)       | NO   | PRI | 0       |       |
| entity_id                               | int(10) unsigned | NO   | PRI | NULL    |       |
| revision_id                             | int(10) unsigned | NO   | MUL | NULL    |       |
| langcode                                | varchar(32)      | NO   | PRI |         |       |
| delta                                   | int(10) unsigned | NO   | PRI | NULL    |       |
| field_floss_fund_nominee_link_target_id | int(10) unsigned | NO   | MUL | NULL    |       |
+-----------------------------------------+------------------+------+-----+---------+-------+
7 rows in set (0.00 sec)

The presentation_title and floss_fund_nominee are field names from the source plugin (ie defined by the fields() method in the source plugin.) If you are adding extra information (for example, more fields) to a content type then you must define these names.

Updating imported YAML files

Another quirk about YAML files is that changing them (as you do repeatedly when troubleshooting them) is a pain. If you change a YAML file and resume a migration (perhaps with drush migrate-rollback followed by drush migrate-import) then the migration will continue to use the version of the YAML file it imported when you installed the associated module (in my case, kwlug_migrate). You somehow need to get rid of this configuration object and replace it with your updated version in order to test your changes.

If you try to naively uninstall and enable the module you will get stuck because the configuration objects are already registered with Drupal: exception 'Drupal\Core\Config\PreExistingConfigException' with message 'Configuration objects (migrate_plus.migration.kwlug_migrate_agenda_redirect.yml) provided by kwlug_migrate already exist in active configuration'

To get around this problem I just reinstalled Drupal again and again, but in writing this entry I found a better way: use config-import to reread the .yml files. Say my configurations are in a folder called test. Then you might do something like this:

#!/bin/bash

element=$1
srcdir=/path/to/drupal/source
testdir=$srcdir/test

pushd .
cd $srcdir

time drush migrate-reset-status $element  --yes 
time drush migrate-rollback $element  --yes 
time drush config-import --partial --source=$testdir --yes
time drush migrate-import --execute-dependencies $element --yes --notify 

popd

The argument $1 should be the name of a migration (eg upgrade_d6_node_location), and the corresponding YAML file should be in the $srcdir/test folder.

migrate-reset-status stops the migration if it got stuck on the last run.
migrate-rollback undoes the migration for this element so far.
config-import imports your changed YAML file
migrate-import reruns the migration

Once you are happy with the YAML file you can then move it back to the config/install/ folder of kwlug_migrate.

If this does not work for you and you are not a dumb-dumb who reruns the entire migration every time, there are some other possible approaches documented here: http://drupal.stackexchange.com/questions/164612/how-do-i-remove-a-configuration-object-from-the-active-configuration.

Assigning constant strings

Sometimes you want to assign a constant to a field in YAML file:

process:
  title: 'Every page should have the same title'

This does not work. Every field on the right-hand side of a process statement seems to look for a variable on the left hand side, even if that variable is in quotes.

The solution is to define a constant in the source section of the YAML file, and assign that instead:

source:
  constants:
    static_title: 'Every page should have the same title'

process:
  title: constants/static_title

Process Plugins

If you are writing source plugins then process plugins tend to be simple or unnecessary, because you can probably massage data in the source plugin's query() or prepareRow() methods. However, I found process plugins useful for the following things:

Printing debug information, as documented in the "Process Plugins" section below.
Picking out specific fields to migrate. For example, I decided to migrate taxonomy terms from selected taxonomies, and found it easiest to write a process plugin for this.
Extracting values out of complicated data structures, because YML files break my brain.
Mapping values to a static map (which probably could have been done in the YAML mapping directly.

Process plugins can be chained together. This is good for specifying default values, or for inserting a debug plugin. The official documentation is pretty good here, but it took me a long time to find: https://www.drupal.org/docs/8/api/migrate-api/migrate-process/migrate-process-overview

Destination Plugins

Destination plugins are black magic. I know nothing about them except that you can specify the destination entity type in the YAML migration file, in the destination section:

destination: 
  plugin: 'entity:redirect'

will specify the target is a redirect, even if the source is a node.

Migration IDs and Maps

Doing a Drupal migration creates a bunch of tables with names prefixed with migrate_map_ and migrate_message_. I am not clear what migrate_message_ is for (although I can guess).
The migrate_map_ tables store mappings of NIDs (or entity IDs) on the old site to the new one.

Why do you need a map? The primary reason is to use the migrate process plugin. For example, in migrate_plus.migration.upgrade_d6_comment.yml I have:

comment_type: plugin: migration migration: upgrade_d6_comment_type source: comment_type

When migrating comments I squished comment and comment_no_subject together, so I used this migration to indicate that the comment_type value from the source (source: comment_type) should be transformed in the same way for this comment.

A secondary reason for maps are incremental and live updates. If users continue to update the D6 site while you are migrating the site to D8, you do not want NIDs to clash. You might also want to refer to migration maps from one entity type when modifying another (see https://www.drupal.org/docs/8/api/migrate-api/migrate-process/process-plugin-migration for examples of this).

I have a feeling that in a real site migration with lots of users and lots of nodes I would have had to be more careful around using migration maps correctly, but I was sloppy with them when writing my own YAML files.

Troubleshooting

Keep drush sqlc running on both your D6 and D8 database. I found I was digging through database structure all the time to figure out field names and how tables related to each other. show tables like '%field%' and desc tablename were good friends.

Sometimes migration failures are logged in the Drupal logs. Use drush wd-show to see a brief summary, and the web interface to see a lot more detail.

I found that running script when running migrations was super useful, because the migrations could get too verbose for my terminal's scrollback buffer. Unfortunately the colour output drush produces make the script output gross, but the Internet has a solution here: http://unix.stackexchange.com/questions/14684/removing-control-chars-including-console-codes-colours-from-script-output . I have this code into a clean-typescript.pl helper script.

I used print_r a lot in my source and process plugins to figure out what data structures I was trying to query/populate. In complicated source plugins I had code like this at the end of my prepareRow() method:

if ($nid >= $this->DEBUG_NID_START  && $nid <= $this->DEBUG_NID_END ) {

  print_r("\n row is\n");
  print_r($row);

} // end if debug

By setting DEBUG_NID_START and DEBUG_NID_END to appropriate windows, I could see what was going on for a few target nodes without getting overwhelmed.

Similarly, I created a debug_contents process plugin to see the contents of fields I was trying to map. Here is example usage from a YAML file. Say I was having trouble understanding what presentation_title was. I could then change:

process:
  field_presentation_title: presentation_title

process:
  field_presentation_title: 
    - 
      plugin: debug_contents
      source: presentation_title

    - 
      plugin: get

This would print out the data structure to the console before running the migration.

Failures and Improvements

The first failure was that this migration took so long (maybe 2.5 months of sporadic work). Many of the techniques I documented in this post took days of experimentation and reading to figure out.

When I started the migration I did not know about Drupal Console or Composer. I am still not sure why Drupal Console is important, but apparently Composer is quickly becoming the standard for Drupal packaging.

I deliberately did not worry about incremental migrations (resyncing the database by migrating only new content) or rolling back migrations. These are important considerations for larger sites.

I did not manage to get a bunch of header images migrated properly. I believe most of the data exists, but I am not clear how to associate it with nodes properly.

I wish I had been able to find better sources of help than I did. I was too scared to post threads on drupal.org directly, which was a mistake. Instead I relied on DuckDuckGo searches, and when I got really desperate I attempted to ask questions on the #drupal-migrate IRC channel and on https://drupal.stackexchange.com . Neither of these support channels worked well. The IRC channel was basically dead, and few people seem active on the Stack Exchange group. (Then again, it isn't as if I am answering other people's questions on those forums, so...)

If I had been more conscientious I would have started with a minimal (or custom) install profile rather the standard one. The standard one created some content types and menu items I did not like.