Drupal 8 Migration Survival Strategies
We received word that KWLUG needed to move hosting providers this year. Like an idiot, I took this opportunity to migrate the KWLUG website from Drupal 6 to Drupal 8. This is a giant dump of what I learned. It is so long and so boring I can barely proofread it.
- Drupal 8 Migration Survival Strategies
- Migration Overview
- About the KWLUG website
- Getting Started
- Setting System UUID
- Reducing Content Types
- Enabling Display Fields in Migrated Content
- Textfields and Textareas
- Deleting Spam Accounts
- Simplifying User Roles
- Fixing Text Formats
- Merging Presentations and Agendas
- Linking Nodes via Entity References
- Fixing Dates and Timezones
- Converting Flexinodes
- Migrating Attachments
- Migrating Comments
- Filtering Taxonomies
- Setting Redirects Using a CSV Source
- Where to Do What
- Failures and Improvements
- Sidebar!
I budgeted 2-3 weeks for the data migration; it took almost two months, and is barely "good enough" to get by. As with everything else in Drupal, the learning curve was steep, and I spent hours and hours struggling to understand how the Drupal migration system wanted me to approach problems. Things that seemed simple on the surface took days of frustration and effort to get working.
The lack of documentation around this process was particularly difficult. I found myself reading the same dozen blog posts again and again, trying to figure out how to generalize their examples to my situation. My hope is that writing out these conceptual difficulties will save you time in figuring out your issues. All of the code I wrote for this will be on my Github account: https://github.com/pnijjar/kwlug-drupal8-migration. I can also produce a tarball on request.
On the plus side, the migration team has put a lot of work into doing Drupal 6 to Drupal 8 upgrades, and this effort provided good scaffolding upon which I built my migration. In addition, configuration management saved my tofu again and again. It is probably the best thing about Drupal 8. With it, I can use the GUI to configure the site and then preserve that configuration for future migrations. This made migrations far more repeatable, and thus easier to develop.
This blog post will focus on migration, as opposed to site building or setting up a development environment. I cover those topics in a companion post.
I refer to Drupal 8 as "D8" and Drupal 6 as "D6" a lot.
Believe it or not, this gigantic blog post does not document every single migration I did on the site. I tried to include only things that were interesting, and/or which other people might benefit from seeing.
UPDATE: I delivered a talk for the Waterloo Region Drupal User Group about D8 migrations. Here are the slides and here are the slide sources .
Migration Overview
Here is an outline of my migration journey:
- I grabbed a copy of the D6 website and set it up on a local machine
(including the database and the
files
directory, which are the most important parts). - I looked through the D6 site and mapped out what I wanted to
migrate. A spreadsheet helped here. (
local-docs/content-types.ods
in the sources.) - I downloaded D8 and did a dummy install.
- I used
drush site-install
to automatically install a dummy site. - I registered the D6 database credentials on the D8 site.
- I used
drush migrate-upgrade
to generate a bunch of migration.yml
files (YAML migration files), which I exported usingdrush config-export
- I created a custom module for migrations and dumped the exported migrations there.
- I set up scripts to reinstall Drupal and run the migrations automatically.
- I created a new migration group so I could run only the migrations I had checked and fixed. This also allowed me to avoid migrating things I did not care about (such as blocks).
- One by one, I enabled the migrations I wanted for the new site by adding them to my custom migration group, then developing and troubleshooting the migration until that migration worked. This often involved writing source and process plugins, and modifying the given YAML migration files.
- When I had most of the migrations working, I did the rest of the site development (views, themes, etc), which I won't document here.
- When I was ready to deploy the site, I pulled a fresh copy of the D6 database and reran the migration on this fresh copy.
Throughout the process I kept track of the following things (in a computer file. Not in my head!):
- What got migrated and what was too much trouble to migrate.
- Steps I would have to do before the migration.
- Steps I would have to do to fix up the site post-migration.
Keeping track of these things was enormously helpful, because it served as a checklist of things to remember when deploying the site.
Local Modules
I created several custom modules for this migration. I could probably have consolidated them if I was wiser, but oh wells.
kwlug_migrate is the main migration module.
- Inside its
install/config
folder I put the YAML migration files for my migrations. - Inside the
src/Plugin/migrate/source
folder I put custom source plugin PHP files. - Inside the
src/Plugin/migrate/process
folder I put custom process plugin PHP files.
To run a migration I first enabled this module, and then ran
drush migrate-import --verbose --execute-dependencies --group=kwlug_migrate --yes
kwlug_content_types contains configuration information for the site that was not directly related to migration. This included the following:
a
premigrate_settings
folder with configuration that needed to be run before the main migration, because otherwise the migration would break. This included:system.date.yml
to set the timezoneuser.settings.yml
to disable account creation by usersfilter.format.full_html.yml
andfilter.format.restricted_html.yml
which defined text filters.
To import these before the migration I ran
drush config-import --partial \ --source=modules/custom/kwlug_content_types/premigrate_settings \ --yes
A bunch of configurations in the
config/install
folder. This mainly consisted of custom field definitions (egfield.storage.node.field_page_category.yml
) that did not have many dependencies, but were needed for the migration to work well.A
postmigrate_settings
folder which was run after the migration.Including these configurations in the
config/install
folder was a bad idea because it would have meant adding a lot of spurious dependencies to that folder, making the site more fragile overall.In practice this ended up being the form and displays for my content types, which were needed so that I could see migrated content after the migration ws finished.
I activated these similarly to the premigration settings.
kwlug_dependencies is a stupid little module consisting of a
single file (kwlug_dependencies.info.yml
). Its purpose is to list
dependencies for the project that were to be enabled. As such it was a
poor-man's install profile. (Unlike a real install profile, you cannot
install themes this way.)
What's missing
Despite the Migrate team's work, there are a bunch of things that do not upgrade cleanly. The best place to start is here: https://www.drupal.org/docs/8/upgrade/known-issues-when-upgrading-from-drupal-6-or-7-to-drupal-8
Here are some of the things that burned me:
- "Fields missing on the edit form" was irritating. The fix was to add
configuration to the
postmigrate_settings
folder that would make these fields display. - "Text/Input formats": the
filter_null
filter messes everything up. Any node assigned a filter offilter_null
will not display. I fixed this by migrating my input filters explicitly to excludefilter_null
. - "Views": not migrated but I did not care that much. I wanted to redo views anyways.
- "Aggregator Categories": this hurt me for a feature I wanted to develop later, but did not hurt me during the migration.
- "Image and file attachments" : This did not work well for me. I found myself writing my own migrations to deal with file attachments.
Other Notes
Drupal 8 is an I/O hog. Use an SSD on your development machine if you possibly can. I do not know why Drupal 8 in general and migrations in particular hit the database so hard, but they do. (I am not the only one who finds Drupal 8 slow: https://deekayen.net/drupal8-xdebug-installer-timeout .)
I disabled Drupal cron on my development sites because running cron slowed everything to a crawl for over half an hour. (See "Drupal 8 is an I/O hog" above.)
I wrote a bunch of local scripts to make migrations more repeatable.
I put those in the bin/
folder of the code for this project.
About the KWLUG website
Before proceeding to specific examples I will waste some time talking about the structure of the data I was migrating, and some design decisions I made.
The KWLUG website has been around since Drupal 4. The current iteration has been around since 2005. Originally we had planned to use KWLUG as a content generation hub: members would contribute reviews and forum posts and blogs to the site. This never took off, and KWLUG morphed into an information site focused on meetings and meeting announcements.
In migrating http://kwlug.org, I had a number of concrete objectives in mind:
I wanted to clean up and simplify content types:
- There were a bunch of content types that essentially had the same
field sets, but used names to distinguish the types of content.
I wanted to make these all
Page
nodes. - There were content types with fields that were rarely used and no longer needed.
- There were content types that needed additional fields.
- There were a bunch of content types that essentially had the same
field sets, but used names to distinguish the types of content.
I wanted to make these all
In the Drupal 6 site
Presentation Topic
andMeeting Agenda
were two separate content types, linked using an old module called Node Relativity. The intentions behind having two distinct content types were noble (having a queue of upcoming presentation topics to be scheduled) but in practice we just created presentations when scheduling meetings. Also Node Relativity was a dead-end module, superseded by entity references. There is also a longstanding bug that made attaching presentations to agendas difficult to use. Thus I wanted to do the following:- Get rid of Node Relativity.
- Merge the
Presentation Topic
andMeeting Agenda
content types so there would only be Meeting Agendas. - Preserve "orphaned"
Presentation Topic
nodes that were not associated with agendas. - Ensure that links to merged presentation topics continued to work (because such links sometimes exist in podcast show notes).
Some ancient version of the Drupal site used Flexinode for the earliest meeting agendas. Flexinode was a competitor to CCK in Drupal 6, but CCK won and nobody had bothered to migrate this old content. I wanted to fix that.
There were a bunch of podcasts and video recordings of meetings, but they were not linked to respective meeting agendas. I wanted to fix this, in an automated way if possible.
I wanted to reorganize the site to make it easier to find information people cared about.
I wanted to retheme the site to make it mobile-friendly but still usable on desktops. This was less about migration than site building.
I wanted to preserve historical content as much as possible. It is likely that nobody will ever look back at old posts, but sometimes it is enlightening to explore how KWLUG did things in the past.
I wanted to preserve URLs as much as possible.
I wanted to delete hundreds of spam accounts in the migration.
Over the years we had built up a lot of cruft in permissions and roles. I wanted to simplify and start fresh.
I wanted to pretend that spending two months automating the migration was better spent than manually copying a few hundred nodes manually. My thinking was that building a migration would scale to tens of thousands of nodes, and thus make me more employable.
I wanted to automate the process as much as possible and make it as repeatable as possible. I thought this would make the cutover process easier.
Getting Started
Setting up databases
The following guide is pretty good for getting the database set up: http://affinitybridge.com/blog/migrating-from-drupal-6-to-drupal-8
There are parts in that blog post that use Drupal Console, but I was not able to get Drupal Console working on my setup, so I just made my YML files manually.
There is a migration GUI, but don't bother with it. It times out for even small migrations. Use Drush instead.
For some reason I believed that the settings.local.php
wanted both a
$databases['migrate']['default']
and
$databases['upgrade']['default']
entry pointing to the D6 database.
So I did the following to set them both to be equal:
// Database entry for `drush migrate-upgrade --configure-only`
$databases['upgrade']['default'] = array (
'database' => 'd6_db_name',
'username' => 'd6_db_user',
'password' => 'd6_db_password',
'prefix' => '',
'host' => 'localhost',
'port' => '3306',
'namespace' => 'Drupal\\Core\\Database\\Driver\\mysql',
'driver' => 'mysql',
);
$databases['migrate']['default'] = $databases['upgrade']['default'];
To generate the initial set of migration settings, I then ran:
drush migrate-upgrade --configure-only
This generated configurations which I could then export:
drush config-export --destination=/tmp/migrate01
Then I copied the migration .yml
files that began with
migrate_plus.migration.
to a new folder. These would be the basis
files for my migration.
Migration module and migration group
To set up the kwlug_migrate
module, I did the following:
- I made a folder called
kwlug_migrate
in themodules/custom
folder of my Drupal install. - I made a file called
kwlug_migrate.info.yml
in this folder. - I made a nested
config/install
folder inside thekwlug_migrate
folder. - I added the
migrate_plus.migration.
YAML files to theconfig/install
folder.
The kwlug_migrate.info.yml
file looked like this:
name: kwlug_migrate
type: module
description: Migrate content from Drupal 6 to Drupal 8
core: 8.x
package: Custom
dependencies:
- migrate_plus
- migrate_drupal
- migrate_tools
- migrate_upgrade
- kwlug_content_types
This was actually enough to try a migration:
drush migrate-import --verbose --execute-dependencies --yes
but the migration took a long time and did not do what I wanted. The
next step was to set up a migration group.
I called mine kwlug_migrate
, because I name things
creatively.
To set the migration group I added a file to the config/install
folder called migrate_plus.migration_group.kwlug_migrate.yml
. It
defined the migration group as follows:
id: kwlug_migrate
label: D6 imports
descriptions: Content to import to the new site
source_type: Drupal 6
shared_configuration:
source:
key: upgrade
This file might not even be necessary. What is necessary is selecting
a target YAML file (say
migrate_plus.migration.upgrade_d6_node_blog.yml
) and changing the
following line from:
migration_group: migrate_drupal_6
to
migration_group: kwlug_migrate
Then I reran the migration as:
drush migrate-import --group=kwlug_migrate --verbose --execute-dependencies --yes
and Drupal attempted to migrate everything in the migrate group (in my case
upgrade_d6_node_blog
) and all the associated dependencies
(regardless of which migration group they are in).
It is nice to track down those dependencies and put them in the
kwlug_migrate
group as well. Then you will have a set of YAML files
you can keep (because they are in the migration group) and a set you
can discard.
Setting System UUID
If you install the initial system with drush site-install
then
Drupal sets a UUID. Then when you try to override certain
configurations (in my case system.site.yml
to change the front page
display) you
may get messages like Site UUID in source storage does not match the
target storage
.
This problem is documented here:
https://github.com/drush-ops/drush/issues/1625 .
The quick fix is to explicitly set the UUID of the site after it is
installed, so it matches the UUID in system.site.yml
:
drush cset system.site uuid 3112d604-7bb2-4dba-b418-f4f542f2682c --yes
Reducing Content Types
I discovered that I had a number of content types (blogs, pages, locations, book) that were all effectively the same, in the sense that they had the same sets of fields. I guess using different content types to semantically differentiate content is okay, but I decided to consolidate these types and differentiate them in a different way.
Take the example of locations. The migration for these is specified in
migrate_plus.migration.upgrade_d6_node_location.yml
. The source
and destination
sections of this YAML file originally looked like
this (with all other sections omitted):
source:
plugin: d6_upload_node
node_type: location
constants:
bundle_type: location
destination:
plugin: 'entity:node'
default_bundle: location
I wanted all location nodes to be turned into pages. To do this, I modified the destination bundle as follows:
source:
plugin: d6_upload_node
node_type: location
constants:
bundle_type: location
destination:
plugin: 'entity:node'
default_bundle: page
Of course, I needed to ensure that all the target fields for pages were specified in the YAML file as well.
Classifying content types
I wanted to maintain distinctions between locations and other page types. My original thinking was to use a taxonomy term for each page type, and assign that taxonomy term during migration. But this article (which is well worth reading) convinced me otherwise: http://blog.dcycle.com/blog/83/what-content-what-configuration/ . This article argues that taxonomy terms are data that can be changed at any time. Furthermore taxonomy terms are kept in the database, not in Drupal configuration (which could be exported into YAML files). The suggested solution was to add a select field to my page content type. This field would have a fixed set of values -- one for each content type.
To create this I used the (Drupal 8) GUI:
- First I made sure that the Page content type was migrated into Drupal 8.
- In the Page content type, I added a new field called
page_category
of typeList (text)
. - For
Allowed values
I made one entry per content type (whether they were merged content types or not). I kept the key values easy to parse ('meeting_agenda' instead of 'Meeting Agenda'). - I reused this field and added it to my other content types as well.
- I used
drush config-export
to export the configuration and pick outfield.storage.node.field_page_category.yml
and each of thefield.field.node.*.field_page_category.yml
files. These went into theconfig/install
of thekwlug_content_types
module.
The field.storage.node.field_page_category.yml
is almost editable by
hand, in case you want to add other content types to the list later
on.
The next step was to assign the content type in the migration YAML
file. To do this for location was fairly easy, since every single
location would have the same value. I started by adding a constant to
the source
section of the YAML file:
source:
plugin: d6_upload_node
node_type: location
constants:
category: 'Meeting Location'
bundle_type: page
and then assigning that category to the field:
process:
[stuff snipped]
field_page_category: constants/category
Enabling Display Fields in Migrated Content
At some point I was convinced that my custom fields were being migrated properly, but they were not showing up when I displayed nodes. When I navigated to the associated content types, the fields were listed as "disabled" in the "Manage Form Display" tab. Enabling these fields in the "Manage Form Display" and "Manage Display" tabs makes the (populated!) fields display properly.
The known issues page (https://www.drupal.org/docs/8/upgrade/known-issues-when-upgrading-from-drupal-6-or-7-to-drupal-8) acknowledges that this is a problem, but the listed solution is unsatisfactory: after each migration you are supposed to manually re-enable the fields. That is awful, so here is a better way:
- Do the migration
- Go to the GUI and manually fix the displays and form displays once
- Use
drush config-export
to export the configuration - Add the relevant config entries to the
postmigrate_settings
folder of thekwlug_content_types
custom module. For example, for the Page content type I had to add:core.entity_view_display.node.page.default.yml
core.entity_view_display.node.page.teaser.yml
core.entity_form_display.node.page.default.yml
- Enable these configurations after the main migration, with an
invocation like this:
drush config-import --partial drush config-import --partial
The reason you import the configuration after the main migration is
that these .yml
files have a bunch of dependencies, and including
all of these dependencies is messy and fragile.
Of course, every time you update the content type with new fields (or new orderings of the fields, or new widgets for field display...) then you have to update these files.
Textfields and Textareas
Say your Drupal 6 site has a content-type with a string field. That string field is set as follows:
- It has no maximum length
- Its form field is configured to be a Textfield (ie one line of text)
When you migrate this field it will migrate, but will be displayed as a Textarea (with multiple lines of text). This is due to ambiguity in migrating the field: https://www.drupal.org/node/1117028 .
I tried a bunch of automated ways to set this information during the migration, but gave up. The easy way to deal with this is to alter the Drupal 6 database: set each affected text field to have a maximum length of 255. Then the migration will assign the right type, and the forms will have textfield widgets.
Deleting Spam Accounts
Instead of attempting to delete spam accounts in the D6 site directly,
I got rid of them during the migration. To do this, I wrote a custom
source plugin for users (ContributingUser.php
). I defined a
"contributing user" as a user that had authored a node. Then in the
plugin I had the following query()
method:
/**
* {@inheritdoc}
*/
public function query() {
// Make a subquery of all the UIDs who have authored nodes.
$node_authors = $this->select('node','n')
->fields('n', array('uid'));
return $this->select('users','u')
->fields('u', array_keys($this->baseFields()))
->condition('u.uid', 0, '>')
->condition('u.uid', $node_authors, 'IN');
} // end query
The first query finds all authors of a node, and the second picks only users that are in that list of authors. This filters out any account that has not authored a node, which includes all spam accounts (and some legitimate lurker accounts, unfortunately).
This technique can be used to filter out all kinds of input, so long as you can distinguish legitimate from illegitimate data using a query.
I guess I should point out a couple of other elements of the plugin. Firstly, I reused most of the existing User plugin by extending it:
use Drupal\migrate\Row;
use Drupal\user\Plugin\migrate\source\d6\User as D6User;
class ContributingUser extends D6User {
I also had to define an ID for this plugin, which is done in a comment:
/*
* @MigrateSource(
* id = "d6_contributing_user"
* )
*/
Then in the migrate_plus.migration.upgrade_d6_user.yml
I had to
specify the use of this plugin:
source:
plugin: d6_contributing_user
I made one other change of note: I disabled all user accounts, with the idea that active users could have their accounts re-enabled later. This required setting a default value for the status field in the YAML file:
status:
plugin: default_value
default_value: 0
(Yes, being allowed to use 0
as a constant when you have to define
strings feels inconsistent to me as well.)
Simplifying User Roles
In addition to having too many spam users the old D6 site had accumulated a lot of spurious user roles over the years ("librarian", "speaker") that were no longer needed. I decided to start fresh by including only the built-in "administrators", "authenticated users" and "anonymous users", then adding other roles in the new website as required.
This meant I had to filter out roles somehow. To do this I changed the
migrate_plus.migration.upgrade_d6_user_role.yml
file to migrate only
the three built in roles and ignore the rest. In the process
section,
I changed the id
stanza from:
id:
-
plugin: machine_name
source: name
-
plugin: user_update_8002
to
id:
-
plugin: machine_name
source: name
-
plugin: static_map
source: name
bypass: false
map:
'administrator': 'administrator'
'authenticated_user': 'authenticated'
'anonymous_user': 'anonymous'
# plugin: user_update_8002
(Once again, I am mystified why I was allowed to use straight strings on the right hand sides of the map. YAML is weird.)
The map
part is the easy part of this migration: some names in the
D6 database had changed names for D8. The secret of this static map is
the bypass: false
part, which states that the migration should
ignore any entry that is not in the static map.
I am sure the plugin: user_update_8002
does something very
important, but I didn't know what it was and the migration seemed okay
without it, so I commented it out.
Fixing Text Formats
This is also acknowledged in the "Known Issues" page, but again the
solution was not obvious. Some input filters (notably the PHP input
filter) are no longer supported in Drupal 8, and others are missing.
These are replaced by something called filter_null
, which messes up
the site.
Symptoms you are affected include:
- During the migration you see the message
Missing filter plugin: filter_null.
- Content pages are blank when you view them, even though you believe
they have been migrated. Editing such a node displays a message like
Missing filter. All text is removed.
- In the GUI, going to "Configuration" -> "Content Monitoring" -> "Text
Formats and Editors", editing the affected content filter and
immediately saving makes the problem go away. You may see a message
like
The filter_null filter is missing, and will be removed once this format is saved.
There is a pretty good description of the problem here: https://www.hywel.me/drupal/2016/02/11/a-website-upgrade-from-drupal-6-to-drupal-8-part-4.html
The issue is that some filter or setting in the text filter is missing. PHP filter is one culprit, but in my migration there was some other problem that affected a lot of my content.
My solution was to migrate filter formats early in the migration process. Drupal 8 provides some default text formats (in the standard installation profile?) and I mapped my old formats to those.
The process plugin was called MapKWLUGFormatFilter
, and it lived in
the src/Plugin/migrate/process
folder of the kwlug_migrate
custom
module.
The heart of the function was very easy. Here is an excerpt from the
transform()
method:
public function transform($value,
MigrateExecutableInterface $migrate_executable,
Row $row,
$destination_property) {
$filter_mapping = array(
0 => 'restricted_html', // unknown but it exists
1 => 'restricted_html', // filtered_html
2 => 'full_html', // php_code
3 => 'full_html', // full_html
4 => 'restricted_html', // unknown. Some image format that has been lost.
5 => 'plain_text', // messaging plain text. Unused.
);
$retval = $filter_mapping[$value];
if (!$retval) {
$retval = 'restricted_html';
}
return $retval;
} // end transform
(One big difference between "Basic HTML" and "Restricted HTML" is the use of CKFilter, I think.)
I found the filter mappings that existed for my Drupal 6 site by
looking in the filters
and filter_formats
tables in the database.
I used this process plugin in content types that had a node body. Basically, I would change:
body/format:
plugin: migration
migration: upgrade_d6_filter_format
source: format
to
body/format:
plugin: map_kwlug_format_filter
source: format
I think I do not need the migration
plugin in this stanza because
the migration
plugin looks up ID maps of migrated filters, and in
this case I am setting the filters with a static map. (This
explanation might be wrong.)
I also found that I wanted to customize these filter formats. The
usual trick of making changes in the GUI, using drush config-export
and copying the relevant filter.format.*.yml
to
premigrate_settings
of kwlug_content_types
.
Merging Presentations and Agendas
Wow this took a long time. The basic idea was to merge two content types: "Meeting Agendas", which mostly had a meeting date and location, and "Presentations" which listed topics for the meetings.
Most meeting agendas were associated with exactly one presentation, but some early meeting agendas were associated with two. A few presentations were not associated with any meeting agendas.
As mentioned above, a module called Node Relativity associated agendas with presentations, but it also associated agendas with a different content type called 'FLOSS Fund Nominees'. So I had to be careful about picking proper associations.
With this in mind, here was my strategy:
- Run one migration for the Agenda content type. This migration would merge in all associated Presentation nodes for that agenda.
- Run one migration for the Presentation content, that would only pick out "orphaned" presentations not associated with any Agenda.
- Run one migration to redirect deleted presentation nodes to the relevant Agenda.
I also had to migrate auxiliary content such as attachments (which were typically attached to presentations) and images, but I will document these later.
Migrating Agendas
I did the bulk of the work in a source plugin for agendas, called
AgendaNode
(which extended
Drupal\node\Plugin\migrate\source\d6\Node
). Surprisingly, I did not
need to override the query()
method. Instead I did the bulk of the
work in the prepareRow()
method.
To find Presentation nodes associated with a particular agenda, I had to write a query that looked through the Node Relativity tables for matches:
$nid = $row->getSourceProperty("nid");
// Look for associated presentation topics in relativity table
$query = $this->select('node', 'p')
->fields('p', ['nid','title'])
->condition('p.type', 'presentation');
$query->join('relativity', 'r', 'r.nid = p.nid');
$query->condition('r.parent_nid', $nid, '=');
$query->join('node_revisions', 'nr', 'nr.nid = p.nid');
$query->addField('nr', 'body');
$presentation_info = $query->execute()
->fetchAll();
Now $presentation_info
contained zero or more presentations. I
looped through this array, grabbed each presentation's data, and
populated variables for the YAML file. For example, here is an extract
where I took the body texts of each presentation and appended them
to the Agenda body (this is not identical to the actual code, but it
is close). I also collected the NIDs:
if ($presentation_info) {
foreach ($presentation_info as $p) {
$body_so_far = $row->getSourceProperty('body');
$pbody = $p['body'];
if ($body_so_far) {
$body_so_far = $body_so_far . "\n\n* * *\n" . $pbody;
} else {
$body_so_far = $pbody;
} // end if body
$row->setSourceProperty('body', $body_so_far);
$row->setDestinationProperty('body', $body_so_far);
} // end foreach
} // end if presentation_info exists
You can see my confusion about source and destination properties here:
$row->setSourceProperty('body', $body_so_far);
$row->setDestinationProperty('body', $body_so_far);
I now believe you should only be setting source properties in source plugins. Setting the destination did not harm anything, but it was not effective.
(You can also see my confusion in getting the existing body at the beginning of each loop iteration and setting it at the end of each iteration, instead of pulling that functionality out of the loop. Oops. I am not changing it now, though.)
This example is cheating because instead of collecting an array of associated presentation bodies, I am just concatenating them into one big body. There are other examples where I did have to collect arrays of data, but I will cover them below.
In addition to grabbing presentation info, I had to get data from custom fields that were already associated with the agenda (meeting MCs, meeting dates and locations):
// I do not know why this stuff doesn't migrate itself,
// but whatever.
$query_agenda = $this->select('content_type_agenda','c')
->fields('c', ['field_emcee_uid', 'field_date_value',
'field_location_nid'])
->condition('c.nid', $nid, '=');
$agenda_info = $query_agenda->execute()
->fetchAll();
if ($agenda_info) {
// There SHOULD be only one row. I guess we are taking the last
// value if there are multiple.
// Lots of these will be NULL, though.
// Also we just want to append to the body if there is an emcee.
foreach ($agenda_info as $a) {
$row->setSourceProperty('meeting_date', $a['field_date_value']);
$row->setSourceProperty('emcee_uid', $a['field_emcee_uid']);
$row->setSourceProperty('meeting_location_nid', $a['field_location_nid']);
} // end foreach agenda_info
} // end if agenda_info
Migrating Orphaned Presentations
The key to this migration was to filter out all nodes associated with
Agendas, since those presentations are migrated in the Agenda
migration. Thus I created a source plugin (PresentationNode.php
)
that overrode the query()
method:
public function query() {
// Make a subquery of all the NIDs in the relativity table.
// Return presentation nodes not in this set.
$linked_presentations = $this->select('relativity', 'r')
->fields('r', array('nid'));
$parent_q = parent::query();
$parent_q->condition('n.type', 'presentation')
->condition('n.nid', $linked_presentations, 'NOT IN');
return $parent_q;
} // end query
The rest of this migration was fairly standard. The hard part was in figuring out that I never want to touch presentation nodes that are associated with agendas.
Creating Redirects
In the new website I wanted linked presentation nodes to disappear, but I wanted the old URLs to be preserved. Thus I wanted redirects from merged presentation nodes to the agendas that digested them.
Drupal migrations have "ID maps" that map NIDs from the D6 site to
entity IDs in the D8 one. I kept thinking that I could read these identity
maps to create the redirect, but this was stupid. The right way to
this was to go through the presentation nodes a second time, this time
selecting those nodes that HAD been merged
(MergedPresentationNodes.php
). Then I needed to fill in the YAML
file, but in my initial migration I could not find an appropriate YAML
file.
After installing the redirect
D8 module, I found a template:
modules/contrib/redirect/migration_templates/d6_path_redirect.yml
.
This specified the fields I needed to fill in.
There were a few tricks in this YAML file, so I will reproduce big chunks of it here:
source:
plugin: d6_merged_presentation_node
node_type: presentation
Even though my target was to create a redirect, I could use a node content type as the source. I found this interesting.
constants:
nodelist: node/
internal: internal:/
In the D8 database I saw that source redirects took the form
node/<entity-id>
but that targets were of the form
internal:/node/<entity-id>
. These constants help create those
strings.
process:
# If you omit this will it auto-generate?
# rid: rid
This part confused me a lot. This was not a migration from redirects
to redirects, so populating the rid
(redirect ID?) did not make
sense. I tried using things like the presentation node ID, but that
did not work well either. I found that omitting the rid
entirely
made it autogenerate, which is a neat trick that can be used
elsewhere.
redirect_source:
plugin: concat
source:
- constants/nodelist
- nid
# This is broken broken broken for multiple presentations.
# There is no easy way to fix this without an iterator, though.
redirect_redirect:
plugin: concat
source:
- constants/internal
- constants/nodelist
- agenda_nid
The redirect_source
and redirect_redirect
fields came straight out
of the template. The concat
plugin allowed me to build (simple)
strings for the redirections.
destination:
plugin: 'entity:redirect'
This was the magic that made a redirect and not a node.
In this case it did not matter whether the agenda nodes had been created be
Linking Nodes via Entity References
I wanted the Agenda content type to be the centre of the new website. Agendas needed to refer to locations, podcasts, video recordings, and other nodes associated with particular meetings. Pre-migration the set of links were a mishmash:
- A content type called "FLOSS Fund Nominees" were linked to Agendas in the same way presentation nodes were: via the Node Relativity module.
- The D6 site Agendas had a custom field for Location, which pointed
to nodes of content-type Location. These nodes were selectable via a
dropdown box. In D8, Location nodes were to be converted to Pages
using the
page_category
field to distinguish them, as described in the "Classifying content types" section above. - There were separate, unlinked nodes for video recordings and audio podcasts of the meetings.
FLOSS Fund Nominees
There is not much that is new to say. Since Node Relativity already related nominees to agendas, I took the same approach that I did when merging presentations into agendas above. But instead of merging strings I pulled out nominee NIDs and used them to populate the YAML file:
field_floss_fund_nominee_link/target_id: floss_fund_nominee
(Truth be told, figuring out HOW to populate these entity references caused me a lot of grief. But that was my own fault.)
Locations
Since the NIDs of locations were already included with the Agenda as
nodereference
s,
populating the entity reference links was not hard. More challenging
was getting the Agenda form view to select only locations, as opposed
to every possible node of type page. To do this I went into the GUI.
I navigated to the Agenda content type, found the location field, and
changed the Reference method
from "Default" to "Views: Filter by an
entity reference view". In order to do this I had to make a view
(duh). The view had the following properties:
- Format: Entity reference list
- Show: Entity Reference inline fields
- Field: Content: Title
- Filter Criteria:
- Content: Publishing status (= yes)
- Content: Page Category (= Meeting Location)
After doing this (and using configuration export to save the view and field settings in my config) the Agenda form restricted possible meeting locations to nodes of type "Meeting Location".
Podcasts and Vidcasts
Creating entity reference fields for podcasts and vidcasts was not that difficult: I just used the GUI to add them, and then used the magic of configuration export to retain those settings. Populating the fields was a different matter, because in the D6 database these fields were not formally linked in any way.
Fortunately, most podcasts and vidcasts followed a standard naming
scheme: "YYYY-MM: ". In the Agendas I had a
meeting date stored. So in the source plugin AgendaNode.php
I correlated the two.
Here is some of the code I used to correlate podcasts with meeting
agendas:
$meeting_date_raw = $row->getSourceProperty('meeting_date');
// Looks like: 2016-03-07T00:00:00
$is_match = preg_match('/^\d\d\d\d-\d\d/', $meeting_date_raw,
$substr_array);
// There is NO WAY that there should not be a match, because
// all (post-flexinode) agendas have a date.
// HOWEVER, some podcasts should not be associated with some
// agendas (laptop rescue missions). Unfortunately this ruins
// SFD podcasts, which need to be added manually.
if ($is_match && $row->getSourceProperty('presentation_nid')) {
$meeting_YYmm = $substr_array[0] . ":%";
// Look for podcasts
$query = $this->select('node', 'n')
->fields('n', array('nid'));
$query->condition('n.title', $meeting_YYmm, 'LIKE');
$query->condition('n.type', 'podcast');
$row->setSourceProperty('podcast_nid', $query->execute()->fetchAll());
} // end if is_match
As the comments indicate, my first attempt had unintended consequences: some agendas (Laptop Rescue Missions) had no associated presentations, but were getting populated with podcasts from presentations held in the same month. Thus I needed to filter out these agendas, which messed up a handful of other Agendas that DID have podcasts but did NOT have presentation nodes (Software Freedom Day celebrations). I opted to fix those manually afterwards.
Overall I am unreasonably happy with this hack, because it will save me hours of tediously associating podcasts and vidcasts with meeting nodes.
Note that this database query is gross and unsafe. I opted to trust the user input because I know who generated it, but if you are working with untrusted data you should not be dumb like me.
(I feel this code is fragile to the problem of multiple podcasts being associated with a single Agenda, but I do not think that happened in our D6 site. Sorry, future me.)
Fixing Dates and Timezones
Ugh. Dates and times. Ugh.
The Agenda content type came with a "Meeting Date" field. The Drupal 6
site regarded this field as being of type Date
. As a plain date it
had no hour or minute fields, and I am not sure it was timezone-aware.
Timezones mess everything up. The dates get migrated, but for some
reason that I have forgotten the hours and minutes become significant,
and the dates of the meetings sometimes switch. To fix this, I had to
manually set timezones on date fields. In the source plugin AgendaNode.php
, I
wrote code to set the timezones properly. In the library
imports of the module I had:
use \DateTime;
use \DateTimeZone;
and then in prepareRow()
I put:
// This should not be hardcoded?
$LOCAL_TIMEZONE = 'America/Toronto';
$DEFAULT_MEETING_TIME = "19:00:00";
$EMPTY_MEETING_TIME = "00:00:00";
// Ugh. Times get stored at 00:00:00, then Drupal does
// timezone magic to make the time incorrect. So munge the
// dates.
if ($a['field_date_value']) {
list($date, $time) = explode('T', $a['field_date_value']);
// This should look like 2016-12-26T00:00:00
if ((!$time) || ($time === $EMPTY_MEETING_TIME)) {
$target_time = $DEFAULT_MEETING_TIME;
} else {
$target_time = $time;
} // end set time
$localdate = new DateTime( $date . "T" . $target_time,
new DateTimeZone($LOCAL_TIMEZONE));
$localdate->setTimeZone(new DateTimeZone('UTC'));
$munged_date = $localdate->format('Y-m-d\TH:i:s');
$row->setSourceProperty('meeting_date', $munged_date);
} // end if field_date_value exists
This code sets a meeting time in local time, and then converts the meeting time to UTC for storage in the database.
Unfortunately this sets all meeting times to 7:00pm, which is in fact incorrect for some of our meetings. I thought about being more clever, but in the end opted to fix the incorrect time fields manually.
I am still not certain why I could not use plain Date fields, which appear to migrate properly: https://www.drupal.org/node/2566779#comment-11783277.
Converting Flexinodes
As mentioned above, flexinodes were an early competitor to CCK. I
guess they had been migrated from the Drupal 5 site to Drupal 6.
Flexinode agendas were still displayed in the D6 site,
but were not editable (I think because they
were missing node_revision
entries in the database).
The key to migrating flexinodes was in understanding the database
structure. In the D6 database, the flexinode_type
table provided a
list of the "content types" created using flexinodes:
mysql> select ctype_id,name from flexinode_type;
+----------+--------------------+
| ctype_id | name |
+----------+--------------------+
| 1 | Presentation topic |
| 2 | Meeting Agenda |
+----------+--------------------+
2 rows in set (0.00 sec)
Associated with these types are fields, which are defined in the
flexinode_field
table:
mysql> select field_id,ctype_id,label,field_type from flexinode_field;
+----------+----------+------------------------+--------------+
| field_id | ctype_id | label | field_type |
+----------+----------+------------------------+--------------+
| 2 | 1 | Abstract | textarea |
| 3 | 1 | Presentation Material | textarea |
| 4 | 1 | Reference material | url |
| 5 | 1 | Attachment | file |
| 11 | 2 | Pre-meeting Topic | presentation |
| 12 | 2 | Location | textarea |
| 10 | 2 | Presentation Topic | presentation |
| 13 | 2 | Meeting host / emcee | usergroup |
| 14 | 2 | Pre-meeting activities | textfield |
| 15 | 2 | Introduction | textarea |
+----------+----------+------------------------+--------------+
Thus the source plugin for flexinode data had to pick out data by
field_id
and associate them with the proper fields in the D8 content
types. Fortunately I was again merging "Presentation topic" and
"Meeting Agenda" nodes.
Data for these fields was stored in the flexinode_data
table, which
has a definitively weird schema:
mysql> desc flexinode_data;
+-----------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+-------+
| nid | int(10) unsigned | NO | PRI | 0 | |
| field_id | int(10) unsigned | NO | PRI | 0 | |
| textual_data | mediumtext | NO | | NULL | |
| numeric_data | int(10) unsigned | NO | | 0 | |
| serialized_data | mediumtext | NO | | NULL | |
+-----------------+------------------+------+-----+---------+-------+
5 rows in set (0.00 sec)
Different fields put data in different places. For example, here is field ID 10, which is like Node Relativity for Flexinodes:
mysql> select * from flexinode_data where field_id=10;
+-----+----------+------------------------------------+--------------+-----------------+
| nid | field_id | textual_data | numeric_data | serialized_data |
+-----+----------+------------------------------------+--------------+-----------------+
| 58 | 10 | a:1:{i:0;s:2:"50";} | 0 | |
| 432 | 10 | a:1:{i:0;s:3:"375";} | 0 | |
| 450 | 10 | a:1:{i:0;s:3:"452";} | 0 | |
| 62 | 10 | a:1:{i:0;s:2:"61";} | 0 | |
| 65 | 10 | a:2:{i:0;s:2:"63";i:1;s:2:"64";} | 0 | |
| 311 | 10 | a:1:{i:0;s:3:"321";} | 0 | |
| 318 | 10 | a:2:{i:0;s:3:"313";i:1;s:3:"356";} | 0 | |
| 320 | 10 | a:1:{i:0;s:3:"319";} | 0 | |
| 324 | 10 | a:1:{i:0;s:3:"323";} | 0 | |
| 374 | 10 | N; | 0 | |
| 376 | 10 | a:1:{i:0;s:3:"391";} | 0 | |
| 383 | 10 | N; | 0 | |
| 386 | 10 | a:2:{i:0;s:3:"384";i:1;s:3:"385";} | 0 | |
| 395 | 10 | a:1:{i:0;s:3:"394";} | 0 | |
| 397 | 10 | a:1:{i:0;s:3:"396";} | 0 | |
| 400 | 10 | N; | 0 | |
| 431 | 10 | a:1:{i:0;s:3:"434";} | 0 | |
+-----+----------+------------------------------------+--------------+-----------------+
17 rows in set (0.00 sec)
Note that the serialized data is in the textual_data
column. Oy.
Therefore putting the pieces together required the following:
- Finding each meeting agenda
- Using the relativity to find associated presentations (and decoding
these using the
unserialize()
Drupal function) - Picking out the titles, bodies, etc and merging them into a single node
I did this work in the prepareRow()
method. This sounds like a bad
idea since I want to iterate over agenda nodes. Fortunately the node
table has a field called node_type
, and you can select that node
type in the YAML file:
source:
plugin: d6_flexinode_agenda_node
node_type: flexinode-2
and then do the usual trick for turning these nodes into Agendas:
destination:
plugin: 'entity:node'
default_bundle: agenda
Once I figured out that I had to query all fields associated with a NID and then
classify on field_id
, the actual code of the source plugin
FlexinodeAgendaNode.php
is fairly straightforward (if tedious). Check the
source if it would give you joy.
Astute readers might wonder whether there were any Flexinode presentations not associated with Flexinode agendas. There were a couple, but I decided against writing a second source plugin for two nodes. Even I have limits. Instead I opted to migrate the content from these orphaned presentations post-migration. I also opted not to deal with Flexinode attachments.
Migrating Attachments
Man, I don't even know how this works.
Here is what I do know: The Drupal 6 database has two relevant tables:
files
and upload
. I guess files
stores filename information
(filename, path, etc) for all the files Drupal knows about, and
upload
records information about files uploaded (ie attached) to
specific nodes. The upload
table relates files to nodes by the vid
column of the node
table, NOT the nid
.
Thus migration of file attachments proceeds in several phases:
- The
files
table has to be migrated. The destination table in Drupal 8 isfile_managed
. The migration isupgrade_d6_file
. - The
upload
table has to be migrated, to the tablenode__field_file_attachments
. As the table name suggests, attachments are now fields of existing entities, so those entities need to migrate the attachments along with the other fields.
This workflow really confused me, because there is a d6_upload
plugin and an associated YAML file called
migrate_plus.migration.upgrade_d6_upload.yml
. I found that I did
not want to use this. Explaining why is tricky, so bear with me. Say
that we enabled this migration, and say that I am concerned about blog
posts.
- Because I am simplifying content types, I have a migration that transforms blog posts into pages.
- Every upload is associated with a node. If that node is a blog post,
and the upload runs first, then the migration will CREATE a blog
node, fill in the
upload
field, and stub out the rest of the migration. - Later, my custom blog migration will run. It will discover that there is already an entity that was created with the name NID (entity ID). So it does not run my migration, and does not create a Page.
- As a result blog posts without attachments become Page nodes, and blog posts with attachments become Blog nodes.
I do not know whether the file migration works if you are not consolidating content types or not.
My solution to this ended up copying code from the migration template
at core/modules/file/src/Plugin/migrate/source/d6/Upload.php
. One
example can be found in UploadNode.php
:
use Drupal\node\Plugin\migrate\source\d6\Node as D6Node;
/**
* @MigrateSource(
* id = "d6_upload_node"
* )
* Find uploaded files.
*/
class UploadNode extends D6Node {
/**
* {@inheritdoc}
*/
public function prepareRow(Row $row) {
$nid = $row->getSourceProperty('nid');
// This is copied from Upload.php
$query = $this->select('upload', 'u')
->distinct()
->fields('u', array('fid', 'description', 'list'))
->condition('u.nid', $nid, '=');
$row->setSourceProperty('upload', $query->execute()->fetchAll());
// print_r($row);
return parent::prepareRow($row);
} // end prepareRow
/**
* {@inheritdoc}
*/
public function fields() {
/* Add an upload field.
*/
$orig_fields = parent::fields();
$new_fields = array(
'upload' => $this->t('Uploaded Files'),
);
$fields = array_merge($orig_fields, $new_fields);
return $fields;
} // end fields
} // end class.
The prepareRow()
method queries the database for uploads related to this
node. The fields()
adds a field called upload
which can be used in the YAML
mapping. Note that these methods explicitly reference methods and
variables from their parent class (namely, Node).
Some content types need source plugins. For these content types, you
can extend UploadNode
directly, which will add the upload
field:
class BlogNode extends UploadNode {
// stuff goes here
} // end class
Some content types do not need me to write separate source plugins.
For these content types I modified the YAML files directly. For
example, in migrate_plus.migration.upgrade_d6_node_location.yml
I
changed the source plugin from:
source:
plugin: d6_node
node_type: location
to
source:
plugin: d6_upload_node
node_type: location
For these content types I could then add a stanza to the YAML file to set the file attachments:
field_file_attachments:
plugin: iterator
source: upload
process:
target_id:
plugin: migration
migration: upgrade_d6_file
source: fid
display:
plugin: default_value
default_value: 1
description: description
The iterator
was there because there can be many attachments. I
hardcoded the default_value
for the display
field to 1 so that all
attachments would be visible. (This may have been a mistake. It would
have been possible to propagate this setting as well.)
In one case (namely Agenda nodes) I needed to add code to the prepareRow()
method directly. As usual, merging presentation nodes and adding their attachments
to the associated Agenda caused issues, but the concept was the same.
File migration location
One way to specify the location of the Drupal 6 files is to pass a
legacy-root
parameter to the drush migrate-upgrade
command. But if
you forget this, it looks like you can set this manually in the
migrate_plus.migration.upgrade_d6_file.yml
YAML file. In my installation the
key constant is source_base_path
:
source:
plugin: d6_file
constants:
source_base_path: /home/linuxuser/drupal/files
process:
fid: fid
filename: filename
source_full_path:
-
plugin: concat
delimiter: /
source:
- constants/source_base_path
- filepath
-
plugin: urlencode
I believe that I have changed this constant and successfully pointed the file source correctly.
Image locations
I found that several image migrations were not working. The migrations were
failing because the files were not found. I found that there were a bunch of
subfolders in the D6 sites/default/files
folder, and that these subfolders
were not being searched:
teaser/files
thumbnail/files/images
imagefield_thumbs/images
pictures
My solution was to flatten the hierarchy. To do this I used a program called Meld, because some of the images in the subfolder had identical names to other files.
I do not know what these subfolders are for or why they were created, although I can guess.
Migrating Comments
Comment migration is similar to file migration in that you first have to migrate nodes which have comments, and then migrate the comments for those nodes later.
One quirk is that D8 wants to split comments into two subtypes:
comment
and comment_no_subject
. I foolishly rebelled against this and
decided to turn all comments into comment_no_subject
types. That
made life much more difficult. You probably want to conform to
whatever Drupal decides to do.
I had a lot of problems actually getting comments to display, even after the comment nodes were migrated. Here is what I learned:
First: similarly to files, you need to add a field to each content
type that will host comments. This field is called
comment_no_subject
and is specified in YML files with names like
field.field.node.page.comment_no_subject.yml
(These live in
config/install
of kwlug_content_types
.)
Inside each of these field definition, there is a status
default variable:
default_value:
-
status: 1
cid: 0
last_comment_timestamp: 0
last_comment_name: null
last_comment_uid: 0
comment_count: 0
It is very important that the status
be set to 1 if you want
comments to display. 0
means comments are hidden. 2
probably means
comments are read/write (which might be good for your site, but not
for mine -- comments were read only, and kept for historical
purposes).
In addition to setting the default value for this field, I explicitly
set the field in my node migration template. For example, in
migrate_plus.migration.upgrade_d6_node_page.yml
there is a stanza
that reads:
process:
# other stuff skipped..
comment_no_subject/status:
plugin: default_value
default_value: 1
However, I am pretty sure that the default_value
in the field
definition field.field.node.page.comment_no_subject.yml
takes precedence.
There were some database tables in the D8 database that were useful in figuring out these settings:
node__comment_no_subject
which shows statuses for each node that has a comment with no subject.comment_field_data
which shows which comments have been linked to which node (entity_id
)comment_entity_statistics
tallies the number of comments associated with each node. It specifies the comment trees, and splits by typecomment
vscomment_no_subject
.
In the configuration migration there are a bunch of different YML files you could incorporate. I incorporated the following:
migrate_plus.migration.upgrade_d6_comment_field.yml
migrate_plus.migration.upgrade_d6_comment_type.yml
migrate_plus.migration.upgrade_d6_comment.yml
To re-merge comment
and comment_no_subject
I needed to modify a
bunch of YAML files. In
migrate_plus.migration.upgrade_d6_comment_type.yml
I needed to map
the id
to a default value:
process:
# id: comment_type
# Make all comment types the same
id:
plugin: default_value
default_value: comment_no_subject
I then had to use this migrated value in TWO places in
migrate_plus.migration.upgrade_d6_comment.yml
:
process:
# stuff omitted
field_name:
plugin: migration
migration: upgrade_d6_comment_type
source: comment_type
comment_type:
plugin: migration
migration: upgrade_d6_comment_type
source: comment_type
I had missed field_name
for a long time and eventually discovered that
it was causing comment_entity_statistics
to break.
Finally, there was the issue of merged presentations and agendas.
Picking the right nodes to migrate comments from presentation
nodes
to agenda
ones required me to specify a bunch of mappings manually:
process:
# stuff skipped
# If this is a merged presentation node then
# use the agenda, not the presentation node
entity_id:
plugin: migration
migration:
- kwlug_migrate_dummy_merged_presentations
- upgrade_d6_node_agenda
- upgrade_d6_node_blog
- upgrade_d6_node_book
- upgrade_d6_node_location
- upgrade_d6_node_nominee
- upgrade_d6_node_page
- upgrade_d6_node_podcast
- upgrade_d6_node_presentation
- kwlug_migrate_forum
- kwlug_migrate_library
source: nid
no_stub: true
The key was the kwlug_migrate_dummy_merged_presentations
, which mapped
presentation nodes to the agenda NIDs that absorbed them. Unfortunately I
ended up having to specify all the content types with comments to migrate
as well, which was messy and irritating.
RDF Module Breaks Rollbacks
As of this writing, trying to roll back comment migrations failed when
I had the rdf
module installed:
PHP Fatal error: Call to a member function url() on null in
/home/linuxuser/kwlug-drupal-v05/web/core/modules/rdf/rdf.module on
line 252
It looks like the RDF module has caused problems in the past: https://www.drupal.org/node/2340401 .
Instead of being a good citizen and fixing the problem I just uninstalled the RDF module.
Filtering Taxonomies
There were a number of taxonomy vocabularies defined in the old site, and I wanted to migrate exactly one. I could not find a clean way to do this, so I resorted to a dirty hack.
First, I found the name and vocabulary ID of the vocabulary I wanted
to keep. Then I wrote a process plugin with the following transform()
method:
public function transform($value, MigrateExecutableInterface $migrate_executable, Row $row, $destination_property) {
$allowed_taxonomies = array(
'blogtags' => 9,
);
if (in_array($value, $allowed_taxonomies)) {
return $value;
} // end if
return FALSE;
} // end transform
Then I did something sneaky. In the YAML file
migrate_plus.migration.upgrade_d6_taxonomy_term.yml
I added the following
stanza to the process section:
process:
# Only allow terms from taxonomies we care aboot
dummy_test:
- plugin: select_taxonomy
source: vid
-
plugin: skip_on_empty
method: row
This dummy_test
works as follows: it calls my custom select_taxonomy
process plugin and passes the vocabulary ID as a parameter. This is
the vocabulary associated with this term. If the term has a vocabulary
ID that is in the whitelist, the migration continues and the term is
migrated. Otherwise the select_taxonomy
plugin returns FALSE (ie
nothing) and the skip_on_empty
prevents the migration of this term.
This hack was not my preferred approach. Ordinarily I would have
overridden the query()
method in a source plugin. There was a reason
I avoided this, but I do not remember what it was. Maybe it was
because hardcoding VID values is ugly, and it would have been easier
to see the hardcoding in a process plugin.
My actual preferred approach would have been to filter out unwanted taxonomy terms right in the YAML file with no plugins, but there was a reason that failed too.
I played the same sneaky trick in the
migrate_plus.migration.upgrade_d6_taxonomy_vocabulary.yml
to migrate
only vocabulary names I had whitelisted.
Setting Redirects Using a CSV Source
RSS feeds and views had some URLs that would be different in the D8 site than the D6 one. So that old feeds would not break, I wanted to create redirects from the old feed locations to the new ones.
One option would have been to create these redirects manually after the site had been migrated. But I am forgetful, so I decided to automate this with a migration, using a CSV file as the source.
The key to doing this was to install and enable the
migrate_source_csv
module. This defines a CSV source plugin, which
I used in my migration kwlug_migrate_rss_redirect
:
source:
plugin: csv
path: /home/linuxuser/drupal/rss_redirect.csv
header_row_count: 1
keys:
- sourcepath
column_names:
0:
sourcepath: Source
1:
destpath: Destination
2:
statuscode: Status Code
The code is fairly self-explanatory. The sourcepath
, destpath
, and
statuscode
entries are used in the migration, but Source
,
Destination
and Status Code
are not (as far as I can tell).
I did not want to specify the status code 301 if I did not need to, so
I added a default_value
plugin to my process section:
process:
status_code:
plugin: default_value
source: statuscode
default_value: 301
Where to Do What
I struggled a lot with figuring out how the Drupal migration process wanted me to think. A few blog posts helped get me started:
- https://cheppers.com/blog/migrate-d8-pt2 : an overview of which components to modify. This was the first blog post I found really helpful.
- http://webikon.com/cases/migrating-to-drupal-8 : I found this more abstract, but it is more thorough than the above link.
- https://www.slideshare.net/isholgueras/migrating-data-to-drupal-8 : This is dated, but was useful in understanding where different files go.
- https://www.drupaleasy.com/blogs/ultimike/2016/04/drupal-6-drupal-81x-custom-content-migration : This is a simplified example, but it includes some of the modifications one might make to a YAML file.
Despite these resources, it took me a long time to
understand the overall structure migrations want. This
section documents some of these lessons.
The high-level view of migration components is as follows:
Source plugins read from the D6 database (or a CSV file, or a different database, or...) and populate variables ("fields") that can be specified in YAML files. There will typically be one source plugin per entity type.
In addition to the standard source plugins I wrote a bunch of custom ones for content types with extra fields or weird structures.
- YAML files map fields from the source plugins to fields in the destination node/entity.
- Process plugins manipulate the output of source plugin fields into formats suitable for the destination fields.
- Destination plugins actually create the node or entity in question, populated as directed by the YAML files.
For low level examples, read on.
Migration Patterns
Drupal migrate really wants to transform exactly one node/entity from
the D6 database into exactly one entity in the D8 database. If you
want to merge two D6 nodes into one D8 entity (for example, when I
merged Meeting Agenda
and Presentation
content types) then you
have to write custom code in a source module to pull associated nodes
from the database.
Similarly, if you want to migrate one node from the D6 database into
two distinct entities in D8 (for example, in generating a redirect
entity from each merged Presentation
in addition to migrating the
node itself) then you may need TWO migrations (effectively
reading the D6 database twice). Trying to output two nodes from one
YAML migration file does not seem to work. Trying to reuse "migration map"
database entries tends not to work unless you can specify a YAML file
that uses the migration
process plugin cleanly.
If you want to associate nodes with each other (eg associating file
attachments to their nodes) then you should migrate the linkee
first, and then the linker (ie files should be migrated first, and
then nodes that have those files as attachments migrated later).
To do this you have to declare a
dependency in the migration_dependencies
section of the linker.
(I think Drupal can handle circular dependencies by rerunning
migrations again and again.)
Your ability to incorporate additional information into migrated nodes is pretty limited (eg adding a new field to a content type and populating it). If you want to do it, you have the following options:
If there is some indication in the source node about what information needs to be added, AND the number of destination values is limited, then a
static_map
in your YAML migration file might do the trick.You can define a new data source (from a CSV file or something) and use a migration to make entities of that, and then link those entities to your target nodes. I have never done this, but based on the file attachment example I think this should work.
You can define your data source, incorporate it into the D6 database, and then pull that data with
prepareRow()
.Maybe you can use the
prepareRow()
method in the node's source plugin to pull the new generated data, and offer that data to the destination via the YAML migration file.
Source Plugins
Source plugins have two interesting methods: query()
and
prepareRow()
. (There are also two less-interesting ones:
getIDs()
and fields()
, which are pretty straightforward.)
Both query()
and prepareRow()
pull from the D6 database, but there are some conceptual
differences between them:
query()
is used to pull the SET of database rows that will be migrated to D8 entities. It returns a query that represents the set of rows to transform. You can do database joins here if you want, but the key point is that each row that is returned should correspond to one entity in the migration.If you need to filter out nodes/entities then do it here. For example, when migrating users I wanted to filter out all users that had never created content, so I wrote an appropriate query here.
prepareRow()
gets a SINGLE row from the database. It can then manipulate and massage this data to make it suitable for the D8 target. It does this by populating fields defined in thefields()
method.There is a mechanism for
prepareRow()
to reject rows (and thus refrain from migrating that row into a D8 entity) but you don't want to do this, because it messes up your migration status (rows that you did not intend to migrate show up as incomplete migrations). Instead, be more specific in thequery()
method.When doing weird migrations I wrote a lot of code in
prepareRow()
methods.
To assign fields in the prepareRow()
method, use the
$row->setSourceProperty()
. I was confused because there is also a
$row->setDestinationProperty()
but I think this is not relevant in
prepareRow()
. You want to set the source properties in source
plugins.
Drupal 7 apparently had more methods to override. For example,
https://www.drupal.org/node/1132582 documents prepare()
and
complete()
methods, but these no longer exist in Drupal 8.
Drupal has some elaborate query builder syntax. Fortunately the syntax appears to be similar to Drupal 7, so there are cheatsheets available: http://www.eilyin.name/note/database-queries-drupal-8-7 helped get me started with Drupal 8 syntax, and the "Drupal 7 database Cheat Sheet" from https://wizzlern.nl/drupal/cheat-sheets got me most of the rest of the way.
YAML Mapping Files
The YAML Mapping files from source to destination entities is supposed to be the easy part, but I found that it was difficult to set mappings unless the source plugin output exactly the information I needed.
Sources and destinations
One thing I struggled with in YAML mapping files is where the different components came from.
Consider the following fragment of a mapping file:
process:
field_presentation_title: presentation_title
field_floss_fund_nominee_link/target_id: floss_fund_nominee
This means that field_presentation_title
and
field_floss_fund_nominee_link
are fields in the destination content
type. The target_id
is confusing, and I do not remember how exactly
I found it (maybe here:
http://drupal.stackexchange.com/questions/223715/migrate-multi-value-paragraph-field), but I do see that there is a clue about the name in the
database schema:
mysql> desc node__field_floss_fund_nominee_link;
+-----------------------------------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------------------------+------------------+------+-----+---------+-------+
| bundle | varchar(128) | NO | MUL | | |
| deleted | tinyint(4) | NO | PRI | 0 | |
| entity_id | int(10) unsigned | NO | PRI | NULL | |
| revision_id | int(10) unsigned | NO | MUL | NULL | |
| langcode | varchar(32) | NO | PRI | | |
| delta | int(10) unsigned | NO | PRI | NULL | |
| field_floss_fund_nominee_link_target_id | int(10) unsigned | NO | MUL | NULL | |
+-----------------------------------------+------------------+------+-----+---------+-------+
7 rows in set (0.00 sec)
The presentation_title
and floss_fund_nominee
are field names from
the source plugin (ie defined by the fields()
method in the source
plugin.) If you are adding extra information (for example, more
fields) to a content type then you must define these names.
Updating imported YAML files
Another quirk about YAML files is that changing them (as you do
repeatedly when troubleshooting them) is a pain. If you change a YAML
file and resume a migration (perhaps with drush migrate-rollback
followed by drush migrate-import
) then the migration will continue
to use the version of the YAML file it imported when you installed the
associated module (in my case, kwlug_migrate
). You somehow need to
get rid of this configuration object and replace it with your updated
version in order to test your changes.
If you try to naively uninstall and enable the module
you will get stuck because the configuration objects are already
registered with Drupal: exception
'Drupal\Core\Config\PreExistingConfigException' with message
'Configuration objects
(migrate_plus.migration.kwlug_migrate_agenda_redirect.yml)
provided by kwlug_migrate already exist in active configuration'
To get around this problem I just reinstalled Drupal again and again,
but in writing this entry I found a better way: use config-import
to
reread the .yml
files. Say my configurations are in a folder called
test
. Then you might do something like this:
#!/bin/bash
element=$1
srcdir=/path/to/drupal/source
testdir=$srcdir/test
pushd .
cd $srcdir
time drush migrate-reset-status $element --yes
time drush migrate-rollback $element --yes
time drush config-import --partial --source=$testdir --yes
time drush migrate-import --execute-dependencies $element --yes --notify
popd
The argument $1
should be the name of a migration (eg
upgrade_d6_node_location
), and the corresponding YAML file should be
in the $srcdir/test
folder.
migrate-reset-status
stops the migration if it got stuck on the last run.migrate-rollback
undoes the migration for this element so far.config-import
imports your changed YAML filemigrate-import
reruns the migration
Once you are happy with the YAML file you can then move it back to the
config/install/
folder of kwlug_migrate
.
If this does not work for you and you are not a dumb-dumb who reruns the entire migration every time, there are some other possible approaches documented here: http://drupal.stackexchange.com/questions/164612/how-do-i-remove-a-configuration-object-from-the-active-configuration.
Assigning constant strings
Sometimes you want to assign a constant to a field in YAML file:
process:
title: 'Every page should have the same title'
This does not work. Every field on the right-hand side of a process statement seems to look for a variable on the left hand side, even if that variable is in quotes.
The solution is to define a constant in the source
section of the
YAML file, and assign that instead:
source:
constants:
static_title: 'Every page should have the same title'
process:
title: constants/static_title
Process Plugins
If you are writing source plugins then process plugins tend to be
simple or unnecessary, because you can probably massage data in the
source plugin's query()
or prepareRow()
methods. However, I
found process plugins useful for the following things:
- Printing debug information, as documented in the "Process Plugins" section below.
- Picking out specific fields to migrate. For example, I decided to migrate taxonomy terms from selected taxonomies, and found it easiest to write a process plugin for this.
- Extracting values out of complicated data structures, because YML files break my brain.
- Mapping values to a static map (which probably could have been done in the YAML mapping directly.
Process plugins can be chained together. This is good for specifying default values, or for inserting a debug plugin. The official documentation is pretty good here, but it took me a long time to find: https://www.drupal.org/docs/8/api/migrate-api/migrate-process/migrate-process-overview
Destination Plugins
Destination plugins are black magic. I know nothing about them
except that you can specify the destination entity type in the YAML
migration file, in the destination
section:
destination:
plugin: 'entity:redirect'
will specify the target is a redirect, even if the source is a node.
Migration IDs and Maps
Doing a Drupal migration creates a bunch of tables with names prefixed
with migrate_map_
and migrate_message_
. I am not clear what
migrate_message_
is for (although I can guess).
The
migrate_map_
tables store mappings of NIDs (or entity IDs) on the
old site to the new one.
Why do you need a map? The primary reason is to use the migrate
process plugin. For example, in
migrate_plus.migration.upgrade_d6_comment.yml
I have:
comment_type: plugin: migration migration: upgrade_d6_comment_type source: comment_type
When migrating comments I squished comment
and comment_no_subject
together, so I used this migration to indicate that the comment_type
value from the source (source: comment_type
) should be transformed
in the same way for this comment.
A secondary reason for maps are incremental and live updates. If users continue to update the D6 site while you are migrating the site to D8, you do not want NIDs to clash. You might also want to refer to migration maps from one entity type when modifying another (see https://www.drupal.org/docs/8/api/migrate-api/migrate-process/process-plugin-migration for examples of this).
I have a feeling that in a real site migration with lots of users and lots of nodes I would have had to be more careful around using migration maps correctly, but I was sloppy with them when writing my own YAML files.
Troubleshooting
Keep drush sqlc
running on both your D6 and D8 database. I found I
was digging through database structure all the time to figure out
field names and how tables related to each other. show tables like
'%field%'
and desc tablename
were good friends.
Sometimes migration failures are logged in the Drupal logs. Use drush
wd-show
to see a brief summary, and the web interface to see a lot
more detail.
I found that running script
when running migrations was super
useful, because the migrations could get too verbose for my terminal's
scrollback buffer. Unfortunately the colour output drush
produces
make the script output gross, but the Internet has a solution here:
http://unix.stackexchange.com/questions/14684/removing-control-chars-including-console-codes-colours-from-script-output
. I have this code into a clean-typescript.pl
helper script.
I used print_r
a lot in my source and process plugins to figure out
what data structures I was trying to query/populate. In complicated
source plugins I had code like this at the end of my prepareRow()
method:
if ($nid >= $this->DEBUG_NID_START && $nid <= $this->DEBUG_NID_END ) {
print_r("\n row is\n");
print_r($row);
} // end if debug
By setting DEBUG_NID_START
and DEBUG_NID_END
to appropriate
windows, I could see what was going on for a few target nodes without
getting overwhelmed.
Similarly, I created a debug_contents
process plugin to see the
contents of fields I was trying to map. Here is example usage from a
YAML file. Say I was having trouble understanding what
presentation_title
was. I could then change:
process:
field_presentation_title: presentation_title
to
process:
field_presentation_title:
-
plugin: debug_contents
source: presentation_title
-
plugin: get
This would print out the data structure to the console before running the migration.
Failures and Improvements
The first failure was that this migration took so long (maybe 2.5 months of sporadic work). Many of the techniques I documented in this post took days of experimentation and reading to figure out.
When I started the migration I did not know about Drupal Console or Composer. I am still not sure why Drupal Console is important, but apparently Composer is quickly becoming the standard for Drupal packaging.
I deliberately did not worry about incremental migrations (resyncing the database by migrating only new content) or rolling back migrations. These are important considerations for larger sites.
I did not manage to get a bunch of header images migrated properly. I believe most of the data exists, but I am not clear how to associate it with nodes properly.
I wish I had been able to find better sources of help than I did. I was too scared to post threads on drupal.org directly, which was a mistake. Instead I relied on DuckDuckGo searches, and when I got really desperate I attempted to ask questions on the #drupal-migrate IRC channel and on https://drupal.stackexchange.com . Neither of these support channels worked well. The IRC channel was basically dead, and few people seem active on the Stack Exchange group. (Then again, it isn't as if I am answering other people's questions on those forums, so...)
If I had been more conscientious I would have started with a minimal (or custom) install profile rather the standard one. The standard one created some content types and menu items I did not like.