This is an edited transcript. For the blog post and video, see: Moving from Drupal 7 to Drupal 10: Managing Complex File and Media Migrations
[00:00:06] Janez Urevc: Hello and welcome to Tag1 Team Talks, brought to you by Tag1 Consulting. With Drupal 7 rapidly approaching end of life and Drupal 9 already there, we are hearing people talk about migrating and updating more than ever before. Anyone who's ever been involved with a large scale migration, migrating a large site or application from one technology stack to another will tell you that it's complex, time consuming, and it demands expertise.
[00:00:35] That's why we are bringing you this series of talks diving deep into the world of Drupal migrations. And who's better to guide us than Tag1's very own Drupal migration experts? From the masterminds and maintainers of Drupal's migration tooling, to the individuals behind the most groundbreaking Drupal migrations, we've got an all star lineup.
[00:00:57] who will cover everything you need to know about every aspect of migrating large scale applications. In today's episode, we are going to talk, uh, through migrating media and files, we'll discuss how the media landscape changed between Drupal seven and 10. Touch on how to migrate locally stored files versus remote media and talk about inline embedded media and much more.
[00:01:23] Let's dive in. I'm Janez Urevc, Strategic Growth and Innovation Manager here at Tag1 and a long time contributor to Drupal. And I'm joined today by the two well known top contributors to Drupal, Lucas Hedding, one of the five current Drupal Migrate Core subsystem maintainers, and Mauricio Dinarte, Drupal Migrations expert and author of the 31 Days of Migration series.
[00:01:49] Welcome both and thank you for joining me.
[00:01:54] Lucas Hedding: You're welcome.
[00:01:54] Janez Urevc: At the beginning, I wanted to open the discussion, uh, with a little bit of background on how things changed between Drupal 7 and Drupal 10, because this is obviously something that will affect how we migrate files and media. Um, Lucas, do you want to.
[00:02:17] Tell us about that.
[00:02:19] Lucas Hedding: Sure. So in Drupal seven, there was a thing called a file entity, and then there's a contrib module called media that kind of nested itself on top of it. And a lot of times it worked pretty well. And if you weren't deep in the, into the code or in the database, it. It just seemed like you're dealing with obviously just files, right?
[00:02:45] Except it wasn't, it was a lot of things. And that model is a bit different in Drupal 10. Uh, it got pulled into Drupal core. It, um, matured and has a different setup. The main part is that, uh, we can still migrate between the two. You just have to just do it right. And there's a lot of contrib modules that.
Um, we'll help with us and we'll, we'll talk about those through the course of our time today.
[00:03:16] Janez Urevc: Um, and like in Drupal 10, we have this new entity type called media, but we still have the like managed file or what basically used to be file entity in Drupal 7.
[00:03:32] Lucas Hedding: Yeah. File managed table in the database with files on the local hard drive.
[00:03:37] Janez Urevc: They are entities, but they are not fieldable in core.
[00:03:40] When you migrate, you generally need to migrate twice, quote unquote, uh, first into managed files and then into media entities to be able to use media in a media library and whatnot. But sometimes, um, people decide not to even use media entities. [00:04:00] Um, and this is also something to consider before we start a migration. And I'm pretty sure Mauricio that, uh, you have experience with that. So what are your thoughts on this decision?
[00:04:13] Mauricio Dinarte: Um, generally speaking, it depends on whether you will leverage the functionality that in Drupal 10, you have around media. Uh, one of that would be the, the media browser. If the UI, um, is compelling to you, uh, if you have needs around access controls, because you know, it's an entity in itself. You can build a lot around that. Um, generally speaking, media entities are kind of the default that you will want to go into, uh, to take advantage of, you know, the progress of Drupal itself. That being said, there are scenarios in which you might want to like not go that route, at least not.
It is not a choice that you need to make for the whole site. Like, you can, uh, port some of the fields to media entities and other fields into regular file fields as before. So, things that you might want to consider is if you depend for any reason on per field configuration, uh, then you might. Uh, you might go with regular files because at the moment, um, if you have one single, um, media type for images, for example, everything is going to go into that bucket and, uh, and how, wherever that field itself is configured, the same rules would apply, uh, in terms of where files are located.
[00:05:34] For example, I have worked with organizations that for different reasons, they want to have a specific. Uh, file structure, even though for, from a Drupal's perspective, you know, they, they are just files in the file system and it is not relevant where they live, but they want to keep it as they had it before.
[00:05:53] Another example was, uh, uh, a break that had, uh, publication workflow, and they specifically didn't want to use media library because they could expose, um, You know, files and documents that should not be seen by people, uh, uploading files, uh, as part of the process. Again, like, there are considerations around permissions, around, uh, files locations, if that is important for you.
[00:06:21] Um, but generally speaking, going to media is It's the default. And as you say, uh, it's usually a two step process. You migrate into the file entity itself, and then you migrate, uh, into the media entity. And in most cases, you can even use the same source plugins, uh, for that, like the same source. And what you will be tweaking is the destination because there will be different entities.
[00:06:44] And, uh, you will be Tweaking the process, um, pipeline, um, because, you know, being different entities, there will be different properties that you need to map. And one last thing about this is that if you enable validation, um, which is generally a good idea in a migration, sometimes things are not obvious in the context of a single migration.
[00:07:10] But, but if you have, um, a file migration and you don't specify that the status of the file is permanent, uh, that's a, that's a property of the media, um, entity itself, actually it's a property of the file entity itself. The file migration is going to complete, validation is going to pass. When you go to the second step of migrating media, if the file is not permanent, you will not be able to reference it.
[00:07:40] So you are going to get an error in the media migration for something that you kind of overlooked in the file migration. So just to be mindful that sometimes, especially when working with validations, you need to be mindful about configurations. in dependent, uh, migrations.
[00:07:59] Janez Urevc: That's a really good point.
[00:08:00] Um, I, I never thought about it, but it makes total sense. Um, when you were talking about per field configuration, I, I had this thought. Like, I guess you could, if you have different fields in Drupal 7 and you still want to keep the distinction, you could create separate, like, let's assume that they are all the same file type.
[00:08:31] Uh, let's say that they are all images, but some are profile images and some are like images that belong to a slideshow that belongs to an event or something like that. And semantically you want to separate them. I guess you could have separate media types, that all use the same image source plugin. And then you could even on, on, on those media types, you could have different fields.
[00:08:55] And I think that there, then you could also configure to, to store them in, a separate location. Or maybe, maybe that would be one, one way to solving that. Um, and this was, um, this was exactly the reason why we. We came up with this idea of media source plugins and media types, uh, to cover cases like this, when you have, um, the same type of asset, but you want to have a different type, uh, for different, items of, that asset.
[00:09:34] Um, because they might all be images, but semantically, they might not be the same thing.
[00:09:41] Mauricio Dinarte: And one more comment on this is that we might be familiar with the, uh, media types that are provided by the standard installation profile. You know, there are five out of the box, but two things, one, you are not, you know, bound to only those five.
[00:09:56] And also, uh, as it is the case with migration, sometimes you do not start with the standard installation profile. You start with minimap, for example, or even with a custom profile. When you do that, you don't get the, the other ones because the five ones that come with core actually are part of configuration that only gets installed with the standard installation profile.
[00:10:18] So if you're using something else as your, as your base, as your starting point for a migration and to Drupal 10, you might either have to recreate them manually or could be the same configuration that comes with the standard installation profile. Just want to highlight that because one of the very first time that I noticed that.
[00:10:40] Um, those configurations actually from from the profile and not from a module. It was kind of puzzling why it was not there.
[00:10:48] Janez Urevc: That's a really good point. And just to mention here, the source plugins that we have available in core. Uh, we have file like a generic file image, audio [00:11:00] file. And a video file and here with video file, we're not talking about things that are remote embeddable like YouTube or Vimeo or something like that, but like a literal video file dot web.
[00:11:12] WebM or MP4. Um, and additional to that, and we will talk about the remote media in the next section. Additional to that, it also provides an OEmbed, which covers like majority of remote use cases. Um, but then. If, uh, if we need something specific, there are also contributed modules that provide more source plugins.
[00:11:44] But as core evolved, I think that it's safe to say that nowadays, um, we can, we can cover a lot just by using functionality provided in
[00:11:56] core. Um, Lucas, welcome back.
[00:12:01] Lucas Hedding: Thanks.
[00:12:04] Janez Urevc: We touched on MediaModule in D7 before, um, but how would the fact that the Drupal 7 site uses MediaModule compared to like only what was in Drupal 7 core?
[00:12:19] How, how does that affect migration?
[00:12:22] Lucas Hedding: In some ways It's a little bit easier because you've already got to meet two entities between Drupal seven and Drupal seven to migrate. Uh, you've got the media entity and you've got the file entity, which is kind of what you already have in Drupal 10 too. Uh, and so it makes it a little easier to, to be able to, to link the two together and then pull them over into your new site.
[00:12:50] The, uh, The sources of these, of the data, the source of the data is, is a SQL query that you have to look in different tables, look at different places in the database, and there's a source source plugins that pull together both of those things. So one for, for the media and one for the files. And, and then allows you to map them to the new destinations in Drupal 10.
[00:13:22] Janez Urevc: Yeah, and I guess media, media in D7, um, handled remote media, also handled, uh, CKEditor embeds. Um, and during our migration, we will potentially need to cover all of those. Um, but I guess we will, we will talk about those later on when we get there. Um, are there any contributed modules that can help us approach migration?
[00:13:52] Of, uh, media from Drupal 7 into Drupal 10.
[00:13:58] Mauricio Dinarte: Um, I have worked with at least three different ones. Um, media migration is probably the most complete or the one that is trying to achieve more. Um, for one, it will try to automatically convert. Uh, what you had in Drupal 7 to media entities in Drupal 10. And when I say what you had in Drupal 7, if you were not using like media in Drupal 7, it will still convert regular file fields into, you know, it will build a connection so that they are media entities.
[00:14:29] And if you were using media entities before, it would just like copy the configuration automatically and move over the data. It will also take care of things within the wizard, as you said before, uh, if you had embeds, if you have even, uh, uh, links to other entities, it will try to detect them and just like make the connections automatically under the hood.
[00:14:51] That model depends on another model that is called migrant magician. And that is for a very good reason, because this model is very powerful. It is almost like magic to like everything that it does. That being said. If, uh, the, the approach of the module takes is more or less what Drupal core does. Like it is going to try to make a one to one copy of the content model to what you had before.
[00:15:15] And if, you know, for different reasons, like you have a new content model, a new set of structure, you need to make some changes to how the entities are like, uh, configured, um, like you might. You might have to go a custom route, and in that case, what we normally do is install the module, do not use it for the automatic migration of configuration, but instead just use the process plugins that are provided, so that in my custom migration, I am able to, you know, do these transformations like the embeds in the WYSIWYG, but adapted to my new content model that is, you know, being migrated into Uh, with the, with the custom migrations, so that's probably like the, the one that you might look first.
[00:16:01] And the second one that I have also used is MigrateMediaHandler. Uh, this is, uh, narrower in scope, but, um, I, I will share a story later in another section, but just to point out that it is there, and I actually use it for another project that was, uh, migrating content from a WordPress site. So, the process plugins provided by that module actually help me, um, migrate from WordPress to Drupal in a different context. And the last one. Is, uh, Migrate Plus itself. It has, um, process plugins to manipulate them. So depending on, you know, what type of changes you need to do to, uh, to reach text fields, uh, you, you could use it to also like manipulating bits of media within the, within the body or rich text fields in general.
[00:16:51] Janez Urevc: Great. Um, another thing that I have experience with is, uh, handling Large libraries and please correct me if I'm wrong, but I think that generally Migrate when you're migrating files. It will check if a file exists on the expected destination. And if it doesn't it will try to copy it from the source Where it should be, um, and if you have a large library, um, that could take quite a long time.
[00:17:31] So it's much better if you can, to mount your files into the correct place. So it's already, so they're already there. And, um, if that's not possible, uh, rather Rsync it before running the migration than copying through the migration, because that would, that will slow it down significantly.
[00:17:54] Lucas Hedding: Um, It's not default.
[00:17:57] It's not default? It is a flag that you can alter and change it's the File Copy plugin, and there's a file exists flag to the plugin, and you just say use existing and we'll start using existing files if it finds a file there already.
[00:18:21] Mauricio Dinarte: Yeah, so it is important you mentioned like the strategy that you mentioned is actually very helpful because it's going to speed up the process of the migration.
[00:18:30] If you don't copy over the files first, you will have to like, fetch them basically and. Bringing over the files is going to take usually a long time, especially for big libraries. So probably the best approach is to copy the files manually via Rsync or some other way. And then what Lucas just said, like in the file copy plugin, add the configuration for use the existing files.
[00:18:59] If not, the default is that if the file exists, it is going to, uh, rename it, I think. So, uh, just like make a copy with, uh, uh, an appendix, uh, like underscore zero, underscore one, underscore two. So you need to use that, uh, that thing. Another thing that I want to point out is that, um, it is common in a migration, uh, to be importing and rolling back the migrations because you are testing things.
[00:19:27] You are polishing along the way. With the files migration, in particular, if, once you have the entities in your file system of the new site, whether that was because you manually copied them or because it was done as part of the migration itself, one important part of the file migration is that if you roll it back, uh, ultimately the Migrate API works with other APIs in Drupal.
[00:19:54] And in particular with the Entity API, rolling back an entity means. Uh, deleting the entity and, uh, file entity, not only deletes, the record from the database, it also deletes the file from the file system. So if you already copied, you know, 10 gigabytes of data and you rolled up migration, you rolled back the migration, um, you're going to lose all the files.
[00:20:18] So be mindful about that. There is something that I usually do, but I don't tell anyone until now is that you can hack or delete the one line that actually removes the, the file system operation. And then you files will be still there the next time that you're migrating again, but you know, in these sessions, we are giving good advice and sometimes not so wise advice.
[00:20:40] So
[00:20:42] Lucas Hedding: That's not actually a bad advice. No, I, I hacked for a lot on my local machine.
[00:20:47] Janez Urevc: Thank you for sharing your secret
[00:20:51] Mauricio Dinarte: and another one more bad advice that I can provide is that If for whatever reason, you're just testing locally, like in the destination, like production, real migration, what we just said before about mounting the files or arching the files, that is the best approach.
[00:21:06] But if you are just working locally and you want to speed up the process, another trick that I do is that. In the file migration, I do not use the file copy plugin. I just literally map the URI from the source to the destination. And then I use a staged file proxy to only fetch files as needed. Again, like you need to be mindful about these things because in the final migration, you need to do it properly.
[00:21:31] But again, like when. When you just like want to speed things up because, you know, 10 gigabytes of downloads is going to slow you down in general. That's another thing that I do, but I usually don't talk about it.
[00:21:46] Janez Urevc: And that's why we have this team talks to share little dirty tricks that we use.
[00:21:55] Are there any special considerations with private files when migrating private files or it's just business as usual?
[00:22:03] Lucas Hedding: Uh, No, I mean, they're private for a reason, so you're not going to be able to get access to them, but you still want to move them over. Right. Right. Um, yeah. So, just like Mauricio said with the, not actually doing the file copy.
[00:22:20] You'd want to do that here. And you'd want to move those files using Rsync. Either that or you create a secure dev environment with a firewall or something and take off the private for a few minutes while you copy things over. Like there are ways around it. I have heard of people doing it, but give yourself a break.
[00:22:41] Rsync the files over, otherwise it's death by a thousand paper cuts. And there's really no good way to know that they actually got over there. I mean, HTTP is a terrible way to sync things. If you try to do a request using HTTP, which is what happens when we're doing the file copy, um, one out of a thousand requests will fail.
[00:23:03] But Rsync has built into it, into its protocols, a lot of protections to make sure that the things on left and right, destination and source are exactly the same. And after you've run it once, if you run it a second time, it's just going to pick up the differences. So don't, don't try to do it other ways.
[00:23:27] I don't, I don't know of any large file migration that someone hasn't used Rsync. Um, there's even like, if you're using Pantheon, Pantheon has an Rsync terminus plugin to make this really easy. Um, yeah, make yourself. Make your life a lot easier and, and do it the right way.
[00:23:47] Janez Urevc: Um, I have another anecdote related to Rsync, but I will, I will save that to the end, to the anecdote section. Um, let's, let's talk about remote media and just to define what we mean by remote media here. Uh, we're talking about things that are on the Internet and we want to use in our websites, but are in the context of our website are not.
[00:24:14] Files, like things like YouTube videos, um, Flickr images, Instagram photos, um, TED talks, uh, Vimeo videos, and general OEmbeds. You could have like SlideShare slides or something like that. Um, again, in Drupal 7, media store these as managed files. Um, in Drupal 10, we don't do that anymore. Then store them as media entities using Uh, the correct source plugins.
[00:24:49] Um, so I guess it's not very different, but, but how is migration of this kind of media different than migrating local to the site files?
[00:25:01] Lucas Hedding: In some ways it's identical. It really is. However, because it's a new type of thing, a lot of times you get more requests on. Can we filter out certain ones of them? Can we divide them? Can we chunk them up? Or, hey, we've got various reasons, these iframes directly embedded in CKEditor, and our, and so we need to extract them, and use this new media thing, new media entity, and, and create a media entity for it, and then replace that with, A media embed right in the CKEditor.
[00:25:48] And so those are the types of things that come up with remote media that wouldn't come up with local files and using all the same modules that we've already talked about, the media migration module has a really great tool for converting, uh, some of those things in the CKEditor, uh, body field. However, Migrate Plus is, is Mauricio hit on has some DOM manipulator utilities.
[00:26:22] And when you're dealing with iframes, you can then do, uh, queries, direct, uh, DOM queries using, uh, like real X query style requests. You're not even doing regex then you're doing, uh, manipulation. So you find the iframes that have the YouTube videos, and then you can create the media entity inserted in the destination site.
[00:26:52] Go create the embed code and drop it and replace that in the body field as you're migrating over. And that's the power of, of using all of these different things. It's not the, you know, like that's gonna be custom, you're gonna have to talk, talk through what what's actually needed, what's actually wanted, but it's totally possible.
[00:27:15] And it's actually not all that complicated to do and makes the customer so much more happy when everything's converted over to media.
[00:27:26] Mauricio Dinarte: Um, one small tip on performance when dealing with, uh, like remote, uh, media by default, um, Drupal is going to try to generate a thumbnail for every media that gets created on entity safe.
[00:27:42] Again, like the metadata API is just interacting with other. Um, APIs of Drupal itself, and in this case, the Entity API. So, um, if you have a large migration, whether it uses remote media or not, you can disable the generation of [00:28:00] thumbnails and, like, defer that to a later stage that can happen on Cron. Point being, It that that generation doesn't happen during the migration and is specifically important for remote medias because you will be being in an external service to get the thumbnail itself.
[00:28:18] Like, if you are getting videos from YouTube, you're not only getting the reference to the, to the YouTube videos, like the link itself, you will also. Ping their servers to, to get the thumbnail. So that is something that you can disable. It will not affect the migration. You just like, uh, defer that to a later stage, which can happen on Cron.
[00:28:39] Lucas Hedding: That brings up, uh, uh, remembrance with the local files too. So local files, if you say, ah, we don't need to pull over the file sizes. Or, um, well, see, this was the issue. This is why I remember it. We were pulling over the files, the file sizes, just because that's what we dutifully did. Except, uh, Drupal, Drupal's entity API for files was ignoring the fact that that, that the file sizes were already provided to it during the migration.
[00:29:12] And it was going out and rechecking the size of the file, which is not. So bad if the files are local, but if they are on s3 on a slow remote s3 instance, it can be really painful after moving over a large file to then wait another two or three seconds for it to do the calculation of sizes. So now that's fixed.
[00:29:40] But you still have to make sure that you pass the file sizes over. Otherwise, it's still going to, uh, try to figure out what the size of the file is and store that in the database.
[00:29:50] Janez Urevc: All those small details.
[00:29:55] Lucas Hedding: Yes. All of the details.
[00:29:57] Janez Urevc: As you said, dead by a thousand paper [00:30:00] cuts, painful and slow. Um, do you want to briefly talk what is currently provided in terms of remote media? What is provided by core and in which situations we might need to rely on contrib for?
[00:30:20] Mauricio Dinarte: Um, YouTube and video are supported out of the box. Um, in general. Drupal supports OEmbed, uh, but if you want something other than YouTube and video, that provider gives you the option to fetch from OEmbed, uh, you can enable more. And there are other models, uh, one is called VideoEmbedfield that will give you like a very long list of more providers that you can pick from.
[00:30:51] Janez Urevc: Yes, exactly. And, and, and, and if you are not Working with videos. If we are working with something else, there is usually a module that provides the integration for media entity. Um, even for things like I, I know, I, I'm not sure if the model still exists, but, you know, I remember seeing a module for like Google drive documents, then you could document to Google drive and use that.
[00:31:20] Um, what about the inline? Or embedded media in CKEditor. We, we touched on that in the past, um, but how, how that affects migration. Um, as far as I know, like the, the embed tag is definitely different. Um, so we need to handle that. Uh, what, what else might we consider? While doing that?
[00:31:51] Mauricio Dinarte: The, the media migration module, um, actually supports two type of embeds.
[00:31:57] Uh, and there is a configuration that you] can, uh, you can toggle, but again, like media migration is trying to automatically convert all of those that's needed. Um, if you're not. Doing like a 1 to 1 migration, uh, you can still use the module without using the resource plugins for generating the configuration.
[00:32:20] And in particular, that is a media we should filter that it's a process plugin that is going to, um. It's like, I can do it's magic to detect when there is a media embed and look for the corresponding media entity and make the proper embed code into the, into the body field, into the WYSIWYG. Um, another option, and this is where the story about the WordPress migrations comes along.
[00:32:50] Um, for this project, we were, it was a Drupal project already. It was a Drupal 7 to Drupal 10 upgrade at that point. Um, but. They also wanted to consolidate other properties that they have, and some of those were in WordPress. So. We were migrating WordPress post and their associated images and files. Uh, what we did was install, uh, plugging the WordPress sites to be, to be able to export the configuration.
[00:33:19] Similar to the token system that we have in Drupal, WordPress uses something that is, that are called shortcuts, uh, you know, like token, like strings. And that is how it is storing the database. So the, the plug-in allow us to run a. A function in WordPress, they're called filters and it's called do shortcut, basically transform the tokens into HTML.
[00:33:43] And from there, we generated CSV files that we migrated into in Drupal. And in this case, we were actually using, uh, the media Uh, media migrate handler module, uh, because it, it provided very good reference process plugins. So again, I, I really recommend also looking at that one, uh, especially when you are migrating from other things outside of Drupal.
[00:34:11] Janez Urevc: Great um,
[00:34:17] Lucas, maybe we can briefly talk about the architecture of embedding in D10, because it's different than D7, because in D7, MediaModule did everything. What do we have in Drupal 10? I know this is not strictly migration related, but it's still useful knowledge for the context.
[00:34:39] Lucas Hedding: There's kind of two things. There's media embedding that has been around with media module in Drupal 7 that that's still a thing in Drupal. And that's kind of in core even, but then there's entity embed and, and that was not part of media in Drupal 7. And you could still do it. You still want to install that module now, the embed module is what it's called in Drupal 10 to be able to do this entity embed functionality. The module just within the last two or three months got Drupal 10 support. Uh, so you're, you're good to go there, uh, to go straight to Drupal 10.
[00:35:40] Uh, the code, the short code, or like the, the token that is embedded is a slightly different. Uh, for, for entities versus media, uh, there's a lot of similarities. Um, you just got to get the right, right short code to, to get inserted into your, your CKEditor for, to then go out and find the right thing to load and embed with the right view mode and all the other options that would get passed along to the thing that you're embedding. Okay.
[00:36:14] Janez Urevc: Yeah, this basically allows us to not like, you obviously can embed media, but then since it can embed any entity, you can embed views, for example. So if you want your view embedded in WYSIWYG for whatever reason, um, there is a sub module for embed that can do that. Um, and I believe that it even provides like a nice button for CKEditor for you to be able to.
[00:36:40] Lucas Hedding: It does,
[00:36:41] yeah. Uh, entity embed is where I've mostly used it, and that's, that's really useful for migration. So the interesting views, views don't work so well unless you're calling the view between the old site and the new site the same exact name, and nothing has changed in that view, which is a big if.
[00:37:00] But entities, entities, you can find that, you know, it was called node 1, 2, 3 on the old site, and now it's called maybe node 2, 3, 4 on the new site. And let's just connect the dots here and replace it with the right embed code.
[00:37:16] Janez Urevc: Great. And then if we are using cloud storage providers like S3. Um, are there any special considerations in that case, or how would that usually work?
[00:37:36] Lucas Hedding: Well, you have to, it depends on the provider. If you're dealing with S3 itself from Amazon, then you want to Minimize data transfers. If you're on a budget, if you're not on a strict budget, then maybe you don't care. Uh, but you can clone buckets very easily. So you can have bucket one and bucket two.
[00:38:03] And so your migration can be as simple as cloning the bucket. Um, I I guess I would recommend cloning at least during initial development because those nasty developers tend to try to create and delete things while they're doing things and what happens if they run a revert of the file migration and now all of a sudden the S3 bucket is empty and where did all of the files go on the live site?
[00:38:41] Oops, don't want that to happen. So clone it and you do yourself a favor.
[00:38:45] Janez Urevc: That never happened before, right?
[00:38:47] Lucas Hedding: That has not thankfully happened to me. Not exactly that. Although close enough that I've had scares. So clone it. Um, so that's if you're dealing with S3. Now, if you're dealing with another provider, Backblaze or, uh, even Cloudflare now has a new object storage.
[00:39:09] And, uh, what is the other one? Um, Digital Ocean has one. I'm a lot of folks who are introducing these things and they're fairly economical, uh, use them, totally use them. In fact, you might even want to go from local storage to S3, but if you're doing that, uh, I recommend getting the files local, probably use Rsync to sync them local to your local disk, and then use S3 command to push them up to Digital Ocean.
[00:39:37] If that's where you, you've. Landed or S3 itself, right? And then use S3 command to sync them up to the cloud. It's going to be so much faster S3 command. Uh, doesn't there's no API to do synchronous HTTP, uh, directly to S3 because S3 itself doesn't quite support it, but, but it's kind of multi threaded. So it'll do up to like eight file uploads all at the same time with S3 command. So you're, you're, you're going to be a lot, a lot faster if you're dealing with 100 gigabytes of data or 200 gigabytes of data, uh, and your pocketbook will be a whole lot better. Uh, the next month, once you've rolled over to S3, cause, uh, your S3 compatible storage is going to be way more, um, cheap than local storage.
[00:40:38] Now, I say that and there's always exceptions, but every time I've ever entered that conversation, it's way cheaper to do S3 than local storage.
[00:40:50] Janez Urevc: Yeah.
[00:40:50] That's my experience as well.The fact is like, I've been involved with a project where they decided to go with S3. Uh, to save on hosting costs. Yes. The number of files grew and their hosting bills started to grow because of the, the size of file storage there.
[00:41:12] And then they switched to S3 for that reason.
[00:41:15] Lucas Hedding: That brings to mind, um, another two things about S3. One, uh, in Drupal 7, there was a, a module called with the word migrate in it for S3. It was like a separate module, uh, called, what is it called? S3FS migrate. It's not a migrate in the module in the general sense.
[00:41:38] It just was a bad, bad, overly used name to move the files from from local to S3. That's now part of S3FS itself. And it's still even in S3FS, not really. Migrate in the sense that Drupal migrate, but it still lets you move things from local to S3. That's one thing. The other thing, uh, around S3 is there's an S3FS cores module.
[00:42:12] This is just a free suggestion, but, uh, if you're dealing with large files over maybe 100 megabytes or so. And you're going to be pushing them up to S3, the S3FS cores module, uh, again, not related to migrate, but you know, free tip will let you, uh, rather than uploading directly to Drupal and then Drupal turning around and forwarding it over to S3.
[00:42:40] So now we've got timeouts for PHP. Uh, of 60 seconds or 30 seconds and max post size and all these wonderful things. Maybe your host on Pantheon and it blocks a file that's over a hundred, you know, all of these things, S3FS cores will let you post [00:43:00] directly to S3 bucket. And so then all of those, I think S3, um, providers have limits on how long it'll take to upload the file, but they're like.
[00:43:10] An hour and two, two terabytes of, for the site, you know, like orders of magnitude, massively bigger. So use, use that module, um, not for migrate, but just for using S3.
[00:43:26] Janez Urevc: Great. Thank you. Um, and let's, let's quickly discuss, uh, migrating media from remote sources. And here we're, we're not talking about like things like YouTube videos and, you know, embeddable things, but actually file assets that we get from remote sources. Usually to HTTP requests, um, and Migrate API will generally copy those over.
[00:43:56] Um, but as we discussed, this can cause migrations to run very slowly. And I guess, are there any other problems that we might have experienced? In what cases would we resort to doing that?
[00:44:13] What's your experience there?
[00:44:14] Lucas Hedding: Well, anytime you're dealing with files. You can run into issues, but if it's a Drupal site, the Drupal site tends to have certain requirements that we're more familiar with, we're used to the fact that it's running a certain version of PHP and Apache or Nginx. And, and those things are usually more up to date, but if you're dealing with
[00:44:47] an external website, I've dealt with some really odd things where two, two days later, we finally figured out it was an SSL cert issue. And it was only when we ran the migration on Acquia. Which had one version of Open SSL and the remote site that we're pulling from had another version of Open SSL that the certs were not working the way that we are expected and the files were randomly not getting pulled over.
[00:45:26] They were kind of a bit secure, so there was even sort of a, uh, IP address level security where we would only allow migrations from. Acquia, it wasn't private, but it was the equivalent of private on that, that, that system. And, Oh, you know, you just, the, the things that you can dig up when you're dealing with pulling files from a remote location, if it can happen, it will happen.
[00:45:59] [00:46:00] And you'll have the joy of figuring it out. So just to R sync the stuff, I mean, that solves nine out of 10 problems, 99 out of 100 problems. To Rsync the files.
[00:46:10] Mauricio Dinarte: When we talk about files and remote files, it's not only like, you know, documents or images. It's like remote sources. So something that I have seen is that, you know, we are embedding these slideshow document from this third party service. And that was an iframe before. Um, like again, like if your provider uses OEmbed, you can replace the iframe by, you know, copying the OEmbed link, processing that, parsing that, and just use that as a media entity, it will be more secure and, um, easier to manage in the long run.
[00:46:48] Um, just like an iframe in, in your bridge text field is generally a bad idea. Uh, something else that. Again, I guess at this point, I'm going to take the, uh, just like multiple interpretations of remote files because Drupal itself can have, you know, remote files, uh, in, and specifically I'm talking about files that are not in the file to manage table, I have some projects that they use.
[00:47:18] Uh, ICME or some other, like even FTP direct uploads to, to the server. And it is a Drupal site. Uh, if you go to the domain of the Drupal site, they will be served because it's part of the, you know, they are in the, in the same server, but they are not known to Drupal, they just have an interface, uh, to. Like link to them, but if you do a fast migration, they will never be detected.
[00:47:41] So that's another one that's kind of freaky, like having to like search in like basically the, the file system for things that are not managed by Drupal itself. And again, like speaking about communication with, between one environment and another, something that I have seen.
[00:48:00] Um, when working locally, um, uh, specifically with DDEV, and this is not exclusive to DDEV, but it's just like the tool that I use the most often, um, if you are trying to migrate files between one instance of DDEV and the other one, by default, uh, you will have to take some extra steps to set up the local SSL certificates.
[00:48:23] If you don't do that, uh, you can, you know, go around it by like providing, you know, uh, HTTP instead of HTTPS. Uh, and if it's a local environment, it might not be a big deal, but again, like, um, those things that happen during development and depending on what type of project you're working on, you are going to be dealing with this kind of thing.
[00:48:44] So be mindful of certificates, be mindful of locations, be mindful of files that might have been in the server, but not managed by Drupal or just like remote resources that. Where iframes before and you might convert those to be the entities.
[00:49:00] Janez Urevc: Great. Thank you. And for the end, I, uh, thought that we could share any interesting experiences from the past that we've seen during migrations. Uh, Lucas, do you have any? I mean, I'm sure you do.
[00:49:17] Lucas Hedding: Well, most of them aren't around files because you can avoid a lot of the issues by Rsyncing. You Can't say enough for for that.
[00:49:27] Um, but if you do do a file migration, there is Still after like probably eight years, uh, an open issue in Drupal. org for a memory leak. And with, during a migration, it's just generally about memory leak and migrate, uh, during a migration, but, um, about every six months, someone will find that issue and post on and say, Hey, I'm having this too.
[00:49:59] And. And here's the workaround I did, and I'll talk about some workarounds, but it's always to do with files. It's always to do with files that they have this, this memory leak where they, they're running their Drush command. And now it's like up to three terabytes or two terab, two, two gigabytes of memory or something, you know, like massive quantities of memory.
[00:50:22] And then the migration dies and it's usually during the file migration. So what are some workarounds there? Um, there's a batch mode, uh, that you can pass to your source plugins. it's all of these migrations, um, for the most part have, well, they all have a source plugin. There it's built into the, there's a couple other sessions.
[00:50:45] We have Janez isn't there around, um, ETL and Extract, Transform, Load, right? Yeah. So I'll, I'll let you go to that, that session to hear more about it. But the source plugin is, uh, for Drupal. Nine times out of 10, a SQL query and SQL, you can pass a, a limit on the SQL query.
[00:51:08] And so for the file migration, do that. That'll help you. Um, the next thing is, uh, rather than if you're using a Drush migration, Drush migrate import, and then here's the name of the thing I want to import, do it on a per migration name, use like tags or a group or anything and say, migrate all the things all at once.
[00:51:35] Uh, again, there's a, an issue with memory management, either in Drush or Drupal Core or somewhere. Uh, where after a few hundred of these things, it runs out of memory. Um, if you break them up into migrate the basic page migration, migrate the files migration, migrate the users migration, and individually [00:52:00] migrate each individual thing, you're going to be a whole lot happier with memory management.
[00:52:05] Um, and then one more last thing, uh, you spent all this time and you want to get a pristine report. There was 100 files on the old site and only 99 got moved over. According to the report from Drush Migrate or whatever, you're always going to have missing files, because these were files that were stored on the hard drive.
[00:52:31] I don't think I've run into a migration yet where one or two or dozens of files got removed at some random point in the past. So don't be shocked if you're going to see 404s, uh, or missing files. That's just a matter of garbage in, garbage out, and someone Deleted a file because it was too big, or maybe the file got corrupted.
[00:52:56] Uh, you had an incident or some, for some reason, there's always missing files. Always.
[00:53:03] Mauricio Dinarte: Well, I have shared some of the stories already, but the one that Lucas just, um, talk about reminded me of one case in which, um, like out of memory errors, um, laid together with another configuration option that you can have for your source plugin, which is the high watermark. So in this context, Um, the high watermarks allows you to basically that translates to, uh, aware statements in, in the, in the query that says, I only want to migrate anything above this value.
[00:53:41] The problem in this case was that, um, the high watermark as of today, and there is an issue for this, um, is set up very early in the process. Uh, when, when you're processing the row. And because of the out of memory error, um, you know, the, the, the high watermark has already been saved in the database as if the row were, was, uh, processed correctly, but it failed because of the memory error.
[00:54:10] And then the next time that you run, uh, it, it, it just like a skips one value. And the problem is that it is not only a skips one value. Another thing that happens is for one, uh, it's kind of not intuitive to debug what's going on, but also you end up, uh, finding out that at the end of that migration, you're missing one file that isn't, that hasn't been processed.
[00:54:33] And as it is a good idea. To have, um, migration dependencies in place, then any other migration that depends on files is going to be blocked because you are one file short of in your file migration. So you need to like, in this case, like it was some thing debugging to figure out what was going on, what was the cost and then reset the high watermark value to be able to migrate that missing file.
[00:55:00] But like, it's like many little things. Working together, and as I said before, sometimes the result of one migration can affect another one, and this was one of those scenarios like, uh, high watermark combined with the memory leak made, uh, made this file migration miss one record and then, you know, nodes and everything else that dependently, um, you know, didn't meet the requirements because of that file that was missing.
[00:55:31] Janez Urevc: Interesting.
[00:55:33] All the dirty details. Always hard to debug. Um, we almost like we mentioned that you should Rsync files so many times during this episode. It's almost like the main message that that we seem to want to have to deliver and, um, My anecdote from the past is related to that, uh, to, to Rsyncing the files.
[00:55:58] Um, and it's, um, it was not a migration. It's from the, the times when I worked at examiner. com, which was the largest Drupal website on the internet at the time. Um, it was a D7 site. Uh, but this was after a few years after the migration to Drupal 7 happened. Um, but we were switching data centers, uh, because we, we, we had our own hardware, but we switched the data center provider.
[00:56:26] And, um, the infrastructure team decided that because we had so huge files library on that side, they decided that they don't want to Rsync it from one data center to another. And they opted in. To send hard drives over mail to the other data center, because they, apparently they thought it would be faster and easier.
[00:56:53] So even Rsync doesn't solve every problem. I found it hilarious like that. It's it's, it was actually easier to do it over snail mail with hard drives. Then
[00:57:06] Lucas Hedding: You probably still use Rsync though. You probably used Rsync to catch up.
[00:57:10] Janez Urevc: Yeah. Just, just to catch up for sure. But, uh, yeah. Rsync is there.
[00:57:17] Okay, this brings us to the end of this episode. We have some great talks coming up still.
[00:57:26] Our goal is to put one out per week over the next few months to support the community in the migration process. Um, performance, we touched on performance today and it's something that we care deeply about at Tag1.
[00:57:42] Um, and as we've seen, it applies to migrations as well. Uh, when you're handling large data sets, uh, migration can take hours or even days. Um, and we'll do talks, specifically about performance of migrations. Um, every project owner wants their migration to be a success. So we will dedicate an episode to discuss the most important factors for successful Drupal 7 to Drupal 10 migration, in order to help you successfully navigate your migration project.
[00:58:15] And other talks that we are planning to do include topics like porting custom code from Drupal 7 to 10, the future of the Migrate tooling, how to port the team and so much more. So we hope that you'll tune in and enjoy our upcoming team talks.
[00:58:33] At this point. I would also like to mention, uh, the upcoming upgrading from Drupal seven to Drupal 10 series of blog posts by, Mauricio, which was inspired by the _31 Days of Migration Series _that Mauricio did in the past.
[00:58:50] So Mauricio, can you tell us a bit more about the new upcoming series?
[00:58:55] Mauricio Dinarte: Sure. Um, this will be coming up, uh, coming out early in 2024, and it can be summarized as an opinionated guide, uh, for migrating a Drupal site from Drupal 7 to Drupal 10. It contains a lot of the lessons that I have learned over the years. And similar to the original series, 31 Days of Migrations, it is going to be packed with examples.
[00:59:18] It is actually like a real project that we will be migrating together. Um, Both content and configuration will be migrated but more important than the technical part is also like giving advice, like before writing the first migration, before executing the first command, we're going to discuss things like understanding, you know, the tool that you're going to use, the Migrate API, because it is probably the most popular one, but by no means the only one.
[00:59:45] So we're going to. Uh, explain how it works, where are some of the assumptions, where are some of the limitations, we're going to give recommendations about auditing your Drupal 7 site and making considerations when moving to Drupal 10. So, um, I'm looking forward to sharing it with the community.
[01:00:03] Janez Urevc: We're all looking forward to it.
[01:00:04] So, uh, yeah, stay tuned. It will be published on Tag1's website. all the links that we mentioned today, will be posted online with the talk. Uh, if you like this talk, please remember to upvote subscribe and share it check our past talks at tag1.com/ttt That's three t's for Tag1 Team Talks as always We'd love your feedback and topic suggestions.
[01:00:31] Write us at ttt@tag1.com Big thanks to our two guests today and to everyone who tuned in. Thank you for joining us Thank you.