How to approach your website content migration

Posted by Mitch Gerber on September 18, 2017

When moving to a new website platform, one of the biggest hurdles can be figuring out how to migrate all your content to the new site. In many instances, automating your content migration can save you countless hours of having a roomful of people rekey seemingly endless pages of content. But in some cases, automation may actually be more costly and time-consuming than doing the process manually.

At BlueModus, we regularly have customers who are moving onto a new platform such as Kentico, and in cases where they have a significant amount of content to carry forward, migrating it is a top concern.

When you're ready to navigate through your own content migration, here are the questions and factors you'll want to consider:

1. Do you have enough content to justify automation?

If you have 50 or fewer pages of content, it’s probably more time-saving and cost-effective to copy-paste and key in.  On the other hand, the more pages you have over 50, the more likely it is that you should consider investing in an automated solution. If you have hundreds of pages (or more!), the answer is obvious: automate.

2. Does your new platform provide a way to automatically import content?

If your solution provider tells you the answer is “no,” verify that this is truly the case. Importing content is not necessarily straightforward or easy to do, so take the time to get the complete story as to why a vendor is steering you away from automation. If you learn that rekeying is the only path, it might make sense to consider a different platform.

3. What are your timeline and your budget?

Automation will consume time and money up front, and quite often it’s more than you think, particularly if custom code needs to be written. Be sure to account for how long it will take to test, troubleshoot and fix any bugs. Are migration tools already built into your new platform of choice? If so, the likelihood of a timely, less costly migration should increase.

4. How accessible is your content?

If you decide to take the automation route, the next big question is: how accessible is your existing content? Will your vendor be able to access it using commonly available methods and tools, or will it require additional effort?

Let’s review some common methods of content retrieval:

  • Scraping existing website HTML
    This method uses HTTP requests to retrieve existing web pages and scrape off the returned HTML content.
  • Copying HTML files from a web server
    In this approach, HTML files would typically be duplicated from a web server and passed along to the vendor for processing.

  • Database export, either as HTML or raw data
    Exporting content from a database is an option that requires your vendor to have access permission. However, if database access is restricted, scraping might be the only option.

  • Retrieval via an API request
    In this method, API calls can be written to retrieve content from the web server.

You've Retrieved the Existing Content. What's Next?

Now that you’ve successfully accessed the content, you’ll need to determine where to place it in your new platform’s content tree. Doing this may require using a table that maps the existing URL/location of each piece of content to the correct URL/location within the new site.

In most instances, a new platform also means a revamped site design, and the existing HTML markup may be ill-suited for your design revamp. If this is the case, you should consider programmatically cleaning the HTML to whatever extent possible. This clean-up might entail removing HTML elements, attributes and inline styles. While you can likely automate much of the cleaning, be sure to account for the fact that there will inevitably be some manual clean-up work as well.

Cleaned up and Ready for Import!

Once the content is clean, it’s finally time to import it into the new platform. How you approach this will depend on the platform itself. If, for example, the platform is Kentico, a developer will likely write a customized program that uses the Kentico API to import the content into the platform's database.
Now that your content has been imported…be patient, don’t flip the switch just yet! A successful import does not guarantee your content is ready for public consumption. Plan for a post-import QA and manual editing effort, including a page-by-page inspection, to catch and fix any problems before launch. While still an effort, the good news is that you avoided days of copy-paste mode or working those fingers rekeying content.

Don't Rush the Decision to Automate

As you can see, it's important to carefully consider the pros and cons of an automated content migration before jumping in. While your impulse may be to import, we suggest keeping this urge at bay until you’ve considered everything involved. It will pay off in the long run to take the time to weigh all factors involved before making a final decision.
Our team at BlueModus has been down the content import road with many clients, each with its own unique migration needs. If you're looking for expert advice on how we’d approach your own content migration situation, please drop us a line today.

Mitch Gerber
Mitch Gerber is a Senior Software Developer at BlueModus. He designs and builds public and internal facing websites for BlueModus’ clients. Using the Kentico CMS/EMS Platform, he provides customized solutions using the ASP.NET platform with C#. Mitch has developed software for a host of industries over the past two decades, including financial services regulatory compliance, accounting, legal, real estate and medical devices.