Stanford University’s School of Engineering contacted Drupal Connect to migrate their legacy site into Drupal. Stanford is one of the most well-respected private universities in the country.
HTML Migration: Stanford School of Engineering’s legacy site was comprised of over 1,300 static HTML pages that had to be converted to Drupal nodes. The markup of these pages was not always consistent, and some pages differed from the norm considerably in terms of look and feel, making automated content parsing more difficult. Migrated pages needed to retain all embedded images and links, as well as relevant copy.
MSSQL Migration: The legacy site also contained data supplied by a MSSQL database, including 1,000+ faculty profiles, 5,000+ publications, and hundreds of labs and organizations, each with many specific data fields. These also had to be migrated to Drupal with no data loss and a clean, reusable implementation.
Performance: The website is highly trafficked and as such, maintaining performance and reliability was a high priority.
Automated HTML retrieval, parsing, cleanup, and Drupal node creation: We built a tool that crawled each page of the legacy site, fetching and cleaning/sanitizing relevant content and creating Drupal page nodes. This tool was also responsible for bringing over images as Drupal files, correcting all local links to point to new page locations, and updating the site nav to point to the new Drupal paths. Using this tool, the process was completely portable, and we were able to run it repeatedly as the legacy site’s content was updated while development was in progress.
Automated MSSQL to MySQL to Drupal importing: We built a set of scripts which, given a MSSQL dump of the legacy site’s data, converted it to a MySQL dump, imported it into the Drupal database as custom tables, and mapped the relevant fields to CCK fields in custom content types to create nodes. This was also a repeatable tool, and was used and reused as MSSQL content was added or updated.
Pressflow and Varnish: To solve the performance issues, we built the site on top of Pressflow and put it behind Varnish, a reverse proxy and HTTP accelerator that is known to serve around 3,000 requests per second.
Maintain Stanford best practices: Stanford has a custom Drupal theme as well as a custom Drupal module which were integrated into their internal authentication/log-in system. We integrated both into the site so that it maintained a cohesive look and feel with other Stanford sites, and anyone with an internal Stanford log-in (i.e., all students, faculty, and staff) could log in to the Drupal site using that information.