Roles
- Set up CentOS 9 server on virtual machine via SSH
- Installed WordPress, MariaDB through SSH
- Updated PHP and plugins
- Set up automatic, rolling backups
- debugged deprecated and broken code lines
- Used Python to scrape pubmed.org for RSS feed urls for 6200 rare diseases
- Created custom database table to store > 230,000 RSS feed items
- Created WP-CRON job to periodically fetch new disease items from RSS urls and to clean up older items so that each disease had a maximum of 100 latest publication articles
Summary
checkorphan.org is a nonprofit organization that lists rare disease information for families in need. The website suffered major data loss in the past and had no current developer team.
Once I was on-boarded, I was tasked with updating the site’s plugins and setting up automatic rolling backups. What seemed like two easy tasks ended up being quite labor intensive as the web server’s PHP had not yet been updated to PHP 8, meaning many plugins could not be updated.
The website uses a self-maintained virtual server, however with the lack of a developer on-staff, no one had access to the server’s password. Within the same machine, I was then tasked with creating a new server instance using CentOS 9 (stream), installing Apache, PHP 8.1, WordPress and MariaDB through SSH, then migrating the fairly large website to the new address, using the Better Search Replace plugin to finalize changes.
Once the website was up and running on updated PHP, I had to then localize and debug several deprecated lines of code, most of which were array errors that were easily fixed by checking if the values existed before referencing with isset().
Now I could finally update the plugins and set up automatic, rolling database backups using Updraft. Once that part was finished, we could then focus on re-generating some of the data they had previously lost. One of the most important aspects of the website is a widget that shows the latest 5 research publications on any given disease. A “more” button also sends the user to a page with the latest 100 publications. Finally, a Research page tab takes the user to a listing of publications by disease as well as allowing for users to search for publications.
Using a custom post type for publications was not a good solution for performance purposes as there ended up being over 220,000 publications, with many more potentially coming in. The publications were set up to be automatically generated and deleted to ensure only the latest 100 were saved, and searching by the meta was slowing down pages. The solution I came up with was to set up a WP-CRON job that would fetch (with sanitization and validation of data) 50 disease urls per minute (with a maximum of 5000 publication items being accessed at a time). These publication items are then saved in a custom database, with the program checking to see if there are now over 100 publications for the diseases and deleting the oldest one if that is the case. At this point, we are now able to make custom database calls using wpdb to display disease publications where necessary.