04-29-2023, 06:13 PM
(04-23-2023, 10:37 PM)user328 Wrote: [ -> ]As you may recall, I worked for a newspaper for about 9 years. While I was there, one of the system I maintained was a vendor feeds system that pulled in syndicated content from Associated Press and other sources.
I want to make something similar that loads RSS feeds from various sites and displays the headlines in a scroller.
The newspaper's vendor feed system was a hodge podge of PHP and shell scripts written by numerous programmers over the years. When I took over maintenance duties, some of the scripts were so cryptic and convoluted that I literally had to sit down with pen and paper and reverse engineer them every time I needed to make a simple change.
Over time, I rewrote a lot of the scripts to be easier to read and modify. Changes that took days to make before could now be done in an hour or two -- modified, tested, and pushed to production. Most notably, I created the XML parser base class mentioned elsewhere that could be extended to parse just about any new feed the paper subscribed to.
It always seemed to me that something more could be done to simplify and streamline the system though. Although I didn't know exactly how I'd go about it at the time, I had the idea that the system could be refactored as a pipeline of standardized modules; i.e., a handful of modules from a relatively small set could be chained together to import each feed. None of this business of writing a whole new script for every single feed.
That's what I wanted to do, but I never found the time.
This week, I've made considerable progress towards building such a system. The new system is divided into a few distinct stages. Filters can be plugged in between stages to make minor changes to the data, so that the standard modules can process it without needing to be modified themselves.
1. Pre-download filtering
2. Download the feed with a standard FTP or HTTP module
3. Pre-parse filtering
4. Parse the feed with a standard RSS, Atom, or JSON module
5. Pre-import filtering
6. Import the feed articles into the database and download media attachments with a standard FeedImport module
7. Post-import filtering
I'll go over some use cases for the several filtering stages later.