Python Automation Cookbook
上QQ阅读APP看书,第一时间看更新

Getting ready

We need to add the feedparser dependency to our requirements.txt file and reinstall it:

$ echo "feedparser==5.2.1" >> requirements.txt
$ pip install -r requirements.txt

Feed URLs can be found on almost all pages that deal with publications, including blogs, news, podcasts, and so on. Sometimes they are very easy to find, but sometimes they are a little bit hidden. Search by feed or RSS.

Most newspapers and news agencies has their RSS feeds divided by themes. We'll use as example to parse The New York Times main page feed, http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml. There are more feeds available in the main feed page: https://archive.nytimes.com/www.nytimes.com/services/xml/rss/index.html

Please note the feeds may be subjected to terms and conditions of use. In the New York Times case, they are described at the end of the main feed page.

Please note that this feed changes quite often, meaning that the linked entries will change from the examples in this book.