
上QQ阅读APP看书,第一时间看更新
Extracting HTML table data from a web page
Though it is possible to treat HTML data as a specialized form of XML, R provides specific functions to extract data from HTML tables, as follows:
> url <- "WorldPopulation-wiki.htm"
> tables <- readHTMLTable(url)
> world.pop <- tables[[6]]
The readHTMLTable() function parses the web page and returns a list of all the tables that are found on the page. For tables that have an id attribute, the function uses the id attribute as the name of that list element.
We are interested in extracting the "10 most populous countries", which is the fifth table, so we use tables[[6]].