![Python Web Scraping Cookbook](https://wfqqreader-1252317822.image.myqcloud.com/cover/240/36700240/b_36700240.jpg)
上QQ阅读APP看书,第一时间看更新
Getting ready
We will read a file named unicode.html from our local web server, located at http://localhost:8080/unicode.html. This file is UTF-8 encoded and contains several sets of characters in different parts of the encoding space. For example, the page looks as follows in your browser:
![](https://epubservercos.yuewen.com/02C97C/19470398001588706/epubprivate/OEBPS/Images/89c7b066-5d99-4dff-a318-3d97e1d6be0a.png?sign=1738838931-6XCwSLVLrHcNvPpEzFNZuIWGBqKKB9yW-0-b70d66de6b90fdbdb49f917905ab31f4)
The Page in the Browser
Using an editor that supports UTF-8, we can see how the Cyrillic characters are rendered in the editor:
![](https://epubservercos.yuewen.com/02C97C/19470398001588706/epubprivate/OEBPS/Images/afdf6e7f-3bbb-4226-bc69-356d01a27d5a.png?sign=1738838931-wPKbpeco9ARQK7tfA8MiTn3ws0kuCPOm-0-cd78065df4dc951d0c17e16b6bf635bc)
The HTML in an Editor
Code for the sample is in 02/06_unicode.py.