In this tutorial, we will introduce how to parse a html page using python beautiful soup package.
1. Import library
from bs4 import BeautifulSoup
2. Create a soup object to parse html
html_content = '<html><div><span>Cocyer</span> https://www.cocyer.com</div></html>' soup = BeautifulSoup(html_content, "html.parser")
3. Get all div tags
divs = soup.find("div")
You also can get all paragraphs and links by soup.find() function.
tags = soup.find("p") tags = soup.find("a")
4. Get all div tags with specific class
divs = soup.find("div", { "class" : "full_name" })
This code will extract all divs with a class ‘full_name‘.
Of course, you also can get other html tags with class attributes.
5. Extract text in div
for d in divs: print(d.text)
.text will return text in one html tag.