Parse HTML Using Python Beautiful Soup

In this tutorial, we will introduce how to parse a html page using python beautiful soup package.

1. Import library

from bs4 import BeautifulSoup

2. Create a soup object to parse html

html_content = '<html><div><span>Cocyer</span> https://www.cocyer.com</div></html>'
soup = BeautifulSoup(html_content, "html.parser")

3. Get all div tags

divs = soup.find("div")

You also can get all paragraphs and links by soup.find() function.

tags = soup.find("p")
tags = soup.find("a")

4. Get all div tags with specific class

divs = soup.find("div", { "class" : "full_name" })

This code will extract all divs with a class ‘full_name‘.

Of course, you also can get other html tags with class attributes.

5. Extract text in div

for d in divs:
    print(d.text)

.text will return text in one html tag.