Scrape and Extract Text from HTML Using Python Beautiful Soup

In this tutorial, we will introduce the way to scrape html and extract some useful text from a web page using python beautiful soup.

1. Import libray

import requests
from bs4 import BeautifulSoup

2. Scape a url using python

We can use python requests package to scape a html and get its text content.

# Create a variable with the url
url = 'https://www.cocyer.com'

# Use requests to get the contents
r = requests.get(url)

# Get the text of the contents
html_content = r.text

3. Use beautiful soup to parse a html content

soup = BeautifulSoup(html_content, "html.parser")

4. Extract some information from soup

HTML title

# View the title tag of the soup object
soup.title

You will get:

<title>Cocyer.com</title>

If you only want to get the title in <title>, you can use this code.

soup.title.string

You will get: Cocyer.com

Get all paragraphs and content

px = soup.find_all('p')
for p in px:
    print(p.text)

You also can get all h1, h2, or other information by soup.find_all().

Extract all links in this html page

You can refer to this tutorial:

Extract All Links From Web Page Using Python Beautiful Soup