Extract All Links From Web Page Using Python Beautiful Soup

In this tutorial, we will introduce the way to extract all urls from a web page using python BeautifulSoup.

1. Import some python libraries

from BeautifulSoup import BeautifulSoup
import urllib2

2. Get the text content of a web page by its url

We can use urllib2 to get the text content of a web page.

html_page = urllib2.urlopen("http://test.com")

3. Use python BeautifulSoup to extract all links in web page

soup = BeautifulSoup(html_page)
for link in soup.findAll('a'):
    print link.get('href')

Then you will get all links in this web page.

Cocyer