In this tutorial, we will introduce the way to extract all urls from a web page using python BeautifulSoup.
1. Import some python libraries
from BeautifulSoup import BeautifulSoup import urllib2
2. Get the text content of a web page by its url
We can use urllib2 to get the text content of a web page.
html_page = urllib2.urlopen("http://test.com")
3. Use python BeautifulSoup to extract all links in web page
soup = BeautifulSoup(html_page) for link in soup.findAll('a'): print link.get('href')
Then you will get all links in this web page.