소스 코드를 기록하는 남자

'파이썬'에 해당되는 글 1건

  1. How to crawl a webPage by python (BeautifulSoup)

How to crawl a webPage by python (BeautifulSoup)

파이썬
from bs4 import BeautifulSoup
import urllib.request

url = 'http://~~'
source_code = urllib.request.urlopen(url).read()
soup = BeautifulSoup(source_code, "html.parser")
for href in soup.find_all('a'):
	print(href.get('href').getText())

it is a code that is collecting all links with tag name 'a' in a page. 

 

before you start a basic crawling, you need to know  few methods, like soup.find() , soup.find_all()

 

find() method needs at least one argument("tag name") to two for crawling.

 

soup.find() method return a soup object list, for example, find_all() method in

 

the sample code on the top in this page is return list of all of 'a' tags  in the url page.

 

and you can get text wiht .getText() method.

 

Because soup.find() and soup.find_all() method return Object class, you can use this object as a iterable variable in

loop statement.

 

url = 'http://'
source_code = urllib.request.urlopen(url).read()
soup = BeautifulSoup(source_code, "html.parser")

soupObject = soup.find_all('div')
for div in soupObject:
	print(div.getText())