How to crawl a webPage by python (BeautifulSoup)
파이썬from bs4 import BeautifulSoup
import urllib.request
url = 'http://~~'
source_code = urllib.request.urlopen(url).read()
soup = BeautifulSoup(source_code, "html.parser")
for href in soup.find_all('a'):
print(href.get('href').getText())
it is a code that is collecting all links with tag name 'a' in a page.
before you start a basic crawling, you need to know few methods, like soup.find() , soup.find_all()
find() method needs at least one argument("tag name") to two for crawling.
soup.find() method return a soup object list, for example, find_all() method in
the sample code on the top in this page is return list of all of 'a' tags in the url page.
and you can get text wiht .getText() method.
Because soup.find() and soup.find_all() method return Object class, you can use this object as a iterable variable in
loop statement.
url = 'http://'
source_code = urllib.request.urlopen(url).read()
soup = BeautifulSoup(source_code, "html.parser")
soupObject = soup.find_all('div')
for div in soupObject:
print(div.getText())