파이썬으로 네이버 실시간검색 순위 클롤링(crawling) 하기

프로그래밍/python

파이썬으로 네이버 실시간검색 순위 클롤링(crawling) 하기

싯타마 2020. 9. 2. 17:01

1. 우선 크롤링을 위하여 pip install bs4를 터미널 창에 입력하여 bs4패키지를 다운로드합니다.

2. 크롤링을 위한 코드 입력

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36'} 
url = 'https://datalab.naver.com/keyword/realtimeList.naver?where=main'
res = requests.get(url, headers = headers) 
soup = BeautifulSoup(res.content, 'html.parser') 
data = soup.select('span.item_title') 
i = 1
for item in data:
    print(str(i)+ "위 :" + item.get_text())
    i = i+1

구글링 해보니 현재 네이버 메인화면에서 네이버 실시간 검색이 ajax통신 방식으로 변경되어 beautifulsoup를 활용한 크롤링이 가져와지지 않는다고 한다. 방법은 있지만 beautifulsoup를 활용하기 위하여 네이버 데이터랩을 크롤링

그 과정에서 headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win 64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36'}를 입력하여 크롤링하려는 사용자임을 표시해주고

데이터랩 페이지에 접속 후 F12클릭 후 검색어 부분을 보면

화면과 같이 Span태그에 item_title로 표기한 걸 확인할 수 있다.

따라서 Beautiful Soup 문서 페이지에서 select 함수를 활용하여 data = soup.select('span.item_title')을 활용해서 크롤링

https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Beautiful Soup Documentation — Beautiful Soup 4.9.0 documentation

Non-pretty printing If you just want a string, with no fancy formatting, you can call unicode() or str() on a BeautifulSoup object, or a Tag within it: str(soup) # ' I linked to example.com ' unicode(soup.a) # u' I linked to example.com ' The str() functio

www.crummy.com

마지막으로 순위를 표현하기 위하여 i=1로 설정해서 string으로 감 싸준 뒤 출력

3. 결과 화면

'프로그래밍 > python' 카테고리의 다른 글

Python 백준 알고리즘 2753번 윤년, 2884 알람 문제풀기 (0)	2020.09.07
Python 백준 알고리즘 곱셈 문제 풀기 (0)	2020.09.06
Python 백준 알고리즘 문제 A+B 및 사칙연산 풀기(input,map,split) (0)	2020.09.06
Python으로 이미지 크롤링 및 다운로드 받기 (0)	2020.09.04
파이썬 다운로드, 설치 하기 (0)	2020.09.02

현재글파이썬으로 네이버 실시간검색 순위 클롤링(crawling) 하기

Flutter, flutter 기초강좌, Python, 시티챌린지, webpack, 자바프로그래밍, 이더리움, 코딩테스트, 자바스크립트, flutter강좌, three.js, 알고리즘, HTML, 프로그래머스, 파이썬, java, Javascript, 프로그래밍, 플러터, CSS,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

IT 공부를 위한 블로그