python爬取138看書網(wǎng)小說源碼,來自論壇大神原創(chuàng)制作的一個爬取源碼,可以幫您爬取138小說網(wǎng)的小說資源,支持分類搜索查找功能,支持目錄爬取,自帶書簽系統(tǒng),讓您看小說更加輕松。本次帶來python爬取138看書網(wǎng)小說源碼資源下載,需要看小說的朋友們不妨試試吧!
python爬取138看書網(wǎng)小說源碼作者說明
記錄一下今天爬的第二個小說網(wǎng),第二次弄起來比較熟練了,不像第一次弄一半就得找教程邊看邊搞。
但實際上爬小說應(yīng)該是最簡單的事情了吧,涉及的技術(shù)也沒有那么深。
把代碼發(fā)出來讓想剛?cè)腴T卻不知道從何下手的同萌新們看看吧,看教程總是云里霧里的,還是還得上手多練練才能加深印象。
這個小說爬蟲還是有點垃圾,多線程沒有,而且有些函數(shù)也是用的跟*一樣,這玩意該怎么改進也沒有頭緒。
python爬取138看書網(wǎng)小說源碼演示一覽
import requests
import lxml
import re
headers = {
'user-agent': 'User-AgentMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36'
}
print ('本腳本僅適用于138看書網(wǎng):https://www.13800100.com/')
#url_list = 'https://www.13800100.com/list/72262/'
url_list = (input('粘貼小說目錄url,必須是小說目錄,小說首頁不支持\n'))
downurl = 'https://www.13800100.com/article/'
url_list = requests.get(url_list)
text_list = url_list.text
#爬小說書名
text_title = re.findall(r'<div class="cate-tit">(.*?)</h2>',text_list,re.S)[0]
text_title = text_title.replace('\r\n','')
text_title = text_title.replace('<h2>','')
text_title = text_title.replace(' ','')
#爬小說目錄列表
text_list_info = re.findall(r'<div class="bd">.*?</div>',text_list,re.S)[0]
text_list_info = re.findall(r'<a href="/article/(.*?)" class="name">(.*?)</a>',text_list)
for i in text_list_info:
#每章小說的url和每章章名
list = i[0]
name = i[1]
download = downurl + list
download_info = requests.get(url = download,headers=headers)
html=download_info.text
html_info = re.findall(r'<div class=".*?">(.*?)</div>',html,re.S)[0]
html_info = html_info.replace (' ','')
html_info = html_info.replace ('<br/>',('\n'))
html_info = html_info.replace (' ','')
print (name)
#輸出為記事本
with open ('%s.txt' % text_title,'a+',encoding = 'utf-8')as f:
f.write(' '+ name + '\n')
f.write('\n')
f.write(html_info + '\n')
f.write('\n')
print ('下載完成')