奇書小說目錄爬取表格,由論壇大佬原創(chuàng)制作分享的一款小說目錄爬取表格工具,羅列了奇書小說當(dāng)前包含了全部小說資源,并且按照分類顯示,每本小說都給出了可以在線瀏覽的網(wǎng)頁鏈接,直接點(diǎn)擊即可跳轉(zhuǎn)到閱讀界面。本次放出奇書小說目錄爬取表格資源下載,同時(shí)給出了爬取源碼,想要學(xué)習(xí)參考的朋友們可以看看。
奇書小說目錄爬取表格說明
羅列了數(shù)萬本已收藏小說資源
提供小說名,類型,鏈接,作者信息展示
奇書小說目錄爬取源碼一覽
#coding=utf-8
import requests
import re
import openpyxl
ld=openpyxl.load_workbook
book=ld("d:\\qishu.xlsx")
sheet1=book["Sheet1"]
hd={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}
p=1
while p<1234:
url="http://m.iqishu.la/full/"+str(p)+".html"
try:
dat=requests.get(url,headers=hd,timeout=60)
except BaseException:
dat=requests.get(url,headers=hd,timeout=60)
if dat.status_code==200:
with open("d:\\qishu.txt","a+",encoding="utf-8")as f:
f.write(dat.text)
f.close()
else:
pass
p=p+1
print(str(p))
s=open("d:\\qishu.htm",encoding="utf-8").read()
pat=r"<div class=\"full_content\"><p class=\"p1\">(.*?)</p><p class=\"p2\"> <a href=\"(.*?)\" class=\"blue\">(.*?)</a></p><p class=\"p3\"><a>(.*?)</a></p></div>"
r=re.findall(pattern=pat,string=s)
for i in range(len(r)):
x=r[i]
print(x)
row=sheet1.max_row+1
for b in range(len(x)):
# print(row)
sheet1.cell(row,b+1).value=x[b]
book.save("d:\\qishu.xlsx")