# SpidersShared

**Repository Path**: goccoder/spiders-shared

## Basic Information

- **Project Name**: SpidersShared
- **Description**: 爬虫分享
SpidersShared
- **Primary Language**: Python
- **License**: MulanPSL-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-10-05
- **Last Updated**: 2025-09-09

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README


   <!-- Copyright (c) 2020 
   SpidersShared is licensed under Mulan PSL v2.
   You can use this software according to the terms and conditions of the Mulan PSL v2. 
   You may obtain a copy of Mulan PSL v2 at:
            http://license.coscl.org.cn/MulanPSL2 
   THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE.  
   See the Mulan PSL v2 for more details.   -->
# 说明
## 基础软件说明
* Scrapy 爬虫框架
  * 有非常多的扩展支持：分布式支持、js支持等等
  * 有效文档遍地，遇到问题百度一下就可以找到相当多的资料
* python3.x
  * python2已经过时了，再使用不太好
* pycharm 
  * python不二选IDE,
  * 实时计算是不是写错了
  
* firefox
  * 我个人非常喜欢,^^又是一波安利
  * 完整的css和xpath支持，像chromium、chrome都不直接提供css跟路径支持
  * css跟xpath是都可用于爬虫获取元素

* splash
  * 对于使用js渲染的站点，必需执行js渲染
  * 这个就是可以配合scrapy的方案
  * 用这个东西需要docker,要不自己配出来早就爬虫完了


## 扩展聊聊
* requests-html
  * 对于简单爬虫也可以用这个神器，直接支持js渲染
* 无头浏览器
  * [phantomjs](https://phantomjs.org/)
    * 内置webkit，开发使用简单
    * 不推荐使用（不如直接用selenium)
  * selenium
    * [chromedriver](http://npm.taobao.org/mirrors/chromedriver/)
    * [firefox驱动geckodriver](https://github.com/mozilla/geckodriver/releases)
    * 可以解决各种反爬虫机制
    * [教程](https://blog.csdn.net/weixin_36279318/article/details/79475388)