Scrapy

Scrapy is an open source and collaborative framework for extracting the data you need from websites.

In a fast, simple, yet extensible way.

安装实战

更新 pip

$   sudo pip install --upgrade pip

日志

$ sudo pip install --upgrade pip
Password:
The directory '/Users/houbinbin/Library/Caches/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/Users/houbinbin/Library/Caches/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting pip
  Downloading https://files.pythonhosted.org/packages/0f/74/ecd13431bcc456ed390b44c8a6e917c1820365cbebcb6a8974d1cd045ab4/pip-10.0.1-py2.py3-none-any.whl (1.3MB)
    100% |████████████████████████████████| 1.3MB 505kB/s 
Installing collected packages: pip
  Found existing installation: pip 9.0.1
    Uninstalling pip-9.0.1:
      Successfully uninstalled pip-9.0.1
Successfully installed pip-10.0.1

安装 scrapy

$   sudo pip install scrapy

日志

$ sudo pip install scrapy
The directory '/Users/houbinbin/Library/Caches/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/Users/houbinbin/Library/Caches/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting scrapy
  Downloading https://files.pythonhosted.org/packages/db/9c/cb15b2dc6003a805afd21b9b396e0e965800765b51da72fe17cf340b9be2/Scrapy-1.5.0-py2.py3-none-any.whl (251kB)
    100% |████████████████████████████████| 256kB 309kB/s 
Collecting w3lib>=1.17.0 (from scrapy)
  Downloading https://files.pythonhosted.org/packages/37/94/40c93ad0cadac0f8cb729e1668823c71532fd4a7361b141aec535acb68e3/w3lib-1.19.0-py2.py3-none-any.whl
Collecting six>=1.5.2 (from scrapy)
  Downloading https://files.pythonhosted.org/packages/67/4b/141a581104b1f6397bfa78ac9d43d8ad29a7ca43ea90a2d863fe3056e86a/six-1.11.0-py2.py3-none-any.whl
Collecting cssselect>=0.9 (from scrapy)
  Downloading https://files.pythonhosted.org/packages/7b/44/25b7283e50585f0b4156960691d951b05d061abf4a714078393e51929b30/cssselect-1.0.3-py2.py3-none-any.whl
Collecting parsel>=1.1 (from scrapy)
  Downloading https://files.pythonhosted.org/packages/bc/b4/2fd37d6f6a7e35cbc4c2613a789221ef1109708d5d4fb9fd5f6f721a43c9/parsel-1.4.0-py2.py3-none-any.whl
Collecting service-identity (from scrapy)
  Downloading https://files.pythonhosted.org/packages/29/fa/995e364220979e577e7ca232440961db0bf996b6edaf586a7d1bd14d81f1/service_identity-17.0.0-py2.py3-none-any.whl
Requirement already satisfied: pyOpenSSL in /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python (from scrapy) (0.13.1)
Collecting queuelib (from scrapy)
  Downloading https://files.pythonhosted.org/packages/4c/85/ae64e9145f39dd6d14f8af3fa809a270ef3729f3b90b3c0cf5aa242ab0d4/queuelib-1.5.0-py2.py3-none-any.whl
Collecting PyDispatcher>=2.0.5 (from scrapy)
  Downloading https://files.pythonhosted.org/packages/cd/37/39aca520918ce1935bea9c356bcbb7ed7e52ad4e31bff9b943dfc8e7115b/PyDispatcher-2.0.5.tar.gz
Collecting Twisted>=13.1.0 (from scrapy)
  Downloading https://files.pythonhosted.org/packages/12/2a/e9e4fb2e6b2f7a75577e0614926819a472934b0b85f205ba5d5d2add54d0/Twisted-18.4.0.tar.bz2 (3.0MB)
    100% |████████████████████████████████| 3.0MB 925kB/s 
Collecting lxml (from scrapy)
  Downloading https://files.pythonhosted.org/packages/18/95/abf8204fbbc9a01e0e156029cd1ee974237b5798b9e84477df6c4fabfbd2/lxml-4.2.1-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (8.8MB)
    100% |████████████████████████████████| 8.8MB 1.8MB/s 
Collecting attrs (from service-identity->scrapy)
  Downloading https://files.pythonhosted.org/packages/41/59/cedf87e91ed541be7957c501a92102f9cc6363c623a7666d69d51c78ac5b/attrs-18.1.0-py2.py3-none-any.whl
Requirement already satisfied: pyasn1 in /Library/Python/2.7/site-packages (from service-identity->scrapy) (0.3.3)
Collecting pyasn1-modules (from service-identity->scrapy)
  Downloading https://files.pythonhosted.org/packages/e9/51/bcd96bf6231d4b2cc5e023c511bee86637ba375c44a6f9d1b4b7ad1ce4b9/pyasn1_modules-0.2.1-py2.py3-none-any.whl (60kB)
    100% |████████████████████████████████| 61kB 42kB/s 
Collecting zope.interface>=4.4.2 (from Twisted>=13.1.0->scrapy)
  Downloading https://files.pythonhosted.org/packages/ac/8a/657532df378c2cd2a1fe6b12be3b4097521570769d4852ec02c24bd3594e/zope.interface-4.5.0.tar.gz (151kB)
    100% |████████████████████████████████| 153kB 83kB/s 
Collecting constantly>=15.1 (from Twisted>=13.1.0->scrapy)
  Downloading https://files.pythonhosted.org/packages/b9/65/48c1909d0c0aeae6c10213340ce682db01b48ea900a7d9fce7a7910ff318/constantly-15.1.0-py2.py3-none-any.whl
Collecting incremental>=16.10.1 (from Twisted>=13.1.0->scrapy)
  Downloading https://files.pythonhosted.org/packages/f5/1d/c98a587dc06e107115cf4a58b49de20b19222c83d75335a192052af4c4b7/incremental-17.5.0-py2.py3-none-any.whl
Collecting Automat>=0.3.0 (from Twisted>=13.1.0->scrapy)
  Downloading https://files.pythonhosted.org/packages/17/6a/1baf488c2015ecafda48c03ca984cf0c48c254622668eb1732dbe2eae118/Automat-0.6.0-py2.py3-none-any.whl
Collecting hyperlink>=17.1.1 (from Twisted>=13.1.0->scrapy)
  Downloading https://files.pythonhosted.org/packages/a7/b6/84d0c863ff81e8e7de87cff3bd8fd8f1054c227ce09af1b679a8b17a9274/hyperlink-18.0.0-py2.py3-none-any.whl
Requirement already satisfied: setuptools in /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python (from zope.interface>=4.4.2->Twisted>=13.1.0->scrapy) (18.5)
Requirement already satisfied: idna>=2.5 in /Library/Python/2.7/site-packages (from hyperlink>=17.1.1->Twisted>=13.1.0->scrapy) (2.6)
matplotlib 1.3.1 requires nose, which is not installed.
matplotlib 1.3.1 requires tornado, which is not installed.
pyasn1-modules 0.2.1 has requirement pyasn1<0.5.0,>=0.4.1, but you'll have pyasn1 0.3.3 which is incompatible.
Installing collected packages: six, w3lib, cssselect, lxml, parsel, attrs, pyasn1-modules, service-identity, queuelib, PyDispatcher, zope.interface, constantly, incremental, Automat, hyperlink, Twisted, scrapy
  Found existing installation: six 1.4.1
Cannot uninstall 'six'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
  • matplotlib
$   sudo python -mpip install -U matplotlib --ignore-installed six  

更新默认 python

  • 下载
$   ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
  • 配置到环境变量
$   echo "export PATH=/usr/local/bin:/usr/local/sbin:$PATH" >> ~/.bashrc
  • 刷新
$   source ~/.bashrc
  • 重新安装
$   brew install python
  • 如果需要更新
$ brew update 
$ brew upgrade python

快速开始

hello.py


输出结果