> For the complete documentation index, see [llms.txt](https://moluo.gitbook.io/notes/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://moluo.gitbook.io/notes/bian-cheng-xue-xi/python/scrapy/1.1.scrapy-jian-jie.md).

# 1.1.Scrapy简介

Scrapy是一个开源框架，用一种**快速**、**简单**、**可扩展**的方式从网站中提取你需要的数据。

## Scrapy特点

* 快速且强大：你仅仅需要编写用于提取数据的规则，Scrapy会为你完成其余工作
* 扩展容易：通过设计扩展，无需接触核心即可轻松插入新功能
* 便携：基于python语言，各平台兼容

## 构建并运行你的网络爬虫

```bash
$ pip install scrapy 
$ cat > myspider.py <<EOF
import scrapy

class BlogSpider(scrapy.Spider):
    name = 'blogspider'
    start_urls = ['https://blog.scrapinghub.com']

    def parse(self, response):
        for title in response.css('.post-header>h2'):
            yield {'title': title.css('a ::text').get()}

        for next_page in response.css('a.next-posts-link'):
            yield response.follow(next_page, self.parse)
EOF
$ scrapy runspider myspider.py
```

## 部署网络爬虫

Scrapy支持多种部署部署方式，你可根据你的实际情况决定选用哪一种方式

* 部署爬虫到[Scrapy Cloud](https://scrapinghub.com/scrapy-cloud/)

  ```bash
  $ pip install shub
  $ shub login
  Insert your Scrapinghub API Key: <API_KEY>

  # Deploy the spider to Scrapy Cloud
  $ shub deploy

  # Schedule the spider for execution
  $ shub schedule blogspider 
  Spider blogspider scheduled, watch it running here:
  https://app.scrapinghub.com/p/26731/job/1/8

  # Retrieve the scraped data
  $ shub items 26731/1/8
  {"title": "Improved Frontera: Web Crawling at Scale with Python 3 Support"}
  {"title": "How to Crawl the Web Politely with Scrapy"}
  ...
  ```
* 或者使用 [Scrapyd](https://github.com/scrapy/scrapyd) 在你自己的服务器中托管爬虫


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://moluo.gitbook.io/notes/bian-cheng-xue-xi/python/scrapy/1.1.scrapy-jian-jie.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
