# 4.2调试

在命令行运行爬虫步骤

1.切换路径到项目根路径

2.使用如下命令运行指定爬虫

```bash
scrapy crawl {爬虫名称}
```

> 提示：爬虫名称与ArticleSpider/spiders/下的python文件名相同
>
> 如运行ArticleSpider/spiders/jobbole.py，则使用如下命令
>
> ```bash
> scrapy crawl jobbole
> ```

在pycharm中运行爬虫

在ArticleSpider/目录下新建main.py文件，内容如下

```bash
import os
import sys
from scrapy.cmdline import execute

sys.path.append(os.path.dirname(os.path.abspath(__file__)))
execute(["scrapy", "crawl", "jobbole"])
```

> 解析：
>
> `os.path.dirname(os.path.abspath(__file__))`获取ArticleSpider/的绝对路径
>
> execute(\["scrapy", "crawl", "jobbole"])运行jobbole爬虫

在ArticleSpider/settings.py中，设置不遵循robots.txt协议

```
ROBOTSTXT_OBEY = False
```

若ROBOTSTXT\_OBEY = True，scrapy会读取每个网站的robos.txt协议，将robos.txt协议中的url过滤掉，不进行爬取


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://moluo.gitbook.io/notes/bian-cheng-xue-xi/python/scrapy/4.2-tiao-shi.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.