4.6xpath基础语法

请参阅https://www.runoob.com/xpath/xpath-syntax.html

xpath用法获取内容 /h2/text() 标签固定样式 //span[@class='vote-post-up'] 标签包含某个样式 //span[contains(@class=,'vote-post-up')]

如何在scrapy中使用xpath？

scrapy抓取到http源码之后，将抓取结果保存到response中，通过respense.xpath("{xpath 表达式}")即可过滤出想要的内容。

respense.xpath()返回一个scrapy.selector.unified.SelectorList对象

通过SelectorList的extract()可以返回一个unicode字符串列表

命令行方式：scrapy shell {待抓取的url}，示例如下

scrapy shell http://blog.jobbole.com

response.xpath(".entry-header h1")
response.xpath(".entry-header h1::text").extract()
response.xpath(".entry-header h1::text").extract()[0]

pycharm调试方式

debug模式下使用

Previous4.2调试 Next4.7css选择器

Last updated 5 years ago

Was this helpful?