Python

Scrapy图片下载如何使用自定义文件名

发布于 2021-01-29 19:20:49

对于我scrapy项目我目前使用的ImagesPipeline。下载的图像以其URL的SHA1哈希作为文件名存储。

如何使用我自己的自定义文件名存储文件？

如果我的自定义文件名需要包含同一项目中的另一个抓取字段，该怎么办？例如，使用item['desc']和和图像的文件名item['image_url']。如果我理解正确，那将涉及以某种方式从图像管道访问其他项目字段。

任何帮助将不胜感激。

关注者

被浏览

219

1 个回答

面试哥 2021-01-29

为面试而生，有面试问题，就找面试哥。

这就是我在Scrapy
0.10中解决问题的方式。检查FSImagesStoreChangeableDirectory的persist_image方法。下载图像的文件名是密钥

class FSImagesStoreChangeableDirectory(FSImagesStore):

    def persist_image(self, key, image, buf, info,append_path):

        absolute_path = self._get_filesystem_path(append_path+'/'+key)
        self._mkdir(os.path.dirname(absolute_path), info)
        image.save(absolute_path)

class ProjectPipeline(ImagesPipeline):

    def __init__(self):
        super(ImagesPipeline, self).__init__()
        store_uri = settings.IMAGES_STORE
        if not store_uri:
            raise NotConfigured
        self.store = FSImagesStoreChangeableDirectory(store_uri)

知识点

Python

面圈网VIP题库全新上线，海量真题题库资源。 90大类考试，超10万份考试真题开放下载啦

去下载看看