scholar.py 文件源码-python代码片段

scholar.py 文件源码

python

阅读 30 收藏 0 点赞 0 评论 0

项目：snowballing 作者: JoaoFelipe 项目源码文件源码

def parse(self, html):
        """
        This method initiates parsing of HTML content, cleans resulting
        content as needed, and notifies the parser instance of
        resulting instances via the handle_article callback.
        """
        self.soup = BeautifulSoup(html, "html.parser")

        # This parses any global, non-itemized attributes from the page.
        self._parse_globals()

        # Now parse out listed articles:
        for div in self.soup.findAll(ScholarArticleParser._tag_results_checker):
            self._parse_article(div)
            self._clean_article()
            if self.article['title']:
                self.handle_article(self.article)