如何使用Python将网页转换为PDF

发布于 2021-01-29 14:58:00

我正在寻找使用Python将网页打印为本地文件PDF的解决方案。很好的解决方案之一是使用Qt,请参见https://bharatikunal.wordpress.com/2010/01/

由于我在安装PyQt4时遇到问题,因此一开始它不起作用,因为它给出了错误消息,例如“ ImportError: No module named PyQt4.QtCore”和“ ImportError: No module named PyQt4.QtCore”。

这是因为PyQt4没有正确安装。我以前的库位于C:\ Python27 \ Lib,但是它不适用于PyQt4。

实际上,它只需要从http://www.riverbankcomputing.com/software/pyqt/download下载(注意所使用的正确Python版本),然后将其安装到C:\
Python27(我的情况)。而已。

现在脚本运行良好,所以我想分享一下。有关使用Qprinter的更多选项,请参考http://qt-
project.org/doc/qt-4.8/qprinter.html#Orientation-enum

关注者
0
被浏览
92
1 个回答
  • 面试哥
    面试哥 2021-01-29
    为面试而生,有面试问题,就找面试哥。

    感谢下面的帖子,无论页面有多少页,我都可以添加要打印的网页链接地址并在生成的PDF上显示时间。

    使用Python将文本添加到现有PDF

    https://github.com/disflux/django-
    mtr/blob/master/pdfgen/doc_overlay.py

    要共享脚本,如下所示:

    import time
    from pyPdf import PdfFileWriter, PdfFileReader
    import StringIO
    from reportlab.pdfgen import canvas
    from reportlab.lib.pagesizes import letter
    from xhtml2pdf import pisa
    import sys 
    from PyQt4.QtCore import *
    from PyQt4.QtGui import * 
    from PyQt4.QtWebKit import *
    
    url = 'http://www.yahoo.com'
    tem_pdf = "c:\\tem_pdf.pdf"
    final_file = "c:\\younameit.pdf"
    
    app = QApplication(sys.argv)
    web = QWebView()
    #Read the URL given
    web.load(QUrl(url))
    printer = QPrinter()
    #setting format
    printer.setPageSize(QPrinter.A4)
    printer.setOrientation(QPrinter.Landscape)
    printer.setOutputFormat(QPrinter.PdfFormat)
    #export file as c:\tem_pdf.pdf
    printer.setOutputFileName(tem_pdf)
    
    def convertIt():
        web.print_(printer)
        QApplication.exit()
    
    QObject.connect(web, SIGNAL("loadFinished(bool)"), convertIt)
    
    app.exec_()
    sys.exit
    
    # Below is to add on the weblink as text and present date&time on PDF generated
    
    outputPDF = PdfFileWriter()
    packet = StringIO.StringIO()
    # create a new PDF with Reportlab
    can = canvas.Canvas(packet, pagesize=letter)
    can.setFont("Helvetica", 9)
    # Writting the new line
    oknow = time.strftime("%a, %d %b %Y %H:%M")
    can.drawString(5, 2, url)
    can.drawString(605, 2, oknow)
    can.save()
    
    #move to the beginning of the StringIO buffer
    packet.seek(0)
    new_pdf = PdfFileReader(packet)
    # read your existing PDF
    existing_pdf = PdfFileReader(file(tem_pdf, "rb"))
    pages = existing_pdf.getNumPages()
    output = PdfFileWriter()
    # add the "watermark" (which is the new pdf) on the existing page
    for x in range(0,pages):
        page = existing_pdf.getPage(x)
        page.mergePage(new_pdf.getPage(0))
        output.addPage(page)
    # finally, write "output" to a real file
    outputStream = file(final_file, "wb")
    output.write(outputStream)
    outputStream.close()
    
    print final_file, 'is ready.'
    


知识点
面圈网VIP题库

面圈网VIP题库全新上线,海量真题题库资源。 90大类考试,超10万份考试真题开放下载啦

去下载看看