Merging two or more PDF files at a defined page number.Īppending two or more PDF files, one after another.įind all the meta information for any PDF file to get information like creator, author, date of creation, etc. Rotating a PDF file page by any defined angle Reading the text of the PDF file, which we just did above The PyPDF2 module can be used to perform many opertations on PDF files, such as:
Once we are done, we can call the close() method on the file object to close the file resource. Then we have used Python for loop, to print the text of all the pages of the PDF.
HOW TO INSTALL PYPDF2 IN ANACONDA FULL
In the code above, we are printing the title and the name of the creator for the PDF file mypdf.pdf( change it as per your PDF file name and provide the full path for the file) which are attributes of the getDocumentInfo() method. Print("PDF File created by: " + str(pdfReader.getDocumentInfo().creator)) Print("PDF File name: " + str(pdfReader.getDocumentInfo().title)) Then we have the getPage() method to get the page from the PDF file using the page index which starts from 0, and finally the extractText() method which is used to extract the text from the PDF file page. One we have the PdfFileReader object ready, we can use its methods like getDocumentInfo() to get the file information, or getNumPages() to get the total number of pages in the PDF file. In the code above, we have first used the open() method used to open a file in Python for reading, then we will use this file object to initialize the PdfFileReader object. Print("Number of Pages: " + str(pdfReader.getNumPages())) Print("Printing the document info: " + str(pdfReader.getDocumentInfo())) # create PDFFileReader object to read the file Now let's see how we can use PyPDF2 module to read PDF files: from PyPDF2 import PdfFileReader
Using the PyPDF2 moduleįor extracting text from a PDF file we will be using the PdfFileReader class which is used to initialize PdfFileReader object, taking a stream parameter, in which we will provide the file stream for the PDF file. Once we have downloaded the PyPDF2 module, we can write the code for opening the PDF file, then reading its text and printing it on the console or writing the text in a separate text file. Run the below pip command to download the PyPDF2 module: pip install PyPDF2 To install the PyPDF2 module, you can use pip command. We will be using the PyPDF2 module for extracting text from PDF files. The PDF can be a multipage PDF too, we will extract the text for all the pages of PDF. In this simple tutorial, we will learn how we can extract text from a given PDF in Python.