In this tutorial, we will introcude a simple way to extract text from a pdf file in python, we will use python pdftotext library to implement it.
1. Instal pdftotext
pip install pdftotext
2. Import library
import pdftotext
3. Read a pdf file
pdf_file = open("test.pdf" , "rb")
4. Extract text from a pdf file
gvj_pdf = pdftotext.PDF(pdf_file)
5. Print the text in pdf
for i in gvj_pdf: # iterating every page in pdf print(i)
6. Close pdf file
pdf_file.close()