In this tutorial, we will use an example to show you how to extract text from a docx file using python mammoth library.
1.Install mammoth library
pip install mammoth
2.Import library
import mammoth
3.Open a docx file
with open(input_filename, "rb") as docx_file:
4.Extract text from docx file
result = mammoth.extract_raw_text(docx_file) text = result.value # The raw text with open('output.txt', 'w') as text_file: text_file.write(text)
In this tutorial, we will useĀ mammoth.extract_raw_text() function get extract text from a docx file. Then, we will save it to output.txt file.