In this tutorial, we will use an example to show you how to convert a docx file to html file using python mammoth library.
1.Install mammoth
pip install mammoth
2.Import library
import mammoth
3.Start to convert docx to html using mammoth
custom_styles = "b => i" with open(input_filename, "rb") as docx_file: result = mammoth.convert_to_html(docx_file, style_map = custom_styles) text = result.value with open('output.html', 'w') as html_file: html_file.write(text)
In this code, we should notice:
We will use mammoth.convert_to_html() to convert a docx file to html. However, we should use style_map parameter to set the style of html.
Moreover, we also use our custom css style in the converted html file. Here is an example:
custom_css =""" <style> .red{ color: red; } .underline{ text-decoration: underline; } .ul.li{ list-style-type: circle; } table, th, td { border: 1px solid black; } </style> """ custom_styles = """ b => b.mark u => u.initialism p[style-name='Heading 1'] => h1.card table => table.table.table-hover """ with open(input_filename, "rb") as docx_file: result = mammoth.convert_to_html(docx_file, style_map = custom_styles) html = result.value edited_html = custom_css + html output_filename = "output.html" with open(output_filename, "w") as f: f.writelines(edited_html)
Here we will use custom_css + html to add our custom css to html file.