Python File Processing: Convert DOCX to HTML Using Mammoth

In this tutorial, we will use an example to show you how to convert a docx file to html file using python mammoth library.

Python File Processing: Convert DOCX to HTML Using Mammoth

1.Install mammoth

pip install mammoth

2.Import library

import mammoth

3.Start to convert docx to html using mammoth

custom_styles = "b => i"

with open(input_filename, "rb") as docx_file:
    result = mammoth.convert_to_html(docx_file, style_map = custom_styles)
    text = result.value
    with open('output.html', 'w') as html_file:
        html_file.write(text)

In this code, we should notice:

We will use mammoth.convert_to_html() to convert a docx file to html. However, we should use style_map parameter to set the style of html.

Moreover, we also use our custom css style in the converted html file. Here is an example:

custom_css ="""
    <style>
    .red{
        color: red;
    }
    .underline{
        text-decoration: underline;
    }
    .ul.li{
        list-style-type: circle;
    }
    table, th, td {
    border: 1px solid black;
    }
    </style>
    """
custom_styles = """ b => b.mark
                    u => u.initialism
                    p[style-name='Heading 1'] => h1.card
                    table => table.table.table-hover
                    """
with open(input_filename, "rb") as docx_file:
    result = mammoth.convert_to_html(docx_file, style_map = custom_styles)
    html = result.value 

edited_html = custom_css + html

output_filename = "output.html"
with open(output_filename, "w") as f: 
    f.writelines(edited_html)

Here we will use custom_css + html to add our custom css to html file.