Skip to main content
def sanitize_html(
    html_string: str,
    *,
    remove_scripts: bool = True,
    remove_styles: bool = True,
    remove_svgs: bool = True,
    remove_comments: bool = True,
    remove_long_attributes: bool = True,
    max_attribute_length: int = 500,
    preserve_attributes: list[str] | None = None,
    remove_empty_tags: bool = True,
    preserve_empty_tags: list[str] | None = None,
    minify_whitespace: bool = True,
) -> str
Sanitizes and cleans HTML content by removing unwanted elements, attributes, and whitespace. Provides fine-grained control over each cleaning operation through configurable options.

Examples

from intuned_browser import sanitize_html
async def automation(page, params, **_kwargs):
    dirty_html = '''
    <div>
        <script>alert('xss')</script>
        <p style="color: red;">Hello World</p>
        <span></span>
    </div>
    '''
    sanitized_html = sanitize_html(dirty_html)
    # Output: '<div><p>Hello World</p></div>'

Arguments

html_string
str
required
The HTML content to sanitize
remove_scripts
bool
default:"True"
Remove all <script> elements. Defaults to True.
remove_styles
bool
default:"True"
Remove all <style> elements. Defaults to True.
remove_svgs
bool
default:"True"
Remove all <svg> elements. Defaults to True.
remove_comments
bool
default:"True"
Remove HTML comments. Defaults to True.
remove_long_attributes
bool
default:"True"
Remove attributes longer than max_attribute_length. Defaults to True.
max_attribute_length
int
default:"500"
Maximum length for attributes before removal. Defaults to 500.
preserve_attributes
list[str]
List of attribute names to always preserve. Defaults to [“class”, “src”].
remove_empty_tags
bool
default:"True"
Remove empty tags (except preserved ones). Defaults to True.
preserve_empty_tags
list[str]
default:"['img']"
List of tag names to preserve even when empty. Defaults to [“img”].
minify_whitespace
bool
default:"True"
Remove extra whitespace between tags and empty lines. Defaults to True.

Returns: str

The sanitized HTML string