CVE-2024-52595 Information

Description

lxml_html_clean is a project for HTML cleaning functionalities copied from lxml.html.clean. Prior to version 0.4.0 the HTML Parser in lxml does not properly handle context-switching for special HTML tags such as <svg> <math> and <noscript>. This behavior deviates from how web browsers parse and interpret such tags. Specifically content in CSS comments is ignored by lxml_html_clean but may be interpreted differently by web browsers enabling malicious scripts to bypass the cleaning process. This vulnerability could lead to Cross-Site Scripting (XSS) attacks compromising the security of users relying on lxml_html_clean in default configuration for sanitizing untrusted HTML content. Users employing the HTML cleaner in a security-sensitive context should upgrade to lxml 0.4.0 which addresses this issue. As a temporary mitigation users can configure lxml_html_clean with the following settings to prevent the exploitation of this vulnerability. Via remove_tags one may specify tags to remove - their content is moved to their parents’ tags. Via kill_tags one may specify tags to be removed completely. Via allow_tags one may restrict the set of permissible tags excluding context-switching tags like <svg> <math> and <noscript>.

Reference

https://github.com/fedora-python/lxml_html_clean/security/advisories/GHSA-5jfw-gq64-q45f https://github.com/fedora-python/lxml_html_clean/pull/19 https://github.com/fedora-python/lxml_html_clean/commit/c5d816f86eb3707d72a8ecf5f3823e0daa1b3808

Share on: