how to remove an element in lxml

how to remove an element in lxml

Asked on January 12, 2019 in XML.
Add Comment


  • 8 Answer(s)

         To remove element in lxml use the remove method of an xmlElement:

    tree=et.fromstring(xml)
     
    for bad in tree.xpath("//fruit[@state=\'rotten\']"):
      bad.getparent().remove(bad) # here I grab the parent of the element to call the remove directly on it
     
    print et.tostring(tree, pretty_print=True, xml_declaration=True)
    
    Answered on January 12, 2019.
    Add Comment

         The alternative way is the remove function. So call the tree’s remove method and pass it a subelement to remove.

    import lxml.etree as et
    xml="""
    <groceries>
      <fruit state="rotten">apple</fruit>
      <fruit state="fresh">pear</fruit>
      <punnet>
        <fruit state="rotten">strawberry</fruit>
        <fruit state="fresh">blueberry</fruit>
      </punnet>
      <fruit state="fresh">starfruit</fruit>
      <fruit state="rotten">mango</fruit>
      <fruit state="fresh">peach</fruit>
    </groceries>
    """
     
    tree=et.fromstring(xml)
     
    for bad in tree.xpath("//fruit[@state='rotten']"):
        bad.getparent().remove(bad)
     
    print et.tostring(tree, pretty_print=True)
    

    Result:

    <groceries>
      <fruit state="fresh">pear</fruit>
      <fruit state="fresh">starfruit</fruit>
      <fruit state="fresh">peach</fruit>
    </groceries>
    
    Answered on January 12, 2019.
    Add Comment

    Here the solution:

    <div>
    <script>
    some code
    </script>
    text here
    </div>
    

    div.remove(script) will remove the text here

         The etree.strip_elements is also a fair solution, which we can control whether or not we will remove the text behind with with_tail=(bool) param.

    If it can use xpath filter for tag is not concluded. Simply keep this for informing.

    The statement is here:

         strip_elements(tree_or_element, *tag_names, with_tail=True)

         Simply, delete all the elements with the given tag names from a tree or subtree. It will remove the elements and their whole subtree, which also includes all their attributes, text content and descendants. This will also remove the tail text of the element still the user explicitly set the with_tail keyword argument option to False.

    The tag names can consist wildcards as in _Element.iter.

         It will not delete the element or ElementTree root element which we passed if it matches also. It will treat its descendants only. If the user need to include the root element, then check their tag name directly before even calling this function.

    For instance:

    strip_elements(some_element,
       'simpletagname',              # non-namespaced tag
       '{http://some/ns}tagname',    # namespaced tag
       '{http://some/other/ns}*'     # any tag from a namespace
       lxml.etree.Comment            # comments
    )
    
    Answered on January 12, 2019.
    Add Comment
    for bad in tree.xpath("//fruit[@state=\'rotten\']"):
      bad.getparent().remove(bad)

    But it removes the element including its tail, which is a problem if you are processing mixed-content documents like HTML:

    <div><fruit state="rotten">avocado</fruit> Hello!</div>

    Becomes

    <div></div>

    Which is I suppose what you not always want 🙂 I have created helper function to remove just the element and keep its tail:

    def remove_element(el):
        parent = el.getparent()
        if el.tail.strip():
            prev = el.getprevious()
            if prev:
                prev.tail = (prev.tail or '') + el.tail
            else:
                parent.text = (parent.text or '') + el.tail
        parent.remove(el)
    
    for bad in tree.xpath("//fruit[@state=\'rotten\']"):
        remove_element(bad)

    This way it will keep the tail text:

    <div> Hello!</div>
    Answered on January 13, 2019.
    Add Comment
    import lxml.etree as et
    
    xml="""
    <groceries>
      <fruit state="rotten">apple</fruit>
      <fruit state="fresh">pear</fruit>
      <fruit state="fresh">starfruit</fruit>
      <fruit state="rotten">mango</fruit>
      <fruit state="fresh">peach</fruit>
    </groceries>
    """
    
    tree=et.fromstring(xml)
    
    Answered on February 25, 2019.
    Add Comment
    import lxml.etree as et
    
    xml="""
    <groceries>
      <fruit state="rotten">apple</fruit>
      <fruit state="fresh">pear</fruit>
      <punnet>
        <fruit state="rotten">strawberry</fruit>
        <fruit state="fresh">blueberry</fruit>
      </punnet>
      <fruit state="fresh">starfruit</fruit>
      <fruit state="rotten">mango</fruit>
      <fruit state="fresh">peach</fruit>
    </groceries>
    """
    
    tree=et.fromstring(xml)
    
    for bad in tree.xpath("//fruit[@state='rotten']"):
        bad.getparent().remove(bad)
    Answered on February 25, 2019.
    Add Comment

    strip_elements(tree_or_element, *tag_names, with_tail=True)

    Delete all elements with the provided tag names from a tree or subtree. This will remove the elements and their entire subtree, including all their attributes, text content and descendants. It will also remove the tail text of the element unless you explicitly set the with_tail keyword argument option to False.

    Tag names can contain wildcards as in _Element.iter.

    Note that this will not delete the element (or ElementTree root element) that you passed even if it matches. It will only treat its descendants. If you want to include the root element, check its tag name directly before even calling this function.

    Example usage::

       strip_elements(some_element,
           'simpletagname',             # non-namespaced tag
           '{http://some/ns}tagname',   # namespaced tag
           '{http://some/other/ns}*'    # any tag from a namespace
           lxml.etree.Comment           # comments
    Answered on February 25, 2019.
    Add Comment
    def remove_element(el):
        parent = el.getparent()
        if el.tail.strip():
            prev = el.getprevious()
            if prev:
                prev.tail = (prev.tail or '') + el.tail
            else:
                parent.text = (parent.text or '') + el.tail
        parent.remove(el)
    
    for bad in tree.xpath("//fruit[@state=\'rotten\']"):
        remove_element(bad)
    Answered on February 25, 2019.
    Add Comment


  • Your Answer

    By posting your answer, you agree to the privacy policy and terms of service.