Parsing XML with namespace in Python via ‘ElementTree’

Parsing XML with namespace in Python via ‘ElementTree’

Asked on December 25, 2018 in XML.
Add Comment


  • 3 Answer(s)

    Parsing XML with namespace in Python via ‘ElementTree’:

          About the namespaces, the ElementTree is not too acute. User need to give the .find(), findall() and iterfind() methods an explicit namespace dictionary. It is not documented  as good,

    namespaces = {'owl': 'http://www.w3.org/2002/07/owl#'} # add more as needed
     
    root.findall('owl:Class', namespaces)
    

         The prefixes are looked up in the namespaces parameter that user pass in. They may use any namespace prefix they like; the owl:  part splits off by API, looks up the corresponding namespace URL in the namespaces dictionary, then  it will modifies the search to look for the XPath expression instead. Use the same syntax too like,

    root.findall('{http://www.w3.org/2002/07/owl#}Class')
    

         It is better that if user can switch to the lxml library things; that library will supports the same ElementTree API, but fetches the namespaces for user in a .nsmap attribute on elements.

    Answered on December 25, 2018.
    Add Comment

    XML with namespace in Python:

    from lxml import etree
    tree = etree.parse("filename")
    root = tree.getroot()
    root.findall('owl:Class', root.nsmap)
    
    Answered on December 25, 2018.
    Add Comment

         Use ElementTree.iterparse function to extract namespace’s prefixes and URI from XML data , parsing only namespace start events,

    >>>  from io import StringIO
    >>>  from xml.etree import ElementTree
    >>>  my_schema = u'''<rdf:RDF xml:base="http://dbpedia.org/ontology/"
    ...     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    ...     xmlns:owl="http://www.w3.org/2002/07/owl#"
    ...     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
    ...     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    ...     xmlns="http://dbpedia.org/ontology/">
    ...
    ...     <owl:Class rdf:about="http://dbpedia.org/ontology/BasketballLeague">
    ...         <rdfs:label xml:lang="en">basketball league</rdfs:label>
    ...         <rdfs:comment xml:lang="en">
    ...           a group of sports teams that compete against each other
    ...           in Basketball
    ...         </rdfs:comment>
    ...     </owl:Class>
    ...
    ...  </rdf:RDF>'''
    >>>  my_namespaces = dict([
    ...     node for _, node in ElementTree.iterparse(
    ...         StringIO(my_schema), events=['start-ns']
    ...     )
    ...  ])
    >>>  from pprint import pprint
    >>>  pprint(my_namespaces)
    {'': 'http://dbpedia.org/ontology/',
     'owl': 'http://www.w3.org/2002/07/owl#',
     'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
     'rdfs': 'http://www.w3.org/2000/01/rdf-schema#',
     'xsd': 'http://www.w3.org/2001/XMLSchema#'}
    

         The dictionary be passed as argument to the search functions then,

    root.findall('owl:Class', my_namespaces)
    
    Answered on December 25, 2018.
    Add Comment


  • Your Answer

    By posting your answer, you agree to the privacy policy and terms of service.