Clojure XML Parsing



  • 10 Answer(s)

    XML to parse in the file:

    <high-node>
      <low-node>my text</low-node>
    </high-node>
    

         load clojure.xml:

    user=> (use 'clojure.xml)
    

         while it is parsed, then the xml will have the structure like,

    {:tag :high-node, :attrs nil, :content [{:tag :low-node, :attrs nil, :content ["my text"]}]}
    

         then we can seq over the content of the file to get the content of the low-node:

    user=> (for [x (xml-seq
                  (parse (java.io.File. file)))
                       :when (= :low-node (:tag x))]
              (first (:content x)))
     
    ("my text")
    

         Normally, if we need to have access to the whole list of information on low-node, we can change the :when predicate to (= (:high-node (:tag x))):

    user=> (for [x (xml-seq
                  (parse (java.io.File. file)))
                        :when (= :high-node (:tag x))]
              (first (:content x)))
     
    ({:tag :low-node, :attrs nil, :content ["my text"]})
    
    Answered on January 12, 2019.
    Add Comment

         Use clojure.data.zip.xml which is used to be clojure-contrib.zip-filter.xml prior to Clojure 1.3.

    file: myfile.xml

    <songs>
      <track id="t1"><name>Track one</name></track>
      <track id="t2"><name>Track two</name></track>
    </songs>
    

    code:

    ; Clojure 1.3
    (ns example
      (:use [clojure.data.zip.xml :only (attr text xml->)]) ; dep: see below
      (:require [clojure.xml :as xml]
              [clojure.zip :as zip]))
     
    (def xml (xml/parse "myfile.xml"))
    (def zipped (zip/xml-zip xml))
    (xml-> zipped :track :name text)     ; ("Track one" "Track two")
    (xml-> zipped :track (attr :id))     ; ("t1" "t2")
    

         The user want to pull in a dependency on data.zip to get this nice read/filter functionality by unfortunate.

    [org.clojure/data.zip "0.1.1"]
    

         Then for docs for data.zip.xml. The relative small source file here to see what is possible.

    Answered on January 12, 2019.
    Add Comment
    (require '[clojure.xml :as xml]
             '[clojure.zip :as zip])
    
    ;;convenience function, first seen at nakkaya.com later in clj.zip src
    (defn zip-str [s]
      (zip/xml-zip 
          (xml/parse (java.io.ByteArrayInputStream. (.getBytes s)))))
    
    ;;parse from xml-strings to internal xml representation
    user=> (zip-str "<a href='nakkaya.com'/>")
    [{:tag :a, :attrs {:href "nakkaya.com"}, :content nil} nil]
    
    ;;root can be rendered with xml/emit-element
    user=> (xml/emit-element (zip/root [{:tag :a, :attrs {:href "nakkaya.com"}, :content nil} nil]))
    <a href='nakkaya.com'/>
    
    ;;printed (to assure it's not lazy and for performance), can be caught to string variable with with-out-str

     

    Answered on January 13, 2019.
    Add Comment

    the following XML in the example.nzb file:

    <?xml version="1.0" encoding="iso-8859-1" ?>
    <!-- <!DOCTYPE nzb PUBLIC "-//newzBin//DTD NZB 1.1//EN" "http://www.newzbin.com/DTD/nzb/nzb-1.1.dtd"> -->
    <nzb xmlns="http://www.newzbin.com/DTD/2003/nzb">
     <head>
       <meta type="title">Your File!</meta>
       <meta type="tag">Example</meta>
     </head>
     <file poster="Joe Bloggs &lt;bloggs@nowhere.example&gt;" date="1071674882" subject="Here's your file!  abc-mr2a.r01 (1/2)">
       <groups>
         <group>alt.binaries.newzbin</group>
         <group>alt.binaries.mojo</group>
       </groups>
       <segments>
         <segment bytes="102394" number="1">123456789abcdef@news.newzbin.com</segment>
         <segment bytes="4501" number="2">987654321fedbca@news.newzbin.com</segment>
       </segments>
     </file>
    </nzb>
    Answered on January 24, 2019.
    Add Comment

    Let us start by creating a new project (for details on using Leiningen, see this guide:

    $ lein new nzb
    

    Now edit project.clj to contain the following:

    (defproject nzb "0.1.0-SNAPSHOT"
      :description ""
      :url ""
      :license {:name "Eclipse Public License"
                :url "http://www.eclipse.org/legal/epl-v10.html"}
      :dependencies [[org.clojure/clojure "1.4.0"]
                     [org.clojure/data.zip "0.1.1"]])
    Answered on January 24, 2019.
    Add Comment
    Parses and loads the source s, which can be a File, InputStream or
    String naming a URI. Returns a tree of the xml/element struct-map,
    which has the keys :tag, :attrs, and :content. and accessor fns tag,
    attrs, and content. Other parsers can be supplied by passing
    startparse, a fn taking a source and a ContentHandler and returning
    a parser
    Answered on February 24, 2019.
    Add Comment

    This work is licensed under a Creative Commons Attribution 3.0 Unported License (including images & stylesheets). The source is available on Github

    Answered on February 24, 2019.
    Add Comment

    “XML is like violence – if it doesn’t work, use more”

    Clojure is awesome for parsing and processing structured data. It has a wide range of functions for handling lists, maps (associative arrays), sets, and (if you really need them) objects.

    One great example of the power of clojure for this sort of thing is processing xml. You may hate xml, you may use json or edn or yaml or anything else you can – but ultimately, xml is still all over the place, and if you need to handle complex xml or large xml, you might want to look at clojure.

    Answered on February 24, 2019.
    Add Comment

    Clojure has three basic approaches to xml:

    1. Parsing as structured data
    2. Traversing the structured data as a sequence
    3. Manipulating via zippers and their friends

    More, the first two of these can be done lazily, allowing for easy processing of huge data sets. More on this later.

    Parsing xml as structured data

    Clojure comes with a built in xml parser – it can parse streams, files, or URIs into nested maps. Unfortunately it doesn’t have a simple way to parse strings, but you can make them into streams and then parse them as follows:

    (defn parse [s]
       (clojure.xml/parse
         (java.io.ByteArrayInputStream. (.getBytes s))))
    

    Given an xml file like this:

    <top>
    Baby, I'm the top
      <mid>
        <bot foo="bar">
          I'm the bottom!
        </bot>
      </mid>
    </top>
    

    calling (parse xml) will return a set of nested maps representing the data:

    {:tag :top,
     :attrs nil,
     :content [
       "Baby, I'm the top"
        {:tag :mid, 
         :attrs nil, 
         :content [
            {:tag :bot,
             :attrs {:foo "bar"},
             :content ["I'm the bottom!"]}]}]}
    

    Once you have nested maps in clojure, you have a huge number of ways to manipulate the data just using language constructs. For example, you can get the content above with:

    (first (:content (first (:content (second (:content (parse data)))))))
    
    => "I'm the bottom!"
    

    Or using the ->> macro:

    (->> (parse data)
         :content
         second
         :content
         first
         :content
         first
         :content)
    
    Answered on February 24, 2019.
    Add Comment

    parse

    function

    Usage: (parse s)
           (parse s startparse)
    
    Parses and loads the source s, which can be a File, InputStream or
    String naming a URI. Returns a tree of the xml/element struct-map,
    which has the keys :tag, :attrs, and :content. and accessor fns tag,
    attrs, and content. Other parsers can be supplied by passing
    startparse, a fn taking a source and a ContentHandler and returning
    a parser
    Answered on February 24, 2019.
    Add Comment


  • Your Answer

    By posting your answer, you agree to the privacy policy and terms of service.