Regex group capture in R with multiple capture-groups

Regex group capture in R with multiple capture-groups

Asked on January 11, 2019 in Regex.
Add Comment


  • 5 Answer(s)

    By using gsub  .for instance:

    gsub("\\((.*?) :: (0\\.[0-9]+)\\)","\\1 \\2", "(sometext :: 0.1231313213)")
    [1] "sometext 0.1231313213"
    

    For double escape the \s in the quotes then they work for the regex

    Answered on January 11, 2019.
    Add Comment

    Also use gsub() is return only the capture group:

    In order to select elements outside to capture group as mentioned in the gsub().

    (…) elements of character vectors ‘x’ which are not substituted will be returned unchanged.

    And text to be selected lies in the middle of some string, adding .* before and after the capture group should return it.

    gsub(".*\\((.*?) :: (0\\.[0-9]+)\\).*","\\1 \\2", "(sometext :: 0.1231313213)")
      [1] "sometext 0.1231313213"
    
    Answered on January 11, 2019.
    Add Comment

    In R, is it possible to extract group capture from a regular expression match? As far as I can tell, none of grepgreplregexprgregexprsub, or gsub return the group captures.

    I need to extract key-value pairs from strings that are encoded thus:

    \((.*?) :: (0\.[0-9]+)\)
    Answered on May 20, 2019.
    Add Comment
    Syntax Feature
    Any character except [\^$.|?*+() Literal character
    \ followed by any of [\^$.|?*+(){} Backslash escapes a metacharacter
    . Any character
    | Alternation
    \| Alternation
    ? Greedy quantifier
    \? Greedy quantifier
    ?? Lazy quantifier
    ?+ Possessive quantifier
    * Greedy quantifier
    *? Lazy quantifier
    *+ Possessive quantifier
    + Greedy quantifier
    \+ Greedy quantifier
    +? Lazy quantifier
    ++ Possessive quantifier
    { and } Literal curly braces
    {n} where n is an integer >= 1 Fixed quantifier
    {n,m} where n >= 0 and m >= n Greedy quantifier
    {n,} where n >= 0 Greedy quantifier
    {,m} where m >= 1 Greedy quantifier
    \{n\} where n is an integer >= 1 Fixed quantifier
    \{n,m\} where n >= 0 and m >= n Greedy quantifier
    \{n,\} where n >= 0 Greedy quantifier
    \{,m\} where m >= 1 Greedy quantifier
    {n,m}? where n >= 0 and m >= n Lazy quantifier
    {n,}? where n >= 0 Lazy quantifier
    {,m}? where m >= 1 Lazy quantifier
    {n,m}+ where n >= 0 and m >= n Possessive quantifier
    {n,}+ where n >= 0 Possessive quantifier
    ^ String anchor
    ^ Line anchor
    $ String anchor
    $ Line anchor
    \a Character escape
    \A String anchor
    \A Attempt anchor
    \b Backspace character
    \b Word boundary
    \B Backslash character
    \c XML shorthand
    \ca through \cz Control character escape
    \cA through \cZ Control character escape
    \C XML shorthand
    \B Word boundary
    \d Digits shorthand
    \D Non-digits shorthand
    \e Escape character
    \f Form feed character
    \g{name} Named backreference
    \g-1\g-2, etc. Relative Backreference
    \g{-1}\g{-2}, etc. Relative Backreference
    \g1 through \g99 Backreference
    \g{1} through \g{99} Backreference
    \g<name> where “name” is the name of a capturing group Named subroutine call
    \g<name> where “name” is the name of a capturing group Named backreference
    \g'name' where “name” is the name of a capturing group Named subroutine call
    \g'name' where “name” is the name of a capturing group Named backreference
    \g<0> Recursion
    \g'0' Recursion
    \g<1> where 1 is the number of a capturing group Subroutine call
    \g<1> where 1 is the number of a capturing group Backreference
    \g'1' where 1 is the number of a capturing group Subroutine call
    \g'1' where 1 is the number of a capturing group Backreference
    \g<-1> where -1 is a negative integer Relative subroutine call
    \g<-1> where -1 is is a negative integer Relative backreference
    \g'-1' where -1 is is a negative integer Relative subroutine call
    \g'-1' where -1 is is a negative integer Relative backreference
    \g<+1> where +1 is a positive integer Forward subroutine call
    \g'+1' where +1 is is a positive integer Forward subroutine call
    \G Attempt anchor
    \G Match anchor
    \h Hexadecimal digit shorthand
    \h Horizontal whitespace shorthand
    \H Non-hexadecimal digit shorthand
    \H Non-horizontal whitespace shorthand
    \i XML shorthand
    \I XML shorthand
    \k<name> Named backreference
    \k'name' through \k'99' Named backreference
    \k{name} Named backreference
    \k<1> through \k<99> Backreference
    \k'1' through \k'99' Backreference
    \k<-1>\k<-2>, etc. Relative Backreference
    \k'-1'\k'-2', etc. Relative Backreference
    \K Keep text out of the regex match
    \l Lowercase shorthand
    \L Non-lowercase shorthand
    \m Tcl word boundary
    \M Tcl word boundary
    \n Line feed character
    \N Not a line break
    Literal CRLF, LF, or CR line break Line break
    \o{7777} where 7777 is any octal number Octal escape
    \pL where L is a Unicode category Unicode category
    \PL where L is a Unicode category Unicode category
    \p{L} where L is a Unicode category Unicode category
    \p{IsL} where L is a Unicode category Unicode category
    \p{Category} Unicode category
    \p{IsCategory} Unicode category
    \p{Script} Unicode script
    \p{IsScript} Unicode script
    \p{Block} Unicode block
    \p{InBlock} Unicode block
    \p{IsBlock} Unicode block
    \P{Property} Negated Unicode property
    \p{^Property} Negated Unicode property
    \P{^Property} Unicode property
    \Q…\E Escape sequence
    \r Carriage return character
    \R Line break
    \s Whitespace shorthand
    \S Non-whitespace shorthand
    \t Tab character
    \u Uppercase shorthand
    \uFFFF where FFFF are 4 hexadecimal digits Unicode code point
    \u{FFFF} where FFFF are 1 to 4 hexadecimal digits Unicode code point
    \U Non-uppercase shorthand
    \v Vertical tab character
    \v Vertical whitespace shorthand
    \V Non-vertical whitespace shorthand
    \w Word character shorthand
    \W Non-word character shorthand
    \xFF where FF are 2 hexadecimal digits Hexadecimal escape
    \xFFFF where FFFF are 4 hexadecimal digits Unicode code point
    \x{FFFF} where FFFF are 1 to 4 hexadecimal digits Unicode code point
    \X Unicode grapheme
    \y Tcl word boundary
    \Y Tcl word boundary
    \Z String anchor
    \z String anchor
    \0 NULL escape
    \1 through \7 Octal escape
    \1 through \9 Backreference
    \10 through \77 Octal escape
    \10 through \99 Backreference
    \100 through \377 Octal escape
    \01 through \0377 Octal escape
    \` String anchor
    \` Attempt anchor
    \' String anchor
    \< GNU word boundary
    \> GNU word boundary
    [[:<:]] POSIX word boundary
    [[:>:]] POSIX word boundary
    (regex) Capturing group
    \(regex\) Capturing group
    (?:regex) Non-capturing group
    (?<name>regex) Named capturing group
    (?'name'regex) Named capturing group
    (?#comment) Comment
    (?|regex) Branch reset group
    Answered on May 20, 2019.
    Add Comment

    Character Class SyntaxFeatureAny character except ^-]\Literal character\ (backslash) followed by any of ^-]\Backslash escapes a metacharacter\Literal backslash- between two tokens that each specify a single characterRange^ immediately after the opening [Negated character class[Literal opening bracket[Nested character class[base-[subtract]]Character class subtraction[base&&[intersect]]Character class intersection[base&&intersect]Character class intersection[:alpha:]POSIX class[:^alpha:]Negated POSIX class\p{Alpha}POSIX class\p{IsAlpha}POSIX class[.span-ll.]POSIX collation sequence[=x=]POSIX character equivalence

    Answered on May 20, 2019.
    Add Comment


  • Your Answer

    By posting your answer, you agree to the privacy policy and terms of service.