Saturday, April 28, 2018

Using sed Capture Groups (Linux/Mac)

This will be a short one and belongs to the TIL bin: until 2 days ago I did not even know about capture groups and how to use it. So, I knew how to replace a matching string/pattern in sed

bash-3.2$ echo "As of today swallow_v=23kph at STD" | sed -e 's/swallow_v/swallow_speed/' 
As of today swallow_speed=23kph at STD
bash-3.2$ 
And how to replace from a given pattern all the way to the end of the line. Or starting from the beginning of the line to said pattern:

bash-3.2$ echo "As of today swallow_v=23kph at STD" | sed -e 's/swallow_v=.*/swallow_speed=42/' 
As of today swallow_speed=42
bash-3.2$ echo "As of today swallow_v=23kph at STD" | sed -e 's/^.*swallow_v=/Can you believe that swallow_speed=/' 
Can you believe that swallow_speed=23kph at STD
bash-3.2$ 

If you are curious, the .* in the search pattern means "any character or list of characters here, be it zero or a lot of characters". The dot (.) does the any part and the asterisk (*) the how many. I think this is from regex, but don't quote me on that. Fine, what if I want to replace everything between two patterns, but leaving the second pattern alone? Tricky. You see, replacing everything between the two patterns, inclusive is not that hard

raub@desktop:~$ echo "As of today swallow_v=23kph at STD" | sed -e 's/swallow_v=.*k/swallow_v=25/'
As of today swallow_v=25ph at STD
raub@desktop:~$ 

But to preserve the second pattern we need to use the Capture Groups mentioned in the title of this article. And that makes sense because if it is on the title I better use it. So, we are supposed to surround the capture group pattern with parenthesis and then we can refer to them. If it does not make sense, I too was confused, so let's keep on using our test string:

raub@desktop:~$ echo "As of today swallow_v=23kph at STD" | sed -e 's/swallow_v=.*(k)/swallow_v=25$1/'
As of today swallow_v=23kph at STD
raub@desktop:~$ 

Er, it does not seem to have worked according to the plan. In fact, it was supposed to at least grab swallow_v=23k but as you can see it did not find the pattern. Is $1 the proper way to output the captured string? Going nowhere slowly.

After much soul searching, I found the -e requires the parenthesis to be escaped. And, the capture group pattern is output using \1 instead of $1.So we try again:

raub@desktop:~$ echo "As of today swallow_v=23kph at STD" | sed -e "s/swallow_v=.*\(k\)/swallow_v=25\1/"
As of today swallow_v=25kph at STD
raub@desktop:~$ 

much better!

What about the Mac? Same thing (the bash-3.2$ you have seen all day is it; the raub@desktop:~$ is the Linux box):

bash-3.2$ echo "As of today swallow_v=23kph at STD" | sed -e 's/swallow_v=.*\(k\)/swallow_v=25\1/' 
As of today swallow_v=25kph at STD
bash-3.2$ 

I am not going to say it is perfect though:

bash-3.2$ echo "As of today swallow_v=23kph at STD" | sed -e 's/swallow_v=.*\([[:blank:]]\)/swallow_v=25\1/' 
As of today swallow_v=25 STD
bash-3.2$ echo "As of today swallow_v=23kph at    STD" | sed -e 's/swallow_v=.*\([[:blank:]]\)/swallow_v=25\1/' 
As of today swallow_v=25 STD
bash-3.2$ 

References

  • A site I think has lots of interesting sed examples.

No comments: