Parse Strings Into Go Structs

I will take the opportunity to use this blog post to describe how I parse strings into GO structs. For demonstration purposes I will use the common log format found in Apache web servers.

We start off by creating a regex for parsing each log entry (line). The regex will include multiple named regex groups. This is important because we will utilize struct tags and match them against the name of regex groups in order to identify which part of the string belongs to which struct field.

So let us take the following sample string:

127.0.0.1 - frank [05/Oct/2020:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

We will use this regex for parsing it:

(?P<ip>^\S+)\s-\s(?P<user>\S+)\s\[(?P<datetime>\S+\s\S+)\]\s"(?P<method>[A-Z]+)\s(?P<request>[^•"]+)\s\S+"\s(?P<status>[0-9]{3})\s(?P<size>[0-9]+|-)

There’s various sites on the internet that allow you to test the regex against the afore mentioned string. But you can also take my word for it ;)

Now moving on to the GO code.

First we’re going to create a struct type to store our parsed entries in:

type logLine struct {
	IP       string `match:"ip"`
	User     string `match:"user"`
	DateTime string `match:"datetime"`
	Method   string `match:"method"`
	Request  string `match:"request"`
	Status   string `match:"status"`
	Size     string `match:"size"`
}

Notice I have added a GO struct tag to each field. If you’re unfamiliar with them here’s some reading material from golang.org & a great DigitalOcean post. We will use these tags to identify which regex group belongs to which struct field.

We’ll create a function that reads our log file and calls a parsing method for each line of our file:

func readLog(path string) error {
	file, err := os.Open(path)
	if err != nil {
		return err
	}

	scanner := bufio.NewScanner(file)
	scanner.Split(bufio.ScanLines)

	for scanner.Scan() {
		parsed :=parseLine(scanner.Text())
                parsed.PrintPairs()
	}
	return nil
}

Next up the parsing method itself:

func parseLine(line string) *logLine {
	matchTag := "match"
	r := regexp.MustCompile(logLineReg)
	matches := r.FindStringSubmatch(line)

	ll := &logLine{}
	t := reflect.ValueOf(ll).Elem()

	for i, n := range r.SubexpNames() {
		if i == 0 {
			continue
		}

		for j := 0; j < t.NumField(); j++ {
			f := t.Field(j)

			tag := t.Type().Field(j).Tag.Get(matchTag)
			if tag == n {
				f.Set(reflect.ValueOf(matches[i]))
			}
		}
	}
	return ll
}

To break down what’s happening here. We take our previously defined regex string and match it against our current line. Then we move on to iterate over all of our subexpression matches (basically our match groups). Then we use GO’s reflect package to iterate over each field of the logLine struct. If the struct field matches our match group’s name we update the field with our parsed value.

If we put everything together it should look like this:

package main

import (
	"bufio"
	"fmt"
	"log"
	"os"
	"reflect"
	"regexp"
)

var logLineReg = `(?P<ip>^\S+)\s-\s(?P<user>\S+)\s\[(?P<datetime>\S+\s\S+)\]\s"(?P<method>[A-Z]+)\s(?P<request>[^•"]+)\s\S+"\s(?P<status>[0-9]{3})\s(?P<size>[0-9]+|-)`

type logLine struct {
	IP       string `match:"ip"`
	User     string `match:"user"`
	DateTime string `match:"datetime"`
	Method   string `match:"method"`
	Request  string `match:"request"`
	Status   string `match:"status"`
	Size     string `match:"size"`
}

func (ll *logLine) PrintPairs() {
	fmt.Printf("ip -> %s\nuser -> %s\ndatetime -> %s\nmethod -> %s\nrequest -> %s\nstatus -> %s\nsize -> %s\n",
		ll.IP, ll.User, ll.DateTime, ll.Method, ll.Request, ll.Status, ll.Size)
}

func parseLine(line string) *logLine {
	matchTag := "match"
	r := regexp.MustCompile(logLineReg)
	matches := r.FindStringSubmatch(line)

	ll := &logLine{}
	t := reflect.ValueOf(ll).Elem()

	for i, n := range r.SubexpNames() {
		if i == 0 {
			continue
		}

		for j := 0; j < t.NumField(); j++ {
			f := t.Field(j)

			tag := t.Type().Field(j).Tag.Get(matchTag)
			if tag == n {
				f.Set(reflect.ValueOf(matches[i]))
			}
		}
	}
	return ll
}

func readLog(path string) error {
	file, err := os.Open(path)
	if err != nil {
		return err
	}

	scanner := bufio.NewScanner(file)
	scanner.Split(bufio.ScanLines)

	for scanner.Scan() {
		parsed := parseLine(scanner.Text())
		parsed.PrintPairs()
	}

	return nil
}

func main() {
	err := readLog("sample.log")
	if err != nil {
		log.Fatalln(err)
	}
}

Output:

$ go run main.go
ip -> 127.0.0.1
user -> frank
datetime -> 05/Oct/2020:13:55:36 -0700
method -> GET
request -> /apache_pb.gif
status -> 200
size -> 2326

It turns out struct tags can be used for a lot more than just parsing JSON objects :-)