Improving Golang input reading

Правка en1, от gealbermorales, 2021-08-25 18:28:35

fmt.Scan family in Golang

Prerequisites

In order to follow this post, we will need to had installed Golang and Python in your machine

Introduction

Reading input from stdin in Golang and in any other language is not a tricky thing to do, actually is quite easy. In Go you could achieve this by simple using one of these functions:

  • fmt.Scan
  • fmt.Scanf
  • fmt.Scanln

This function is the standard that you would use for reading from stdin.

The program

On a recent post, I wrote about the issue that I was having related to Time Limit Exceeded(TLE) on my submissions written in Go. Even in a ridiculous way, given that implementation of the same algorithm in Python were accepted, while my Go implementations were not. Almost all of my submissions written in Go faced TLE, so I decided to dig a little further on the issue.

After profiling a program that receive an input in this way

1 4 1 2 3 4

Where the first number is the number of test cases, the second one is the number of elements of the array, and the next elements are those elements of the array.

I won't do anything fancy with this input, I will just read them and see how fast Go can do this using fmt scans functions. This is the code I'm using, let's see it: package main

import (
    "flag"
    "fmt"
    "log"
    "os"
    "runtime/pprof"
)

var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")

func main() {
    flag.Parse()
    if *cpuprofile != "" {
        f, err := os.Create(*cpuprofile)
        if err != nil {
            log.Fatal(err)
        }
        pprof.StartCPUProfile(f)
        defer pprof.StopCPUProfile()
    }

    var tc int
    fmt.Scan(&tc)

	for i := 0; i < tc; i++ {
        var n int
        fmt.Scan(&n)
        arr := make([]int, n)
        for j := 0; j < n; j++ {
            fmt.Scan(&arr[j])
        }
        fmt.Println("DONE TC: ", i+1)
	}
}

The first part of the program is just using "runtime/pprof" to make the profiling of the program. Now if you run this program with a small input, you won't have any issue related to performance, the problem start when we try this program with an input with large amount of elements in the array. Let's write a simple script in Python to generate an input file:

with open("in_large", "w") as f:
    f.write("1\n")
    f.write("2000000\n")
    for i in range(2000000):
        f.write("1 ")

Profiling

This will generate an input with one test case and an array of $$$2*10^6$$$ elements. Now let's profile this program, here are the steps to follow:

  1. Compile the code go build -o main
  2. Generate the input file, python app.py, where app.py contains the code above
  3. Run the compiled program against the input file generated, ./main -cpuprofile=out.prof < input_large
  4. Open profile tool, go tool pprof main out.prof

Now let's see if we could identify the bottleneck on the program. After running top10, you will see something similar to this output:

Output:

File: main
Type: cpu
Time: Aug 25, 2021 at 11:00am (CDT)
Duration: 7.12s, Total samples = 7.08s (99.38%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10 
Showing nodes accounting for 4510ms, 63.70% of 7080ms total
Dropped 38 nodes (cum <= 35.40ms)
Showing top 10 nodes out of 69
      flat  flat%   sum%        cum   cum%
    3290ms 46.47% 46.47%     3790ms 53.53%  syscall.Syscall
     270ms  3.81% 50.28%     4900ms 69.21%  fmt.(*ss).ReadRune
     160ms  2.26% 52.54%      400ms  5.65%  runtime.mallocgc
     140ms  1.98% 54.52%     4630ms 65.40%  fmt.(*readRune).ReadRune
     110ms  1.55% 56.07%     2520ms 35.59%  fmt.(*ss).consume
     110ms  1.55% 57.63%     5010ms 70.76%  fmt.(*ss).getRune
     110ms  1.55% 59.18%     4460ms 62.99%  io.ReadAtLeast
     110ms  1.55% 60.73%      280ms  3.95%  runtime.exitsyscall
     110ms  1.55% 62.29%      190ms  2.68%  runtime.reentersyscall
     100ms  1.41% 63.70%     2760ms 38.98%  fmt.(*ss).SkipSpace

Ups! That's a lot of syscalls, not good for such a simple task. All this syscalls are due to repetitive call to fmt.Scan to read the elements of the array, so what could we do to avoid all this. Let's make the same thing differently, using "bufio" package we could read these elements. The code would be like this:

package main
 
import (
    "bufio"
    "flag"
    "fmt"
    "log"
    "os"
    "runtime/pprof"
    "strconv"
    "strings"
)

var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")

func main() {
    flag.Parse()
    if *cpuprofile != "" {
        f, err := os.Create(*cpuprofile)
        if err != nil {
            log.Fatal(err)
        }
        pprof.StartCPUProfile(f)
        defer pprof.StopCPUProfile()
    }

    in := bufio.NewReader(os.Stdin)
    tc := readInt(in)

	for i := 0; i < tc; i++ {
        n := readInt(in)
        _ = readArrInt(in)
        fmt.Println("N: ", n)
        fmt.Println("DONE TC: ", i+1)
	}
}

func readInt(in *bufio.Reader) int {
	nStr, _ := in.ReadString('\n')
	nStr = strings.ReplaceAll(nStr, "\r", "")
	nStr = strings.ReplaceAll(nStr, "\n", "")
	n, _ := strconv.Atoi(nStr)
	return n
}

func readLineNumbs(in *bufio.Reader) []string {
	line, _ := in.ReadString('\n')
	line = strings.ReplaceAll(line, "\r", "")
	line = strings.ReplaceAll(line, "\n", "")
	numbs := strings.Split(line, " ")
	return numbs
}

func readArrInt(in *bufio.Reader) []int {
	numbs := readLineNumbs(in)
	arr := make([]int, len(numbs))
	for i, n := range numbs {
		val, _ := strconv.Atoi(n)
		arr[i] = val
	}
	return arr
}

func readArrInt64(in *bufio.Reader) []int64 {
	numbs := readLineNumbs(in)
	arr := make([]int64, len(numbs))
	for i, n := range numbs {
		val, _ := strconv.ParseInt(n, 10, 64)
		arr[i] = val
	}
	return arr
}

This code is a little larger because I wrote four auxiliary functions, the profiling for this program is ridiculously faster than the previous one:

File: main
Type: cpu
Time: Aug 25, 2021 at 11:03am (CDT)
Duration: 302.30ms, Total samples = 230ms (76.08%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 220ms, 95.65% of 230ms total
Showing top 10 nodes out of 40
      flat  flat%   sum%        cum   cum%
      60ms 26.09% 26.09%       60ms 26.09%  indexbytebody
      40ms 17.39% 43.48%      130ms 56.52%  strings.genSplit
      30ms 13.04% 56.52%       40ms 17.39%  runtime.scanobject
      20ms  8.70% 65.22%      170ms 73.91%  main.readArrInt
      20ms  8.70% 73.91%       20ms  8.70%  strconv.Atoi
      10ms  4.35% 78.26%       10ms  4.35%  internal/bytealg.IndexByteString
      10ms  4.35% 82.61%       10ms  4.35%  runtime.(*gcBits).bitp (inline)
      10ms  4.35% 86.96%       10ms  4.35%  runtime.memclrNoHeapPointers
      10ms  4.35% 91.30%       10ms  4.35%  runtime.osyield
      10ms  4.35% 95.65%       10ms  4.35%  runtime.spanOf (inline)

GOOD BUY syscalls! This way is more efficient that using fmt.Scan family functions.

Conclusion

If you are going to read a large amount of elements, use the functions in "bufio" package instead of "fmt". I hope this is useful for someone facing the same issue.

Теги golang, input manipulator

История

 
 
 
 
Правки
 
 
  Rev. Язык Кто Когда Δ Комментарий
en4 Английский gealbermorales 2021-08-25 18:35:51 22 (published)
en3 Английский gealbermorales 2021-08-25 18:31:15 124
en2 Английский gealbermorales 2021-08-25 18:29:16 268
en1 Английский gealbermorales 2021-08-25 18:28:35 7658 Initial revision (saved to drafts)