Could not make GNU gprof work on my mac. So I wrote my own profiler for C++.
It prints a neatly-indented time tree for every function's entry and exit.
Example.
Given a sample code like
this#include <bits/stdc++.h>
using namespace std;
const int N = 2e8 + 5;
bool table[N];
void init() {
for (size_t i = 0; i < N; ++i) {
table[i] = true;
}
}
void sieve() {
init();
table[0] = table[1] = false;
for (size_t div = 2; div * div <= N; div++) {
if (table[div]) {
for (size_t mult = div * div; mult < N; mult += div) {
table[mult] = false;
}
}
}
}
bool isComp[N];
int primes[N / 10], cnt = 0;
void clear() {
cnt = 0;
for (int i = 0; i < N; ++i) {
isComp[i] = false;
}
}
void linear_sieve() {
clear();
for (int i = 2; i < N; ++i) {
if (!isComp[i])
primes[++cnt] = i;
for (int j = 1; primes[j] * i < N; ++j) {
isComp[primes[j] * i] = true;
if (i % primes[j] == 0)
break;
}
}
}
void solve() {
sieve();
linear_sieve();
}
int main() {
int tt = 2;
while (tt--)
solve();
}
It produces
this>> main
>> solve()
>> sieve()
>> init()
<< init(): 28.065 ms
<< sieve(): 905.288 ms
>> linear_sieve()
>> clear()
<< clear(): 15.414 ms
<< linear_sieve(): 453.257 ms
<< solve(): 1.359 s
>> solve()
>> sieve()
>> init()
<< init(): 4.098 ms
<< sieve(): 849.913 ms
>> linear_sieve()
>> clear()
<< clear(): 6.121 ms
<< linear_sieve(): 440.210 ms
<< solve(): 1.290 s
<< main: 2.649 s
I’ve pushed the code and usage details to to Github.
How to Run?
Commandg++ -std=c++23 -O2 -g -Wall \
-finstrument-functions \
-finstrument-functions-exclude-file-list=/bits/stl,debug.h \ #example
solution.cpp profiler.cpp -o prof
./prof < in.txt > out.txt # profile appears on stderr
At a glance
- Zero code changes inside your solution.
- Skip profiling all the std:: calls. (This was super tricky to achieve.)
- Overhead ≈ 50–100 ns per call.
- POSIX only (
dladdr system call), Won't work in windows. - Tested on GCC 14.
- Multithreaded code or recursion will be hard to analyse.
Feedback and pull requests are welcome.
Peace!