Ideas
Looking at the $$$W \leq 1$$$ subbugaboo, we see that it is helpful to colour nodes based on the XOR of edges on the path to $$$0$$$. Let this value be $$$d_u$$$. In the $$$W \leq 1$$$ subbugaboo, the answer is $$$0$$$ if and only if there exists a common ancestor $$$L$$$ of $$$U$$$ and $$$V$$$, with $$$d_L = d_U = d_V$$$.
We can generalise this to compute $$$\lfloor \log_2 \text{ans} \rfloor$$$. We note that if $$$\text{ans} \lt 2^k$$$, we can only jump between nodes $$$u$$$ and $$$v$$$ with $$$d_u \oplus d_v \lt 2^k \implies (d_u \gg k) = (d_v \gg k)$$$. Thus, there must be a common ancestor $$$L$$$ of $$$U$$$ and $$$V$$$, with $$$(d_L \gg k) = (d_U \gg k) = (d_V \gg k)$$$. Let us compute $$$par(u,k)$$$ (which we will discuss later), the highest ancestor of $$$u$$$ such that $$$d_{par(u,k)} \gg k = d_u \gg k$$$. Then, this reduces to checking if $$$par(U,k)=par(V,k)$$$.
Suppose we have found out that $$$2^k \leq \text{ans} \lt 2^{k+1}$$$. Any jumps with $$$d_u \oplus d_v \lt 2^k$$$ are now "free". Using only "free jumps", we can jump from $$$U$$$ to any node $$$u$$$ such that $$$(d_u \gg k) = (d_U \gg k)$$$ and $$$u$$$ is a descendant of $$$par(U,k)$$$ (it is easy to check these two conditions are both necessary and sufficient). Call the set of such nodes $$$S_U$$$, and define $$$S_V$$$ similarly. Also, if $$$(d_u \gg (k+1)) \neq (d_v \gg (k+1))$$$, jumping from $$$u$$$ to $$$v$$$ is bad since $$$d_u \oplus d_v \geq 2^{k+1}$$$.
There are two cases here.
Case 1: $$$par(U,k)$$$ is an ancestor of $$$par(V,k)$$$.
In this case, $$$(d_V \gg k) \neq (d_U \gg k)$$$ (since if they were equal, $$$par(V,k)$$$ would not be the highest parent). In this case, to get from $$$U$$$ to $$$V$$$, we must jump from $$$S_U$$$ to $$$S_V$$$ at some point in time. Suppose we jump from $$$u \in S_U$$$ to $$$v \in S_V$$$. Then we need to minimise $$$d_u \oplus d_v$$$ over all such $$$u$$$ and $$$v$$$.
Let $$$x$$$ be the lower node. Then, we want to minimise $$$d_x \oplus d_y$$$, given that $$$y$$$ is an ancestor of $$$x$$$, and the most significant bit $$$d_x$$$ and $$$d_y$$$ differ is the $$$k$$$-th bit. Let this be $$$f(x,k)$$$. We thus have to find the minimum value of $$$f(x,k)$$$, where $$$x$$$ is in the subtree of $$$par(V,k)$$$ and $$$(d_x \gg (k+1)) = (d_{par(V,k)} \gg (k+1))$$$. Call this $$$g(par(V,k),k+1)$$$. We will discuss how to compute $$$f$$$ and $$$g$$$ later.
Case 2: $$$par(U,k)$$$ and $$$par(V,k)$$$ are not ancestors of each other.
In this case, we first jump out of the subtree of $$$par(U,k)$$$, for a cost of $$$g(par(U,k),k+1)$$$, then jump into the subtree of $$$par(V,k)$$$, for a cost of $$$g(par(V,k),k+1)$$$.
Implementation Notes
$$$par$$$ and $$$f$$$ can be computed easily with a trie. DFS from node $$$0$$$, and add $$$d_x$$$ to the trie when we enter it, then remove $$$d_x$$$ when we leave it.
I avoided implementation messiness by using an array segtree rather than pointers. I had an array of size $$$2^{21}$$$, and the left and right children of node $$$x$$$ are $$$2x$$$ and $$$2x+1$$$. The root is node $$$1$$$, and nodes $$$[2^{20},2^{21})$$$ are the leaves.
Note that $$$g(x,k)$$$ is the minimum value of $$$f(y,k-1)$$$ where $$$(d_y \gg k) = (d_x \gg k)$$$. We store a map, where the key is $$$(k, d_y \gg k)$$$, and the value is the minimum $$$f(y,k-1)$$$. Two maps can be merged using small-to-large merging to achieve a complexity of $$$N \log N \log A + Q \log \log A$$$. (Since I used regular rather than unordered maps, and found $$$k$$$ by linear search rather than binary search, my code's complexity was $$$N \log^2 N \log A + Q \log A$$$).
In my implementation, I lumped $$$(k, d_y \gg k)$$$ into a single integer.
Implementation#include "teleport.h"
#include <bits/stdc++.h>
using namespace std;
int par[50'005][21];
map<int,int> mpScore[50'005];
int minScore[50'005][21];
int dist[50'005];
int pre[50'005], post[50'005];
int seg[1<<21];
void init(int N, std::vector<int> P, std::vector<int> W) {
fill(seg,seg+(1<<21),50005);
vector<vector<pair<int,int>>> child(N);
for(int i=1;i<N;i++)
child[P[i]].push_back({i,W[i]});
int cnt = 0;
auto dfs = [&child,&cnt](auto &dfs, int x, int p) -> void {
pre[x] = cnt++;
int v = dist[x] + (1<<20);
for(int i=v;i;i>>=1)
seg[i] = min(seg[i],x);
for(int i=0;i<21;i++)
par[x][i] = seg[v>>i];
for(int i=0;i<20;i++) {
int cur = (v >> i) ^ 1;
int j = i;
if(seg[cur] == 50005)
continue;
while(cur < (1<<20)) {
j--;
int pref = (cur << 1) | ((v>>j) & 1);
if(seg[pref] == 50005)
cur = pref ^ 1;
else
cur = pref;
}
cur ^= v;
int k = v >> (i + 1);
if(mpScore[x].count(k))
mpScore[x][k] = min(mpScore[x][k], cur);
else
mpScore[x][k] = cur;
}
for(auto [i, w]: child[x])
if(i != p) {
dist[i] = dist[x] ^ w;
dfs(dfs, i, x);
if(mpScore[i].size() > mpScore[x].size())
swap(mpScore[x], mpScore[i]);
for(auto [k, v]: mpScore[i]) {
if(mpScore[x].count(k))
mpScore[x][k] = min(mpScore[x][k], v);
else
mpScore[x][k] = v;
}
}
for(int i=1;i<21;i++)
minScore[x][i] = (mpScore[x].count(v>>i)?mpScore[x][v>>i]:50005);
for(int i=v;i;i>>=1)
if(seg[i] == x)
seg[i] = 50005;
post[x] = cnt;
};
dfs(dfs, 0, -1);
}
int minimum_energy(int U, int V) {
int layer;
for(layer=0;layer<21;layer++) {
if(par[U][layer] == par[V][layer])
break;
}
if(layer == 0)
return 0;
return max(minScore[par[U][layer-1]][layer],minScore[par[V][layer-1]][layer]);
}