Compare the original array (say $$$a$$$) to the sorted one (say $$$b$$$). Since we can move only one token at once, in one move we can place at max one token to its correct place. So let's try to minimise the number of moves that don't do this.
To make optimal moves, model a directed multi-edged graph. Start with only vertices ($$$1$$$ to $$$n$$$), no edges, and add a directed edge from $$$a[i]$$$ to $$$b[i]$$$ when $$$a[i]!=b[i]$$$, for each $$$i$$$ (meaning that $$$b[i]$$$ should be placed in place of $$$a[i]$$$).
We can follow an optimal scheme like this: For now let us assume there is at least one position where correct token is placed (let it be $$$p$$$). Choose an incorrectly placed token, place it on top of $$$p$$$. This creates a gap. Place the correct token for this place. This moves the gap somewhere else. Again choose a corect token for the gap and so on. In the end, place the token stacked on top of $$$p$$$ into the gap, close the gap, and all the tokens we touched are placed correctly.
In the graph, placing a token from $$$b[i]$$$ to gap at $$$a[i]$$$ would mean moving from $$$a[i]$$$ to $$$b[i]$$$ in the graph along an edge and removing the edge. Thus the problem reduces to removing all edges of the graph in minimum number of moves.
The graph is split into disjoint connected componnts. We can thus independently resolve each component optimally for minimum number of moves.
Next, we prove that a connected component is always an euler circuit.
Claim $$$1$$$: The component would have a cycle. If it doesn't, then following some path $$$a \rightarrow b \rightarrow ...$$$, we will reach a dead end, which means that the last element has no place to go, which is not true since it belongs to the array and must have a correct position to be placed at. By induction, we can keep removing cycles (by moving along them and removing edges), and so the component is basically a set of directed cycles intersecting at vertices.
Claim $$$2$$$: It will hence form an euler circuit. Let us go along a cycle. It starts at $$$v$$$, ends at $$$v$$$. Now if there is a cyce intersecting at some vertex $$$v'$$$ in the cycle, on reaching $$$v'$$$ resolve that cycle, return at $$$v'$$$, continue on the older cycle. By this inductive argument, we can cover all edges exactly once and return to $$$v$$$, making an euler circuit.
To find the precise number of moves to resolve a component involving $$$s$$$ tokens, we need to create a gap in $$$1$$$ move, place $$$s-1$$$ tokens correctly, and place the last token we moved initially. Total $$$s+1$$$ moves. To create a gap, we need to place a token from our position to some position not in the component.
Now, proper casework:
Let the number of tokens not placed correctly be $$$co$$$. Let the number of disjoint connected components be $$$x$$$.
If $$$co \lt n$$$: There is always a correctly placed position to stack our token on and create a gap. So, each component of size s will be resolved in $$$s+1$$$. So total $$$co+x$$$ moves.
If $$$x \gt 1$$$: Since there are $$$ \gt 1$$$ components, we can use a position from the other component to create gap for one component. Again $$$co+x$$$.
$$$co==n$$$ and $$$x==1$$$: It may be a single cycle, or $$$ \gt 1$$$ joint cycles. If there are $$$ \gt 1$$$ joint cycles, resolve one of the cycles, then it reduces to the cases above. Totals to $$$n+2$$$ moves. For exactly one cycle, graph is $$$1 \rightarrow 2 \rightarrow ...n \rightarrow 1$$$. So stack $$$1$$$ on $$$2$$$, resolve rest $$$n-2$$$, total $$$n-1$$$. Then resolve $$$1$$$ and $$$2$$$ which takes $$$3$$$ moves. So total again is $$$n+2$$$. Since $$$n+1$$$ is not possible in this case, and we construct a solution in $$$n+2$$$, it is optimal.
Final solution:
$$$(co \lt n)$$$ or $$$(x \gt 1)$$$ : $$$co+x$$$
$$$(co==n)$$$ and $$$(x==1)$$$ : $$$n+2$$$