1. **Abridged Problem Statement**

We consider complete simple undirected graphs on \(N\) labeled vertices, where each edge is colored with one of \(M\) colors. Two such colored graphs are considered *isomorphic* if one can be turned into the other by renumbering the vertices (i.e., by a permutation of the vertices that preserves edge colors).  

Given integers \(N\) (up to 53), \(M\) (up to 1000), and a prime \(P\), compute the number of non‑isomorphic \(M\)-colored complete graphs on \(N\) vertices, modulo \(P\).


---

2. **Detailed Editorial**

### 2.1. What is being counted?

We have:

- Vertices: \(1,2,\dots,N\).
- Edges: all unordered pairs \(\{i,j\}\), \(i < j\). So we have \(\binom{N}{2}\) edges.
- Each edge independently gets one of \(M\) colors.

If vertices are labeled, the number of colorings is:
\[
M^{\binom{N}{2}}.
\]

But we want graphs up to *vertex relabeling*. The group that acts on the vertices is the symmetric group \(S_N\): every permutation \(\pi\in S_N\) relabels vertices, and thus permutes edges and their colors. Two colorings are in the same orbit (under \(S_N\)) iff the corresponding graphs are isomorphic.

We are counting the number of orbits of this group action.

This is a classic application of **Burnside’s Lemma** (a.k.a. the Cauchy–Frobenius lemma).

---

### 2.2. Burnside’s Lemma

Burnside’s lemma:

\[
\text{# orbits} \;=\; \frac{1}{|G|} \sum_{g\in G} \text{Fix}(g),
\]
where

- \(G = S_N\) is the group of permutations of vertices,
- \(|G| = N!\),
- \(\text{Fix}(g)\) is the number of colorings fixed (left unchanged) by the permutation \(g\).

Here, a coloring is fixed by a permutation \(g\) if for every edge \(\{i,j\}\), the color of edge \(\{i,j\}\) equals the color of edge \(\{g(i), g(j)\}\).

So we must:

1. For each permutation \(g\in S_N\),
2. Determine how many edge-orbits the permutation induces (i.e., how it permutes edges),
3. For each orbit, all edges in that orbit must have the same color to be fixed, so:
   \[
   \text{Fix}(g) = M^{\#\text{edge-orbits under } g}.
   \]
4. Then average over all permutations and divide by \(N!\).

Directly iterating over all \(N!\) permutations is impossible for \(N\le 53\). We need to group permutations by their *cycle structure*.

---

### 2.3. Cycle structure and partitions of \(N\)

Every permutation in \(S_N\) decomposes uniquely into disjoint cycles on the vertices. Only the **cycle lengths** matter for our counting, not which specific vertices are in which cycle.

A cycle structure is thus a **partition** of \(N\):
\[
N = c_1 + c_2 + \dots + c_k, \quad c_1 \ge c_2 \ge \dots \ge c_k \ge 1.
\]
Each \(c_i\) is a cycle length.

For a given cycle type \((c_1, c_2, \dots, c_k)\):

- **Number of permutations** with that cycle type:
  \[
  \text{count} = \frac{N!}{\left(\prod_{i=1}^{k} c_i\right)\cdot\left(\prod_{\ell} (\text{mult}(\ell)!)\right)},
  \]
  where \(\text{mult}(\ell)\) is the number of cycles of length \(\ell\) in the partition.

  Explanation:
  - First, if you ignore repeated lengths, the number of permutations with cycles of lengths \(c_i\) is:
    \[
    \frac{N!}{\prod_i c_i}.
    \]
    (Because for each cycle of length \(c\), its internal rotation of vertices gives \(c\) different permutations that represent the same cycle, so you divide by \(c\) for each cycle.)
  - If some lengths repeat, you can permute identical-length cycles among themselves without changing the cycle type, so you must divide further by \(\text{mult}(\ell)!\) for each length \(\ell\).

- **Number of edge-orbits** under such a permutation:
  - Let the cycle lengths be \(c_1, c_2, \dots, c_k\) (not necessarily sorted).
  - Edges are pairs of vertices; they can be:
    1. Between two **distinct cycles** of lengths \(c_i\) and \(c_j\).
    2. Within the **same cycle** of length \(c_i\).

We must count how many orbits of edges there are.

---

### 2.4. Edge orbits between different cycles (\(c_i \neq c_j\))

Consider two cycles of lengths \(c\) and \(d\). Label vertices in those cycles:

- First cycle: \(a_0, a_1, \dots, a_{c-1}\) where permutation sends \(a_t\) to \(a_{t+1 \bmod c}\).
- Second cycle: \(b_0, b_1, \dots, b_{d-1}\) similarly.

Edges between these cycles are all pairs \(\{a_x, b_y\}\).

Under the permutation:

- \(g^1\) sends edge \(\{a_x, b_y\}\) to \(\{a_{x+1}, b_{y+1}\}\),
- \(g^2\) sends it to \(\{a_{x+2}, b_{y+2}\}\),
- etc.

We are walking in steps of \((+1, +1)\) modulo \((c, d)\). The length of that orbit is
\[
L = \text{lcm}(c, d).
\]

The total number of such edges is \(c \cdot d\). Each orbit has length \(L\), so number of distinct orbits is:
\[
\frac{c\cdot d}{\text{lcm}(c,d)} = \gcd(c, d).
\]

So:
\[
\text{edge-orbits between cycles of lengths } c_i, c_j = \gcd(c_i, c_j).
\]

Summing over all unordered pairs \(i<j\):
\[
E_{\text{between}} = \sum_{1 \le i < j \le k} \gcd(c_i, c_j).
\]

---

### 2.5. Edge orbits within the same cycle (\(c_i\))

Now consider edges among vertices in a single cycle of length \(c\). There are \(\binom{c}{2}\) edges in that cycle.

Again label vertices \(v_0, v_1, \dots, v_{c-1}\). Any edge is a pair \(\{v_x, v_y\}\), \(x\neq y\). Under permutation, we go:

\[
\{v_x, v_y\} \to \{v_{x+1}, v_{y+1}\} \to \{v_{x+2}, v_{y+2}\} \to \dots
\]

The key observation used in the code (and standard fact) is that edges can be classified by the **distance** between their endpoints on the cycle. Define distance modulo \(c\):
\[
d = (y - x) \mod c,
\]
where \(1 \le d \le c-1\). But undirected edge \(\{x,y\}\) is the same as \(\{y,x\}\). Distances \(d\) and \(c-d\) describe the same undirected "type" of edge (e.g. going clockwise or counterclockwise).

So distinct types of distances are:
- If \(c\) is odd: \(\frac{c-1}{2}\) types.
- If \(c\) is even: \(\frac{c}{2}\) types (because there’s also the distance \(c/2\) which is self-opposite).

Under the cycle permutation, all edges with the same undirected distance form **one orbit**. So the number of intra-cycle edge-orbits in a cycle of length \(c\) is:

\[
\left\lfloor \frac{c}{2} \right\rfloor,
\]
which in this problem (full cycle) simplifies to \(c/2\) in the sense used in the reasoning, but as integer arithmetic: `c / 2`. For both odd and even, this matches the count of orbit types they use:

- For odd \(c\): \((c-1)/2\) orbits, which equals `c/2` in integer division.
- For even \(c\): \(c/2\) orbits, which also equals `c/2`.

So:
\[
E_{\text{within}}(c_i) = \frac{c_i}{2} \text{ in integer division.}
\]

Summing over all cycles:
\[
E_{\text{within}} = \sum_{i=1}^{k} \left\lfloor \frac{c_i}{2} \right\rfloor.
\]

---

### 2.6. Total number of edge-orbits for a cycle partition

Given a partition \((c_1, \dots, c_k)\), define:

\[
\text{ex} = E_{\text{within}} + E_{\text{between}} 
= \sum_{i} \left\lfloor\frac{c_i}{2}\right\rfloor + \sum_{i < j} \gcd(c_i, c_j).
\]

Then the number of colorings fixed by a permutation with that cycle type is:
\[
\text{Fix}(\text{partition}) = M^{\text{ex}}.
\]

The total contribution of this cycle type to the Burnside sum is:
\[
\text{Fix} \times \#\{\text{permutations of this type}\}.
\]

So if we let:

- \(f(\mathbf{c}) = \text{Fix} = M^{\text{ex}(\mathbf{c})}\),
- \(k(\mathbf{c}) = \#\{\text{permutations with cycle type } \mathbf{c}\}\),

then Burnside’s lemma says:
\[
\text{answer} = \frac{1}{N!}\sum_{\mathbf{c}} k(\mathbf{c}) f(\mathbf{c}),
\]
where the sum is over all partitions \(\mathbf{c}\) of \(N\).

However, we do not want to recompute \(k(\mathbf{c})\) each time from scratch. We can re-express the contribution in a more convenient factorized way, which matches the code.

---

### 2.7. Grouping and combinatorial formula in the code

We rewrite:

\[
k(\mathbf{c}) 
= \frac{N!}{\prod_i c_i \cdot \prod_\ell (\text{mult}(\ell)!)}.
\]

Thus:
\[
\frac{k(\mathbf{c})}{N!} 
= \frac{1}{\prod_i c_i\cdot \prod_\ell (\text{mult}(\ell)!)}.
\]

Therefore:
\[
\text{answer} 
= \sum_{\mathbf{c} \text{ partition of } N} 
\left( M^{\text{ex}(\mathbf{c})} \times \frac{1}{\prod_i c_i \cdot \prod_\ell (\text{mult}(\ell)!)}\right).
\]

This is exactly what the C++ code computes:

- `ex` is the exponent of \(M\): sum of `c / 2` plus sums of `gcd(cur[i], cur[j])`.
- `prod_ci = ∏ c_i (mod p)`.
- `freq` counts multiplicities (`mult(len)`).
- `prod_mfact = ∏ fact[mult(len)]` which is factorials of multiplicities.
- The denominator is `prod_ci * prod_mfact` (mod \(p\)).
- They invert this denominator modulo \(p\) via Fermat’s little theorem (since \(P\) is prime).

So for each partition, they add:

\[
\text{contrib}(\mathbf{c}) = M^{\text{ex}(\mathbf{c})}\cdot \left(\prod_i c_i \cdot \prod_\ell (\text{mult}(\ell)!)\right)^{-1} \pmod{P}.
\]

`fact[i]` is precomputed with `fact[0] = 1`, `fact[i] = i! mod p`.

Note: Factor of \(N!\) effectively cancels because we are using the 1/N! version directly (not summing `Fix * #permutations` then dividing by N!), thanks to the combinatorial identity above.

Thus the final `ans` computed by `gen` is exactly the number of orbits (non-isomorphic colored graphs), modulo \(P\).

---

### 2.8. Enumerating all partitions of \(N\)

We still need to enumerate all integer partitions of \(N\) efficiently for \(N \le 53\). Number of partitions up to 53 is manageable (~204,226 for 53), so recursion is fine.

The function:
```cpp
int64_t gen(int last, int sum_left, vector<int>& cur)
```
does a standard partition-generation:

- `sum_left` is the remaining sum to fill; initially `sum_left = N`.
- `last` is the maximum next part we can take, ensuring non-increasing order and avoiding duplicates; initially `last = N`.
- `cur` holds the current sequence of parts (cycle lengths).

Pseudo-behavior:

- If `sum_left == 0`, `cur` holds a valid partition of `N`. We evaluate that partition and return its contribution.
- Otherwise, try all possible next parts `x` from `min(last, sum_left)` down to `1`, push `x` into `cur`, recurse with `sum_left - x` and `last = x`, then pop.

This enumerates all non-increasing sequences of positive integers summing to `N`, i.e., all partitions.

Within each completed partition (`sum_left == 0`), we:

1. Compute `ex`:  
   - Add `c / 2` for each part `c`.  
   - For each pair of parts `i < j`, add `gcd(cur[i], cur[j])`.

2. Compute `prod_ci = ∏ c_i (mod P)`.

3. Compute frequencies of each `c_i`, then `prod_mfact = ∏ fact[freq[size]] (mod P)`.

4. Denominator = `prod_ci * prod_mfact % P`.

5. `inv_denom = modular_inverse(denominator)` using exponent `P - 2` in `mod_pow`.

6. `fix = mod_pow(m, ex, P)` is `M^ex mod P`.

7. Contribution = `fix * inv_denom % P`.

We sum contributions over all partitions and take everything modulo \(P\).

Time complexity is dominated by the number of partitions of \(N\), about 2e5 at worst, and per partition we do:

- \(O(k^2)\) for `gcd` pair sums (k is length of partition, typically small),
- some small overhead.

Overall is fine for the constraints.

---

### 2.9. Handling modulo \(P\)

- \(P\) is prime and up to \(10^9\) (from statement, apparently truncated in text but known typical range); anyway, it’s guaranteed prime.
- We use Fermat’s little theorem to invert numbers: \(a^{-1} \equiv a^{P-2} \pmod{P}\).
- All factorials and powers are taken modulo \(P\).

---

3. **C++ Solution with Detailed Line-by-Line Comments**

```cpp
#include <bits/stdc++.h>  // Includes almost all standard headers
using namespace std;

// Overload output operator for pair<T1, T2>
template<typename T1, typename T2>
ostream& operator<<(ostream& out, const pair<T1, T2>& x) {
    // Print pair as "first second"
    return out << x.first << ' ' << x.second;
}

// Overload input operator for pair<T1, T2>
template<typename T1, typename T2>
istream& operator>>(istream& in, pair<T1, T2>& x) {
    // Read two space-separated values into pair
    return in >> x.first >> x.second;
}

// Overload input operator for vector<T>
template<typename T>
istream& operator>>(istream& in, vector<T>& a) {
    // Read each element of the vector from stream
    for(auto& x: a) {
        in >> x;
    }
    return in;
};

// Overload output operator for vector<T>
template<typename T>
ostream& operator<<(ostream& out, const vector<T>& a) {
    // Print all elements of vector with spaces
    for(auto x: a) {
        out << x << ' ';
    }
    return out;
};

// Fast modular exponentiation: compute b^e mod mod
int64_t mod_pow(int64_t b, int64_t e, int64_t mod) {
    int64_t res = 1;   // Will hold result
    b %= mod;          // Reduce base modulo mod
    while(e) {         // While exponent is not zero
        if(e & 1) {    // If current bit of e is 1
            res = res * b % mod; // Multiply result by base
        }
        b = b * b % mod; // Square the base
        e >>= 1;         // Shift exponent right (divide by 2)
    }
    return res;
}

// Compute modular inverse of a modulo prime 'mod' using Fermat's little theorem
int64_t mod_inverse(int64_t a, int64_t mod) { 
    // a^(mod-2) mod mod is the inverse when mod is prime
    return mod_pow(a, mod - 2, mod); 
}

int n, m;       // N = number of vertices, M = number of colors
int64_t p;      // Prime modulus P
vector<int64_t> fact; // fact[i] = i! mod p

// Recursive function to enumerate partitions of 'n' and accumulate contributions
// last      - maximum allowed next part (to ensure non-increasing order)
// sum_left  - remaining sum to fill (initially n)
// cur       - current partition (cycle sizes)
int64_t gen(int last, int sum_left, vector<int>& cur) {
    // Base case: we formed a partition whose parts sum to n
    if(sum_left == 0) {
        // cur now contains cycle lengths c1, c2, ..., ck

        // ex will be the total number of edge-orbits induced by this cycle structure
        int64_t ex = 0;

        // Add intra-cycle contribution: floor(ci / 2) for each cycle ci
        for(int c: cur) {
            ex += c / 2;  // integer division
        }

        // Add inter-cycle contribution: gcd(ci, cj) for every pair i < j
        for(int i = 0; i < (int)cur.size(); i++) {
            for(int j = i + 1; j < (int)cur.size(); j++) {
                ex += gcd(cur[i], cur[j]);
            }
        }

        // Compute product of all cycle lengths, prod_ci = ∏ ci (mod p)
        int64_t prod_ci = 1;
        for(int c: cur) {
            prod_ci = prod_ci * c % p;
        }

        // Compute multiplicities of each cycle length
        map<int, int> freq; // key: cycle length, value: multiplicity
        for(int c: cur) {
            freq[c]++;
        }

        // Compute product over factorials of multiplicities:
        // prod_mfact = ∏ fact[ freq[length] ] (mod p)
        int64_t prod_mfact = 1;
        for(auto pr: freq) {
            prod_mfact = prod_mfact * fact[pr.second] % p;
        }

        // Denominator for this partition's coefficient:
        // denom = (∏ ci) * (∏ mult(len)! ) (mod p)
        // We later invert it modulo p.
        int64_t denom = prod_ci * prod_mfact % p;
        int64_t inv_denom = mod_inverse(denom, p);

        // Number of colorings fixed by this cycle type is M^ex (mod p)
        int64_t fix = mod_pow(m, ex, p);

        // Contribution of this partition to the final answer:
        // Fix * (1 / denom) mod p
        int64_t contrib = fix * inv_denom % p;
        return contrib;
    }

    // Otherwise, we still need to choose more parts (cycle sizes)
    int64_t total = 0;

    // Choose next part x between 1 and min(last, sum_left), in descending order
    for(int x = min(last, sum_left); x >= 1; x--) {
        cur.push_back(x);                        // Add x as next part
        total = (total + gen(x, sum_left - x, cur)) % p;  // Recurse
        cur.pop_back();                          // Remove x, backtrack
    }
    return total;
}

// Read input values (N, M, P)
void read() { 
    cin >> n >> m >> p; 
}

void solve() {
    // Precompute factorials fact[i] = i! mod p for i = 0..n
    fact.assign(n + 1, 1);       // fact[0] = 1
    for(int i = 1; i <= n; i++) {
        fact[i] = fact[i - 1] * i % p;
    }

    // Start recursive generation of partitions
    vector<int> cur;             // will hold one partition at a time
    int64_t ans = gen(n, n, cur); // last = n, sum_left = n initially

    // Output the final answer: number of non-isomorphic M-colored complete graphs
    cout << ans << endl;
}

int main() {
    ios_base::sync_with_stdio(false); // Fast IO
    cin.tie(nullptr);                 // Untie cin from cout
    
    int T = 1; // number of test cases (problem has single test, but template supports multiple)
    // cin >> T; // commented out; could be enabled for multiple tests
    for(int test = 1; test <= T; test++) {
        read(); // read a single test
        solve();
    }

    return 0;
}
```

---

4. **Python Solution with Detailed Comments**

```python
import sys
from math import gcd

# Read input: N = number of vertices, M = number of colors, P = prime modulus
data = sys.stdin.read().strip().split()
n = int(data[0])
m = int(data[1])
p = int(data[2])

# Fast modular exponentiation: compute base^exp % mod
def mod_pow(base, exp, mod):
    result = 1
    base %= mod
    while exp > 0:
        if exp & 1:           # if lowest bit is 1
            result = (result * base) % mod
        base = (base * base) % mod
        exp >>= 1             # shift exponent right by 1 (divide by 2)
    return result

# Modular inverse using Fermat's little theorem (since p is prime)
def mod_inverse(a, mod):
    # a^(mod-2) mod mod is the multiplicative inverse of a modulo mod
    return mod_pow(a, mod - 2, mod)

# Precompute factorials: fact[i] = i! % p for i in [0..n]
fact = [1] * (n + 1)
for i in range(1, n + 1):
    fact[i] = fact[i - 1] * i % p

# Recursive function to enumerate all integer partitions of 'n'
# last:    maximum part size we are allowed to pick next
# sum_left: the remaining sum we need to fill
# cur:     list of chosen parts (current partition)
def gen(last, sum_left, cur):
    # Base case: we've exactly filled the sum (a valid partition)
    if sum_left == 0:
        # cur is a partition of n, representing cycle lengths

        # Compute ex = number of edge-orbits for this cycle structure
        ex = 0

        # Intra-cycle orbits: for each cycle length c, add c//2
        for c in cur:
            ex += c // 2

        # Inter-cycle orbits: for every pair of cycles (i < j), add gcd(ci, cj)
        k = len(cur)
        for i in range(k):
            for j in range(i + 1, k):
                ex += gcd(cur[i], cur[j])

        # Compute product of cycle lengths: prod_ci = ∏ ci (mod p)
        prod_ci = 1
        for c in cur:
            prod_ci = (prod_ci * c) % p

        # Count multiplicities of each cycle length
        freq = {}
        for c in cur:
            freq[c] = freq.get(c, 0) + 1

        # Compute product of factorials of multiplicities:
        # prod_mfact = ∏ fact[count], mod p
        prod_mfact = 1
        for count in freq.values():
            prod_mfact = (prod_mfact * fact[count]) % p

        # Denominator: (∏ ci) * (∏ mult(len)! ) mod p
        denom = (prod_ci * prod_mfact) % p
        inv_denom = mod_inverse(denom, p)

        # Number of colorings fixed: M^ex mod p
        fix = mod_pow(m, ex, p)

        # Contribution of this partition to the final result:
        contrib = (fix * inv_denom) % p
        return contrib

    # Recursive case: still need to choose more parts
    total = 0

    # Next part x can range from min(last, sum_left) down to 1
    # Descending order ensures we generate partitions in non-increasing order
    upper = min(last, sum_left)
    for x in range(upper, 0, -1):
        cur.append(x)                           # choose part x
        total = (total + gen(x, sum_left - x, cur)) % p  # recurse
        cur.pop()                               # undo choice, backtrack

    return total

# Initialize recursion with last = n, sum_left = n, and empty partition
answer = gen(n, n, [])

# Print the answer: number of non-isomorphic M-colored complete graphs on n vertices, modulo p
print(answer)
```

This Python implementation directly mirrors the logic of the C++ solution:

- Uses the same recursion for integer partitions.
- Computes the same `ex`, `prod_ci`, `freq`, and factorial multiplicity product.
- Uses modular exponentiation and modular inverse in the same way.
- Returns the sum of contributions of all cycle partitions, i.e., the Burnside average already simplified by the combinatorial identity.

---

5. **Compressed Editorial**

We need the number of non-isomorphic edge-colored complete graphs on \(N\) vertices with \(M\) colors, modulo a prime \(P\). The group of vertex relabelings is \(S_N\). We count orbits of edge-colorings under \(S_N\) via Burnside’s lemma:

\[
\text{answer} = \frac{1}{N!} \sum_{\pi \in S_N} \text{Fix}(\pi),
\]
where \(\text{Fix}(\pi)\) is the number of colorings fixed by permutation \(\pi\).

A coloring is fixed by \(\pi\) if every edge has the same color as its image under \(\pi\). Hence if \(\pi\) induces \(E(\pi)\) orbits on the set of edges, then \(\text{Fix}(\pi) = M^{E(\pi)}\).

The key is that permutations with the same cycle structure on vertices have the same \(E(\pi)\). A cycle structure corresponds to a partition of \(N\): \(N = c_1 + \dots + c_k\).

For a given partition \((c_i)\):

- Number of permutations with this cycle type:
  \[
  \#(\text{type}) = \frac{N!}{\left(\prod_i c_i\right)\left(\prod_{\ell} \text{mult}(\ell)!\right)}.
  \]
- Edge-orbits:
  - Within a cycle of length \(c\): \( \lfloor c/2 \rfloor \) orbits.
  - Between cycles of lengths \(c_i, c_j\): \(\gcd(c_i, c_j)\) orbits.
  Thus:
  \[
  E(\mathbf{c}) = \sum_i \lfloor c_i/2 \rfloor + \sum_{i<j} \gcd(c_i, c_j).
  \]

Plugged into Burnside and simplifying by dividing by \(N!\):
\[
\text{answer} = \sum_{\mathbf{c} \vdash N}
\frac{M^{E(\mathbf{c})}}{\left(\prod_i c_i\right)\left(\prod_{\ell} \text{mult}(\ell)!\right)}.
\]

We compute this sum modulo \(P\):

1. Enumerate all integer partitions \(\mathbf{c}\) of \(N\) via a recursion that builds non-increasing sequences of parts.
2. For each partition:
   - Compute \(E(\mathbf{c})\).
   - Compute \(\prod_i c_i \mod P\).
   - Count multiplicities \(\text{mult}(\ell)\); precompute factorials \(i!\mod P\); form \(\prod_\ell (\text{mult}(\ell)!) \mod P\).
   - Denominator \(D = \left(\prod_i c_i\right)\left(\prod_\ell \text{mult}(\ell)!\right) \mod P\).
   - Contribution \(= M^{E(\mathbf{c})} \cdot D^{-1} \mod P\), using modular inverse via \(x^{P-2}\mod P\).

Sum all contributions modulo \(P\). The number of partitions of \(N \le 53\) is about \(2 \times 10^5\), and per partition we do \(O(k^2)\) gcd operations (small \(k\)), which fits within the time limit.