<|instruction|>
Solve the below problem. The solution should start with an abridged problem statement. Then key observations. Then full solution based on the observations. Then C++ and Python implementations with comments.

168. Matrix
time limit per test: 0.5 sec.
memory limit per test: 16000 KB
input: standard
output: standard



You are given N*M matrix A. You are to find such matrix B, that B[i,j]=min{ A[x,y] : (y>=j) and (x>=i+j-y) }

Input
On the first line of the input there are two integer numbers, N and M (1<=N,M<=1000). Then matrix A follows: next N lines contains M integers each (not greater than 32000 by absolute value). The j-th number on then i-th of this lines is A[i,j].

Output
Write matrix B in the same format as matrix A, but without N and M.

Sample test(s)

Input
3 3
1 2 3
4 5 6
7 8 9

Output
1 2 3
2 3 6
3 6 9
Author:	NNSU #2 team
Resource:	
Date:

<|response|>
1. Abridged Problem Statement  
   Given an N×M integer matrix A (1 ≤ N,M ≤ 1000), compute an N×M matrix B so that for each 0 ≤ i < N and 0 ≤ j < M:  
     B[i][j] = min { A[x][y] : y ≥ j  and  x ≥ (i + j − y) }.

2. Key Observations  
   - The constraint x ≥ i + j − y can be rewritten as x + y ≥ i + j.  
   - Define a new index s = x + y.  Then the two constraints become:  
       y ≥ j  
       s ≥ i + j  
   - If we arrange all A[x][y] into an auxiliary 2D array Q indexed by (row = y) and (col = s), then the desired minimum for B[i][j] is just the minimum of Q over the suffix region “row ≥ j, col ≥ i+j.”  
   - We can precompute for every cell Q[r][c] the minimum over the submatrix from (r,c) downwards and rightwards by a simple DP sweep in reverse order.

3. Full Solution Approach  
   a. Read N, M and the input matrix A of size N×M.  
   b. Let S = N + M − 1.  Create Q as an M×S array, initialized to +∞ (INF).  
   c. Map A into Q:  
        for x in [0..N−1], y in [0..M−1]:  
          s = x + y  
          Q[y][s] = A[x][y]  
   d. Compute suffix‐minimums in Q so that each Q[r][c] becomes the minimum of itself, Q[r+1][c], and Q[r][c+1], scanning r=M−1..0 and c=S−1..0.  
   e. Build B: for each (i,j), let s = i + j, then B[i][j] = Q[j][s].  
   f. Output B in the prescribed format.

   Time complexity: O(N·M).  
   Memory usage: O(M·(N+M)) plus O(N·M).

4. C++ Implementation with Detailed Comments  
```cpp
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int N, M;
    cin >> N >> M;

    // Read input matrix A
    vector<vector<int>> A(N, vector<int>(M));
    for(int i = 0; i < N; i++) {
        for(int j = 0; j < M; j++) {
            cin >> A[i][j];
        }
    }

    // S = number of possible diagonals (0 .. N+M-2)
    int S = N + M - 1;
    const int INF = 1'000'000'000;

    // Q[r][c] will hold values for row=r (y) and diag-index=c (s=x+y)
    // initialize to INF
    vector<vector<int>> Q(M, vector<int>(S, INF));

    // Map A[x][y] into Q[y][x+y]
    for(int x = 0; x < N; x++) {
        for(int y = 0; y < M; y++) {
            int s = x + y;
            Q[y][s] = A[x][y];
        }
    }

    // DP sweep: compute the minimum over suffix region (down + right moves)
    for(int r = M - 1; r >= 0; r--) {
        for(int c = S - 1; c >= 0; c--) {
            int best = Q[r][c];
            if (r + 1 < M) best = min(best, Q[r+1][c]);   // move down
            if (c + 1 < S) best = min(best, Q[r][c+1]);   // move right
            Q[r][c] = best;
        }
    }

    // Build and output B
    // B[i][j] = Q[row = j][col = i+j]
    for(int i = 0; i < N; i++) {
        for(int j = 0; j < M; j++) {
            int s = i + j;
            cout << Q[j][s] << (j+1 < M ? ' ' : '\n');
        }
    }

    return 0;
}
```

5. Python Implementation with Detailed Comments  
```python
import sys

def main():
    data = sys.stdin.read().split()
    it = iter(data)
    N, M = int(next(it)), int(next(it))

    # Read A
    A = [ [int(next(it)) for _ in range(M)] for _ in range(N) ]

    # Number of diagonals
    S = N + M - 1
    INF = 10**9

    # Q has M rows (y=0..M-1) and S columns (s=0..N+M-2)
    Q = [ [INF]*S for _ in range(M) ]

    # Map A[x][y] -> Q[y][x+y]
    for x in range(N):
        for y in range(M):
            s = x + y
            Q[y][s] = A[x][y]

    # DP: compute min over suffix region by scanning bottom-right to top-left
    for r in range(M-1, -1, -1):
        row = Q[r]
        row_down = Q[r+1] if r+1 < M else None
        for c in range(S-1, -1, -1):
            best = row[c]
            if row_down is not None:
                best = min(best, row_down[c])
            if c+1 < S:
                best = min(best, row[c+1])
            row[c] = best

    # Extract and print B[i][j] = Q[j][i+j]
    out = []
    for i in range(N):
        line = []
        for j in range(M):
            s = i + j
            line.append(str(Q[j][s]))
        out.append(" ".join(line))

    sys.stdout.write("\n".join(out))

if __name__ == "__main__":
    main()
```