<|instruction|>
Solve the below problem. The solution should start with an abridged problem statement. Then key observations. Then full solution based on the observations. Then C++ and Python implementations with comments.

485. Arrays
Time limit per test: 1.75 second(s)
Memory limit: 262144 kilobytes
input: standard
output: standard



You are given a sequence of 3· N integers (X1, X2, ·s, X3· N). Create three sequences (A1, A2, ·s, AN), (B1, B2, ·s, BN) and (C1, C2, ·s, CN) such that:


each of the integers from 1 to 3· N belongs to exactly one of the sequences A, B or C;

and the value of:

SUM from i=1 to N (Ai - Bi) * Ci 

is the largest possible.


Input

Constraints on N	Constraints on T
1 ≤ N ≤ 10	1 ≤ T ≤ 1000
11 ≤ N ≤ 15	1 ≤ T ≤ 100
16 ≤ N ≤ 20	1 ≤ T ≤ 10
21 ≤ N ≤ 25	T = 1

The input file contains T test cases, all having the same value of N. The first line of the input file contains the integers T and N, constrained as shown in the adjacent table. Each of the following T lines describes one test case and contains 3· N integers, the members of the sequence X. All these values are in the range from 0 to 1000.

Output
The output file should consist of T lines. Each line should contain the largest possible value of S for the corresponding test case from the input file.

Example(s)
sample input
sample output
1 2
4 1 8 2 0 5
46



Note. The maximal value is attained by taking A = (1, 3), B = (2, 5), C = (4, 6).

<|response|>
1. Abridged Problem Statement  
You have 3·N integers. You must partition them into three groups A, B, C of size N each (using each integer exactly once) and then reorder each group arbitrarily to maximize  
   S = Σ_{i=1..N} (A_i – B_i) * C_i.  
Given T test cases (all with the same N), compute the maximum S for each.

2. Key Observations  
- Because B appears with a minus sign in every term, it is always optimal to assign the N smallest input values to B.  
- After removing B, we are left with 2N values (call them AC). We must choose N of them to be A and N to be C. Once chosen, we sort A ascending and C ascending and pair index-for-index.  
- Brute forcing which N out of 2N go to A (and the rest to C) costs O(C(2N,N)) which is too large for N up to 25.  
- We can do a DP over bitmasks of size N by “scanning” the sorted AC array from largest to smallest, maintaining which of N “C-slots” are already filled. Each DP state is a mask of length N; bit=1 means that C-slot is occupied. As we move from one AC element to the next, we “shift” the window of slots, recycle one slot for A, and try placing the current AC element in any free C-slot.  
- This rolling-mask DP has 2^N states and O(N) transitions each, for overall O(N·2^N) time, which fits N≤25.

3. Full Solution Approach  
1. Read T and N.  
2. For each test case:  
   a. Read the 3N integers into an array X and sort X ascending.  
   b. Let B = X[0..N–1] (the N smallest values).  
   c. Let AC = X[N..3N–1], sorted ascending, of length 2N.  
3. We will process AC from its largest element down to its smallest (i.e., from index 2N–1 down to 0). Define a DP array dp[mask] for mask in [0..2^N–1], initialized to –∞ except dp[0]=0. Here mask’s 1‐bits mark which of the N C-slots are already taken.  
4. Precompute popcount(mask) for all masks. This tells how many C’s have been chosen so far (call that cnt). We will pair the cnt-th smallest B with the next A–C pair we form.  
5. We also maintain an integer top_zero, initially = N–1, which identifies the bit position of the newly recycled A-slot each step.  
6. For each mask in [0..2^N–1]:  
   a. cnt = popcount(mask).  
   b. Compute shifted = ((mask << 1) & ((1<<N)–1)).  
   c. If the bit that was shifted out was 0, decrement top_zero.  
   d. Set shifted |= (1 << top_zero).  // we just recycled that slot for A  
   e. Let curC = AC[ N + cnt ]  // this is the next AC element we are assigning to either A or C. Actually we process in decreasing AC‐index, but index arithmetic can be simplified by iterating cnt from 0..N–1.  
   f. For each i in [0..N–1] where (shifted & (1<<i)) == 0:  
        newmask = shifted | (1<<i)  
        // slot i becomes a C drawn now  
        // The A paired with it is the one in the recycled slot top_zero  
        A_val = AC[ N–1 – top_zero ]  
        B_val = B[cnt]  
        C_val = AC[ N–1 – i ]  
        gain = (A_val – B_val) * C_val  
        dp[newmask] = max(dp[newmask], dp[mask] + gain)  
7. The answer for this test is dp[(1<<N)–1], i.e. all C-slots filled.  

(Note: In code you can manage the indices of AC carefully so that each DP layer corresponds exactly to consuming one more AC element. The editorial masks shift and recycling logic implements exactly this “sliding window” of size N over the 2N AC elements.)

Time complexity per test: O(N · 2^N). For N up to 25 this runs in about 1–1.5s in optimized C++.

4. C++ Implementation with Detailed Comments  
```cpp
#include <bits/stdc++.h>
using namespace std;

// We will solve each test case with a DP over masks of size N (N ≤ 25).
// dp[mask] = maximum partial sum when the set bits of mask mark which C-slots
// are occupied. We process the 2N "AC" values from largest to smallest.
// Each step we shift the mask, recycle one slot for an A, then try placing
// the current number into any free slot as a C.

static const long long INF = (long long)4e18;

int main(){
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int T, N;
    cin >> T >> N;
    vector<int> popcnt(1<<N);
    // Precompute popcounts for all masks
    for(int m = 1; m < (1<<N); m++){
        popcnt[m] = popcnt[m>>1] + (m & 1);
    }

    while(T--){
        vector<int> X(3*N);
        for(int &v : X) cin >> v;
        sort(X.begin(), X.end());

        // B = first N (smallest). AC = next 2N.
        vector<long long> B(N), AC(2*N);
        for(int i = 0; i < N; i++) B[i] = X[i];
        for(int i = 0; i < 2*N; i++) AC[i] = X[N + i];

        // dp array: initialize to very small
        int FULL = (1<<N) - 1;
        vector<long long> dp(1<<N, -INF);
        dp[0] = 0;

        // top_zero indicates which bit gets recycled at each step.
        int top_zero = N - 1;

        // We will process all masks in increasing order.
        // When we process mask, we have already assigned popcnt(mask) of the 2N
        // AC-elements as C, so the next one to assign is at AC[ 2N-1 - popcnt(mask) ].
        for(int mask = 0; mask < (1<<N); mask++){
            long long curVal = dp[mask];
            if(curVal < -INF/2) continue;  // unreachable

            int cnt = popcnt[mask];  // how many Cs chosen so far
            // shift window:
            int shifted = (mask << 1) & FULL;
            // if the bit we shifted out was zero, we recycle one more slot:
            if(((mask >> (N-1)) & 1) == 0){
                top_zero--;
            }
            // mark recycled slot as an A (occupied in shifted)
            shifted |= (1 << top_zero);

            // now consider placing the next AC element as a C in any free slot
            long long Bval = B[cnt];
            // The AC element we are assigning now is AC[2N-1 - cnt].
            long long Cval;
            for(int i = 0; i < N; i++){
                if(shifted & (1<<i)) continue; // slot i already has A or old C
                // compute the indices of A and C in AC[]:
                long long A_val = AC[N-1 - top_zero];
                Cval = AC[N-1 - i];
                int newmask = shifted | (1<<i);
                long long gain = (A_val - Bval) * Cval;
                dp[newmask] = max(dp[newmask], curVal + gain);
            }
        }

        // the full mask is the result
        cout << dp[FULL] << "\n";
    }
    return 0;
}
```

5. Python Implementation with Detailed Comments  
```python
import sys
input = sys.stdin.readline

def main():
    T, N = map(int, input().split())
    FULL = (1<<N) - 1

    # Precompute popcount for 0..2^N-1
    popcnt = [0]*(1<<N)
    for m in range(1, 1<<N):
        popcnt[m] = popcnt[m>>1] + (m & 1)

    for _ in range(T):
        X = list(map(int, input().split()))
        X.sort()
        B = X[:N]       # the N smallest go to B
        AC = X[N:]      # remaining 2N

        # dp[mask] = best sum with this pattern of C-slots chosen
        dp = [-10**30]*(1<<N)
        dp[0] = 0

        top_zero = N-1

        # Iterate all masks; when processing mask we have chosen popcnt(mask) Cs
        for mask in range(1<<N):
            cur = dp[mask]
            if cur < -10**20:
                continue
            cnt = popcnt[mask]

            # Shift the sliding window
            shifted = ((mask << 1) & FULL)
            # If the bit we dropped was 0, decrement top_zero
            if ((mask >> (N-1)) & 1) == 0:
                top_zero -= 1
            # Recycle that slot for A
            shifted |= (1 << top_zero)

            Bval = B[cnt]
            # The next AC element to assign as C is AC[2N-1 - cnt]
            for i in range(N):
                if (shifted >> i) & 1:
                    continue
                newmask = shifted | (1<<i)
                # A comes from recycled slot, C from slot i
                Aval = AC[N-1 - top_zero]
                Cval = AC[N-1 - i]
                gain = (Aval - Bval) * Cval
                dp[newmask] = max(dp[newmask], cur + gain)

        print(dp[FULL])

if __name__ == "__main__":
    main()
```

Explanation of the Python details:  
- We keep `dp` as a list of size 2^N, initialized to a large negative.  
- We iterate masks in increasing order. For each mask, we know how many Cs have been placed (`cnt=popcnt[mask]`), so the next AC-value to assign is the `(2N–1–cnt)`-th in the sorted AC list.  
- We do exactly the same “shift, recycle, try all free slots” transitions as in C++.  
- Finally, the full mask (all N bits set) holds the optimal total.