Expanding Map

Let \(S^{n}\) be the set of natural numbers from 1 to n:

\[S^{n} = \{1, 2, 3, ..., n\}\]

Let \(T^{n}_{k}\) be the set of subsets of \(S^{n}\) of size k:

\[T^{n}_{k} = \{X \in \mathcal{P}(S^{n}) \mid |X| = k\}\\\]

Since \(|T^{1000}_{400}| = \binom{1000}{400} \lt \binom{1000}{401} = |T^{1000}_{401}|\) , there exists a function \(f: T^{1000}_{400} \rightarrow T^{1000}_{401}\) that is an injection (i.e. one-to-one). Write a function, e.g. in C or Python, that implements the injection above (the input is represented by a boolean vector of length 1000). Your function should optimize for shortest code length as well as shortest runtime.

Thanks Asaf Aharoni for this super cool puzzle!

Spoiler Alert - Solution Ahead!

We will solve a slightly more general problem. For any n, k and j, we will define \(S^{n}\), \(T^{n}_{k}\), and \(T^{n}_{j}\) like above. We will code a general function \(f: T^{n}_{k} \rightarrow T^{n}_{j}\) that will work for any n, k and j such that \(\binom{n}{k} \lt \binom{n}{j}\).

Let \(x \in T^{n}_{k}\) be represented as above (a boolean vector of length n). x has k bits that are on. The function f will just flip a prefix of bits in x until there are exactly j bits that are on. Let b = j - k:

def f(x, b):
  i = 0
  while b != 0:
    b += [-1, 1][x[i]]
    x[i] = 1 - x[i]
    i += 1

This function has several nice properties: it is obviously short and runs in linear time in the worst case. Actually, with the numbers above (1000, 400, and 401) if the inputs are random elements of \(T^{1000}_{400}\) then it will touch on average only the first 5 elements of x.

Also the decoder function (i.e. the inverse function from the image of f) is exactly the same function (with b = k - j, instead of b = j - k).

A last nice property of this function, is that even in cases where \(\binom{n}{k} \gt \binom{n}{j}\) (i.e. we can’t construct an injection) the function will work on a subset of the inputs (e.g. x = 111…111000…000), and in these cases will be properly reversible. On other inputs (e.g. x = 000…000111…111) it will be able to tell you that it can’t encode that input properly.




Share this story