Generator

Generator，即數據生成器。當數據很大，手造會累死的時候，我們就需要它來幫助我們自動造數據。

簡單的例子

生成兩個 \([1,n]\) 範圍內的整數：

// clang-format off

#include "testlib.h"
#include <iostream>

using namespace std;

int main(int argc, char* argv[]) {
  registerGen(argc, argv, 1);
  int n = atoi(argv[1]);
  cout << rnd.next(1, n) << " ";
  cout << rnd.next(1, n) << endl;
}

為什麼要使用 Testlib？

有人説寫 generator 不需要用 Testlib，它在這沒什麼用。實際上這是個不正確的想法。一個好的 generator 應該滿足這一點： 在任何環境下對於相同輸入它給出相同輸出 。寫 generator 就避免不了生成隨機值，平時我們用的 rand() 或 C++11 的 mt19937/uniform_int_distribution ，當操作系統不同、使用不同編譯器編譯、不同時間運行等，它們的輸出都可能不同（對於非常常用的 srand(time(0)) ，這是顯然的），而這就會給生成數據帶來不確定性。

需要注意的是，一旦使用了 Testlib，就不能再使用標準庫中的 srand() ， rand() 等隨機數函數，否則在編譯時會報錯。因此， 請確保所有與隨機相關的函數均使用 Testlib 而非標準庫提供的。

而 Testlib 中的隨機值生成函數則保證了相同調用會輸出相同值，與 generator 本身或平台均無關。另外。它給生成各種要求的隨機值提供了很大便利，如 rnd.next("[a-z]{1,10}") 會生成一個長度在 \([1,10]\) 範圍內的串，每個字符為 a 到 z ，很方便吧！

Testlib 能做什麼？

在一切之前，先執行 registerGen(argc, argv, 1) 初始化 Testlib（其中 1 是使用的 generator 版本，通常保持不變），然後我們就可以使用 rnd 對象來生成隨機值。隨機數種子取自命令行參數的哈希值，對於某 generator g.cpp ， g 100 (Unix-Like) 和 g.exe "100" (Windows) 將會有相同的輸出，而 g 100 0 則與它們不同。

rnd 對象的類型為 random_t ，你可以建立一個新的隨機值生成對象，不過通常你不需要這麼做。

該對象有許多有用的成員函數，下面是一些例子：

調用	含義
`rnd.next(4)`	等概率生成一個 \([0,4)\) 範圍內的整數
`rnd.next(4, 100)`	等概率生成一個 \([4,100]\) 範圍內的整數
`rnd.next(10.0)`	等概率生成一個 \([0,10.0)\) 範圍內的浮點數
`rnd.next("one \| two \| three")`	等概率從 `one` , `two` , `three` 三個串中返回一個
`rnd.wnext(4, t)`	`wnext()` 是一個生成不等分佈（具有偏移期望）的函數， \(t\) 表示調用 `next()` 的次數，並取生成值的最大值。例如 `rnd.wnext(3, 1)` 等同於 `max({rnd.next(3), rnd.next(3)})` ； `rnd.wnext(4, 2)` 等同於 `max({rnd.next(4), rnd.next(4), rnd.next(4)})` 。如果 \(t<0\) ，則為調用 \(-t\) 次，取最小值；如果 \(t=0\) ，等同於 `next()` 。
`rnd.any(container)`	等概率返回一個具有隨機訪問迭代器（如 `std::vector` 和 `std::string` ）的容器內的某一元素的引用

附：關於 rnd.wnext(i,t) 的形式化定義：

\[ \operatorname{wnext}(i,t)=\begin{cases}\operatorname{next}(i) & t=0 \\\max(\operatorname{next}(i),\operatorname{wnext}(i,t-1)) & t>0 \\\min(\operatorname{next}(i),\operatorname{wnext}(i,t+1)) & t<0\end{cases} \]

另外，不要使用 std::random_shuffle() ，請使用 Testlib 中的 shuffle() ，它同樣接受一對迭代器。它使用 rnd 來打亂序列，即滿足如上「好的 generator」的要求。

示例：生成一棵樹

下面是生成一棵樹的主要代碼，它接受兩個參數——頂點數和伸展度。例如，當 \(n=10,t=1000\) 時，可能會生成鏈；當 \(n=10,t=-1000\) 時，可能會生成菊花。

#define forn(i, n) for (int i = 0; i < int(n); i++)

registerGen(argc, argv, 1);

int n = atoi(argv[1]);
int t = atoi(argv[2]);

vector<int> p(n);

/* 為節點 1..n-1 設置父親 */
forn(i, n) if (i > 0) p[i] = rnd.wnext(i, t);

printf("%d\n", n);

/* 打亂節點 1..n-1 */
vector<int> perm(n);
forn(i, n) perm[i] = i;
shuffle(perm.begin() + 1, perm.end());

/* 根據打亂的節點順序加邊 */
vector<pair<int, int> > edges;
for (int i = 1; i < n; i++)
  if (rnd.next(2))
    edges.push_back(make_pair(perm[i], perm[p[i]]));
  else
    edges.push_back(make_pair(perm[p[i]], perm[i]));

/* 打亂邊 */
shuffle(edges.begin(), edges.end());

for (int i = 0; i + 1 < n; i++)
  printf("%d %d\n", edges[i].first + 1, edges[i].second + 1);

一次性生成多組數據

跟不使用 Testlib 編寫的時候一樣，每次輸出前重定向輸出流就好，不過 Testlib 提供了一個輔助函數 startTest(test_index) ，它幫助你將輸出流重定向到 test_index 文件。

一些注意事項

嚴格遵循題目的格式要求，如空格和換行，注意文件的末尾應有一個換行。
對於大數據首選 printf 而非 cout ，以提高性能。（不建議在使用 Testlib 時關閉流同步）
不使用 UB（Undefined Behavior，未定義行為），如本文開頭的那個示例，輸出如果寫成 cout << rnd.next(1, n) << " " << rnd.next(1, n) << endl; ，則 rnd.next() 的調用順序沒有定義。

新特性：解析命令行參數

在之前，我們通常使用類似 int n = atoi(argv[3]); 的代碼，但是這樣並不好。有以下幾點原因：

不存在第三個命令行參數的時候是不安全的；
第三個命令行參數可能不是有效的 32 位整數。

現在，你可以這樣寫： int n = opt<int>(3) 。與此同時，你也可以使用 int64_t m = opt<int64_t>(1); ， bool t = opt<bool>(2); 和 string s = opt(4); 等。

另外，testlib 同時也支持命名參數。如果有很多參數，這樣 g 10 20000 a true 的可讀性就會比 g -n10 -m200000 -t=a -increment 差。

在這種情況下，現在你可以在 generator 中使用以下代碼：

int n = opt<int>("n");
long long n = opt<long long>("m");
string t = opt("t");
bool increment = opt<bool>("increment");

你可以自由地混合使用按下標和按名稱讀取參數的方式。

支持的用於編寫命名參數的方案有以下幾種：

--key=value 或 -key=value ；
--key value 或 -key value ——如果 value 不是新參數的開頭（不以連字符 - 開頭或一個/兩個連字符後沒有跟隨字母）；
--k12345 或 -k12345 ——如果 key k 是一個字母，且後面是一個數字；
-prop 或 --prop ——啓用 bool 屬性。

下面是一些例子：

g1 -n1
g2 --len=4 --s=oops
g3 -inc -shuffle -n=5
g4 --length 5 --total 21 -ord