reader.py 文件源码

python
阅读 32 收藏 0 点赞 0 评论 0

项目:RecurrentHighwayNetworks 作者: julian121266 项目源码 文件源码
def enwik8_raw_data(data_path=None, num_test_symbols=5000000):
  """Load raw data from data directory "data_path".

  The raw Hutter prize data is at:
  http://mattmahoney.net/dc/enwik8.zip

  Args:
    data_path: string path to the directory where simple-examples.tgz has
      been extracted.
    num_test_symbols: number of symbols at the end that make up the test set

  Returns:
    tuple (train_data, valid_data, test_data, unique)
    where each of the data objects can be passed to hutter_iterator.
  """

  data_path = os.path.join(data_path, "enwik8")

  raw_data = _read_symbols(data_path)
  raw_data = np.fromstring(raw_data, dtype=np.uint8)
  unique, data = np.unique(raw_data, return_inverse=True)
  train_data = data[: -2 * num_test_symbols]
  valid_data = data[-2 * num_test_symbols: -num_test_symbols]
  test_data = data[-num_test_symbols:]
  return train_data, valid_data, test_data, unique
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号