utils.py 文件源码

python
阅读 29 收藏 0 点赞 0 评论 0

项目:deeppavlov 作者: deepmipt 项目源码 文件源码
def train_test_split(inpath, train, test, split, random_seed):
    """
    RuCor doesn't provide train/test data splitting, it makes random splitting.
    Args:
        inpath: path to data
        train: path to train folder
        test: path to test folder
        split: int, split ratio
        random_seed: seed for random module

    Returns:

    """
    print('Start train-test splitting ...')
    z = os.listdir(inpath)
    doc_split = ShuffleSplit(1, test_size=split, random_state=random_seed)
    for train_indeses, test_indeses in doc_split.split(z): 
        train_set = [z[i] for i in sorted(list(train_indeses))]
        test_set = [z[i] for i in sorted(list(test_indeses))]
    for x in train_set:
        build_data.move(os.path.join(inpath, x), os.path.join(train, x))
    for x in test_set:
        build_data.move(os.path.join(inpath, x), os.path.join(test, x))
    print('End train-test splitts.')
    return None
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号