csvcorpus.py 文件源码-python代码片段

csvcorpus.py 文件源码

python

阅读 38 收藏 0 点赞 0 评论 0

项目：topical_word_embeddings 作者: thunlp 项目源码文件源码

def __init__(self, fname, labels):
        """
        Initialize the corpus from a file.
        `labels` = are class labels present in the input file? => skip the first column

        """
        logger.info("loading corpus from %s" % fname)
        self.fname = fname
        self.length = None
        self.labels = labels

        # load the first few lines, to guess the CSV dialect
        head = ''.join(itertools.islice(open(self.fname), 5))
        self.headers = csv.Sniffer().has_header(head)
        self.dialect = csv.Sniffer().sniff(head)
        logger.info("sniffed CSV delimiter=%r, headers=%s" % (self.dialect.delimiter, self.headers))