sampleData.py 文件源码

python
阅读 32 收藏 0 点赞 0 评论 0

项目:cs145-duplicats-in-space 作者: bchalala 项目源码 文件源码
def buildSampleData(numPapers, inputDir, outputDir):
    papers = set()
    authors = set()

    with open(dataDir + "/PaperAuthor.csv") as csvfile:
        reader = csv.DictReader(csvfile)

        with open(sampleDataDir + "/PaperAuthor.csv", 'w') as csvfile:
            writer = csv.DictWriter(csvfile, fieldnames=reader.fieldnames)

            writer.writeheader()
            for row in reader:
                # make sure to stop after numPapers
                if len(papers) >= numPapers:
                    break

                papers.add(row["PaperId"])
                authors.add(row["AuthorId"])
                writer.writerow(row)

    copyFile("Author.csv", authors, inputDir, outputDir)
    copyFile("Paper.csv", papers, inputDir, outputDir)
    return papers, authors
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号