negative_positive_dict.py 文件源码

python
阅读 19 收藏 0 点赞 0 评论 0

项目:skills-ml 作者: workforce-data-initiative 项目源码 文件源码
def negative_positive_dict():
    """
    Construct a dictionary of terms that are considered not to be in job title, including
    states, states abv, cities
    Returns: dictionary of set
    """
    logging.info("Beginning negative dictionary build")
    states = []
    states.extend(list(map(lambda x: x.lower(), list(us.states.mapping('name', 'abbr').keys()))))
    states.extend(list(map(lambda x: x.lower(), list(us.states.mapping('name', 'abbr').values()))))

    places = []
    download = requests.get(PLACEURL)
    reader = csv.reader(download.content.decode('latin-1').encode('utf-8').splitlines(), delimiter=',')
    next(reader)
    for row in reader:
        cleaned_placename = re.sub(r'\([^)]*\)', '', row[4]).rstrip()
        for suffix in SUFFIXES:
            if cleaned_placename.endswith(suffix):
                cleaned_placename = cleaned_placename.replace(suffix, '').rstrip()
        places.append(cleaned_placename.lower())

    places = list(set(places))
    places.remove('not in a census designated place or incorporated place')

    onetjobs = []
    download = requests.get(ONETURL)
    reader = csv.reader(download.content.splitlines(), delimiter='\t')
    next(reader)
    for row in reader:
        onetjobs.append(row[2].lower())
        onetjobs.append(row[3].lower())
    onetjobs = list(set(onetjobs))

    return {'states': states, 'places': places, 'onetjobs': onetjobs}
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号