将pandas列中的关键字与另一个元素列表匹配

发布于 2021-01-29 17:25:08

我有一个熊猫数据框为:

word_list
['nuclear','election','usa','baseball']
['football','united','thriller']
['marvels','hollywood','spiderman']
....................
....................
....................

我也有多个带有类别名称的列表,例如:

movies=['spiderman','marvels','thriller']'

sports=['baseball','hockey','football']

politics=['election','china','usa'] 和许多其他类别。

所有我想匹配大熊猫列的关键字word_list与我的类别列表,如果关键字被匹配在一起分配在单独列相应的列表名称,如果任何关键字不在任何列表,然后简单地把作为被匹配
miscellaneous所以,输出我寻找为:-

word_list                                          matched_list_names
['nuclear','election','usa','baseball']            politics,sports,miscellaneous
['football','united','thriller']                   sports,movies,miscellaneous               
['marvels','spiderman','hockey']                   movies,sports

....................                               .....................
....................                               .....................
....................                               ....................

我成功将匹配关键字获取为:-

for i in df['word_list']:
    for j in movies:
        if i in j:
           print (i)

但这给了我匹配关键字的列表。如何获取列表名称并将其添加到pandas列?

关注者
0
被浏览
50
1 个回答
  • 面试哥
    面试哥 2021-01-29
    为面试而生,有面试问题,就找面试哥。

    您可以先展平列表字典,然后使用.getwith查找miscellaneous不匹配的值,然后将转换为sets以获得唯一类别,然后string通过转换为s
    join

    movies=['spiderman','marvels','thriller']
    sports=['baseball','hockey','football']
    politics=['election','china','usa']
    d = {'movies':movies, 'sports':sports, 'politics':politics}
    d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
    
    f = lambda x: ','.join(set([d1.get(y, 'miscellaneous') for y in x]))
    df['matched_list_names'] = df['word_list'].apply(f)
    print (df)
    
                                     word_list             matched_list_names
    0       [nuclear, election, usa, baseball]  politics,miscellaneous,sports
    1             [football, united, thriller]    miscellaneous,sports,movies
    2  [marvels, hollywood, spiderman, budget]           miscellaneous,movies
    

    列表理解的类似解决方案:

    df['matched_list_names'] = [','.join(set([d1.get(y, 'miscellaneous') for y in x])) 
                                for x in df['word_list']]
    


知识点
面圈网VIP题库

面圈网VIP题库全新上线,海量真题题库资源。 90大类考试,超10万份考试真题开放下载啦

去下载看看