在Python中查找日期范围重叠

发布于 2021-01-29 18:05:42

我试图找到一种基于特定列(id)在数据框中查找重叠数据范围(每行提供的开始/结束日期)的更有效方法。

数据框在“来自”列上排序

我认为有一种方法可以像我一样避免“双重”应用功能…

import pandas as pd
from datetime import datetime

df = pd.DataFrame(columns=['id','from','to'], index=range(5), \
                  data=[[878,'2006-01-01','2007-10-01'],
                        [878,'2007-10-02','2008-12-01'],
                        [878,'2008-12-02','2010-04-03'],
                        [879,'2010-04-04','2199-05-11'],
                        [879,'2016-05-12','2199-12-31']])

df['from'] = pd.to_datetime(df['from'])
df['to'] = pd.to_datetime(df['to'])


    id  from        to
0   878 2006-01-01  2007-10-01
1   878 2007-10-02  2008-12-01
2   878 2008-12-02  2010-04-03
3   879 2010-04-04  2199-05-11
4   879 2016-05-12  2199-12-31

我使用“应用”功能在所有组上循环,并且在每个组中,每行使用“应用”:

def check_date_by_id(df):

    df['prevFrom'] = df['from'].shift()
    df['prevTo'] = df['to'].shift()

    def check_date_by_row(x):

        if pd.isnull(x.prevFrom) or pd.isnull(x.prevTo):
            x['overlap'] = False
            return x

        latest_start = max(x['from'], x.prevFrom)
        earliest_end = min(x['to'], x.prevTo)
        x['overlap'] = int((earliest_end - latest_start).days) + 1 > 0
        return x

    return df.apply(check_date_by_row, axis=1).drop(['prevFrom','prevTo'], axis=1)

df.groupby('id').apply(check_date_by_id)

    id  from        to          overlap
0   878 2006-01-01  2007-10-01  False
1   878 2007-10-02  2008-12-01  False
2   878 2008-12-02  2010-04-03  False
3   879 2010-04-04  2199-05-11  False
4   879 2016-05-12  2199-12-31  True
关注者
0
被浏览
42
1 个回答
  • 面试哥
    面试哥 2021-01-29
    为面试而生,有面试问题,就找面试哥。

    您可以移动to列并直接减去日期时间。

    df['overlap'] = (df['to'].shift()-df['from']) > timedelta(0)
    

    分组时应用它id可能看起来像

    df['overlap'] = (df.groupby('id')
                       .apply(lambda x: (x['to'].shift() - x['from']) > timedelta(0))
                       .reset_index(level=0, drop=True))
    

    演示版

    >>> df
        id       from         to
    0  878 2006-01-01 2007-10-01
    1  878 2007-10-02 2008-12-01
    2  878 2008-12-02 2010-04-03
    3  879 2010-04-04 2199-05-11
    4  879 2016-05-12 2199-12-31
    
    >>> df['overlap'] = (df.groupby('id')
                           .apply(lambda x: (x['to'].shift() - x['from']) > timedelta(0))
                           .reset_index(level=0, drop=True))
    
    >>> df
        id       from         to overlap
    0  878 2006-01-01 2007-10-01   False
    1  878 2007-10-02 2008-12-01   False
    2  878 2008-12-02 2010-04-03   False
    3  879 2010-04-04 2199-05-11   False
    4  879 2016-05-12 2199-12-31    True
    


知识点
面圈网VIP题库

面圈网VIP题库全新上线,海量真题题库资源。 90大类考试,超10万份考试真题开放下载啦

去下载看看