菜单
本页目录

二、Graph Features Extraction

代码将登录事件数据处理成不同类型的图结构,以便后续分析。

这些图特征包括主机与用户的映射关系、主机与源的映射关系等。

数据预处理

首先,代码读取了登录数据文件,并提取了其中的一些列,用于后续的特征提取。

auth = pd.read_csv("../auth_ntlm.txt", header=None, names=['time', 'src_t', 'user', 'src', 'dst', 'auth_t', 'log_t', 'auth_o', 's_f'])
authlog = auth[['time', 'user', 'src', 'dst']]

特征提取函数

1. BuildInHostUserMap

该函数构建了一个映射,表示每个目标主机(dst)在每一天(day)中由哪些用户(user)登录的次数。

def BuildInHostUserMap(authlog):
    InHostUserMap = {}
    for index, event in authlog.iterrows():
        if event['dst'] not in InHostUserMap:
            InHostUserMap[event['dst']] = {}
        if event['user'] not in InHostUserMap[event['dst']]:
            InHostUserMap[event['dst']][event['user']] = {} 
        day = event['time'] / 86400
        if day not in InHostUserMap[event['dst']][event['user']]:
             InHostUserMap[event['dst']][event['user']][day] = 0 
        InHostUserMap[event['dst']][event['user']][day] += 1
        
    return InHostUserMap

2. BuildInHostSrcMap

该函数构建了一个映射,表示每个目标主机(dst)在每一天(day)中从哪些源主机(src)接收的登录次数。

def BuildInHostSrcMap(authlog):
    InHostSrcMap = {}
    for index, event in authlog.iterrows():
        if event['dst'] not in InHostSrcMap:
            InHostSrcMap[event['dst']] = {}
        if event['src'] not in InHostSrcMap[event['dst']]:
            InHostSrcMap[event['dst']][event['src']] = {} 
        day = event['time'] / 86400
        if day not in InHostSrcMap[event['dst']][event['src']]:
             InHostSrcMap[event['dst']][event['src']][day] = 0
        InHostSrcMap[event['dst']][event['src']][day] += 1
        
    return InHostSrcMap

3. BuildInHostUsrSrcMap

该函数构建了一个映射,表示每个目标主机(dst)在每一天(day)中由哪些用户(user)从哪些源主机(src)登录的次数。用户和源主机信息被组合在一起作为键。

def BuildInHostUsrSrcMap(authlog):
    InHostUsrSrcMap = {}
    for index, event in authlog.iterrows():
        if event['dst'] not in InHostUsrSrcMap:
            InHostUsrSrcMap[event['dst']] = {}
        if event['user'] + event['src'] not in InHostUsrSrcMap[event['dst']]:
            InHostUsrSrcMap[event['dst']][event['user'] + event['src']] = {} 
        day = event['time'] / 86400
        if day not in InHostUsrSrcMap[event['dst']][event['user'] + event['src']]:
             InHostUsrSrcMap[event['dst']][event['user'] + event['src']][day] = 0
        InHostUsrSrcMap[event['dst']][event['user'] + event['src']][day] += 1
        
    return InHostUsrSrcMap

4. BuildOutHostUsrMap

该函数构建了一个映射,表示每个源主机(src)在每一天(day)中由哪些用户(user)登录的次数。

def BuildOutHostUsrMap(authlog):
    OutHostUserMap = {}
    for index, event in authlog.iterrows():
        if event['src'] not in OutHostUserMap:
            OutHostUserMap[event['src']] = {}
        if event['user'] not in OutHostUserMap[event['src']]:
            OutHostUserMap[event['src']][event['user']] = {} 
        day = event['time'] / 86400
        if day not in OutHostUserMap[event['src']][event['user']]:
             OutHostUserMap[event['src']][event['user']][day] = 0
        OutHostUserMap[event['src']][event['user']][day] += 1
        
    return OutHostUserMap

5. BuildOutHostDstMap

该函数构建了一个映射,表示每个源主机(src)在每一天(day)中向哪些目标主机(dst)发起的登录次数。

def BuildOutHostDstMap(authlog):
    OutHostDstMap = {}
    for index, event in authlog.iterrows():
        if event['src'] not in OutHostDstMap:
            OutHostDstMap[event['src']] = {}
        if event['dst'] not in OutHostDstMap[event['src']]:
            OutHostDstMap[event['src']][event['dst']] = {} 
        day = event['time'] / 86400
        if day not in OutHostDstMap[event['src']][event['dst']]:
             OutHostDstMap[event['src']][event['dst']][day] = 0
        OutHostDstMap[event['src']][event['dst']][day] += 1
        
    return OutHostDstMap

6. BuildOutHostUsrDstMap

该函数构建了一个映射,表示每个源主机(src)在每一天(day)中由哪些用户(user)向哪些目标主机(dst)发起的登录次数。用户和目标主机信息被组合在一起作为键。

def BuildOutHostUsrDstMap(authlog):
    OutHostUsrDstMap = {}
    for index, event in authlog.iterrows():
        if event['src'] not in OutHostUsrDstMap:
            OutHostUsrDstMap[event['src']] = {}
        if event['user'] + event['dst'] not in OutHostUsrDstMap[event['src']]:
            OutHostUsrDstMap[event['src']][event['user'] + event['dst']] = {} 
        day = event['time'] / 86400
        if day not in OutHostUsrDstMap[event['src']][event['user'] + event['dst']]:
             OutHostUsrDstMap[event['src']][event['user'] + event['dst']][day] = 0
        OutHostUsrDstMap[event['src']][event['user'] + event['dst']][day] += 1
        
    return OutHostUsrDstMap

代码总结

  1. 特征提取:每个函数提取了不同维度的图特征,如用户-主机映射、源主机-目标主机映射等。
  2. 数据存储:提取的图特征通过 pickle 序列化并存储在文件中,以便后续使用。