二、Graph Features Extraction
代码将登录事件数据处理成不同类型的图结构,以便后续分析。
这些图特征包括主机与用户的映射关系、主机与源的映射关系等。
数据预处理
首先,代码读取了登录数据文件,并提取了其中的一些列,用于后续的特征提取。
auth = pd.read_csv("../auth_ntlm.txt", header=None, names=['time', 'src_t', 'user', 'src', 'dst', 'auth_t', 'log_t', 'auth_o', 's_f'])
authlog = auth[['time', 'user', 'src', 'dst']]
特征提取函数
1. BuildInHostUserMap
该函数构建了一个映射,表示每个目标主机(dst
)在每一天(day
)中由哪些用户(user
)登录的次数。
def BuildInHostUserMap(authlog):
InHostUserMap = {}
for index, event in authlog.iterrows():
if event['dst'] not in InHostUserMap:
InHostUserMap[event['dst']] = {}
if event['user'] not in InHostUserMap[event['dst']]:
InHostUserMap[event['dst']][event['user']] = {}
day = event['time'] / 86400
if day not in InHostUserMap[event['dst']][event['user']]:
InHostUserMap[event['dst']][event['user']][day] = 0
InHostUserMap[event['dst']][event['user']][day] += 1
return InHostUserMap
2. BuildInHostSrcMap
该函数构建了一个映射,表示每个目标主机(dst
)在每一天(day
)中从哪些源主机(src
)接收的登录次数。
def BuildInHostSrcMap(authlog):
InHostSrcMap = {}
for index, event in authlog.iterrows():
if event['dst'] not in InHostSrcMap:
InHostSrcMap[event['dst']] = {}
if event['src'] not in InHostSrcMap[event['dst']]:
InHostSrcMap[event['dst']][event['src']] = {}
day = event['time'] / 86400
if day not in InHostSrcMap[event['dst']][event['src']]:
InHostSrcMap[event['dst']][event['src']][day] = 0
InHostSrcMap[event['dst']][event['src']][day] += 1
return InHostSrcMap
3. BuildInHostUsrSrcMap
该函数构建了一个映射,表示每个目标主机(dst
)在每一天(day
)中由哪些用户(user
)从哪些源主机(src
)登录的次数。用户和源主机信息被组合在一起作为键。
def BuildInHostUsrSrcMap(authlog):
InHostUsrSrcMap = {}
for index, event in authlog.iterrows():
if event['dst'] not in InHostUsrSrcMap:
InHostUsrSrcMap[event['dst']] = {}
if event['user'] + event['src'] not in InHostUsrSrcMap[event['dst']]:
InHostUsrSrcMap[event['dst']][event['user'] + event['src']] = {}
day = event['time'] / 86400
if day not in InHostUsrSrcMap[event['dst']][event['user'] + event['src']]:
InHostUsrSrcMap[event['dst']][event['user'] + event['src']][day] = 0
InHostUsrSrcMap[event['dst']][event['user'] + event['src']][day] += 1
return InHostUsrSrcMap
4. BuildOutHostUsrMap
该函数构建了一个映射,表示每个源主机(src
)在每一天(day
)中由哪些用户(user
)登录的次数。
def BuildOutHostUsrMap(authlog):
OutHostUserMap = {}
for index, event in authlog.iterrows():
if event['src'] not in OutHostUserMap:
OutHostUserMap[event['src']] = {}
if event['user'] not in OutHostUserMap[event['src']]:
OutHostUserMap[event['src']][event['user']] = {}
day = event['time'] / 86400
if day not in OutHostUserMap[event['src']][event['user']]:
OutHostUserMap[event['src']][event['user']][day] = 0
OutHostUserMap[event['src']][event['user']][day] += 1
return OutHostUserMap
5. BuildOutHostDstMap
该函数构建了一个映射,表示每个源主机(src
)在每一天(day
)中向哪些目标主机(dst
)发起的登录次数。
def BuildOutHostDstMap(authlog):
OutHostDstMap = {}
for index, event in authlog.iterrows():
if event['src'] not in OutHostDstMap:
OutHostDstMap[event['src']] = {}
if event['dst'] not in OutHostDstMap[event['src']]:
OutHostDstMap[event['src']][event['dst']] = {}
day = event['time'] / 86400
if day not in OutHostDstMap[event['src']][event['dst']]:
OutHostDstMap[event['src']][event['dst']][day] = 0
OutHostDstMap[event['src']][event['dst']][day] += 1
return OutHostDstMap
6. BuildOutHostUsrDstMap
该函数构建了一个映射,表示每个源主机(src
)在每一天(day
)中由哪些用户(user
)向哪些目标主机(dst
)发起的登录次数。用户和目标主机信息被组合在一起作为键。
def BuildOutHostUsrDstMap(authlog):
OutHostUsrDstMap = {}
for index, event in authlog.iterrows():
if event['src'] not in OutHostUsrDstMap:
OutHostUsrDstMap[event['src']] = {}
if event['user'] + event['dst'] not in OutHostUsrDstMap[event['src']]:
OutHostUsrDstMap[event['src']][event['user'] + event['dst']] = {}
day = event['time'] / 86400
if day not in OutHostUsrDstMap[event['src']][event['user'] + event['dst']]:
OutHostUsrDstMap[event['src']][event['user'] + event['dst']][day] = 0
OutHostUsrDstMap[event['src']][event['user'] + event['dst']][day] += 1
return OutHostUsrDstMap
代码总结
- 特征提取:每个函数提取了不同维度的图特征,如用户-主机映射、源主机-目标主机映射等。
- 数据存储:提取的图特征通过
pickle
序列化并存储在文件中,以便后续使用。