﻿1
00:00:00,600 --> 00:00:02,736
(dramatic music)

2
00:00:09,976 --> 00:00:13,680
(audience applauds)

3
00:00:13,680 --> 00:00:15,482
- Welcome to Quantify your Hunt

4
00:00:15,482 --> 00:00:17,750
and if you saw this
at Besides Charm,

5
00:00:17,751 --> 00:00:20,520
I am very pleased to remind you,

6
00:00:20,520 --> 00:00:22,088
this is not that talk.

7
00:00:22,088 --> 00:00:23,323
This is about,

8
00:00:23,323 --> 00:00:24,758
about half of this is
gonna be new content

9
00:00:24,758 --> 00:00:27,727
and it's gonna end in
a very different place.

10
00:00:27,727 --> 00:00:30,497
So without further
ado, let's get started.

11
00:00:30,497 --> 00:00:31,731
This is the agenda.

12
00:00:31,731 --> 00:00:33,533
We're gonna cover
all of these topics

13
00:00:33,533 --> 00:00:35,835
at some point in this talk.

14
00:00:35,835 --> 00:00:37,303
We may not have
time for questions

15
00:00:37,303 --> 00:00:39,406
but we will stick
around this evening,

16
00:00:39,406 --> 00:00:40,974
so if you've got anything
you wanna follow up with,

17
00:00:40,974 --> 00:00:42,308
please let us know.

18
00:00:42,308 --> 00:00:44,611
And so next, who we are.

19
00:00:46,379 --> 00:00:49,015
We're not as interesting
as this topic.

20
00:00:49,949 --> 00:00:51,217
Yeah, my name's Devon.

21
00:00:51,217 --> 00:00:52,719
I spend a lotta time doing IR

22
00:00:52,719 --> 00:00:56,156
and more recently working
on endgame solutions,

23
00:00:56,156 --> 00:00:59,759
basically trying to make this
easier for threat hunters.

24
00:00:59,759 --> 00:01:01,795
Particularly things like,

25
00:01:01,795 --> 00:01:04,964
make it easier for you to
get to initial compromise

26
00:01:04,964 --> 00:01:07,500
so you can really
contain very quickly.

27
00:01:07,500 --> 00:01:09,736
I run our SOC, I run our IR team

28
00:01:09,736 --> 00:01:11,905
and I do a bunch of other
stuff in the public.

29
00:01:11,905 --> 00:01:14,340
I'm not the original author
of Red Team Automation

30
00:01:14,340 --> 00:01:15,708
which is one of the
simulation frameworks

31
00:01:15,708 --> 00:01:17,010
that we kinda talk about,

32
00:01:17,010 --> 00:01:19,446
but I am its biggest
spokesperson.

33
00:01:19,446 --> 00:01:21,114
Roberto take it away.

34
00:01:21,114 --> 00:01:22,515
- All right,

35
00:01:22,515 --> 00:01:25,118
thank you very much everybody
for coming today also.

36
00:01:25,118 --> 00:01:27,554
And it's my first time here
in the Threat Hunting Summit.

37
00:01:27,554 --> 00:01:29,422
So, I'm very excited
about this opportunity.

38
00:01:29,422 --> 00:01:32,258
And I work actually
at SpecterOps

39
00:01:32,258 --> 00:01:35,527
and my, what's it called,
Twitter handle is Cyb3rWard0g.

40
00:01:35,528 --> 00:01:38,431
I think you guys have seen
some of the projects out there.

41
00:01:38,431 --> 00:01:40,366
I'm the author of the
ThreatHunter-Playbook,

42
00:01:40,366 --> 00:01:42,801
the HELK, you know,
OSSEM which is more like

43
00:01:42,802 --> 00:01:45,038
documenting datasets
and things like that.

44
00:01:45,038 --> 00:01:48,041
And the primary two
goals I guess that I have

45
00:01:48,041 --> 00:01:51,578
with my projects is always
to learn something new,

46
00:01:51,578 --> 00:01:53,780
because that's how I
learn actually little bit

47
00:01:53,780 --> 00:01:55,081
about malware attack,

48
00:01:55,081 --> 00:01:56,483
a little bit elastic
and things like that

49
00:01:56,483 --> 00:02:00,086
is just to learn the application

50
00:02:00,086 --> 00:02:02,155
but at the same
time also to share

51
00:02:02,155 --> 00:02:03,456
pretty much everything that
I do with the community.

52
00:02:03,456 --> 00:02:06,192
I think that that's
what keeps me going

53
00:02:06,192 --> 00:02:07,860
and every time I have an idea,

54
00:02:07,861 --> 00:02:09,329
I'm like, you know what,

55
00:02:09,329 --> 00:02:10,563
let's make it an
open source project

56
00:02:10,562 --> 00:02:12,097
and then just share
it with everybody.

57
00:02:12,098 --> 00:02:14,167
And I'm the former Capital
One Senior Threat Hunter.

58
00:02:14,167 --> 00:02:16,503
That's where I actually
had my first experience

59
00:02:16,503 --> 00:02:20,206
to understand data at scale,

60
00:02:20,206 --> 00:02:22,841
'cause I had never worked with
more then 500,000 endpoints

61
00:02:22,842 --> 00:02:24,310
and things like that.

62
00:02:24,310 --> 00:02:25,745
And that was actually
really interesting to see

63
00:02:25,745 --> 00:02:27,847
how things change from a 20,000

64
00:02:27,847 --> 00:02:30,817
to 500,000 input environment.

65
00:02:30,817 --> 00:02:32,318
All right, so why this talk?

66
00:02:32,318 --> 00:02:36,556
And I've heard a lot about
different talks that happened

67
00:02:36,556 --> 00:02:38,024
today and yesterday

68
00:02:38,024 --> 00:02:40,193
and there's some great ideas
about how you can start

69
00:02:40,193 --> 00:02:42,729
being effective hunting
and things like that,

70
00:02:42,729 --> 00:02:45,331
but something that we
like to share today

71
00:02:45,331 --> 00:02:49,636
is to start kinda also focusing
on the things we have seen

72
00:02:49,636 --> 00:02:51,337
companies still struggle with.

73
00:02:51,337 --> 00:02:54,440
And one of the things could
be just as basic as mapping

74
00:02:54,440 --> 00:02:57,043
your hunting engagements
to business use cases

75
00:02:57,043 --> 00:02:59,546
and how you do start
sharing the effectiveness

76
00:02:59,546 --> 00:03:01,947
and how you're going over
with your hunting engagement

77
00:03:01,948 --> 00:03:04,083
but at the same time
also how you can start

78
00:03:04,083 --> 00:03:07,020
showing transparency to
your senior leadership.

79
00:03:07,020 --> 00:03:08,221
Right, 'cause you gotta show 'em

80
00:03:08,221 --> 00:03:09,556
how you're doing your
progress over time

81
00:03:09,556 --> 00:03:10,490
and things like that,

82
00:03:10,490 --> 00:03:12,192
how you map your tools back to,

83
00:03:12,192 --> 00:03:14,093
for example techniques
and things like that.

84
00:03:14,093 --> 00:03:17,030
That will show also
the effectiveness of
hunting engagement.

85
00:03:17,030 --> 00:03:19,132
And of course at
the end for people

86
00:03:19,132 --> 00:03:21,935
that don't know where to
start and things like that,

87
00:03:21,935 --> 00:03:24,304
I believe that this talk
also helps you to understand

88
00:03:24,304 --> 00:03:26,206
how you can prioritize
what you do you

89
00:03:26,206 --> 00:03:28,107
before you want to
start hunting engagement

90
00:03:28,107 --> 00:03:29,042
and things like that.

91
00:03:29,042 --> 00:03:30,577
And we're gonna start using also

92
00:03:30,577 --> 00:03:34,747
some type of understanding
of using simulation for

93
00:03:34,747 --> 00:03:36,182
measuring your
hunting engagement

94
00:03:36,182 --> 00:03:37,784
and things like that as well.

95
00:03:37,784 --> 00:03:39,786
But first it's very
important to understand

96
00:03:39,786 --> 00:03:42,355
exactly what
effective hunting is.

97
00:03:42,355 --> 00:03:46,226
And I think that that's pretty
much what drives exactly

98
00:03:46,226 --> 00:03:48,428
what you do in a hunting
engagement, how you prepare,

99
00:03:48,428 --> 00:03:49,896
and things like that.

100
00:03:49,896 --> 00:03:51,731
And the problem that I
see a lot in the industry

101
00:03:51,731 --> 00:03:54,299
is that there is this
buzzword that you can use,

102
00:03:54,300 --> 00:03:55,802
you know, everything
is effective.

103
00:03:55,802 --> 00:03:57,237
Right and you can use a product,

104
00:03:57,237 --> 00:03:58,738
you can just use anything

105
00:03:58,738 --> 00:04:00,840
and can just be an effective
threat hunting tool.

106
00:04:00,840 --> 00:04:03,443
- [Devon] You could dance up
on a robot with a dress on.

107
00:04:03,443 --> 00:04:04,911
All those things work.

108
00:04:04,911 --> 00:04:07,814
- The idea is to start,
make sure that we understand

109
00:04:07,814 --> 00:04:10,083
why that's so
important to define

110
00:04:10,083 --> 00:04:12,952
this specific word,
like effectiveness.

111
00:04:12,952 --> 00:04:15,188
And I think I've heard
people using efficiency,

112
00:04:15,188 --> 00:04:18,257
efficacy, and effectiveness
like if it was the same thing.

113
00:04:18,257 --> 00:04:21,027
Problem is that
actually, efficiency is

114
00:04:21,027 --> 00:04:24,697
when you start making the most

115
00:04:24,697 --> 00:04:26,299
out of the resources
that you have,

116
00:04:26,299 --> 00:04:29,669
and then efficacy focuses more
on achieving the objective.

117
00:04:29,669 --> 00:04:32,739
And I think that there is
where we start thinking

118
00:04:32,739 --> 00:04:34,039
from a hunting perspective,

119
00:04:34,040 --> 00:04:36,142
we say we gotta find evil

120
00:04:36,142 --> 00:04:37,909
and we gotta
uncover an incident.

121
00:04:37,910 --> 00:04:40,179
You start just
creating this mentality

122
00:04:40,179 --> 00:04:42,515
that that's all you have
to do when you hunt.

123
00:04:42,515 --> 00:04:45,183
Well the problem is that
if you wanna be effective,

124
00:04:45,184 --> 00:04:47,887
you gotta take pretty much
both in consideration, right?

125
00:04:47,887 --> 00:04:50,857
You have to understand how you
can achieve your objective,

126
00:04:50,857 --> 00:04:53,860
but how you can do it using
the resources that you have.

127
00:04:53,860 --> 00:04:56,362
We've been talking about data
a lot in this past couple

128
00:04:56,362 --> 00:04:58,031
of presentations before ours,

129
00:04:58,031 --> 00:04:59,666
and I'm so excited that actually

130
00:04:59,666 --> 00:05:02,869
that's being also shared
with everybody, nowadays,

131
00:05:02,869 --> 00:05:06,673
'cause last year I didn't
hear that much about

132
00:05:06,673 --> 00:05:08,274
basing your hunting engagement

133
00:05:08,274 --> 00:05:09,709
on the data that you have
and things like that.

134
00:05:09,709 --> 00:05:11,210
So effectiveness is huge.

135
00:05:11,210 --> 00:05:13,279
Understand like why, how
you're gonna start mapping

136
00:05:13,279 --> 00:05:14,213
that to your engagements.

137
00:05:14,213 --> 00:05:16,115
Now in this, I can tell you

138
00:05:16,115 --> 00:05:18,785
that from an
efficiency perspective,

139
00:05:18,785 --> 00:05:20,586
that's when you
start thinking about,

140
00:05:20,586 --> 00:05:23,122
MITRE ATT&CK, for example,
like it's a number one,

141
00:05:23,122 --> 00:05:25,024
where you have to be
able to categorize

142
00:05:25,024 --> 00:05:26,893
your techniques and
things that you're doing.

143
00:05:26,893 --> 00:05:28,861
And I think that that's
pretty much where it fits.

144
00:05:28,861 --> 00:05:30,830
It's that it helps you to be
a little bit more efficient

145
00:05:30,830 --> 00:05:32,865
than just saying,
let's hunt for,

146
00:05:32,865 --> 00:05:34,400
I don't know, like
this new thing

147
00:05:34,400 --> 00:05:36,202
that I don't know what it is,
but let's just figure it out.

148
00:05:36,202 --> 00:05:39,105
At least the data allows you
to create some type of like

149
00:05:39,105 --> 00:05:40,406
order in what you do,

150
00:05:40,406 --> 00:05:41,641
understanding what you have

151
00:05:41,641 --> 00:05:44,377
because you wanna make
sure that it's not just,

152
00:05:45,745 --> 00:05:48,381
I would say like,
normal just to say

153
00:05:48,381 --> 00:05:50,082
let's enable
processes monitoring,

154
00:05:50,083 --> 00:05:52,151
but then at the end you
don't understand exactly

155
00:05:52,151 --> 00:05:54,087
if the data's complete
and things like that.

156
00:05:54,087 --> 00:05:55,854
So you gotta be more
efficient into what you do

157
00:05:55,855 --> 00:05:57,557
and what you enable
in your environment.

158
00:05:57,557 --> 00:05:58,758
And at the same time

159
00:05:58,758 --> 00:06:00,193
understand if you
have the right people,

160
00:06:00,193 --> 00:06:02,061
right, the right skills, right?

161
00:06:02,061 --> 00:06:05,998
Because as we were hearing
also Rob Lee talking about

162
00:06:05,998 --> 00:06:08,134
that if you even apply
a machine learning model

163
00:06:08,134 --> 00:06:09,602
and things like that,

164
00:06:09,602 --> 00:06:11,371
if you don't have the right
people that can understand that,

165
00:06:11,371 --> 00:06:13,673
the results and how you
can even make it better

166
00:06:13,673 --> 00:06:14,974
with the data that you have,

167
00:06:14,974 --> 00:06:17,043
then you might not
be as successful

168
00:06:17,043 --> 00:06:18,778
with that specific model.

169
00:06:18,778 --> 00:06:21,214
And efficacy of
course in this part,

170
00:06:21,214 --> 00:06:23,082
is where we starting
thinking more like,

171
00:06:23,082 --> 00:06:25,017
if you measure your
hunting engagements

172
00:06:25,017 --> 00:06:27,253
by just how many
incidents you uncover,

173
00:06:27,253 --> 00:06:29,188
let me tell you, you're
gonna definitely have

174
00:06:29,188 --> 00:06:33,058
a really bad time telling
your senior leadership

175
00:06:33,059 --> 00:06:34,227
what you do everyday

176
00:06:34,227 --> 00:06:36,061
and how that's effecting
your organization.

177
00:06:36,062 --> 00:06:37,930
'Cause you might
find something today,

178
00:06:37,930 --> 00:06:40,032
but in probably in three
months you don't find anything.

179
00:06:40,032 --> 00:06:41,366
And then it turns
back into like,

180
00:06:41,367 --> 00:06:43,669
hey you're actually
not doing anything,

181
00:06:43,669 --> 00:06:45,138
from a hunting perspective.

182
00:06:45,138 --> 00:06:46,938
So that's where you
show it more from,

183
00:06:46,939 --> 00:06:48,040
hey I'm being efficient,

184
00:06:48,040 --> 00:06:50,009
I'm actually validation
the detection

185
00:06:50,009 --> 00:06:52,011
of this technique
rather than just saying

186
00:06:52,011 --> 00:06:55,448
I'm gonna uncover an incident
today and things like that.

187
00:06:55,448 --> 00:06:57,215
- [Devon] So where do we start?

188
00:06:57,216 --> 00:06:58,718
This is the question
we often get

189
00:06:58,718 --> 00:07:01,320
and we're gonna give
you a couple of ideas

190
00:07:01,320 --> 00:07:04,090
for places that we
think are reasonable.

191
00:07:04,090 --> 00:07:07,460
You know obviously if
we just kind of run off

192
00:07:07,460 --> 00:07:08,661
and randomly do this,

193
00:07:08,661 --> 00:07:10,196
we're gonna wind
up getting nowhere.

194
00:07:10,196 --> 00:07:12,865
So our suggestion is to be
a little bit more targeted

195
00:07:12,865 --> 00:07:17,270
and to think about the type of
attack model that's helpful.

196
00:07:18,571 --> 00:07:20,072
This really started,

197
00:07:20,072 --> 00:07:22,742
I think with some of the
research that Roberto did.

198
00:07:22,742 --> 00:07:24,143
So How Hot is Your Hunt Team,

199
00:07:24,143 --> 00:07:26,179
I think is one of the
first blog posts I read

200
00:07:26,179 --> 00:07:29,147
that got me really excited
about threat hunting again.

201
00:07:29,148 --> 00:07:31,384
I felt like it kind of
had fallen off a cliff

202
00:07:31,384 --> 00:07:33,519
so to speak, there wasn't
a lotta innovation.

203
00:07:33,519 --> 00:07:37,323
And then as soon as this
was conceptualized for me,

204
00:07:37,323 --> 00:07:38,724
actually I think
that's when Roberto

205
00:07:38,724 --> 00:07:41,294
and I started collaborating
a lot more often.

206
00:07:41,294 --> 00:07:42,527
And then much more recently,

207
00:07:42,528 --> 00:07:44,597
Ready to hunt?
Show me your data.

208
00:07:44,597 --> 00:07:46,698
Basically assessing data quality

209
00:07:46,699 --> 00:07:48,000
as an input in this process

210
00:07:48,000 --> 00:07:50,102
which I think is
really foundational.

211
00:07:50,102 --> 00:07:52,003
- And the idea with
these two blog posts,

212
00:07:52,004 --> 00:07:53,506
'cause I wanna make sure
that everybody understands

213
00:07:53,506 --> 00:07:55,707
the value that I was
trying to share with this.

214
00:07:55,708 --> 00:07:57,477
Like the first one
with the heat map,

215
00:07:57,477 --> 00:08:00,012
was just the idea
that if you wanna show

216
00:08:00,012 --> 00:08:03,583
some type of either coverage
or how good you are,

217
00:08:03,583 --> 00:08:06,319
or how you can pretty
much track your progress,

218
00:08:06,319 --> 00:08:09,522
visualization goes ways.

219
00:08:09,522 --> 00:08:11,491
I actually talked
to senior leadership

220
00:08:11,491 --> 00:08:12,758
when they were
trying to buy a tool

221
00:08:12,758 --> 00:08:13,993
and they explain to you

222
00:08:13,993 --> 00:08:16,596
how you can tell them
how effective you can be

223
00:08:16,596 --> 00:08:18,064
with that tool and
things like that.

224
00:08:18,064 --> 00:08:20,333
If you start going through
like five, six slides,

225
00:08:20,333 --> 00:08:21,801
they just start
looking at their phones

226
00:08:21,801 --> 00:08:23,034
and they don't even care,

227
00:08:23,035 --> 00:08:24,971
but if you're at least
showing a visualization,

228
00:08:24,971 --> 00:08:28,040
then the conversation kinda
like starts going through

229
00:08:28,040 --> 00:08:30,176
with your nice heat map
and you can talk about it

230
00:08:30,176 --> 00:08:33,412
a little bit where
your weaknesses are
and things like that.

231
00:08:33,412 --> 00:08:35,615
That's a transparency that
you gotta start going forward.

232
00:08:35,615 --> 00:08:37,149
But that was just an idea.

233
00:08:37,149 --> 00:08:40,720
That was a how you can start
at least showing something.

234
00:08:40,720 --> 00:08:43,322
Then how you, you know,

235
00:08:43,322 --> 00:08:45,224
build on the top of
it and how you use it,

236
00:08:45,224 --> 00:08:46,526
that depends on
your organization

237
00:08:46,526 --> 00:08:48,261
and exactly what you
wanna show with it.

238
00:08:48,261 --> 00:08:49,629
And the second one was of course

239
00:08:49,629 --> 00:08:51,531
just focusing more into, hey,

240
00:08:51,531 --> 00:08:53,498
it's not as easy as just saying,

241
00:08:53,499 --> 00:08:54,767
I'm gonna hunt today.

242
00:08:54,767 --> 00:08:56,435
Let's go on a hunt
for this weekend.

243
00:08:56,435 --> 00:08:58,838
No, first show me the data
that you wanna hunt with.

244
00:08:58,838 --> 00:09:00,640
I think that also
was mentioned before,

245
00:09:00,640 --> 00:09:03,809
that that would avoid
probably spending 30 days

246
00:09:03,809 --> 00:09:05,310
of a hunting engagement
that you might not even have

247
00:09:05,311 --> 00:09:06,879
the data to work with.

248
00:09:06,879 --> 00:09:09,115
And how you gonna start
measuring that data,

249
00:09:09,115 --> 00:09:10,683
like the quality of your data.

250
00:09:10,683 --> 00:09:12,051
So there's where we start.

251
00:09:12,051 --> 00:09:14,453
So what are you potentially
measuring already?

252
00:09:14,453 --> 00:09:15,688
This is also very interesting,

253
00:09:15,688 --> 00:09:17,957
because you might not
even think about it,

254
00:09:17,957 --> 00:09:19,325
if you're not doing
some type of numbers

255
00:09:19,325 --> 00:09:20,593
around your hunting engagement,

256
00:09:20,593 --> 00:09:23,429
but the conversations
that I've had before,

257
00:09:23,429 --> 00:09:25,097
are hey can we, uh,

258
00:09:25,097 --> 00:09:27,934
if this happens next
week, in this environment,

259
00:09:27,934 --> 00:09:29,367
can we detect it?

260
00:09:29,368 --> 00:09:31,637
And you start thinking, maybe?

261
00:09:31,637 --> 00:09:34,473
I mean like probability
could be like 20%, 30%.

262
00:09:34,473 --> 00:09:35,974
That to me sounds
a little bit like

263
00:09:35,975 --> 00:09:37,743
when you want to forecast
the weather and you say,

264
00:09:37,743 --> 00:09:41,948
eh give this a probability of
50% it might rain next week.

265
00:09:41,948 --> 00:09:44,150
That actually, it's a thing.

266
00:09:44,150 --> 00:09:47,620
And it's actually
was explained by,

267
00:09:47,620 --> 00:09:49,789
I can't see that
well Ryan McGeehan,

268
00:09:50,957 --> 00:09:52,825
which goes under
Magoo in Twitter.

269
00:09:52,825 --> 00:09:54,694
And he talks about
risk forecasting.

270
00:09:54,694 --> 00:09:57,630
And this is great, because
now I can start thinking

271
00:09:57,630 --> 00:10:01,434
how I can provide some type
of numbers with what I do.

272
00:10:01,434 --> 00:10:02,668
And in this case,

273
00:10:02,668 --> 00:10:04,937
he's showing how you
can choose a scenario,

274
00:10:04,937 --> 00:10:06,205
then you decompose a scenario,

275
00:10:06,205 --> 00:10:07,673
then define threats.

276
00:10:07,673 --> 00:10:09,942
Then you start pretty
gathering supporting data

277
00:10:09,942 --> 00:10:11,344
to reduce the bias.

278
00:10:11,344 --> 00:10:12,612
And then you can forecast

279
00:10:12,612 --> 00:10:14,213
and say hey this
is the probability.

280
00:10:14,213 --> 00:10:16,616
I start with a
baseline of like 20%

281
00:10:16,616 --> 00:10:19,752
and then, as a I go and
execute the controls,

282
00:10:19,752 --> 00:10:21,920
not necessarily
hunting engagements,

283
00:10:21,921 --> 00:10:24,156
but you can start feeling
like, you know what,

284
00:10:24,156 --> 00:10:28,493
yeah 10%, 15%, and at the
end you can see for example

285
00:10:28,494 --> 00:10:31,397
you can increase confidence
again it's a 15% of difference

286
00:10:31,397 --> 00:10:34,300
than when you start, where
you're doing your forecasting.

287
00:10:34,300 --> 00:10:36,267
Very interesting because

288
00:10:36,268 --> 00:10:38,337
this would definitely
start mapping to

289
00:10:38,337 --> 00:10:40,271
concepts that you
might be familiar with,

290
00:10:40,272 --> 00:10:42,575
which is just basic threat
modeling, for example.

291
00:10:42,575 --> 00:10:44,076
You know, where you
model the system,

292
00:10:44,076 --> 00:10:46,178
you pick the system, you model
and identify the threats,

293
00:10:46,178 --> 00:10:48,114
understand how
those threats works,

294
00:10:48,114 --> 00:10:49,882
and then you
address the threats,

295
00:10:49,882 --> 00:10:52,050
validate the detection of this.

296
00:10:52,051 --> 00:10:54,387
You measure again and
you keep going that way.

297
00:10:54,387 --> 00:10:56,188
This is very interesting because

298
00:10:56,188 --> 00:10:58,557
now in this scenarios
for example,

299
00:10:58,557 --> 00:11:01,060
you start identifying
the connections

300
00:11:01,060 --> 00:11:03,529
between this crown
jewel, for example?

301
00:11:03,529 --> 00:11:04,764
How the data flows.

302
00:11:04,764 --> 00:11:06,966
What are the
external connections?

303
00:11:06,966 --> 00:11:09,535
Trust boundaries
around those things.

304
00:11:09,535 --> 00:11:11,904
Very interesting, because
when you start thinking about

305
00:11:11,904 --> 00:11:14,439
what can I measure
from this perspective,

306
00:11:15,608 --> 00:11:16,842
there's a lot of data
that gets generated

307
00:11:16,842 --> 00:11:19,345
across this model for example.

308
00:11:19,345 --> 00:11:22,348
Okay, so where do you
fit hunt with this?

309
00:11:22,348 --> 00:11:24,684
And I think that this is
also an interesting question

310
00:11:24,684 --> 00:11:28,254
because threat hunting
we all see it as this

311
00:11:28,254 --> 00:11:30,489
nice program where you
have different steps,

312
00:11:30,489 --> 00:11:31,924
identify a technique,
hypothesis,

313
00:11:31,924 --> 00:11:33,759
but at the end of the day,

314
00:11:33,759 --> 00:11:35,294
if you don't understand

315
00:11:35,294 --> 00:11:38,864
like exactly how you can map
that to a business use case

316
00:11:38,864 --> 00:11:41,567
or to how your
systems actually work,

317
00:11:41,567 --> 00:11:44,904
then to me it's not actually
that valuable, okay?

318
00:11:44,904 --> 00:11:46,572
So when you start now going back

319
00:11:46,572 --> 00:11:48,974
to the other frameworks
that I was talking about,

320
00:11:48,974 --> 00:11:50,943
like threat modeling
or risk forecasting,

321
00:11:50,943 --> 00:11:53,846
there are some similarities
going on in here, for example.

322
00:11:53,846 --> 00:11:56,415
Some of you might agree, some
of you might disagree that

323
00:11:56,415 --> 00:11:57,983
for example threat hunting to me

324
00:11:57,983 --> 00:12:00,886
could be also part of when
you try to mitigate risk,

325
00:12:00,886 --> 00:12:04,323
when you wanna reduce
the risk of the attacker

326
00:12:04,323 --> 00:12:06,257
achieving an objective.

327
00:12:06,258 --> 00:12:07,526
Threat hunting could
be part of that,

328
00:12:07,526 --> 00:12:09,228
but at the same time you
could start identifying

329
00:12:09,228 --> 00:12:11,796
that some steps
might be similar,

330
00:12:11,797 --> 00:12:13,733
and how you're can
start then mapping them

331
00:12:13,733 --> 00:12:15,800
to how your organization works,

332
00:12:15,801 --> 00:12:17,203
your systems and
your crown jewels.

333
00:12:17,203 --> 00:12:19,738
In this case, if you
model what happens

334
00:12:19,739 --> 00:12:21,674
around your most
important systems,

335
00:12:21,674 --> 00:12:24,043
you might probably
then start identifying,

336
00:12:24,043 --> 00:12:26,078
for example MITRE
ATT&CK techniques

337
00:12:26,078 --> 00:12:29,515
that would fit into
exactly what the,

338
00:12:29,515 --> 00:12:30,750
let me go back in here,

339
00:12:30,750 --> 00:12:32,451
exactly where the
data flows goes.

340
00:12:32,451 --> 00:12:34,019
I focused on some,

341
00:12:34,019 --> 00:12:36,021
for example lateral
movement between systems

342
00:12:36,021 --> 00:12:38,690
that might get to this
crown jewel that you have.

343
00:12:38,691 --> 00:12:40,092
You give it a meaning.

344
00:12:40,092 --> 00:12:42,795
It's not anymore
this isolated process

345
00:12:42,795 --> 00:12:45,531
that you want to just
hunt for the weekend.

346
00:12:45,531 --> 00:12:47,900
You actually have some
type of connection

347
00:12:47,900 --> 00:12:51,337
across what you're doing in
your organization most likely.

348
00:12:51,337 --> 00:12:53,305
So what can we measure
from then hunt,

349
00:12:53,305 --> 00:12:56,040
or in this case I like when
Devon actually crossed hunt,

350
00:12:56,041 --> 00:12:57,910
and say it's just
basically detection.

351
00:12:57,910 --> 00:13:00,679
Then when you start
understanding that

352
00:13:00,679 --> 00:13:02,180
if you modeling your systems

353
00:13:02,181 --> 00:13:04,617
and you're hunting
across that model,

354
00:13:04,617 --> 00:13:06,385
then I wanna understand
what happened

355
00:13:06,385 --> 00:13:09,521
across my trust boundaries,
how my data flows.

356
00:13:09,522 --> 00:13:10,990
Then I started then staying,

357
00:13:10,990 --> 00:13:13,192
you know what, to me it's
very important to know

358
00:13:13,192 --> 00:13:16,362
what percentage of data
I have for detection

359
00:13:16,362 --> 00:13:21,367
or for any type of preventative
measure in my model.

360
00:13:22,468 --> 00:13:24,036
What tools are
helping me the most?

361
00:13:24,036 --> 00:13:27,305
What data's helping me the
most across all my data sets,

362
00:13:27,306 --> 00:13:29,975
because we also know that
even though we say that

363
00:13:29,975 --> 00:13:31,811
it's cheaper now to store data,

364
00:13:31,811 --> 00:13:34,747
when we start going
with solutions that

365
00:13:34,747 --> 00:13:36,048
charge a lotta money for data,

366
00:13:36,048 --> 00:13:38,150
you have to understand
what data is,

367
00:13:38,150 --> 00:13:41,220
what is the most value
in your organization.

368
00:13:41,220 --> 00:13:42,855
How much you can
cover with that data

369
00:13:42,855 --> 00:13:44,089
and things like that.

370
00:13:44,089 --> 00:13:45,457
And at the end you're
gonna start asking,

371
00:13:45,457 --> 00:13:46,625
I'm sorry, answering questions,

372
00:13:46,625 --> 00:13:48,994
are how are we reducing
the probability

373
00:13:48,994 --> 00:13:51,163
of an attacker to
achieve an objective.

374
00:13:51,163 --> 00:13:53,833
You can provide data,

375
00:13:53,833 --> 00:13:57,670
numbers and metrics to
answer those questions.

376
00:13:57,670 --> 00:14:00,104
- So this kinda all
brings us to ATT&CK.

377
00:14:00,105 --> 00:14:02,308
You know Roberto
alluded to it before.

378
00:14:03,576 --> 00:14:05,144
You know, Enterprise
ATT&CK is really

379
00:14:05,144 --> 00:14:07,947
the practitioners
knowledge base.

380
00:14:07,947 --> 00:14:09,748
I'm gonna try and
trademark that.

381
00:14:09,748 --> 00:14:12,484
But I think MITRE
has got a pretty good

382
00:14:12,484 --> 00:14:14,053
description for what it is.

383
00:14:14,053 --> 00:14:17,923
My description is really
the most complete source

384
00:14:17,923 --> 00:14:20,826
of knowledge about
adversary tactics

385
00:14:20,826 --> 00:14:23,429
loosely categorized
into 11 different types

386
00:14:23,429 --> 00:14:27,499
of adversary objectives
with groups, software.

387
00:14:28,868 --> 00:14:32,037
And again if you haven't
expressed your love for ATT&CK

388
00:14:32,037 --> 00:14:34,506
and you do love ATT&CK,
the MITRE ATT&CK team

389
00:14:34,506 --> 00:14:37,810
is here in the back row,
second from the rear.

390
00:14:37,810 --> 00:14:41,213
We're big fans and
there's a lot to like.

391
00:14:41,213 --> 00:14:42,781
And particularly the fact that

392
00:14:42,781 --> 00:14:45,117
all of these techniques
are cross-referenced.

393
00:14:45,117 --> 00:14:47,719
They all contain meta-data
that useful to analysts,

394
00:14:47,720 --> 00:14:49,321
who don't necessarily understand

395
00:14:49,321 --> 00:14:51,423
what these techniques
look like in the wild.

396
00:14:51,423 --> 00:14:54,126
And even better, they
contain reference material,

397
00:14:54,126 --> 00:14:56,694
so links to published
documentation,

398
00:14:56,695 --> 00:15:00,699
whether that's reports,
blog post, GitHub repos.

399
00:15:00,699 --> 00:15:03,401
It is a tremendous
asset to teams

400
00:15:03,402 --> 00:15:06,105
especially if you have a
knowledge curation problem.

401
00:15:06,105 --> 00:15:09,375
This is a place where
knowledge is curated.

402
00:15:09,375 --> 00:15:11,610
So this is just some statistics

403
00:15:11,610 --> 00:15:14,480
as of the most recent
update to ATT&CK's content.

404
00:15:15,347 --> 00:15:16,949
Again, we broke it down by OS

405
00:15:16,949 --> 00:15:19,285
and we'll show some
further statistics

406
00:15:19,285 --> 00:15:20,519
from our analysis of ATT&CK

407
00:15:20,519 --> 00:15:22,221
that we think you
could weaponize.

408
00:15:23,622 --> 00:15:25,357
- [Roberto] So, how can
you measure against ATT&CK?

409
00:15:25,357 --> 00:15:26,824
I think that, also, right,

410
00:15:26,825 --> 00:15:30,829
cause we see ATT&CK going
this whole conference,

411
00:15:30,829 --> 00:15:32,330
which is great, right,

412
00:15:32,331 --> 00:15:34,400
so how can you start pretty
much getting your strategy

413
00:15:34,400 --> 00:15:36,802
of what you have
in your environment

414
00:15:36,802 --> 00:15:38,671
and just mapping
it back to ATT&CK?

415
00:15:38,671 --> 00:15:40,306
I think it's important
that you understand

416
00:15:40,306 --> 00:15:42,775
what ATT&CK actually
gives to you,

417
00:15:42,775 --> 00:15:44,810
besides just the technique name,

418
00:15:44,810 --> 00:15:47,378
besides just the tactic
name, and the ID,

419
00:15:47,379 --> 00:15:48,881
there is actually some data

420
00:15:48,881 --> 00:15:51,583
that you can start using
to probably mapping it to

421
00:15:51,583 --> 00:15:54,153
what I was referring earlier
as your threat model,

422
00:15:54,153 --> 00:15:56,522
for example, where
you have in this case

423
00:15:56,522 --> 00:15:58,457
what are the permissions
an attacker needs

424
00:15:58,457 --> 00:15:59,725
in order to accomplish

425
00:15:59,725 --> 00:16:01,460
in a specific
technique for example?

426
00:16:01,460 --> 00:16:03,729
And you can start
mapping that to your,

427
00:16:03,729 --> 00:16:05,364
for example, trust boundaries.

428
00:16:05,364 --> 00:16:08,400
The users that have access
to that specific system,

429
00:16:08,400 --> 00:16:11,536
like, how does that look like
from an attack perspective?

430
00:16:11,537 --> 00:16:12,805
What are the techniques

431
00:16:12,805 --> 00:16:14,272
that probably you're
going to start using

432
00:16:14,273 --> 00:16:17,743
from a user being an admin
as well, for example?

433
00:16:17,743 --> 00:16:21,513
Later, what is the permissions
the attacker will obtain,

434
00:16:21,513 --> 00:16:23,983
in this case for
example, SYSTEM.

435
00:16:23,983 --> 00:16:27,085
What are the consequences
of this happening?

436
00:16:27,086 --> 00:16:29,722
And that's when you start
mapping what you're doing, okay?

437
00:16:29,722 --> 00:16:33,891
If an adversary becomes
SYSTEM here or an Admin here,

438
00:16:33,892 --> 00:16:35,361
this is what we can do next,

439
00:16:35,361 --> 00:16:36,795
and, you know, things like that.

440
00:16:36,795 --> 00:16:38,329
The other one that I
love the most, of course,

441
00:16:38,330 --> 00:16:41,500
is the data sources there are
recommended for the detection

442
00:16:41,500 --> 00:16:44,069
and the validation for the
detection of each technique,

443
00:16:44,069 --> 00:16:44,902
for example.

444
00:16:44,903 --> 00:16:46,338
Very, very important

445
00:16:46,338 --> 00:16:50,576
and we had a presentation
yesterday by Hacker Here Again,

446
00:16:50,576 --> 00:16:52,144
uh, Michael, right?

447
00:16:52,144 --> 00:16:54,213
And then he was talking about
how you are going to start

448
00:16:54,213 --> 00:16:55,848
mapping data sources
to techniques,

449
00:16:55,848 --> 00:16:57,516
and something that
I'm also doing,

450
00:16:57,516 --> 00:16:58,916
which we're gonna
probably hopefully

451
00:16:58,917 --> 00:17:01,320
be collaborating on that piece

452
00:17:01,320 --> 00:17:03,489
and the one that I
like also the most is

453
00:17:03,489 --> 00:17:05,024
something that I didn't know,

454
00:17:05,023 --> 00:17:08,127
it actually, I asked Blake
directly, and I was like,

455
00:17:08,127 --> 00:17:10,329
hey, what is this
supports remote,

456
00:17:10,329 --> 00:17:12,063
I see that in some techniques.

457
00:17:12,064 --> 00:17:15,100
He said, oh that's
actually for execution,

458
00:17:15,099 --> 00:17:16,001
you know, techniques.

459
00:17:16,001 --> 00:17:17,469
You know, yes or no,

460
00:17:17,469 --> 00:17:19,671
which technique can be
used for the remote event

461
00:17:19,671 --> 00:17:20,906
or something,

462
00:17:20,906 --> 00:17:21,973
you can executive
something remotely.

463
00:17:21,973 --> 00:17:23,408
Very interesting,

464
00:17:23,409 --> 00:17:25,577
because when you start mapping
that to your, let's say,

465
00:17:25,577 --> 00:17:28,280
to your model into how
your system operates

466
00:17:28,280 --> 00:17:30,682
and how the data flows
go from system to system,

467
00:17:30,682 --> 00:17:32,684
then you can start
mapping which techniques

468
00:17:32,684 --> 00:17:36,455
will actually happen
across two systems,

469
00:17:36,455 --> 00:17:37,856
for example.

470
00:17:37,856 --> 00:17:39,491
What can you execute remotely
and things like that.

471
00:17:39,491 --> 00:17:40,726
When you start
understanding this,

472
00:17:40,726 --> 00:17:41,960
then you can start then saying,

473
00:17:41,960 --> 00:17:45,497
you know what from
a data perspective,

474
00:17:46,799 --> 00:17:48,333
process monitoring, for example,

475
00:17:48,333 --> 00:17:52,805
covers 70% of the current
219 techniques in ATT&CK,

476
00:17:52,805 --> 00:17:54,039
for example,

477
00:17:54,039 --> 00:17:55,474
and if you go a little
further and you say,

478
00:17:55,474 --> 00:17:57,509
okay, now map techniques
of data sources,

479
00:17:57,509 --> 00:18:00,446
I can map into a tactic but
all the way to a platform,

480
00:18:00,446 --> 00:18:01,679
then you can say,

481
00:18:01,680 --> 00:18:03,949
okay, you know what,
based out of this 70%,

482
00:18:03,949 --> 00:18:07,219
now of course Mac OS,
LINUX, and Windows,

483
00:18:07,219 --> 00:18:10,622
from a data perspective,
Mac, it's kind of like,

484
00:18:13,258 --> 00:18:15,327
it kind of overlaps,
that's the word, sorry.

485
00:18:15,327 --> 00:18:17,463
It kind of overlaps,
that's why if you add,

486
00:18:17,463 --> 00:18:18,629
for example, the
numbers in there,

487
00:18:18,630 --> 00:18:20,666
it's not going to give you 149,

488
00:18:20,666 --> 00:18:24,436
but it actually tells you
from an attack perspective,

489
00:18:24,436 --> 00:18:27,939
from a Windows
side, 60% is still,

490
00:18:27,940 --> 00:18:30,175
process monitoring is
going to help you across

491
00:18:30,175 --> 00:18:31,443
all these techniques.

492
00:18:31,443 --> 00:18:32,845
If you go a little
bit further then,

493
00:18:32,845 --> 00:18:35,013
and you start getting
information from

494
00:18:35,013 --> 00:18:36,581
an adversarial perspective,

495
00:18:36,582 --> 00:18:39,885
and you can map it from
adversary uses this software,

496
00:18:39,885 --> 00:18:41,086
software is used
for this technique,

497
00:18:41,086 --> 00:18:42,554
and things like that,

498
00:18:42,554 --> 00:18:44,523
then you can start getting
some numbers such as,

499
00:18:44,523 --> 00:18:46,692
based on all this information,

500
00:18:46,692 --> 00:18:49,394
I can tell that 65% of
the groups identified

501
00:18:49,394 --> 00:18:50,629
in MITRE ATT&CK,

502
00:18:50,629 --> 00:18:53,665
they like a shell,
or in this case,

503
00:18:53,665 --> 00:18:57,101
would be a reverse shell,
using a standard protocol.

504
00:18:57,102 --> 00:18:58,670
For example, application
layer protocol,

505
00:18:58,670 --> 00:19:00,205
such as DNS and stuff like that.

506
00:19:00,205 --> 00:19:01,673
So, you can get information

507
00:19:01,673 --> 00:19:04,042
and then you can
pretty much map to

508
00:19:04,042 --> 00:19:05,444
what you're trying
to accomplish with

509
00:19:05,444 --> 00:19:07,412
your hunting engagement.

510
00:19:07,412 --> 00:19:09,515
And, in this specific scenario,

511
00:19:09,515 --> 00:19:11,150
from an adversarial perspective,

512
00:19:11,150 --> 00:19:14,353
Katie Nickels has done a
great job in talking about

513
00:19:14,353 --> 00:19:15,787
all those difference
presentations,

514
00:19:15,787 --> 00:19:17,990
MITRE has been pushing
that a lot, which is great,

515
00:19:17,990 --> 00:19:21,360
and those are two presentations
that you can actually see

516
00:19:21,360 --> 00:19:23,328
and you can get
information such as

517
00:19:23,328 --> 00:19:26,732
challenges you face when
you try to, you know,

518
00:19:26,732 --> 00:19:29,168
use an attack for cyber
threat intelligence

519
00:19:29,168 --> 00:19:31,602
and things like that,
so pretty useful.

520
00:19:31,603 --> 00:19:35,207
So, now that we understand
how we can use ATT&CK

521
00:19:35,207 --> 00:19:38,810
and start mapping it to
a detection strategy,

522
00:19:38,810 --> 00:19:40,979
then let's go back
to the data sources.

523
00:19:40,979 --> 00:19:42,214
In a data source piece,

524
00:19:42,214 --> 00:19:43,415
it's when you start
understanding that

525
00:19:43,415 --> 00:19:44,683
they give you a lot of data,

526
00:19:44,683 --> 00:19:47,019
and this is just for reference,

527
00:19:47,019 --> 00:19:48,253
'cause you can go
back to the slides

528
00:19:48,253 --> 00:19:49,487
and you can see all
the data sources

529
00:19:49,488 --> 00:19:52,724
that MITRE maps
their techniques.

530
00:19:52,724 --> 00:19:54,659
You can then just pick,
let's say these two.

531
00:19:54,660 --> 00:19:57,763
Process command-line parameters
and process monitoring.

532
00:19:57,763 --> 00:20:00,765
To me, that pretty
much maps to a process.

533
00:20:00,766 --> 00:20:02,301
What a process is, right?

534
00:20:02,301 --> 00:20:04,903
And I define what I
need from that process.

535
00:20:04,903 --> 00:20:07,005
When you start
doing this exercise,

536
00:20:07,005 --> 00:20:09,208
you're technically
modeling your data,

537
00:20:09,208 --> 00:20:12,044
because you're finding this
structure of your data,

538
00:20:12,044 --> 00:20:13,645
and you're finding relationships

539
00:20:13,645 --> 00:20:16,415
across what you're
identifying with ATT&CK.

540
00:20:16,415 --> 00:20:18,783
MITRE did a great
job also by creating,

541
00:20:18,784 --> 00:20:21,553
starting a model, like
a couple years ago,

542
00:20:21,553 --> 00:20:24,656
with the first versions
of Sysmon, and there,

543
00:20:24,656 --> 00:20:28,527
we're using the Cyber
Observable Objects from STIX 2

544
00:20:28,527 --> 00:20:32,598
to define what the pretty
much the data objects are.

545
00:20:32,598 --> 00:20:36,501
In this case, I'm defining my
own two in my project OSSEM,

546
00:20:36,501 --> 00:20:41,306
when I say an IP, it's supposed
to have this information.

547
00:20:41,306 --> 00:20:44,743
A process supposed to have
this, and then a file as well.

548
00:20:44,743 --> 00:20:46,445
When I starting doing this,

549
00:20:46,445 --> 00:20:48,614
I find that there
are relationships
across these objects

550
00:20:48,614 --> 00:20:50,649
because a process
creates a file,

551
00:20:50,649 --> 00:20:52,417
a process creates
another process,

552
00:20:52,417 --> 00:20:54,519
and a process can
connect to an IP.

553
00:20:54,519 --> 00:20:56,989
When I started doing
that, then I understand,

554
00:20:56,989 --> 00:20:59,458
I start building relationships,

555
00:20:59,458 --> 00:21:01,526
source destination,
my relationship,

556
00:21:01,526 --> 00:21:03,028
like a basic probably graphing,

557
00:21:03,028 --> 00:21:05,364
but you're defining the model,

558
00:21:05,364 --> 00:21:09,334
so in this case, if we go back
to process use of network,

559
00:21:09,334 --> 00:21:11,737
then I can tell you
that's basically

560
00:21:11,737 --> 00:21:15,073
a process connecting to an
IP to a host name to a URL.

561
00:21:15,974 --> 00:21:18,510
Then I can say, you know what,

562
00:21:18,510 --> 00:21:19,878
the information that
I need from a process,

563
00:21:19,878 --> 00:21:21,612
the information I
need from an IP,

564
00:21:21,613 --> 00:21:23,515
will give me that information,

565
00:21:23,515 --> 00:21:27,352
so I can start saying
Sysmon can give me process,

566
00:21:27,352 --> 00:21:29,788
at the same Windows Security
event logs can give me

567
00:21:29,788 --> 00:21:31,156
a process information,

568
00:21:31,156 --> 00:21:34,726
it's on the, that is
being done already.

569
00:21:34,726 --> 00:21:36,595
But, at the same time, you
have to understand that

570
00:21:36,595 --> 00:21:39,431
when you start looking into
the data that you need,

571
00:21:39,431 --> 00:21:41,133
versus what you have,

572
00:21:41,133 --> 00:21:44,770
you're gonna start measuring
that you can tell that

573
00:21:44,770 --> 00:21:49,741
4688 is missing some fields
that might be valuable to you

574
00:21:50,509 --> 00:21:51,777
that Sysmon provides.

575
00:21:51,777 --> 00:21:53,945
Now, this is not the only thing.

576
00:21:53,945 --> 00:21:58,917
If you look into this
specific relationship,

577
00:22:00,385 --> 00:22:02,220
we're just talking about it
from a process perspective.

578
00:22:02,220 --> 00:22:04,890
Sysmon event ID
one, and then 4688

579
00:22:04,890 --> 00:22:09,895
but our relationship was a
process connecting to an IP,

580
00:22:10,495 --> 00:22:11,963
so for that,

581
00:22:11,963 --> 00:22:13,899
you will pretty much need
Sysmon event ID three,

582
00:22:13,899 --> 00:22:15,132
for example.

583
00:22:15,133 --> 00:22:17,903
So, that changes how
your data connects to

584
00:22:17,903 --> 00:22:20,939
what MITRE pretty much asks
you to, or not ask you,

585
00:22:20,939 --> 00:22:25,077
but request, no, recommend
to use as a data set

586
00:22:25,077 --> 00:22:26,812
for the detection
of that technique.

587
00:22:26,812 --> 00:22:28,080
So, that's when you start saying

588
00:22:28,080 --> 00:22:30,482
I shouldn't just be
mapping out process,

589
00:22:30,482 --> 00:22:34,653
data set that MITRE recommends
to Sysmon Event ID One,

590
00:22:34,653 --> 00:22:37,756
because then if you require
process use of network,

591
00:22:37,756 --> 00:22:40,759
you might not get it
with 4688 by itself.

592
00:22:40,759 --> 00:22:42,094
That's when you
start pretty much

593
00:22:42,094 --> 00:22:44,262
putting numbers
into what you have

594
00:22:44,262 --> 00:22:46,231
in order to go
and start hunting.

595
00:22:46,231 --> 00:22:49,368
In this case, I was able to
create just a basic model

596
00:22:49,368 --> 00:22:51,303
also from what Sysmon provides

597
00:22:51,303 --> 00:22:53,305
so depending on the events,

598
00:22:53,305 --> 00:22:56,441
a process creates a
process creates a file,

599
00:22:56,441 --> 00:22:58,276
creates a registry,
it renames a registry,

600
00:22:58,276 --> 00:22:59,778
it updates a registry,

601
00:22:59,778 --> 00:23:02,481
you have a lot of information
that they provide to you.

602
00:23:02,481 --> 00:23:05,817
That's what you should be doing,
modeling your data sources,

603
00:23:05,817 --> 00:23:08,854
so you can tell what data
sources will help you

604
00:23:08,854 --> 00:23:11,390
for specific techniques,
and not just say,

605
00:23:11,390 --> 00:23:12,891
process this what I have.

606
00:23:12,891 --> 00:23:14,926
No, if a process talks to an IP,

607
00:23:14,926 --> 00:23:17,462
I might need a
different data source.

608
00:23:17,462 --> 00:23:18,697
So, in this case,

609
00:23:18,697 --> 00:23:19,965
what we're trying
to accomplish is

610
00:23:19,965 --> 00:23:22,567
that we don't just wanna
do a mapping of a tactic

611
00:23:22,567 --> 00:23:24,069
straight to Sysmon,

612
00:23:24,069 --> 00:23:27,339
we want to start creating
a connection that will

613
00:23:27,339 --> 00:23:30,108
make sense where
metrics can be applied.

614
00:23:30,108 --> 00:23:33,311
I can just start measuring
exactly what I need for Sysmon,

615
00:23:33,311 --> 00:23:35,880
what I have, how it
helps me to create also

616
00:23:35,881 --> 00:23:38,116
analytics based
on the data model,

617
00:23:38,116 --> 00:23:41,520
that to me is more
valuable than just saying,

618
00:23:41,520 --> 00:23:45,824
I bought this tool and
detects power shell.

619
00:23:45,824 --> 00:23:47,225
No, right?

620
00:23:47,225 --> 00:23:49,727
So that gives me extra
context that would allow me to

621
00:23:49,728 --> 00:23:51,363
be a little bit more efficient.

622
00:23:51,363 --> 00:23:53,331
We come back to efficiency.

623
00:23:53,331 --> 00:23:55,000
I don't want to just
detect something.

624
00:23:55,000 --> 00:23:59,471
I want to be efficient
at detecting that
specific technique.

625
00:23:59,471 --> 00:24:01,173
And, of course, remember,

626
00:24:01,173 --> 00:24:05,177
as Matt Graver and also
Lee Christenson in Blackhat

627
00:24:05,177 --> 00:24:07,344
they were showing how also

628
00:24:07,345 --> 00:24:09,181
an attacker can
influence the data,

629
00:24:09,181 --> 00:24:11,516
and in this case, in
this specific event,

630
00:24:11,516 --> 00:24:12,884
these are the data fields that

631
00:24:12,884 --> 00:24:15,287
it's high attacker
influence rating.

632
00:24:15,287 --> 00:24:17,489
So, when you understand
that as well,

633
00:24:17,489 --> 00:24:19,791
then you can start weighting

634
00:24:19,791 --> 00:24:22,427
when you build your
signatures for example,

635
00:24:22,427 --> 00:24:24,896
if you have to build
a signature at the end

636
00:24:24,896 --> 00:24:26,697
because your hunting
engagement might not be

637
00:24:26,698 --> 00:24:30,669
just needed to happen every day,

638
00:24:30,669 --> 00:24:33,605
you have to be mindful
that you gotta understand

639
00:24:33,605 --> 00:24:35,907
what an attacker can
also influence as well,

640
00:24:35,907 --> 00:24:38,577
and that might affect
your metrics as well.

641
00:24:38,577 --> 00:24:41,246
- [Devon] So this kind of
brings us to the question of,

642
00:24:41,246 --> 00:24:42,514
what can I measure?

643
00:24:42,514 --> 00:24:45,050
Because we talk about
measuring techniques,

644
00:24:45,050 --> 00:24:46,885
do we finally have what we need?

645
00:24:48,253 --> 00:24:50,621
And, you know, data quality
is an important point.

646
00:24:50,622 --> 00:24:54,092
I mean, we can express
availability of
evidence as a ratio,

647
00:24:54,092 --> 00:24:57,562
we can attempt to assess
data just by eyeballing it,

648
00:24:57,562 --> 00:24:59,997
but there actually are
ways for us to measure

649
00:24:59,998 --> 00:25:01,900
the quality of our data.

650
00:25:01,900 --> 00:25:04,269
This is a quote that
basically just gives

651
00:25:04,269 --> 00:25:07,472
an all purpose description
of what data quality is.

652
00:25:07,472 --> 00:25:09,307
The thing to take away is,

653
00:25:09,307 --> 00:25:11,743
it's only of high quality
if it serves the purpose

654
00:25:11,743 --> 00:25:14,279
you need it to work for,

655
00:25:14,279 --> 00:25:16,248
and the DOD, fortunately,

656
00:25:16,248 --> 00:25:20,018
has a standard that gives us
six attributes of data quality,

657
00:25:20,018 --> 00:25:23,355
and each one is represented
as a ratio or a percentage,

658
00:25:23,355 --> 00:25:24,856
which, again,

659
00:25:24,856 --> 00:25:26,892
fits right into our plan
of being able to quantify

660
00:25:26,892 --> 00:25:28,827
every aspect of this.

661
00:25:28,827 --> 00:25:32,497
We're gonna focus on
three of these aspects

662
00:25:32,497 --> 00:25:36,368
that we think are most
relevant to our purposes,

663
00:25:36,368 --> 00:25:38,970
just be aware those other
attributes do still exist

664
00:25:38,970 --> 00:25:40,472
and can still be measured.

665
00:25:40,472 --> 00:25:43,775
- And the reason why also
decided to pick these three,

666
00:25:43,775 --> 00:25:46,678
is because I heard this a lot,

667
00:25:46,678 --> 00:25:49,514
why do I have to focus
on doing some like

668
00:25:49,514 --> 00:25:51,081
data governance and management.

669
00:25:51,082 --> 00:25:52,817
I'm a threat hunter,

670
00:25:52,817 --> 00:25:54,886
what am I supposed to
be doing that stuff?

671
00:25:54,886 --> 00:25:57,722
Well, you as a hunter, based
on my experience as well,

672
00:25:57,722 --> 00:26:00,559
you can influence these
areas of data quality.

673
00:26:00,559 --> 00:26:03,461
You can influence completeness,
consistency, and timeliness

674
00:26:03,461 --> 00:26:04,729
in your engagements.

675
00:26:04,729 --> 00:26:06,264
And I think that's very
important to understand,

676
00:26:06,264 --> 00:26:08,233
so, also before that,

677
00:26:08,233 --> 00:26:10,735
it's very important to do this,

678
00:26:10,735 --> 00:26:14,139
because it's not just as
let's collect more data,

679
00:26:14,139 --> 00:26:15,106
I think I need this,

680
00:26:15,106 --> 00:26:17,175
let's collect this and use it.

681
00:26:17,175 --> 00:26:19,578
There should be a
process behind that,

682
00:26:19,578 --> 00:26:21,879
and data quality
accomplishes that,

683
00:26:21,880 --> 00:26:23,782
and just as data scientists,

684
00:26:23,782 --> 00:26:26,318
they spend the longest
time making sure

685
00:26:26,318 --> 00:26:29,254
that their data's consistent,
their data's complete,

686
00:26:29,254 --> 00:26:31,188
that's a similar process
that you should be doing.

687
00:26:31,189 --> 00:26:34,092
So, you as a hunter should
be also focusing on that.

688
00:26:34,092 --> 00:26:35,327
'Cause at the end,

689
00:26:35,327 --> 00:26:36,595
you're gonna come up
with the best analytic

690
00:26:36,595 --> 00:26:39,197
that you could use based
on the data you have,

691
00:26:39,197 --> 00:26:40,564
and you have to
create the process

692
00:26:40,565 --> 00:26:43,535
that every data set you
ingest something new

693
00:26:43,535 --> 00:26:45,036
has to go through
the same process.

694
00:26:45,036 --> 00:26:47,272
Is this consistent,
and is this complete,

695
00:26:47,272 --> 00:26:49,206
or do I have enough
of that data?

696
00:26:49,207 --> 00:26:50,609
From a completeness perspective,

697
00:26:50,609 --> 00:26:52,711
you know, we went over
this in Besides Charm

698
00:26:52,711 --> 00:26:54,279
and this is basically
is just saying,

699
00:26:54,279 --> 00:26:58,516
if I have Sysmon event ID
One and Windows Security 4688

700
00:26:58,516 --> 00:27:00,185
sure, am I collecting
everything?

701
00:27:00,185 --> 00:27:02,553
Well make sure that you are
enabling Process Command Line.

702
00:27:02,554 --> 00:27:06,291
So, understand what you really
need from a data perspective.

703
00:27:06,291 --> 00:27:09,761
Also, completeness can
translate into coverage as well,

704
00:27:09,761 --> 00:27:11,863
from a hunting
engagement, measure

705
00:27:11,863 --> 00:27:13,698
What is the data
available for you?

706
00:27:13,698 --> 00:27:16,434
Let's say, your scope
touches a hundred computers,

707
00:27:16,434 --> 00:27:18,637
and at the end of the day
you say, you know what,

708
00:27:18,637 --> 00:27:21,973
sorry, we couldn't reach,
you know, thirty computers.

709
00:27:21,973 --> 00:27:23,408
Measure that.

710
00:27:23,408 --> 00:27:25,010
Because, that would
actually drive the change

711
00:27:25,010 --> 00:27:27,178
to enabling probably logging,

712
00:27:27,178 --> 00:27:30,115
enabling controls that would
give you that data next.

713
00:27:30,115 --> 00:27:32,484
Consistency, very
interesting, as you can see,

714
00:27:32,484 --> 00:27:36,420
like Sysmon calls the
process Whole Path Image

715
00:27:36,421 --> 00:27:41,226
and then 4688 gives you New
Process as the data field name.

716
00:27:41,226 --> 00:27:44,696
If I run a query just saying
image equal something,

717
00:27:44,696 --> 00:27:47,131
I might not hit
those devices that

718
00:27:47,132 --> 00:27:49,734
probably don't have Sysmon
but that have, you know,

719
00:27:49,734 --> 00:27:51,136
Windows Security event logs.

720
00:27:51,136 --> 00:27:55,072
Very, very important,
as it was said before,

721
00:27:55,073 --> 00:27:58,009
not just because you run
a query on your console

722
00:27:58,009 --> 00:28:00,145
and nothing comes back,

723
00:28:00,145 --> 00:28:01,913
doesn't mean that
something happened.

724
00:28:01,913 --> 00:28:03,148
So, if you're not aware of

725
00:28:03,148 --> 00:28:06,150
the consistency
of your data sets,

726
00:28:06,151 --> 00:28:08,586
you're gonna have a hard time
proving that that's something,

727
00:28:08,586 --> 00:28:09,821
probably something
needs to happen.

728
00:28:09,821 --> 00:28:11,055
- In my old life,
I had a customer

729
00:28:11,056 --> 00:28:12,524
who really struggled with this,

730
00:28:12,524 --> 00:28:15,092
spending six figures
on data normalization

731
00:28:15,093 --> 00:28:16,861
just so that they
could run one query

732
00:28:16,861 --> 00:28:18,096
and not multiple queries

733
00:28:18,096 --> 00:28:19,664
'cause of all these
different field names.

734
00:28:19,664 --> 00:28:21,199
So, it does matter.

735
00:28:21,199 --> 00:28:23,201
It does, there are dollars
attached to those things.

736
00:28:23,201 --> 00:28:24,669
- So, that's actually,

737
00:28:24,669 --> 00:28:27,372
I'm actually trying to open
a company in doing that

738
00:28:28,540 --> 00:28:29,908
- [Devon] Exactly

739
00:28:29,908 --> 00:28:32,110
- So, in the
timeliness perspective,

740
00:28:32,110 --> 00:28:34,111
it goes in my opinion both ways.

741
00:28:34,112 --> 00:28:37,082
It could be the time that it
takes data to pretty much get

742
00:28:37,082 --> 00:28:40,351
to your database all the
way from the ETL procedures,

743
00:28:40,351 --> 00:28:43,154
Extraction, transformation,
and loads of the data.

744
00:28:43,154 --> 00:28:44,656
That's important,

745
00:28:44,656 --> 00:28:46,457
because that might define
reality in your environment.

746
00:28:46,458 --> 00:28:48,960
I was in an environment
that was taking minutes,

747
00:28:48,960 --> 00:28:51,596
up to an hour to actually show

748
00:28:51,596 --> 00:28:54,632
that that event happened
in an environment.

749
00:28:54,632 --> 00:28:56,134
That's very interesting,

750
00:28:56,134 --> 00:28:58,069
because when you do
some type of simulation,

751
00:28:58,069 --> 00:29:00,138
you might say, uh,
that didn't happen,

752
00:29:00,138 --> 00:29:02,340
I might not have detection,
or the data sources.

753
00:29:02,340 --> 00:29:06,010
No, it's just saying so much
time to get to your stack,

754
00:29:06,010 --> 00:29:08,747
to your L stack or anything,
and then at the end of course,

755
00:29:08,747 --> 00:29:12,117
the data retention,
because, believe me,

756
00:29:12,117 --> 00:29:14,185
you might not find a
pattern in seven days,

757
00:29:14,185 --> 00:29:16,321
but if you extend it to a month,

758
00:29:16,321 --> 00:29:17,889
then you might have a
better understanding

759
00:29:17,889 --> 00:29:19,389
of your environment.

760
00:29:19,390 --> 00:29:22,761
I've done hunts in seven
days, they are the worst,

761
00:29:22,761 --> 00:29:27,065
because, unless you export
the data out of every week

762
00:29:27,065 --> 00:29:29,601
and then kind of like put it
together, it's really hard,

763
00:29:29,601 --> 00:29:31,870
because by the time you
get to the next week

764
00:29:31,870 --> 00:29:34,205
and the third week, the data
that you wanted to test later,

765
00:29:34,205 --> 00:29:36,174
then it's gone and then
you're testing new data

766
00:29:36,174 --> 00:29:37,841
and your whole pattern changes.

767
00:29:37,842 --> 00:29:39,577
It's really hard,
so measure that.

768
00:29:39,577 --> 00:29:42,080
Every engagement, how
much data to we have,

769
00:29:42,080 --> 00:29:45,183
and believe me, that drives
also changes to your,

770
00:29:45,183 --> 00:29:47,585
you know, to your
resources that you have.

771
00:29:47,585 --> 00:29:49,888
- [Devon] So, there's lots
of things you could measure,

772
00:29:49,888 --> 00:29:51,356
you could talk about,

773
00:29:51,356 --> 00:29:53,324
like hunt metrics as part
of your organization,

774
00:29:53,324 --> 00:29:54,993
we provide some
examples right here

775
00:29:54,993 --> 00:29:57,162
of things that
might be relevant.

776
00:29:57,162 --> 00:29:59,264
But, we thought we would give
you a couple of alternatives

777
00:29:59,264 --> 00:30:01,800
to this that might be a
little bit more effective,

778
00:30:01,800 --> 00:30:04,369
and I think, next slide Roberto.

779
00:30:04,369 --> 00:30:05,637
- [Roberto] Yup.

780
00:30:05,637 --> 00:30:07,004
- [Devon] And I think
that this is a graphic

781
00:30:07,005 --> 00:30:08,406
that we've shown before.

782
00:30:08,406 --> 00:30:11,109
This is how some folks
attempt to represent coverage,

783
00:30:11,109 --> 00:30:12,677
and I mean,

784
00:30:12,677 --> 00:30:15,513
how do you derive any meaning
from something like this?

785
00:30:15,513 --> 00:30:17,081
- [Roberto] Yeah, like, I
believe that, for example,

786
00:30:17,081 --> 00:30:19,216
I started doing something
similar before as well,

787
00:30:19,217 --> 00:30:22,120
and this is what I like
to keep doing research,

788
00:30:22,120 --> 00:30:25,122
'cause also improves the way
how I try to look at things,

789
00:30:25,123 --> 00:30:26,491
and I learn from other people,

790
00:30:26,491 --> 00:30:28,726
and I think that we all
should be doing that.

791
00:30:28,726 --> 00:30:31,362
In this case, even though now
we have a better understanding

792
00:30:31,362 --> 00:30:33,731
that a color now might be tied

793
00:30:33,731 --> 00:30:36,467
to percentages of
how much data I have

794
00:30:36,467 --> 00:30:38,903
for that detection of that
technique and things like that,

795
00:30:38,903 --> 00:30:42,073
you might have that
understanding now,
after this talk,

796
00:30:42,073 --> 00:30:45,243
but still, hard to visualize.

797
00:30:45,243 --> 00:30:47,779
You might have a legend
with like ten colors,

798
00:30:47,779 --> 00:30:49,581
and you would be going back
to your legend and saying,

799
00:30:49,581 --> 00:30:52,449
wait, red meant this,
purple meant, wait, green,

800
00:30:52,450 --> 00:30:54,853
no, green to me actually
means something better than--

801
00:30:54,853 --> 00:30:56,054
- [Devon] Green with black
text, green with white text.

802
00:30:56,054 --> 00:30:57,088
- [Roberto] Yeah, (laughs)

803
00:30:57,088 --> 00:30:58,556
so, to me,

804
00:30:58,556 --> 00:31:00,625
you have to start understanding
that your visualizations,

805
00:31:00,625 --> 00:31:03,161
you gotta tell a story
with your visualizations,

806
00:31:03,161 --> 00:31:04,662
and, in this case, for example,

807
00:31:04,662 --> 00:31:08,199
I'm a fan of going now
from the lightest color

808
00:31:08,199 --> 00:31:10,702
to the darkest color because
it tells you something.

809
00:31:10,702 --> 00:31:12,670
This one tells me
a lot of things,

810
00:31:12,670 --> 00:31:13,938
but I don't know what they are.

811
00:31:13,938 --> 00:31:15,373
This tells me at least that

812
00:31:15,373 --> 00:31:18,009
there is some type of
progress might be happening,

813
00:31:18,009 --> 00:31:20,712
something that
probably, a weakness,

814
00:31:20,712 --> 00:31:23,281
to something more like
stronger and things like that,

815
00:31:23,281 --> 00:31:25,015
so it gives me something
to think about,

816
00:31:25,016 --> 00:31:27,418
and then you apply of
course now what they mean

817
00:31:27,418 --> 00:31:29,954
and things like that, but this
should tell you something.

818
00:31:29,954 --> 00:31:33,725
Now, what I like about
using a heat map, also,

819
00:31:33,725 --> 00:31:38,730
was that after talking
about data against attack,

820
00:31:39,464 --> 00:31:40,665
if somebody now asks me,

821
00:31:40,665 --> 00:31:43,167
what is my detection
for Powershell,

822
00:31:43,167 --> 00:31:46,638
if I go to the data sources
that I'm actually collecting,

823
00:31:46,638 --> 00:31:50,975
to validate the
detection of a technique

824
00:31:50,975 --> 00:31:54,478
that uses a variant, that
uses Powershell, then to me,

825
00:31:54,479 --> 00:31:58,016
it's easy to say, hey, all
these techniques in here,

826
00:31:58,016 --> 00:32:00,285
depending on the variant
that you're using,

827
00:32:00,285 --> 00:32:02,253
I have data sources
that map to Powershell,

828
00:32:02,253 --> 00:32:04,422
so my detection about Powershell

829
00:32:04,422 --> 00:32:07,825
is across this fifty techniques,
or a hundred techniques.

830
00:32:07,825 --> 00:32:12,830
Then saying, Powershell is just
one square in the framework.

831
00:32:15,166 --> 00:32:16,801
So, once again,

832
00:32:16,801 --> 00:32:19,870
the heat map to you
at the beginning might
not be as useful,

833
00:32:19,871 --> 00:32:21,239
'cause you start
identifying those things,

834
00:32:21,239 --> 00:32:23,574
just say wait, but
Powershell is at one,

835
00:32:23,574 --> 00:32:24,842
so I don't like the
heat map anymore,

836
00:32:24,842 --> 00:32:27,045
I don't like the idea of
having this whole thing.

837
00:32:27,045 --> 00:32:29,480
It depends how you
use the heat map,

838
00:32:29,480 --> 00:32:31,482
and depends how
you are approaching

839
00:32:31,482 --> 00:32:33,084
the detection of techniques.

840
00:32:33,084 --> 00:32:35,687
In this case, if I look at
it from a data perspective,

841
00:32:35,687 --> 00:32:37,221
then to me it makes more sense.

842
00:32:37,221 --> 00:32:41,526
I can tell you what techniques
require specific data sources

843
00:32:41,526 --> 00:32:44,028
that might map to something
like Powershell, for example.

844
00:32:44,028 --> 00:32:45,697
And then, I would
love to do also,

845
00:32:45,697 --> 00:32:48,032
hunts depending on the data.

846
00:32:48,032 --> 00:32:50,702
I like to do that because it
touches several techniques,

847
00:32:50,702 --> 00:32:52,570
it doesn't just touch one thing.

848
00:32:52,570 --> 00:32:53,972
So, I think that
keep that in mind

849
00:32:53,972 --> 00:32:56,074
when you start applying
this type of scenarios.

850
00:32:56,074 --> 00:32:59,777
- And what you're seeing
here is ATT&CK Navigator,

851
00:32:59,777 --> 00:33:00,945
another wonderful thing

852
00:33:00,945 --> 00:33:02,413
that MITRE has released
to the community,

853
00:33:02,413 --> 00:33:04,816
and it's just the proportion
of data sources covered,

854
00:33:04,816 --> 00:33:07,185
versus those that exist as
recommended data sources.

855
00:33:07,185 --> 00:33:10,288
So, it's a simple ratio,
easy to get this map.

856
00:33:10,288 --> 00:33:12,056
- [Roberto] And when you start
doing these heat maps also,

857
00:33:12,056 --> 00:33:13,725
we recommend adding, um,

858
00:33:13,725 --> 00:33:16,093
you were saying that End
Game does that as well,

859
00:33:16,094 --> 00:33:19,697
which is a, starts
dividing something

860
00:33:19,697 --> 00:33:21,599
like coverage from
a data perspective,

861
00:33:21,599 --> 00:33:23,266
from an analytics perspective.

862
00:33:23,267 --> 00:33:26,104
You can start showing all
these different scenarios,

863
00:33:26,104 --> 00:33:27,672
and then have a
better understanding

864
00:33:27,672 --> 00:33:30,575
of what you're trying to
accomplish with each heat map.

865
00:33:30,575 --> 00:33:32,510
In this case, we
have data sources,

866
00:33:32,510 --> 00:33:34,178
in the other case
we have analytics,

867
00:33:34,178 --> 00:33:37,215
which is something that Devon
then will be talking about.

868
00:33:37,215 --> 00:33:40,118
- [Devon] Yeah, so we're
gonna jump right into this.

869
00:33:40,118 --> 00:33:42,587
This is not gonna be
a big data science

870
00:33:42,587 --> 00:33:44,288
or machine learning talk,

871
00:33:44,288 --> 00:33:47,225
but those principles
are definitely gonna
play a role here,

872
00:33:47,225 --> 00:33:49,459
if we can advance,

873
00:33:49,460 --> 00:33:51,462
and we're gonna go
right past this slide.

874
00:33:51,462 --> 00:33:52,964
So, we're gonna start
with statistics.

875
00:33:52,964 --> 00:33:57,235
In our last talk, we showed
this slide from Chris Gerritz.

876
00:33:57,235 --> 00:33:59,102
It is a great slide
that illustrates

877
00:33:59,103 --> 00:34:01,806
kind of the trade offs
between threat hunting

878
00:34:01,806 --> 00:34:04,509
and traditional like
alert based products,

879
00:34:04,509 --> 00:34:05,810
and basically the difference is,

880
00:34:05,810 --> 00:34:08,246
hunt products need to
be resistant to things

881
00:34:08,246 --> 00:34:10,248
like false negatives,
and false positives,

882
00:34:10,248 --> 00:34:12,617
but you don't want
that from a platform

883
00:34:12,617 --> 00:34:14,118
that's supposed to alert you,

884
00:34:14,118 --> 00:34:15,987
and where this usually
comes into conflict is

885
00:34:15,987 --> 00:34:18,689
if you use a product
that does both things,

886
00:34:18,688 --> 00:34:21,458
so a multi purpose device.

887
00:34:21,458 --> 00:34:23,493
So, what are we measuring now?

888
00:34:23,494 --> 00:34:25,830
Well, when alerts fire,
when we run hunts,

889
00:34:25,830 --> 00:34:28,165
we collect meta data
about true positives,

890
00:34:28,166 --> 00:34:30,968
which is just malicious
things we correctly call bad,

891
00:34:30,967 --> 00:34:32,335
and false positives,

892
00:34:32,335 --> 00:34:35,038
which are benign things
we incorrectly call bad.

893
00:34:35,039 --> 00:34:36,574
And, fortunately,

894
00:34:36,574 --> 00:34:39,077
statistics gives us a tool
for this called precision,

895
00:34:39,077 --> 00:34:40,745
which is just a simple division.

896
00:34:40,745 --> 00:34:43,181
All the stuff you label
in the denominator,

897
00:34:43,181 --> 00:34:45,550
the true positive
count in the numerator,

898
00:34:45,550 --> 00:34:48,419
and this winds up giving
you a confidence score,

899
00:34:48,418 --> 00:34:51,755
which is basically how much
you tolerate false positives,

900
00:34:51,755 --> 00:34:53,757
and here is just an example.

901
00:34:53,757 --> 00:34:58,763
100 events, 74 true
positives, a .74 precision,

902
00:34:58,763 --> 00:35:03,634
so 74% tolerant of true
and false positives.

903
00:35:03,634 --> 00:35:06,037
But, that's not a
good measurement

904
00:35:06,037 --> 00:35:09,707
of how well the analytic works.

905
00:35:09,707 --> 00:35:11,142
So, for that,

906
00:35:11,142 --> 00:35:13,144
statistics gives us a different
measurement called recall,

907
00:35:13,144 --> 00:35:15,546
and for this, we
need false negatives.

908
00:35:15,546 --> 00:35:18,049
These are the
things that are bad

909
00:35:18,049 --> 00:35:20,384
but which we don't
characterize as bad.

910
00:35:20,384 --> 00:35:21,886
These are the things we miss.

911
00:35:21,886 --> 00:35:23,988
So, when leadership asks
you, what have you missed?

912
00:35:23,988 --> 00:35:27,058
These are gonna be the things
that you wanna count up.

913
00:35:27,058 --> 00:35:29,827
And, again, you simply
divide your true positives

914
00:35:29,827 --> 00:35:31,996
by all the stuff
that is actually bad,

915
00:35:31,996 --> 00:35:34,132
whether you found it or not.

916
00:35:34,132 --> 00:35:36,367
In this example, does
give you that metadata.

917
00:35:36,367 --> 00:35:41,005
But you have to be wary of going
too far one way or another.

918
00:35:42,306 --> 00:35:44,475
If you happen to
have perfect recall,

919
00:35:44,475 --> 00:35:48,145
it basically means that
everything that you find,

920
00:35:48,146 --> 00:35:53,017
basically is bad, but all
the other stuff is bad too.

921
00:35:53,017 --> 00:35:54,551
You have so many false positives

922
00:35:54,552 --> 00:35:56,621
that it becomes a
meaningless measurement.

923
00:35:56,621 --> 00:35:58,623
And, with perfect recall,

924
00:35:58,623 --> 00:36:00,358
you wind up with a
similar challenge.

925
00:36:00,358 --> 00:36:02,994
So, going too far one way or
another does not help you,

926
00:36:02,994 --> 00:36:04,996
and you do have to
find a middle ground.

927
00:36:06,797 --> 00:36:08,699
So, there was a step I skipped,

928
00:36:08,699 --> 00:36:11,269
just for those folks
that might be concerned,

929
00:36:11,269 --> 00:36:14,906
how do you determine false
negatives in these contexts?

930
00:36:14,906 --> 00:36:17,842
And it's really the place
where adversary simulation

931
00:36:17,842 --> 00:36:19,544
plays a role.

932
00:36:19,544 --> 00:36:22,279
Adversary simulation
gives you counts

933
00:36:22,280 --> 00:36:24,615
of the techniques
that you want to find,

934
00:36:24,615 --> 00:36:28,386
so whether you're
using something like
Red Team Automation,

935
00:36:28,386 --> 00:36:32,390
whether you're using Ubers
Meta, MITRE's Caldera,

936
00:36:32,390 --> 00:36:35,593
any of the other frame
works that you can package,

937
00:36:35,593 --> 00:36:36,961
you can get those numbers.

938
00:36:36,961 --> 00:36:40,096
But, that doesn't mean
you can't use a red team

939
00:36:40,097 --> 00:36:41,632
or a third party
service provider,

940
00:36:41,632 --> 00:36:43,634
because they will
give you reports.

941
00:36:43,634 --> 00:36:46,470
So, harvest that data,
count those numbers,

942
00:36:46,470 --> 00:36:50,174
and suddenly you have the
ability to measure recall,

943
00:36:50,174 --> 00:36:51,876
which is a much more
meaningful measurement

944
00:36:51,876 --> 00:36:54,312
of how good you are
at detecting a thing.

945
00:36:54,312 --> 00:36:57,281
It also has a strong
relationship between
data quality,

946
00:36:57,281 --> 00:36:59,217
because if you don't have
the sources of evidence,

947
00:36:59,217 --> 00:37:00,985
if you can't run
analytics against them,

948
00:37:00,985 --> 00:37:03,087
if it's in a
different time frame,

949
00:37:03,087 --> 00:37:04,922
suddenly this whole
thing falls apart.

950
00:37:04,922 --> 00:37:06,958
All of your numbers
go sideways on you.

951
00:37:06,958 --> 00:37:08,226
- [Roberto] And the reason
why we're approaching it

952
00:37:08,226 --> 00:37:09,727
this way, also, is

953
00:37:09,727 --> 00:37:12,196
and keeping it separate from
the data quality measurements

954
00:37:12,196 --> 00:37:14,232
and this and that from
a hunting perspective,

955
00:37:14,232 --> 00:37:18,336
is that there might be
100, 10, or 5 variations

956
00:37:18,336 --> 00:37:20,770
of a technique, so
you cannot just say,

957
00:37:20,771 --> 00:37:23,474
if I have five
analytics, I'm great,

958
00:37:23,474 --> 00:37:25,476
'cause you cannot measure
it against something

959
00:37:25,476 --> 00:37:28,079
that might be infinite
or it's just unknown.

960
00:37:28,079 --> 00:37:29,313
It's something you cannot see,

961
00:37:29,313 --> 00:37:30,915
so that's why we like
to keep it separate

962
00:37:30,915 --> 00:37:33,150
and approach a secured
analytics development

963
00:37:33,150 --> 00:37:36,287
by applying the concept
of a true positive

964
00:37:36,287 --> 00:37:38,089
or if a false positive,
false negatives

965
00:37:38,089 --> 00:37:40,891
and things like that,
so that's actually--

966
00:37:40,891 --> 00:37:43,661
- [Devon] And this works whether
you're talking about rules,

967
00:37:43,661 --> 00:37:47,598
signatures, human processes,
or threat hunting.

968
00:37:47,598 --> 00:37:48,799
Those are all detections,

969
00:37:48,799 --> 00:37:50,835
and they can all be
measured in the same way

970
00:37:50,835 --> 00:37:52,937
using adversary
simulation as one input,

971
00:37:52,937 --> 00:37:55,406
again noting that adversary
simulation might be

972
00:37:55,406 --> 00:37:57,541
your red team or
other human beings.

973
00:37:57,541 --> 00:38:01,846
And, getting closer
to precision, getting
closer to recall,

974
00:38:01,846 --> 00:38:04,148
also lets you figure
out how good you are

975
00:38:04,148 --> 00:38:05,716
at assessing badness.

976
00:38:05,716 --> 00:38:09,553
And so, data science gives us
this thing called an F1 score.

977
00:38:09,553 --> 00:38:11,188
You can see the
formula kinda up here,

978
00:38:11,188 --> 00:38:13,658
and using the
examples from before,

979
00:38:13,658 --> 00:38:17,627
those analytics basically
gave us an F1 score of .71

980
00:38:17,628 --> 00:38:19,397
Now, this isn't an
arithmetic average,

981
00:38:19,397 --> 00:38:21,732
it's a harmonic average, which
makes it a little bit easier

982
00:38:21,732 --> 00:38:23,567
to express averages of rates,

983
00:38:23,567 --> 00:38:24,769
which is what we
are talking about,

984
00:38:24,769 --> 00:38:27,238
rates of true/false
positives, etc.

985
00:38:27,238 --> 00:38:30,241
and we can even map
these to visualizations

986
00:38:30,241 --> 00:38:33,778
that help us understand
what that looks like.

987
00:38:33,778 --> 00:38:37,781
So, here's a notion of what F1
scores look like in a table.

988
00:38:37,782 --> 00:38:41,719
This .71 basically means
that we're finding about 60%

989
00:38:41,719 --> 00:38:43,187
of all the bad stuff,

990
00:38:43,187 --> 00:38:45,122
which is the true positives
and the false negatives.

991
00:38:45,122 --> 00:38:46,857
We're just calculating
that ratio.

992
00:38:46,857 --> 00:38:49,393
But, if we advance, you'll
see that we might have

993
00:38:49,393 --> 00:38:52,596
an organizational need
for a different number.

994
00:38:52,596 --> 00:38:53,864
Slide ahead.

995
00:38:53,864 --> 00:38:55,166
So, what if you're
organization says,

996
00:38:55,166 --> 00:38:58,202
everything we do must
be 80% successful?

997
00:38:58,202 --> 00:39:01,138
Well, we could just figure
out our threshold is a .4

998
00:39:01,138 --> 00:39:02,373
and work backwards.

999
00:39:02,373 --> 00:39:05,009
We can figure out
how tolerant we are,

1000
00:39:05,009 --> 00:39:07,745
and then use available
data and human processes

1001
00:39:07,745 --> 00:39:11,115
to tweak these approaches,
whether it's a threat hunt,

1002
00:39:11,115 --> 00:39:14,185
an analytic, a signature,
we control this.

1003
00:39:14,185 --> 00:39:16,320
And this is a data
science based approach,

1004
00:39:16,320 --> 00:39:18,756
not a machine learning
based approach,

1005
00:39:18,756 --> 00:39:20,958
we do have to know what we miss

1006
00:39:20,958 --> 00:39:23,794
in order to create
these thresholds.

1007
00:39:23,794 --> 00:39:25,496
However, from there
you could go on it

1008
00:39:25,496 --> 00:39:28,199
and you could look at
finding ideal thresholds,

1009
00:39:28,199 --> 00:39:30,468
and this is a graph
that basically shows

1010
00:39:30,468 --> 00:39:33,703
what's called a Receiver
Operator Characteristic curve

1011
00:39:33,704 --> 00:39:35,172
or a ROC curve.

1012
00:39:35,172 --> 00:39:36,640
This is a tool that a lot
of data scientists use

1013
00:39:36,640 --> 00:39:39,777
to express how tolerant they
are of these thresholds,

1014
00:39:39,777 --> 00:39:42,813
basically the area underneath
is how well you're doing,

1015
00:39:42,813 --> 00:39:45,182
and you can measure
yourself against this,

1016
00:39:45,182 --> 00:39:47,651
even plot out
hypothetical curves,

1017
00:39:47,651 --> 00:39:49,486
to see if maybe
those give you better

1018
00:39:49,487 --> 00:39:51,255
true and false
positive averages.

1019
00:39:51,255 --> 00:39:54,191
And all of this can be
done with common scripting

1020
00:39:54,191 --> 00:39:57,227
and just a little bit of
visibility into your data.

1021
00:39:58,462 --> 00:40:00,998
So, that's a lot
- [Roberto] That's a lot

1022
00:40:00,998 --> 00:40:04,769
- That's a lot of stuff, we
could probably keep going

1023
00:40:04,769 --> 00:40:08,138
with more aspects of this topic,

1024
00:40:08,139 --> 00:40:10,574
I think measuring how well
we are at finding evil

1025
00:40:10,574 --> 00:40:12,576
is very important to
the entire industry

1026
00:40:13,811 --> 00:40:15,346
- Something I wanted
to add in there,

1027
00:40:15,346 --> 00:40:18,249
just a closing thought,
is that if you thought

1028
00:40:18,249 --> 00:40:20,451
that threat hunting was
just going to a console

1029
00:40:20,451 --> 00:40:23,220
and run a query and then
hunt like that for the month,

1030
00:40:23,220 --> 00:40:25,823
that's not what threat
hunting would be.

1031
00:40:25,823 --> 00:40:27,324
As you can tell,

1032
00:40:27,324 --> 00:40:29,727
there is a lot of stuff
that happens besides that.

1033
00:40:29,727 --> 00:40:32,163
You know, it's interesting when
I talk to people that does,

1034
00:40:32,163 --> 00:40:33,397
you know, data
science, I'm like,

1035
00:40:33,397 --> 00:40:35,199
where do you spend
your time the most?

1036
00:40:35,199 --> 00:40:38,135
I dunno, man, I do a lot of
this, like transformation,

1037
00:40:38,135 --> 00:40:40,638
make sure that everything's
good, the quality of my data,

1038
00:40:40,638 --> 00:40:43,107
and then when I
apply my model, yeah,

1039
00:40:43,107 --> 00:40:46,911
it might not be as the same
time as I spend doing the rest.

1040
00:40:46,911 --> 00:40:50,047
I see that as a similar
thing happening here

1041
00:40:50,047 --> 00:40:52,650
because we're trying to develop,
right, all these analytics,

1042
00:40:52,650 --> 00:40:55,619
trying to do all these crazy
stuff that is really cool,

1043
00:40:55,619 --> 00:40:57,188
but threat hunting,
in my opinion,

1044
00:40:57,188 --> 00:40:59,623
should be also looked at,
as there are procedures,

1045
00:40:59,623 --> 00:41:01,725
things that should happen
before you do that,

1046
00:41:01,725 --> 00:41:03,961
and also after you do
your hunting engagement,

1047
00:41:03,961 --> 00:41:07,031
that involve the whole
process of threat hunting.

1048
00:41:07,031 --> 00:41:08,866
And I think that's what
I want you to take away

1049
00:41:08,866 --> 00:41:10,835
from this presentation,
is to understand

1050
00:41:10,835 --> 00:41:14,371
what needs to happen when you
want to do threat hunting,

1051
00:41:14,371 --> 00:41:15,306
right?

1052
00:41:15,306 --> 00:41:16,540
Going to the console, go crazy,

1053
00:41:16,540 --> 00:41:18,642
do those analytics and
stuff, it's beyond that,

1054
00:41:18,642 --> 00:41:20,077
as you can tell.

1055
00:41:20,077 --> 00:41:22,279
You gotta focus a lot on
what you have, what you need,

1056
00:41:22,279 --> 00:41:24,081
how you can get better,
how can you be efficient,

1057
00:41:24,081 --> 00:41:26,250
how can you be organized, right?

1058
00:41:26,250 --> 00:41:27,852
You know, from an attack
perspective, for example,

1059
00:41:27,852 --> 00:41:29,386
MITRE ATT&CK frame work,

1060
00:41:29,386 --> 00:41:31,788
and then how you can start
to measure your analytics,

1061
00:41:31,789 --> 00:41:34,758
in a way that also would
impact your organization.

1062
00:41:34,758 --> 00:41:36,926
- So we've got three last
things, if we advance,

1063
00:41:36,927 --> 00:41:38,429
we do want to thank

1064
00:41:38,429 --> 00:41:39,697
- Oh
- Oh, so we've got some links.

1065
00:41:39,697 --> 00:41:41,165
We also have an appendix,

1066
00:41:41,165 --> 00:41:42,700
so it's stuff that we
didn't include in the talk

1067
00:41:42,700 --> 00:41:44,567
but we wanted you to
have in the slides,

1068
00:41:44,568 --> 00:41:47,671
but we also wanted to just
thank some folks, you know,

1069
00:41:47,671 --> 00:41:49,439
basically everybody
who contributed,

1070
00:41:49,440 --> 00:41:53,944
the Beyond Science Data
blog, MITRE of course,

1071
00:41:53,944 --> 00:41:56,547
and all of the defenders
who have given us feedback

1072
00:41:56,547 --> 00:41:58,349
since Besides Charm

1073
00:41:58,349 --> 00:42:00,551
- Thank you, thank you very much

1074
00:42:00,551 --> 00:42:05,523
(Applause)
(tinkling piano music)

1075
00:42:11,161 --> 00:42:13,898
(dramatic music)

