So I have a black box that is throwing streams of network data into two bins. I want to run a bayesian classifier (or some sort of classifier) to try and understand how it makes its choices. The input/output is easy.... I have parsers to break the problem into about 50 binary dimensions. But the problem is I want to tune a classifier and then understand the classification it comes up with in human-ish terms, not as some n-dimensional mapping based on quadratic functions or whatever the fuck you mathematicians do while wanking it.
No, I need actual things like "oie, it always goes bugger when the layer two protocol is six, mate!"
How does one do that?
Name:
Anonymous2013-05-11 2:28
any more info about the black box..? input layer size / type (binary inputs?) .. / hidden layer size?
That tends to be the problem with ML techniques (including neural networks): "it works very well, but we don't know exactly how it works/how to fine-tune it."
Name:
Anonymous2013-05-11 7:36
>>3 it works very well,
No it doesn't. Or at least I've never managed to get anything other than Rube Goldberg solutions for simple problems and they still take forever to converge.
- Inside Al Gore's Crazy Riches
- Ex-Dictator, 86, Gets 80 Years for...
- CO2 Levels Highest in Human History
- IRS: Yeah, Sorry, We Went After Tea Party...
- Paramedic at Texas Blast Arrested, Had...
Name:
Anonymous2013-05-11 9:29
How are you training it? Do you have multiple test sets? Are you overtraining?
If you're looking for specific characteristics, you're going to need supervise the thing yourself when it's training, or it might never actually become trained to find those characteristics.
In other words use an algorithm to run test sets and count outputs. If it's trained well enough your shit should find all kind of correllations between shits
Name:
Anonymous2013-05-11 13:58
Hax my separting hyperplane
Name:
Anonymous2013-05-11 14:25
separate my hyperanus
Name:
Anonymous2013-05-11 17:22
>>2
The black box is a network intrusion detection device. The network traffic is known to have specific attacks for training purposes. So I take it back about each dimension being binary, but each dimension will be pretty limited. The inputs will be stuff like protocol type at each layer, presence of flags in the tcp header, out-of-order packet, etc.
>>6
I have about 10,000 test pairs. I'll probably train with a random half, test with the other half. I may also do several iterations with initial values influenced by the lowest error in a previous trial.
Problem is I have done this (with moderate success) with neural networks on other problems, but as >>3 points out, parsing the meaning of the network feels as difficult as the original problem. And as >>4 points out, yes that shit can be slow to converge. I'm willing to throw a few thousand trials at finding the right coefficient of learning though.
Maybe this could be tackled with fuzzy predicate logic? Like I give all test pairs and it tries to find a small set of theories that maximizes the number of conditions that evaluate correctly? I'm pretty rusty on that shit.
hmm.. I don't know if this'll work, but anyway..
You must have labels i suppose, so split the training data into positives (Bad) and negatives (Allowed)..
Then, train two half-networks (first layer / semi-unsupervised) using the good ol' RBM method, -slightly modified- such that it mimics PCA. [Basically just start with one neuron and add them in as training progresses..]
Finally, combine the two halved first layers.. and train the final layer as a simple classifier using grad descent / etc..
It possibly might also help to do a bit of soft rbm on the composite first layer using the full data set before training the final layer..