Full Example
We have covered how to learn the structure and parameters of a Bayesian Belief Network BBN
. Let’s see how we can combine the structure and parameters to create a BBN. Additionally, let’s see how we can use the BBN for exact inference.
Load data
Let’s read our data into a Spark DataFrame SDF
.
[1]:
from pysparkbbn.discrete.data import DiscreteData
sdf = spark.read.csv('hdfs://localhost/data-1479668986461.csv', header=True)
data = DiscreteData(sdf)
Structure learning
Let’s pick the naive Bayes algorithm to learn the structure.
[2]:
from pysparkbbn.discrete.scblearn import Naive
naive = Naive(data, 'n3')
g = naive.get_network()
Parameter learning
After we have a structure, we can learn the parameters.
[3]:
from pysparkbbn.discrete.plearn import ParamLearner
import json
param_learner = ParamLearner(data, g)
p = param_learner.get_params()
print(json.dumps(p, indent=2))
{
"n3": [
{
"n3": "f",
"__p__": 0.47345
},
{
"n3": "t",
"__p__": 0.52655
}
],
"n1": [
{
"n1": "f",
"n3": "f",
"__p__": 0.8588024078572183
},
{
"n1": "t",
"n3": "f",
"__p__": 0.1411975921427817
},
{
"n1": "f",
"n3": "t",
"__p__": 0.6534042351153737
},
{
"n1": "t",
"n3": "t",
"__p__": 0.34659576488462635
}
],
"n2": [
{
"n2": "f",
"n3": "f",
"__p__": 0.8773893758580632
},
{
"n2": "t",
"n3": "f",
"__p__": 0.12261062414193685
},
{
"n2": "f",
"n3": "t",
"__p__": 0.44725097331687397
},
{
"n2": "t",
"n3": "t",
"__p__": 0.552749026683126
}
],
"n4": [
{
"n4": "f",
"n3": "f",
"__p__": 0.667546731439434
},
{
"n4": "t",
"n3": "f",
"__p__": 0.33245326856056606
},
{
"n4": "f",
"n3": "t",
"__p__": 0.1642768967809325
},
{
"n4": "t",
"n3": "t",
"__p__": 0.8357231032190675
}
],
"n5": [
{
"n5": "maybe",
"n3": "f",
"__p__": 0.29675784137712535
},
{
"n5": "no",
"n3": "f",
"__p__": 0.4307741049741261
},
{
"n5": "yes",
"n3": "f",
"__p__": 0.27246805364874854
},
{
"n5": "maybe",
"n3": "t",
"__p__": 0.29503370999905043
},
{
"n5": "no",
"n3": "t",
"__p__": 0.17909030481435761
},
{
"n5": "yes",
"n3": "t",
"__p__": 0.525875985186592
}
]
}
BBN
Now that we have the structure and parameters, we can build a BBN. Use the get_bbn
utility method to help bring together the structure and parameters.
[4]:
from pysparkbbn.discrete.bbn import get_bbn
bbn = get_bbn(g, p, data.get_profile())
Inference
With a BBN defined, we can use py-bbn to proceed with exact inference.
[5]:
from pybbn.pptc.inferencecontroller import InferenceController
join_tree = InferenceController.apply(bbn)
for node, posteriors in join_tree.get_posteriors().items():
p = ', '.join([f'{val}={prob:.5f}' for val, prob in posteriors.items()])
print(f'{node} : {p}')
n3 : f=0.47345, t=0.52655
n1 : f=0.75065, t=0.24935
n2 : f=0.65090, t=0.34910
n4 : f=0.40255, t=0.59745
n5 : maybe=0.29585, no=0.29825, yes=0.40590