Full Example

We have covered how to learn the structure and parameters of a Bayesian Belief Network BBN. Let’s see how we can combine the structure and parameters to create a BBN. Additionally, let’s see how we can use the BBN for exact inference.

Load data

Let’s read our data into a Spark DataFrame SDF.

[1]:

from pysparkbbn.discrete.data import DiscreteData

sdf = spark.read.csv('hdfs://localhost/data-1479668986461.csv', header=True)
data = DiscreteData(sdf)

Structure learning

Let’s pick the naive Bayes algorithm to learn the structure.

[2]:

from pysparkbbn.discrete.scblearn import Naive

naive = Naive(data, 'n3')
g = naive.get_network()

Parameter learning

After we have a structure, we can learn the parameters.

[3]:

from pysparkbbn.discrete.plearn import ParamLearner
import json

param_learner = ParamLearner(data, g)
p = param_learner.get_params()

print(json.dumps(p, indent=2))

{
  "n3": [
    {
      "n3": "f",
      "__p__": 0.47345
    },
    {
      "n3": "t",
      "__p__": 0.52655
    }
  ],
  "n1": [
    {
      "n1": "f",
      "n3": "f",
      "__p__": 0.8588024078572183
    },
    {
      "n1": "t",
      "n3": "f",
      "__p__": 0.1411975921427817
    },
    {
      "n1": "f",
      "n3": "t",
      "__p__": 0.6534042351153737
    },
    {
      "n1": "t",
      "n3": "t",
      "__p__": 0.34659576488462635
    }
  ],
  "n2": [
    {
      "n2": "f",
      "n3": "f",
      "__p__": 0.8773893758580632
    },
    {
      "n2": "t",
      "n3": "f",
      "__p__": 0.12261062414193685
    },
    {
      "n2": "f",
      "n3": "t",
      "__p__": 0.44725097331687397
    },
    {
      "n2": "t",
      "n3": "t",
      "__p__": 0.552749026683126
    }
  ],
  "n4": [
    {
      "n4": "f",
      "n3": "f",
      "__p__": 0.667546731439434
    },
    {
      "n4": "t",
      "n3": "f",
      "__p__": 0.33245326856056606
    },
    {
      "n4": "f",
      "n3": "t",
      "__p__": 0.1642768967809325
    },
    {
      "n4": "t",
      "n3": "t",
      "__p__": 0.8357231032190675
    }
  ],
  "n5": [
    {
      "n5": "maybe",
      "n3": "f",
      "__p__": 0.29675784137712535
    },
    {
      "n5": "no",
      "n3": "f",
      "__p__": 0.4307741049741261
    },
    {
      "n5": "yes",
      "n3": "f",
      "__p__": 0.27246805364874854
    },
    {
      "n5": "maybe",
      "n3": "t",
      "__p__": 0.29503370999905043
    },
    {
      "n5": "no",
      "n3": "t",
      "__p__": 0.17909030481435761
    },
    {
      "n5": "yes",
      "n3": "t",
      "__p__": 0.525875985186592
    }
  ]
}

BBN

Now that we have the structure and parameters, we can build a BBN. Use the get_bbn utility method to help bring together the structure and parameters.

[4]:

from pysparkbbn.discrete.bbn import get_bbn

bbn = get_bbn(g, p, data.get_profile())

Inference

With a BBN defined, we can use py-bbn to proceed with exact inference.

[5]:

from pybbn.pptc.inferencecontroller import InferenceController

join_tree = InferenceController.apply(bbn)

for node, posteriors in join_tree.get_posteriors().items():
    p = ', '.join([f'{val}={prob:.5f}' for val, prob in posteriors.items()])
    print(f'{node} : {p}')

n3 : f=0.47345, t=0.52655
n1 : f=0.75065, t=0.24935
n2 : f=0.65090, t=0.34910
n4 : f=0.40255, t=0.59745
n5 : maybe=0.29585, no=0.29825, yes=0.40590