Search-and-Scoring
Search-and-scoring approaches to learning BBN structures search over a space of BBN structures and score the candidates. The highest scoring BBN structure is typically the output.
Load data
Let’s read our data into a Spark DataFrame SDF
.
[1]:
from pysparkbbn.discrete.data import DiscreteData
sdf = spark.read.csv('hdfs://localhost/data-1479668986461.csv', header=True)
data = DiscreteData(sdf)
Genetic algorithm
We use genetic algorithm GA
as a search-and-scoring approach to learning BBN structures. In general, the GA algorithm has the following major steps.
Initialization: a population of BBN structures
Fitness: the population is scored according to a fitness function and filtered
Crossover: two parents from the population undergo a crossover operation to produce two new offspring
Mutation: each offspring undergo a mutation operation
The fitness, crossover and mutation steps are repeated until a threshold of iterations is reached or there is convergence (a higher scoring BBN structure cannot be discovered).
[2]:
from pysparkbbn.discrete.ssslearn import Ga
ga = Ga(data, sc, max_iters=3)
g = ga.get_network()
[5]:
import matplotlib.pyplot as plt
import networkx as nx
fig, ax = plt.subplots(figsize=(5, 5))
nx.draw(g,
with_labels=True,
node_size=500,
alpha=0.8,
font_weight='bold',
font_family='monospace',
node_color='r',
arrowsize=15,
ax=ax)