In this tutorial, we explore how to benefit Dirty Ecosystem for the construction and analysis of the rich biological geological Junowledge Graph directly within Google Colab. We start by installing all the necessary packages including Pibell, NetworkX, Metplotlib, Siborn and Panda. We then show how to define protein, processes and changes using PVable DSL. From there, we guide you through the design of Alzheimer’s disease, how to encode functional relationships, protein-protein interactions, and phosphoration events. Along with graph construction, we present advanced network analysis, including centrifugal measures, node classification and subgraph extraction, as well as technologies for testimonial and evidence data. By the end of this section, you will have a full OT noted bell graph for interactive biological geological geological geological junowledge of research, downstream visualization and breeding analysis.
!pip install pybel pybel-tools networkx matplotlib seaborn pandas -q
import pybel
import pybel.dsl as dsl
from pybel import BELGraph
from pybel.io import to_pickle, from_pickle
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from collections import Counter
import warnings
warnings.filterwarnings('ignore')
print("PyBEL Advanced Tutorial: Biological Expression Language Ecosystem")
print("=" * 65)
We start by installing a pibble and its dependence directly in the colab, ensuring that all the necessary libraries, networkX, metplotlib, seaborne and panda are available for our analysis. Once installed, we import core modules to clean and focus on the results of our notebook and suppress the warnings.
print("\n1. Building a Biological Knowledge Graph")
print("-" * 40)
graph = BELGraph(
name="Alzheimer's Disease Pathway",
version="1.0.0",
description="Example pathway showing protein interactions in AD",
authors="PyBEL Tutorial"
)
app = dsl.Protein(name="APP", namespace="HGNC")
abeta = dsl.Protein(name="Abeta", namespace="CHEBI")
tau = dsl.Protein(name="MAPT", namespace="HGNC")
gsk3b = dsl.Protein(name="GSK3B", namespace="HGNC")
inflammation = dsl.BiologicalProcess(name="inflammatory response", namespace="GO")
apoptosis = dsl.BiologicalProcess(name="apoptotic process", namespace="GO")
graph.add_increases(app, abeta, citation="PMID:12345678", evidence="APP cleavage produces Abeta")
graph.add_increases(abeta, inflammation, citation="PMID:87654321", evidence="Abeta triggers neuroinflammation")
tau_phosphorylated = dsl.Protein(name="MAPT", namespace="HGNC",
variants=(dsl.ProteinModification("Ph")))
graph.add_increases(gsk3b, tau_phosphorylated, citation="PMID:11111111", evidence="GSK3B phosphorylates tau")
graph.add_increases(tau_phosphorylated, apoptosis, citation="PMID:22222222", evidence="Hyperphosphorylated tau causes cell death")
graph.add_increases(inflammation, apoptosis, citation="PMID:33333333", evidence="Inflammation promotes apoptosis")
graph.add_association(abeta, tau, citation="PMID:44444444", evidence="Abeta and tau interact synergistically")
print(f"Created BEL graph with {graph.number_of_nodes()} nodes and {graph.number_of_edges()} edges")
We start Belgraf with metadata for the Alzheimer’s disease route and define proteins and processes using Pybel DSL. By adding cause relationships, protein changes and organizations, we create a comprehensive network that captures key nuclear interactions.
print("\n2. Advanced Network Analysis")
print("-" * 30)
degree_centrality = nx.degree_centrality(graph)
betweenness_centrality = nx.betweenness_centrality(graph)
closeness_centrality = nx.closeness_centrality(graph)
most_central = max(degree_centrality, key=degree_centrality.get)
print(f"Most connected node: {most_central}")
print(f"Degree centrality: {degree_centrality(most_central):.3f}")
We calculate the degree, between, and the center of closeness to certify the importance of each node inside the graph. Identifying the most connected nodes, we get an understanding of the potential hub that can run the methods of the disease.
print("\n3. Biological Entity Classification")
print("-" * 35)
node_types = Counter()
for node in graph.nodes():
node_types(node.function) += 1
print("Node distribution:")
for func, count in node_types.items():
print(f" {func}: {count}")
We characterize each node by its function, such as protein or biologicalprocess, and get their calculations. This breakdown helps us understand the design of our network at a glance.
print("\n4. Pathway Analysis")
print("-" * 20)
proteins = (node for node in graph.nodes() if node.function == 'Protein')
processes = (node for node in graph.nodes() if node.function == 'BiologicalProcess')
print(f"Proteins in pathway: {len(proteins)}")
print(f"Biological processes: {len(processes)}")
edge_types = Counter()
for u, v, data in graph.edges(data=True):
edge_types(data.get('relation')) += 1
print("\nRelationship types:")
for rel, count in edge_types.items():
print(f" {rel}: {count}")
We separate all proteins and processes to measure the space and complexity of the road. The calculation of different relationships is more clear as to which interactions, such as growth or associations, dominate our model.
print("\n5. Literature Evidence Analysis")
print("-" * 32)
citations = ()
evidences = ()
for _, _, data in graph.edges(data=True):
if 'citation' in data:
citations.append(data('citation'))
if 'evidence' in data:
evidences.append(data('evidence'))
print(f"Total citations: {len(citations)}")
print(f"Unique citations: {len(set(citations))}")
print(f"Evidence statements: {len(evidences)}")
In order to evaluate the grounding of our graph in the published research, we seek testimonial identifiers and evidence from each edge. The summary of the total and unique quantity allows us to evaluate the width of the supporting literature.
print("\n6. Subgraph Analysis")
print("-" * 22)
inflammation_nodes = (inflammation)
inflammation_neighbors = list(graph.predecessors(inflammation)) + list(graph.successors(inflammation))
inflammation_subgraph = graph.subgraph(inflammation_nodes + inflammation_neighbors)
print(f"Inflammation subgraph: {inflammation_subgraph.number_of_nodes()} nodes, {inflammation_subgraph.number_of_edges()} edges")
We separate the inflammatory subgraph by collecting its direct neighbors, giving the inflammatory crossstalk’s concentrated view. This targeted subnetwork highlights how inflammation interfaces with other disease processes.
print("\n7. Advanced Graph Querying")
print("-" * 28)
try:
paths = list(nx.all_simple_paths(graph, app, apoptosis, cutoff=3))
print(f"Paths from APP to apoptosis: {len(paths)}")
if paths:
print(f"Shortest path length: {len(paths(0))-1}")
except nx.NetworkXNoPath:
print("No paths found between APP and apoptosis")
apoptosis_inducers = list(graph.predecessors(apoptosis))
print(f"Factors that increase apoptosis: {len(apoptosis_inducers)}")
We calculate the simplest ways between application and apoptosis to explore mechanistic routes and identify key mediators. The list of all predecessors of apoptosis also shows us which factors can stimulate cell death.
print("\n8. Data Export and Visualization")
print("-" * 35)
adj_matrix = nx.adjacency_matrix(graph)
node_labels = (str(node) for node in graph.nodes())
plt.figure(figsize=(12, 8))
plt.subplot(2, 2, 1)
pos = nx.spring_layout(graph, k=2, iterations=50)
nx.draw(graph, pos, with_labels=False, node_color="lightblue",
node_size=1000, font_size=8, font_weight="bold")
plt.title("BEL Network Graph")
plt.subplot(2, 2, 2)
centralities = list(degree_centrality.values())
plt.hist(centralities, bins=10, alpha=0.7, color="green")
plt.title("Degree Centrality Distribution")
plt.xlabel("Centrality")
plt.ylabel("Frequency")
plt.subplot(2, 2, 3)
functions = list(node_types.keys())
counts = list(node_types.values())
plt.pie(counts, labels=functions, autopct="%1.1f%%", startangle=90)
plt.title("Node Type Distribution")
plt.subplot(2, 2, 4)
relations = list(edge_types.keys())
rel_counts = list(edge_types.values())
plt.bar(relations, rel_counts, color="orange", alpha=0.7)
plt.title("Relationship Types")
plt.xlabel("Relation")
plt.ylabel("Count")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
We prepare nearby matrices and node labels for the use of downstream, and generate multi-panel figures, network structures, central distribution, node-type proportions and edge-type calculations. This visualization brings our bell graph into life, supporting an erhinda biological interpretation.
In this tutorial, we have shown the power and relief of the pale for the modeling of complex biological systems. We showed that any easily Alzheimer’s disease interactions can create a cured white-bra-q graph, analyzing a network-level to identify key hub nodes, and the biologically meaningful subgraphs for concentrated studies. We have also covered the necessary methods for mining and finished data structures for attractive visualization. As ahead step, we encourage you to expand this structure to your ways, integrate additional omix data, run breeding tests, or attach a graph with machine-learning workflow.
Check Codes here. All credit for this research goes to researchers of this project. Also, feel free to follow us Twitter And don’t forget to join us 100 k+ ml subredit And subscribe Our newsletter.
Sana Hassan, a consulting intern at MarktecPost and IIT Madras, is enthusiastic about applying technology and AI to overcome real-world challenges. With more interest in solving practical problems, it brings a new perspective to the intersection of AI and real life solutions.
