This notebook is an introduction to the soil agent-based social network simulation framework. In particular, we will focus on a specific use case: studying the propagation of news in a social network.
The steps we will follow are:
But before that, let's import the soil module and networkx.
import soil
import networkx as nx
%load_ext autoreload
%autoreload 2
%pylab inline
# To display plots in the notebooed_
There are three main elements in a soil simulation:
Soil is based on simpy
, which is an event-based network simulation library.
Soil provides several abstractions over events to make developing agents easier.
This means you can use events (timeouts, delays) in soil, but for the most part we will assume your models will be step-based.
Our first step will be to model how every person in the social network reacts when it comes to news. We will follow a very simple model (a finite state machine).
There are two types of people, those who have heard about a newsworthy event (infected) or those who have not (neutral). A neutral person may heard about the news either on the TV (with probability prob_tv_spread) or through their friends. Once a person has heard the news, they will spread it to their friends (with a probability prob_neighbor_spread). Some users do not have a TV, so they only rely on their friends.
The spreading probabilities will change over time due to different factors. We will represent this variance using an environment agent.
A basic network agent in Soil should inherit from soil.agents.BaseAgent
, and define its behaviour in every step of the simulation by implementing a run(self)
method.
The most important attributes of the agent are:
agent.state
, a dictionary with the state of the agent. agent.state['id']
reflects the state id of the agent. That state id can be used to look for other networks in that specific state. The state can be access via the agent as well. For instance:
a = soil.agents.BaseAgent(env=env)
a['hours_of_sleep'] = 10
print(a['hours_of_sleep'])
The state of the agent is stored in every step of the simulation:
print(a['hours_of_sleep', 10]) # hours of sleep before step #10
print(a[None, 0]) # whole state of the agent before step #0
agent.env
, a reference to the environment. Most commonly used to get access to the environment parameters and the topology:
a.env.G.nodes() # Get all nodes ids in the topology
a.env['minimum_hours_of_sleep']
Since our model is a finite state machine, we will be basing it on soil.agents.FSM
.
With soil.agents.FSM
, we do not need to specify a step
method.
Instead, we describe every step as a function.
To change to another state, a function may return the new state.
If no state is returned, the state remains unchanged.[
It will consist of two states, neutral
(default) and infected
.
Here's the code:
import random
class NewsSpread(soil.agents.FSM):
@soil.agents.default_state
@soil.agents.state
def neutral(self):
r = random.random()
if self['has_tv'] and r < self.env['prob_tv_spread']:
return self.infected
return
@soil.agents.state
def infected(self):
prob_infect = self.env['prob_neighbor_spread']
for neighbor in self.get_neighboring_agents(state_id=self.neutral.id):
r = random.random()
if r < prob_infect:
neighbor.state['id'] = self.infected.id
return
Environment agents allow us to control the state of the environment. In this case, we will use an environment agent to simulate a very viral event.
When the event happens, the agent will modify the probability of spreading the rumor.
NEIGHBOR_FACTOR = 0.9
TV_FACTOR = 0.5
class NewsEnvironmentAgent(soil.agents.BaseAgent):
def step(self):
if self.now == self['event_time']:
self.env['prob_tv_spread'] = 1
self.env['prob_neighbor_spread'] = 1
elif self.now > self['event_time']:
self.env['prob_tv_spread'] = self.env['prob_tv_spread'] * TV_FACTOR
self.env['prob_neighbor_spread'] = self.env['prob_neighbor_spread'] * NEIGHBOR_FACTOR
Feel free to skip this section if this is your first time with soil.
Testing agents is not easy, and this is not a thorough testing process for agents. Rather, this section is aimed to show you how to access internal pats of soil so you can test your agents.
First of all, let's check if our network agent has the states we would expect:
NewsSpread.states
Now, let's run a simulation on a simple network. It is comprised of three nodes:
G = nx.Graph()
G.add_edge(0, 1)
G.add_edge(0, 2)
G.add_edge(2, 3)
G.add_node(4)
pos = nx.spring_layout(G)
nx.draw_networkx(G, pos, node_color='red')
nx.draw_networkx(G, pos, nodelist=[0], node_color='blue')
Let's run a simple simulation that assigns a NewsSpread agent to all the nodes in that network. Notice how node 0 is the only one with a TV.
env_params = {'prob_tv_spread': 0,
'prob_neighbor_spread': 0}
MAX_TIME = 100
EVENT_TIME = 10
sim = soil.simulation.SoilSimulation(topology=G,
num_trials=1,
max_time=MAX_TIME,
environment_agents=[{'agent_type': NewsEnvironmentAgent,
'state': {
'event_time': EVENT_TIME
}}],
network_agents=[{'agent_type': NewsSpread,
'weight': 1}],
states={0: {'has_tv': True}},
default_state={'has_tv': False},
environment_params=env_params)
env = sim.run_simulation()[0]
Now we can access the results of the simulation and compare them to our expected results
agents = list(env.network_agents)
# Until the event, all agents are neutral
for t in range(10):
for a in agents:
assert a['id', t] == a.neutral.id
# After the event, the node with a TV is infected, the rest are not
assert agents[0]['id', 11] == NewsSpread.infected.id
for a in agents[1:4]:
assert a['id', 11] == NewsSpread.neutral.id
# At the end, the agents connected to the infected one will probably be infected, too.
assert agents[1]['id', MAX_TIME] == NewsSpread.infected.id
assert agents[2]['id', MAX_TIME] == NewsSpread.infected.id
# But the node with no friends should not be affected
assert agents[4]['id', MAX_TIME] == NewsSpread.neutral.id
Lastly, let's see if the probabilities have decreased as expected:
assert abs(env.environment_params['prob_neighbor_spread'] - (NEIGHBOR_FACTOR**(MAX_TIME-1-10))) < 10e-4
assert abs(env.environment_params['prob_tv_spread'] - (TV_FACTOR**(MAX_TIME-1-10))) < 10e-6
To run a simulation, we need a configuration. Soil can load configurations from python dictionaries as well as JSON and YAML files. For this demo, we will use a python dictionary:
config = {
'name': 'ExampleSimulation',
'max_time': 20,
'interval': 1,
'num_trials': 1,
'network_params': {
'generator': 'complete_graph',
'n': 500,
},
'network_agents': [
{
'agent_type': NewsSpread,
'weight': 1,
'state': {
'has_tv': False
}
},
{
'agent_type': NewsSpread,
'weight': 2,
'state': {
'has_tv': True
}
}
],
'environment_agents':[
{'agent_type': NewsEnvironmentAgent,
'state': {
'event_time': 10
}
}
],
'states': [ {'has_tv': True} ],
'environment_params':{
'prob_tv_spread': 0.01,
'prob_neighbor_spread': 0.5
}
}
Let's run our simulation:
soil.simulation.run_from_config(config, dump=False)
In real life, you probably want to run several simulations, varying some of the parameters so that you can compare and answer your research questions.
For instance:
network_1 = {
'generator': 'erdos_renyi_graph',
'n': 500,
'p': 0.1
}
network_2 = {
'generator': 'barabasi_albert_graph',
'n': 500,
'm': 2
}
for net in [network_1, network_2]:
for i in range(5):
prob = i / 10
config['environment_params']['prob_neighbor_spread'] = prob
config['network_params'] = net
config['name'] = 'Spread_{}_prob_{}'.format(net['generator'], prob)
s = soil.simulation.run_from_config(config)
The results are conveniently stored in pickle (simulation), csv and sqlite (history of agent and environment state) and gexf (dynamic network) format.
!tree soil_output
!du -xh soil_output/*
Once the simulations are over, we can use soil to analyse the results.
Soil allows you to load results for specific trials, or for a set of trials if you specify a pattern. The specific methods are:
analysis.read_data(<directory pattern>)
to load all the results from a directory. e.g. read_data('my_simulation/')
. For each trial it finds in each folder matching the pattern, it will return the dumped configuration for the simulation, the results of the trial, and the configuration itself. By default, it will try to load data from the sqlite database. analysis.read_csv(<csv_file>)
to load all the results from a CSV file. e.g. read_csv('my_simulation/my_simulation_trial0.environment.csv')
analysis.read_sql(<sqlite_file>)
to load all the results from a sqlite database . e.g. read_sql('my_simulation/my_simulation_trial0.db.sqlite')
Let's see it in action by loading the stored results into a pandas dataframe:
from soil.analysis import *
df = read_csv('soil_output/Spread_barabasi_albert_graph_prob_0.0/Spread_barabasi_albert_graph_prob_0.0_trial_0.environment.csv', keys=['id'])
df
Soil can also process the data for us and return a dataframe with as many columns as there are attributes in the environment and the agent states:
env, agents = process(df)
agents
The index of the results are the simulation step and the agent_id. Hence, we can access the state of the simulation at a given step:
agents.loc[0]
Or, we can perform more complex tasks such as showing the agents that have changed their state between two simulation steps:
changed = agents.loc[1]['id'] != agents.loc[0]['id']
agents.loc[0][changed]
To focus on specific agents, we can swap the levels of the index:
agents1 = agents.swaplevel()
agents1.loc['0'].dropna(axis=1)
If you don't want to work with pandas, you can also use some pre-defined functions from soil to conveniently plot the results:
plot_all('soil_output/Spread_barabasi_albert_graph_prob_0.0/', get_count, 'id');
plot_all('soil_output/Spread_barabasi*', get_count, 'id');
plot_all('soil_output/Spread_erdos*', get_value, 'prob_tv_spread');
Although the simplest way to visualize the results of a simulation is to use the built-in methods in the analysis module, sometimes the setup is more complicated and we need to explore the data a little further.
For that, we can use native pandas over the results.
Soil provides some convenience methods to simplify common operations:
analysis.split_df
to separate a history dataframe into environment and agent parameters.analysis.get_count
to get a dataframe with the value counts for different attributes during the simulation.analysis.get_value
to get the evolution of the value of an attribute during the simulation.And, as we saw earlier, analysis.process
can turn a dataframe in canonical form into a dataframe with a column per attribute.
p = read_sql('soil_output/Spread_barabasi_albert_graph_prob_0.0/Spread_barabasi_albert_graph_prob_0.0_trial_0.db.sqlite')
env, agents = split_df(p);
Let's look at the evolution of agent parameters in the simulation
res = agents.groupby(by=['t_step', 'key', 'value']).size().unstack(level=[1,2]).fillna(0)
res.plot();
As we can see, event_time
is cluttering our results,
del res['event_time']
res.plot()
processed = process_one(agents);
processed
Which is equivalent to:
get_count(agents, 'id', 'has_tv').plot()
get_value(agents, 'event_time').plot()
from soil import analysis
!du -xsh ../rabbits/soil_output/rabbits_example/
If we tried to load the entire history, we would probably run out of memory. Hence, it is recommended that you also specify the attributes you are interested in.
p = analysis.plot_all('../rabbits/soil_output/rabbits_example/', analysis.get_count, 'id')
df = analysis.read_sql('../rabbits/soil_output/rabbits_example/rabbits_example_trial_0.db.sqlite', keys=['id', 'rabbits_alive'])
states = analysis.get_count(df, 'id')
states.plot()
alive = analysis.get_value(df, 'rabbits_alive', 'rabbits_alive', aggfunc='sum').apply(pd.to_numeric)
alive.plot()
h = alive.join(states);
h.plot();
states[[('id','newborn'),('id','fertile'),('id', 'pregnant')]].sum(axis=1).sub(alive['rabbits_alive'], fill_value=0)