Previous topic

Small system

Next topic

Very large system

Large system

We now consider a larger system (e.g., 40 variables). MLE is intractable. Thus, we can just compare the performance of EM with HF and PLE.

As usual, we import the necessary packages to the Jupyter notebook:

[1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import emachine as EM
[2]:
np.random.seed(0)

Weak interactions and large sample size

[3]:
n_var = 40
g = 0.5
n_seq = 8000

The synthetic data are generated by using generate_seq.

[4]:
w_true,seqs = EM.generate_seq(n_var,n_seq,g=g)

We reconstruct the interactions from the sequences seqs (or ops).

[5]:
# Erasure Machine (EM)
ops = EM.operators(seqs)
eps_list = np.linspace(0.5,0.8,16)
E_eps = np.zeros(len(eps_list))
w_eps = np.zeros((len(eps_list),ops.shape[1]))
for i,eps in enumerate(eps_list):
    w_eps[i,:],E_eps[i] = EM.fit(ops,eps=eps,max_iter=100)
    #print(eps,E_eps[i])
ieps = np.argmax(E_eps)
print('The optimal value of eps:',eps_list[ieps])

w_em = w_eps[ieps]
The optimal value of eps: 0.64
[6]:
# Hopfield approximation (HF)
w_hf = EM.hopfield_method(seqs)

# Pseudo Likelihood Estimation (PLE)
w_ple = EM.PLE_method(seqs)

We compare the performances of these methods.

[7]:
nx,ny = 1,1
fig, ax = plt.subplots(ny,nx,figsize=(nx*3.5,ny*2.8))

ax.plot([-1,1],[-1,1])
ax.plot(w_true,w_hf,'g^',marker='^',mfc='none',markersize=4,label='HF')
ax.plot(w_true,w_ple,'bs',marker='s',mfc='none',markersize=4,label='PLE')
ax.plot(w_true,w_em,'ro',marker='o',markersize=4,label='EM')
ax.set_xlim([-0.6,0.6])
ax.set_ylim([-0.6,0.6])
ax.set_xticks([-0.6,0,0.6])
ax.set_yticks([-0.6,0,0.6])
ax.set_xlabel('Actual interactions')
ax.set_ylabel('Inferred interactions')

ax.legend()
[7]:
<matplotlib.legend.Legend at 0x7fd16f694a58>
../_images/codesource_largesystem_12_1.png
[8]:
MSE_em = ((w_true - w_em)**2).mean()
MSE_hf = ((w_true - w_hf)**2).mean()
MSE_ple = ((w_true - w_ple)**2).mean()

df = pd.DataFrame([['EM',MSE_em],['HF',MSE_hf],['PLE',MSE_ple]],
                  columns = ['Method','Mean squared error'])
df
[8]:
Method Mean squared error
0 EM 0.000515
1 HF 0.002071
2 PLE 0.000494

Weak interactions, and small sample size

[9]:
n_var = 40
g = 0.5
n_seq = 2000
[10]:
w_true,seqs = EM.generate_seq(n_var,n_seq,g=g)
[11]:
ops = EM.operators(seqs)
eps_list = np.linspace(0.5,0.8,16)
E_eps = np.zeros(len(eps_list))
w_eps = np.zeros((len(eps_list),ops.shape[1]))
for i,eps in enumerate(eps_list):
    w_eps[i,:],E_eps[i] = EM.fit(ops,eps=eps,max_iter=100)
    #print(eps,E_eps[i])
ieps = np.argmax(E_eps)
print('The optimal value of eps:',eps_list[ieps])

w_em = w_eps[ieps]

w_hf = EM.hopfield_method(seqs)
w_ple = EM.PLE_method(seqs)
The optimal value of eps: 0.72
[12]:
nx,ny = 1,1
fig, ax = plt.subplots(ny,nx,figsize=(nx*3.5,ny*2.8))

ax.plot([-1,1],[-1,1])
ax.plot(w_true,w_hf,'g^',marker='^',mfc='none',markersize=4,label='HF')
ax.plot(w_true,w_ple,'bs',marker='s',mfc='none',markersize=4,label='PLE')
ax.plot(w_true,w_em,'ro',marker='o',markersize=4,label='EM')
ax.set_xlim([-0.6,0.6])
ax.set_ylim([-0.6,0.6])
ax.set_xticks([-0.6,0,0.6])
ax.set_yticks([-0.6,0,0.6])
ax.set_xlabel('Actual interactions')
ax.set_ylabel('Inferred interactions')

ax.legend()
[12]:
<matplotlib.legend.Legend at 0x7fd170168b70>
../_images/codesource_largesystem_18_1.png
[13]:
MSE_em = ((w_true - w_em)**2).mean()
MSE_hf = ((w_true - w_hf)**2).mean()
MSE_ple = ((w_true - w_ple)**2).mean()

df = pd.DataFrame([['EM',MSE_em],['HF',MSE_hf],['PLE',MSE_ple]],
                  columns = ['Method','Mean squared error'])
df
[13]:
Method Mean squared error
0 EM 0.001388
1 HF 0.003787
2 PLE 0.002020

Strong interactions and large sample size

[14]:
n_var = 40
g = 1.0
n_seq = 8000
[15]:
w_true,seqs = EM.generate_seq(n_var,n_seq,g=g)
[16]:
ops = EM.operators(seqs)
eps_list = np.linspace(0.5,0.8,16)
E_eps = np.zeros(len(eps_list))
w_eps = np.zeros((len(eps_list),ops.shape[1]))
for i,eps in enumerate(eps_list):
    w_eps[i,:],E_eps[i] = EM.fit(ops,eps=eps,max_iter=100)
    #print(eps,E_eps[i])
ieps = np.argmax(E_eps)
print('The optimal value of eps:',eps_list[ieps])

w_em = w_eps[ieps]

w_hf = EM.hopfield_method(seqs)
w_ple = EM.PLE_method(seqs)
The optimal value of eps: 0.64
[17]:
nx,ny = 1,1
fig, ax = plt.subplots(ny,nx,figsize=(nx*3.5,ny*2.8))

ax.plot([-1,1],[-1,1])
ax.plot(w_true,w_hf,'g^',marker='^',mfc='none',markersize=4,label='HF')
ax.plot(w_true,w_ple,'bs',marker='s',mfc='none',markersize=4,label='PLE')
ax.plot(w_true,w_em,'ro',marker='o',markersize=4,label='EM')
ax.set_xlim([-0.6,0.6])
ax.set_ylim([-0.6,0.6])
ax.set_xticks([-0.6,0,0.6])
ax.set_yticks([-0.6,0,0.6])
ax.set_xlabel('Actual interactions')
ax.set_ylabel('Inferred interactions')

ax.legend()
[17]:
<matplotlib.legend.Legend at 0x7fd16fbbef60>
../_images/codesource_largesystem_24_1.png
[18]:
MSE_em = ((w_true - w_em)**2).mean()
MSE_hf = ((w_true - w_hf)**2).mean()
MSE_ple = ((w_true - w_ple)**2).mean()

df = pd.DataFrame([['EM',MSE_em],['HF',MSE_hf],['PLE',MSE_ple]],
                  columns = ['Method','Mean squared error'])
df
[18]:
Method Mean squared error
0 EM 0.005574
1 HF 0.077182
2 PLE 0.029280

Strong interactions and small sample size

[19]:
n_var = 40
g = 1.0
n_seq = 2000
[20]:
w_true,seqs = EM.generate_seq(n_var,n_seq,g=g)
[21]:
ops = EM.operators(seqs)
eps_list = np.linspace(0.5,0.8,16)
E_eps = np.zeros(len(eps_list))
w_eps = np.zeros((len(eps_list),ops.shape[1]))
for i,eps in enumerate(eps_list):
    w_eps[i,:],E_eps[i] = EM.fit(ops,eps=eps,max_iter=100)
    #print(eps,E_eps[i])
ieps = np.argmax(E_eps)
print('The optimal value of eps:',eps_list[ieps])
w_em = w_eps[ieps]

print('Hopfield')
w_hf = EM.hopfield_method(seqs)

print('PLE')
w_ple = EM.PLE_method(seqs)
The optimal value of eps: 0.68
Hopfield
PLE
[22]:
nx,ny = 1,1
fig, ax = plt.subplots(ny,nx,figsize=(nx*3.5,ny*2.8))

ax.plot([-1,1],[-1,1])
ax.plot(w_true,w_hf,'g^',marker='^',mfc='none',markersize=4,label='HF')
ax.plot(w_true,w_ple,'bs',marker='s',mfc='none',markersize=4,label='PLE')
ax.plot(w_true,w_em,'ro',marker='o',markersize=4,label='EM')
ax.set_xlim([-0.6,0.6])
ax.set_ylim([-0.6,0.6])
ax.set_xticks([-0.6,0,0.6])
ax.set_yticks([-0.6,0,0.6])
ax.set_xlabel('Actual interactions')
ax.set_ylabel('Inferred interactions')

ax.legend()
[22]:
<matplotlib.legend.Legend at 0x7fd16fd23198>
../_images/codesource_largesystem_30_1.png
[23]:
MSE_em = ((w_true - w_em)**2).mean()
MSE_hf = ((w_true - w_hf)**2).mean()
MSE_ple = ((w_true - w_ple)**2).mean()

df = pd.DataFrame([['EM',MSE_em],['HF',MSE_hf],['PLE',MSE_ple]],
                  columns = ['Method','Mean squared error'])
df
[23]:
Method Mean squared error
0 EM 0.009827
1 HF 0.079888
2 PLE 0.061577
[ ]: