Return to article
Figure 1. A long-term learning adoption rule, Q-learning improves and stabilizes cooperation of agents forming various small-world networks in Prisoner’s Dilemma games.

The modified Watts-Strogatz small-world network was built on a 15×15 lattice, where each node was connected to its eight nearest neighbors. The rewiring probabilities of the links placed originally on a regular lattice were 0.01 (left panels) and 0.04 (right panels), respectively. For the description of the canonical repeated Prisoner's Dilemma game, as well as the best-takes-over (top panels) and Q-learning (bottom panels) strategy adoption rules see Methods and the ESM1. The temptation level, T was 3.6. Networks showing the last round of 5,000 plays were visualized using the Kamada-Kawai algorithm of the Pajek program [46]. Dark blue dots and diamonds correspond to cooperators and defectors, respectively. The Figure shows that both the extent and distribution of cooperators vary, when using the best-takes-over strategy adoption rule (see top panels), while they are rather stable with the Q-learning strategy update rule (see bottom panels).