 1. Envisioning City Spaces, Aligning DNA Sequences, Classifying Emotional Speech and More: Wolfram Community Highlights×ò., 10 îêò.[−]
In this roundup of our recent Wolfram Community favorites, our talented users explore different methods of accessing, interpreting and representing data—creating some eyecatching results that offer new ways of looking at the world. We’re also excited to showcase a few projects from alumni of our annual Wolfram High School Summer Camp and Wolfram Summer School. Check out the culmination of their hard work, as well as how Community members find clever solutions using the Wolfram Language.
William Goodall (Wolfram Summer Camp 2019)
How can you decipher the scale of an area, or the size of an object within that area, if the area is shown using a satellite image? Though this is typically quite a challenging issue, as Earth’s terrain can look the same at different zoom levels, William Goodall makes it look easy by using the Wolfram Language. For his Summer Camp project, William produced an algorithm that uses feature extraction and a neural network to accurately predict map zoom.
Kotaro Okazaki
The emblems representing the 2020 Summer Olympics and Paralympics combine checkered patterns used in Japanese culture (named ichimatsu moyo during the Edo period), indigo blue (a traditional Japanese color) and varieties of rectangular shapes (representing the diversity promoted by and found within the Olympic and Paralympic Games). Kotaro Okazaki, an inventor at Fujitsu Limited in Japan, explores the mathematics used to create the emblems.
Diego Zviovich
After wondering which US cities have reported the highest crime rates, Diego Zviovich set out to uncover the answer—and found the Wolfram Language crucial in interpreting and analyzing the crime statistics data compiled by the FBI. Diego homed in on specific fields of interest and assembled them into a Dataset, finally creating a BubbleChart that gives an ataglance visualization of national crime rates.
Ahmed Elbanna (Wolfram Summer School 2019)
Predicting urban development is essential for citizens, governments and companies. Ahmed Elbanna, a researcher and lecturer at Tanta University, Egypt, explores several different ways to collect satellite images, including manual collection and using the API from NASA’s website, to predict how a city can grow and change. Then to build the model, Ahmed uses the machine learning superfunction Predict to generate a new predicted binary image and add it to the list of binary images of the area in study.
Richard Lapin (Wolfram Summer School 2019)
Richard Lapin, a recent BSc economics graduate from University College London, created the Cartogram function in order to generate cartograms (geometrically distorted maps where the distortion conveys information derived from input data). With Cartogram, Richard made great use of the Wolfram Function Repository—a public resource of usercontributed Wolfram Language functions.
Seth Chandler
Since we’ve announced the Free Wolfram Engine for Developers, more users than ever before have access to the power of Wolfram’s computational intelligence. One popular way to take advantage of the Wolfram Engine is to call it from a system like Jupyter. Seth Chandler, Wolfram Innovator Award winner and a Foundation Professor of Law at the University of Houston, shows us how by walking us through a comprehensive tutorial on installing a Wolfram Engine kernel for Jupyter.
Dev Chheda (Wolfram Summer Camp 2019)
Dev Chheda created a system capable of detecting three moods in human speech: angry, happy and sad. Dev determined the features most useful for detecting mood, extracting the amplitudes, fundamental frequencies and format frequencies of each training clip. He then trained the system on labeled samples of emotional speech using Classify, allowing him to identify and label new audio recordings with the mood of the speaker.
George Varnavides
Chord diagrams are an elegant way to represent interrelationships between variables. After searching for builtin capabilities or other userprovided answers and coming up short, George Varnavides built a chord diagram from scratch using the visualization capabilities of the Wolfram Language. In addition to sharing his process on Wolfram Community, George also made sure other users would be able to easily access his code by submitting ChordDiagram to the Wolfram Function Repository. Thanks, George!
Jessica Shi (Wolfram Summer Camp 2019)
In bioinformatics, DNA sequence alignment is an algorithmbased method of arranging sequences to find similarities between them. Jessica Shi, another Summer Camp student, used pairwise alignment (and a bit of inspiration from a Wolfram Demonstration published in 2011) to create a visual output representing sequence alignment.
If you haven’t yet signed up to be a member of Wolfram Community, please do so! You can join in on similar discussions, post your own work in groups that cover your interests and browse the complete list of Staff Picks. If you’re looking for more interesting work produced by our Summer Program alumni, visit our Wolfram Community groups for Summer Camp and Summer School projects.
 ↑ 
2. Powering Engineering Education with Wolfram Virtual LabsÂò., 08 îêò.[−] How can you make teaching come alive and be more engaging? For many educators, the answer turns out to be not so much a single solution, but rather a set of tools that can vary according to subject and even by student. So today, I want to add something new to the pedagogical toolkit: Wolfram Virtual Labs.
Wolfram Virtual Labs are open educational resources in the form of interactive courseware that are used to explain different concepts in the classroom. Our ambition is to provide an easy way to study difficult concepts and promote student curiosity.
For this post, I spoke with Dr. Matteo Fasano about his experience with using Virtual Labs as a course complement in the masters’ courses in which he acts as a teaching assistant. He also told me why and how he supported the Wolfram MathCore group to develop the CollegeThermal Virtual Labs (now available) and how they can help teachers or instructors make learning more engaging.
Tell us a bit about yourself. What are these Virtual Labs?
I am a postdoctoral researcher at Politecnico di Torino. I am a teaching assistant for five masters’ courses on energy and thermal engineering. I was the recipient of the Young Researcher of the Year award from energy company Eni in 2017 for my research work “ Heat and Mass Transfer of Water at Nanoscale SolidLiquid Interfaces.”
It was in May last year when I—along with my colleagues—attended a Wolfram SystemModeler demo. During this demo, we got to know about an internal project at Wolfram called Virtual Labs. The idea was simple: these were a set of interactive computer exercises meant as a complement to teaching materials in which you explore a specific topic using system models, either by creating new models to describe the topic or by interacting with prebuilt models. It was planned to be distributed as an open educational resource to teachers, students and anyone else wishing to learn about a subject.
I was intrigued by this concept and started correspondence with your team. I sent some exercises from my course material to check if it was possible to prepare models for these case studies, and if it could be modeled as a Virtual Lab with interactive instructions for students. On first review it looked doable; however, because of the length of the content, we proposed for it to be split into two Virtual Labs instead, and these are now available to everyone as the CollegeThermal library.
What prior knowledge is required to use these materials?
We tried to make sure that the content can be followed by anyone with basic thermodynamic knowledge.
Can you walk us through one of the labs?
Ever wondered what the thermal power of the radiator in your house should be to guarantee thermal comfort, or how the ambient conditions affect your room temperature? To answer these questions, it is important to understand the thermal behavior of different components in your house. The Room Heating lab has the ambition to model the most significant components of a house and combine them to see how your room temperature changes with ambient temperature fluctuations.
We first begin by observing the thermal behavior of a multilayer wall crossed by a heat flux normal to its surface. The wall consists of three layers in series: insulation, brick and insulation, respectively. As a first approximation, the thermal response of this configuration can be studied by a onedimensional, lumpedparameters model, in which only heat conduction through the wall is considered (for the moment). In the following figure, we show how the thermal conductivity of the different materials affects the temperature distribution inside the wall, at fixedlayers thickness:
There is a first interesting behavior that can be noticed here: if we consider typical values for the thermal insulating and brick layers, namely 0.07 W/m K and 0.7 W/m K, respectively, we observe a large temperature drop across the insulation layers, whereas the temperature drop across the brick layer is only minimal, even if the thickness of the brick layer is three times that of the insulation layers. In fact, the law of heat conduction (also known as Fourier’s law) states that—at fixedheat flux—temperature gradients are linearly proportional to thermal resistance, which can be here determined as the ratio between the layer thickness and thermal conductivity.
We now add another part to our model: the convective heat transfer from the ambient room to the internal wall, and from the external wall to the environment. The thermal resistances due to convection are added in series with respect to the conductive ones through the wall, thus causing further temperature drops at both sides of the wall. In this case, the typical boundary layer of natural convection can be seen by the nonlinear temperature profiles of air in the proximity of the wall surface:
In addition to opaque walls, buildings have transparent walls (windows). The thermal model of windows here considers the solar radiation through the glass, the air leakages through the window and the thermal conduction across the glass and frame of the window. We can analyze the model response by observing the heatflow values for the different contributions. In this case, the indoor and outdoor temperatures are set at 20 °C and 10 °C, respectively. Results in the following figure show that the heat flux is positive when it flows from the external ambient to the internal one, as in the case of solar radiation, whereas heat fluxes are negative (i.e. thermal losses) when they flow in the opposite direction, as in the cases of leakages and conduction through the window:
The thermal model of both opaque and transparent walls can be combined to represent a wall with a window. This assembled model allows you to compare the thermal flux flowing in/out to/from the room at different ambient and room temperatures. Clearly, a nonzero thermal flux balance would lead to the dynamical change of room temperature. If the net thermal flux has negative sign, the room temperature will tend to decrease with time, therefore reducing the thermal comfort:
To observe the actual dynamic behavior of temperature with time, we then refine the thermal model of the room by introducing: (1) other relevant components of the room that significantly affect its temperature, namely the roof, floor (opaque walls) and people (inner heat source); and (2) the thermal capacitance of air in the room. All the components are then combined to create a more comprehensive room model. This model is tested for a given outside temperature:
In this case, we observe from the following figure that the net heat outflow is greater than the inflow one; as a result, the temperature in the room decreases and eventually stabilizes at 10 °C, when inlet and outlet thermal fluxes equalize:
Such equilibrium temperature causes thermal discomfort in the room, which should be avoided by introducing a heating system in the building.
In the Virtual Lab, we have modeled the heating system using radiators as heat exchangers. Specifically, both electric and water radiators as shown in the following diagrams have been considered, since the multidomain approach of SystemModeler allows us to combine different domains such as electric, thermal and control in one model. As a first implementation, the radiator operates at a fixed nominal thermal power and is controlled by an on/off strategy:
We will now use WolframAlpha to get temperature data for one example day during the winter season of Rome as a case study, and use it to define the average external temperature required by our complete room + radiator model:
✕
tempsWinter =
WeatherData["Rome", "Temperature", {{2016, 01, 01}}]["Path"];
tempsInSecondsWinter = {tempsWinter[[All, 1]]  tempsWinter[[1, 1]],
QuantityMagnitude[tempsWinter[[All, 2]],
"DegreesCelsius"]}\[Transpose];

The following plot shows the temperature variation for a span of about nine hours. This data can then be fed to our model:
✕
ListPlot[tempsInSecondsWinter,
AxesLabel > {"Time[s]", "Temperature[\[Degree]C]"}]

Thanks to an intuitive graphical user interface, the user can now perform fast sensitivity analyses to assess the effect of the model parameters on the room temperature. For example, in this figure, the room reference temperature, the heat capacity of the room and the solar irradiance can be changed to explore their effects on the on/off cycles of the radiator (and thus the heating system):
The overall energy consumption by the heating system is also estimated during the day. Here the students can appreciate the energy (and cost) saving potential of decreasing the room reference temperature by just 1 °C during the heating season, as well as the importance of an adequate sizing of the heating system to guarantee stable comfort conditions throughout the whole day.
In this lab, you saw how we started with trivial models and ended up observing a nontrivial example. The students have access to all the system models used in the labs, where they can learn how these models were created and try to create new models or to refine the existing ones.
What are the problems you are trying to address using these labs?
Existing ways of teaching using only presentations and blackboards have no doubt worked for a long time. I believe that this teaching approach is still required to get a full preliminary understanding of the physical heat and masstransfer mechanisms and the thermodynamics underlying thermal systems. Nowadays, this approach can be assisted by the realtime simulation of thermal systems to get a fast, handson experience of their actual operating conditions. Virtual Labs can support this approach without the need for prior knowledge of lowlevel coding, which is a plus for students without a solid programming background.
“It took me less than a day to create a model, and I felt that I could now create Virtual Labs by myself.”
How was your experience using Wolfram tech?
The notebook interface and wide visualization tools of Mathematica provide a convenient way to create dynamic contents. Modeling using SystemModeler is also easy with its wide range of builtin components and equationbased models. It took me less than a day to create a model, and I felt that I could now create Virtual Labs by myself. I am curious to also test the power of the Wolfram Language for my research: in the near future, I would like to explore the machine learning capabilities of Mathematica to predict heat and masstransfer processes—for instance, for the injection molding of nanocomposites. Not only me, but I can also see my students making use of these tools to improve their understanding of the physical concepts learned during the lectures.
Do you have anything to say to the teachers?
A small tip to teachers: the Wolfram MathCore group is easy to talk to and open to providing help. If you have any ideas or questions on how you could improve your teaching capacities, do contact them! If you want your students to experience innovative learning processes, take an extra step to transform your courses with the help of flexible, intuitive and highlevel programming.
Have a good idea for an educational area where Virtual Labs could be helpful? Let us know by sending us an email at virtuallabs@wolfram.com!
 ↑ 
3. Innovating in Education, Analytics and Engineering: Thirty Years Using Wolfram TechnologyÏò., 04 îêò.[−] Robert PrinceWright has been using Mathematica since its debut in 1988 to develop computational tools in education, business consulting and offshore engineering. We recently talked to PrinceWright about his work developing simulation models for deepwater drilling equipment at safety and systems engineering company Berkeley & Imperial.
His latest work is cutting edge—but it’s only part of the story. Throughout his career, PrinceWright has used Wolfram technologies for “modeling systems as varied as downhole wellbore trajectory, radionuclide dispersion and PID control of automation systems.” Read on to learn more about PrinceWright’s accomplishments and discover why Wolfram technology is his goto for developing unique computational solutions.
Simplifying Complex Ideas
When Mathematica Version 1.0 was released, PrinceWright was in a PhD program in probabilistic modeling at the University of Glasgow. At the time, Mathematica was sold as a specialized tool for symbolic algebra. Its unique treatment of differential equations and linear algebra made it indispensable to PrinceWright in his research on reliability analysis.
Over the next few years, while teaching engineering mathematics, he found the symbolic approach of the Wolfram Language helpful in demonstrating math concepts intuitively. In particular, he used it to show students how the use of vectors and complex numbers made location and rotation algebra much simpler than trigonometry. He also notes that even this early interface produced higherquality visualizations faster than any other software at the time.
Publishing Interactive Models
Eventually, PrinceWright moved to the private sector—analyzing engineering systems, presenting findings and making policy proposals for a range of clients. Though he tried other software, he always gravitated toward the Wolfram Language for its highly automated interface construction (after all, he was a civil engineer, not a programmer). In the early 2000s, he began taking advantage of the publishing capabilities of Wolfram Notebooks. He could readily demonstrate custom models to clients anywhere, either by sending them fullfeatured standalone notebooks or by generating interactive notebookbased interfaces that connected to predeployed models on the back end.
Ongoing improvements in Mathematica’s modeling and analysis functionality also allowed PrinceWright to tackle new challenges. In 2005, he helped develop a new wellcontrol algorithm for a major oil company that combined concepts from several engineering domains. He was pleased by how easy it was to repurpose the language to create this sophisticated multiphysics solution: “What was remarkable was thinking back 14 years and using the skills I developed using Mathematica as a teacher to solve realworld problems.”
Unified Analysis and Simulation
In 2012, while working in safety and systems engineering at Berkeley & Imperial, PrinceWright discovered Wolfram SystemModeler and its Wolfram Language integration features. He began leveraging this new Modelica interface to test and improve models he’d built in the Wolfram Language. Combining the draganddrop simulation workflow of SystemModeler with the realtime analysis of the Wolfram Language allowed him to achieve unprecedented speed and accuracy in his models.
Around that time, PrinceWright heard about the Wolfram Connected Devices Project. In addition to improved integration with lowlevel devices, he soon realized the project gave him a framework for comparing system models to the actual systems they represent—in his words, “integrating the language with the real world around us.” This workflow turned out to be ideal for simulating and testing the kinds of embedded systems used in deepwater drilling. Since then, he and his team have continued to push this concept further, exploring the use of digital twins and hardwareintheloop simulations to continue improving his models.
An EverEvolving System
In many ways, the modern Wolfram technology stack has advanced in parallel with PrinceWright’s career path, growing from a symbolic math tool to a crossplatform computational system to the fullfeatured modern engineering and analytics suite available today. Like his work, Wolfram technology can be applied across a diverse range of engineering fields. And the Wolfram Language maintains consistent syntax and structure throughout, making it easy to try different techniques—and complementing PrinceWright’s approach of constantly pushing the boundaries. These qualities combine to provide lasting value for innovators like PrinceWright: “The evolution of the Wolfram System has been very carefully designed to ensure that the core language was honored through the process. What that means now is that using the Wolfram Language gives you an opportunity to solve even bigger and more complex problems.”
Find out more about Robert PrinceWright, as well as other users’ exciting Wolfram Language applications, on our Customer Stories pages.
Build and test your own engineering models with Wolfram SystemModeler, the easytouse, nextgeneration modeling and simulation environment.

Try now

 ↑ 
4. Announcing the Rule 30 PrizesÂò., 01 îêò.[−]
The Story of Rule 30
How can something that simple produce something that complex? It’s been nearly 40 years since I first saw rule 30 but it still amazes me. Long ago it became my personal alltime favorite science discovery, and over the years it’s changed my whole worldview and led me to all sorts of science, technology, philosophy and more.
But even after all these years, there are still many basic things we don t know about rule 30. And I ve decided that it s now time to do what I can to stimulate the process of finding more of them out. So as of today, I am offering $30,000 in prizes for the answers to three basic questions about rule 30.
The setup for rule 30 is extremely simple. One s dealing with a sequence of lines of black and white cells. And given a particular line of black and white cells, the colors of the cells on the line below are determined by looking at each cell and its immediate neighbors and then applying the following simple rule:
#10005
RulePlot[CellularAutomaton[30]]
If you start with a single black cell, what will happen? One might assume as I at first did that the rule is simple enough that the pattern it produces must somehow be correspondingly simple. But if you actually do the experiment, here’s what you find happens over the first 50 steps:
#10005
RulePlot[CellularAutomaton[30], {{1}, 0}, 50, Mesh > All,
ImageSize > Full]
But surely, one might think, this must eventually resolve into something much simpler. Yet here’s what happens over the first 300 steps:
And, yes, there’s some regularity over on the left. But many aspects of this pattern look for all practical purposes random. It’s amazing that a rule so simple can produce behavior that’s so complex. But I’ve discovered that in the computational universe of possible programs this kind of thing is common, even ubiquitous. And I’ve built a whole new kind of science with all sorts of principles based on this.
And gradually there s been more and more evidence for these principles. But what specifically can rule 30 tell us? What concretely can we say about how it behaves? Even the most obvious questions turn out to be difficult. And after decades without answers, I ve decided it s time to define some specific questions about rule 30, and offer substantial prizes for their solutions.
I did something similar in 2007, putting a prize on a core question about a particular Turing machine. And at least in that case the outcome was excellent. In just a few months, the prize was won establishing forever what the simplest possible universal Turing machine is, as well as providing strong further evidence for my general Principle of Computational Equivalence.
The Rule 30 Prize Problems again get at a core issue: just how complex really is the behavior of rule 30? Each of the problems asks this in a different, concrete way. Like rule 30 itself, they re all deceptively simple to state. Yet to solve any of them will be a major achievement that will help illuminate fundamental principles about the computational universe that go far beyond the specifics of rule 30.
I ve wondered about every one of the problems for more than 35 years. And all that time I ve been waiting for the right idea, or the right kind of mathematical or computational thinking, to finally be able to crack even one of them. But now I want to open this process up to the world. And I m keen to see just what can be achieved, and what methods it will take.
The Rule 30 Prize Problems
For the Rule 30 Prize Problems, I’m concentrating on a particularly dramatic feature of rule 30: the apparent randomness of its center column of cells. Start from a single black cell, then just look down the sequence of values of this cell and it seems random:
#10005
ArrayPlot[
MapIndexed[If[#2[[2]] != 21, # /. {0 > 0.2, 1 > .6}, #] ,
CellularAutomaton[30, {{1}, 0}, 20], {2}], Mesh > All]
But in what sense is it really random? And can one prove it? Each of the Prize Problems in effect uses a different criterion for randomness, then asks whether the sequence is random according to that criterion.
Problem 1: Does the center column always remain nonperiodic?
Here’s the beginning of the center column of rule 30:
#10005
ArrayPlot[List@CellularAutomaton[30, {{1}, 0}, {80, {{0}}}],
Mesh > True, ImageSize > Full]
It’s easy to see that this doesn’t repeat it doesn’t become periodic. But this problem is about whether the center column ever becomes periodic, even after an arbitrarily large number of steps. Just by running rule 30, we know the sequence doesn’t become periodic in the first billion steps. But what about ever? To establish that, we need a proof. (Here are the first million and first billion bits in the sequence, by the way, as entries in the Wolfram Data Repository.)
Problem 2: Does each color of cell occur on average equally often in the center column?
Here’s what one gets if one tallies the number of black and of white cells in successively more steps in the center column of rule 30:
#10005
Dataset[{{1, 1, 0, ""}, {10, 7, 3, 2.3333333333333335}, {100, 52, 48, 1.0833333333333333},
{1000, 481, 519, 0.9267822736030829}, {10000, 5032, 4968, 1.0128824476650564},
{100000, 50098, 49902, 1.0039276982886458}, {1000000, 500768, 499232,
1.003076725850907}, {10000000, 5002220, 4997780, 1.0008883944471345},
{100000000, 50009976, 49990024, 1.000399119632349},
{1000000000, 500025038, 499974962, 1.0001001570154626}}]
The results are certainly close to equal for black vs. white. But what this problem asks is whether the limit of the ratio after an arbitrarily large number of steps is exactly 1.
Problem 3: Does computing the nth cell of the center column require at least O(n) computational effort?
To find the nth cell in the center column, one can always just run rule 30 for n steps, computing the values of all the cells in this diamond:
#10005
With[{n = 100},
ArrayPlot[
MapIndexed[If[Total[Abs[#2  n/2  1]] = n/2, #, #/4] ,
CellularAutomaton[30, CenterArray[{1}, n + 1], n], {2}]]]
But if one does this directly, one’s doing n2 individual cell updates, so the computational effort required goes up like O(n2). This problem asks if there’s a shortcut way to compute the value of the nth cell, without all this intermediate computation or, in particular, in less than O(n) computational effort.
The Digits of Pi
Rule 30 is a creature of the computational universe: a system found by exploring possible simple programs with the new intellectual framework that the paradigm of computation provides. But the problems I’ve defined about rule 30 have analogs in mathematics that are centuries old.
Consider the digits of . They re a little like the center column of rule 30. There s a definite algorithm for generating them. Yet once generated they seem for all practical purposes random:
#10005
N[Pi, 85]
Just to make the analog a little closer, here are the first few digits of in base 2:
#10005
BaseForm[N[Pi, 25], 2]
And here are the first few bits in the center column of rule 30:
#10005
Row[CellularAutomaton[30, {{1}, 0}, {90, {{0}}}]]
Just for fun, one can convert these to base 10:
#10005
N[FromDigits[{Flatten[CellularAutomaton[30, {{1}, 0}, {500, {0}}]],
0}, 2], 85]
Of course, the known algorithms for generating the digits of are considerably more complicated than the simple rule for generating the center column of rule 30. But, OK, so what’s known about the digits of ?
Well, we know they don t repeat. That was proved in the 1760s when it was shown that is an irrational number because the only numbers whose digits repeat are rational numbers. (It was also shown in 1882 that is transcendental, i.e. that it cannot be expressed in terms of roots of polynomials.)
How about the analog of problem 2? Do we know if in the digit sequence of different digits occur with equal frequency? By now more than 100 trillion binary digits have been computed and the measured frequencies of digits are very close (in the first 40 trillion binary digits the ratio of 1s to 0s is about 0.9999998064). But in the limit, are the frequencies exactly the same? People have been wondering about this for several centuries. But so far mathematics hasn t succeeded in delivering any results.
For rational numbers, digit sequences are periodic, and it s easy to work out relative frequencies of digits. But for the digit sequences of all other naturally constructed numbers, basically there s nothing known about limiting frequencies of digits. It s a reasonable guess that actually the digits of (as well as the center column of rule 30) are normal , in the sense that not only every individual digit, but also every block of digits of any given length in the limit occur with equal frequency. And as was noted in the 1930s, it s perfectly possible to digitconstruct normal numbers. Champernowne’s number, formed by concatenating the digits of successive integers, is an example (and, yes, this works in any base, and one can also get normal numbers by concatenating values of functions of successive integers):
#10005
N[ChampernowneNumber[10], 85]
But the point is that for “naturally constructed” numbers formed by combinations of standard mathematical functions, there’s simply no example known where any regularity of digits has been found. Of course, it ultimately depends what one means by “regularity” and at some level the problem devolves into a kind of numberdigit analog of the search for extraterrestrial intelligence. But there’s absolutely no proof that one couldn’t, for example, find even some strange combination of square roots that would have a digit sequence with some very obvious regularity.
OK, so what about the analog of problem 3 for the digits of ? Unlike rule 30, where the obvious way to compute elements in the sequence is one step at a time, traditional ways of computing digits of involve getting better approximations to as a complete number. With the standard (bizarrelooking) series invented by Ramanujan in 1910 and improved by the Chudnovsky brothers in 1989, the first few terms in the series give the following approximations:
#10005
Style[Table[N[(12*\!\(
\*UnderoverscriptBox[\(\[Sum]\), \(k = 0\), \(n\)]
\*FractionBox[\(
\*SuperscriptBox[\((\(1\))\), \(k\)]*\(\((6*k)\)!\)*\((13591409 +
545140134*k)\)\), \(\(\((3*k)\)!\)
\*SuperscriptBox[\((\(k!\))\), \(3\)]*
\*SuperscriptBox[\(640320\), \(3*k + 3/2\)]\)]\))^1, 100], {n, 10}] //
Column, 9]
So how much computational effort is it to find the nth digit? The number of terms required in the series is O(n). But each term needs to be computed to ndigit precision, which requires at least O(n) individual digit operations implying that altogether the computational effort required is more than O(n).
Until the 1990s it was assumed that there wasn t any way to compute the nth digit of without computing all previous ones. But in 1995 Simon Plouffe discovered that actually it s possible to compute albeit slightly probabilistically the nth digit without computing earlier ones. And while one might have thought that this would allow the nth digit to be obtained with less than O(n) computational effort, the fact that one has to do computations at ndigit precision means that at least O(n) computational effort is still required.
Results, Analogies and Intuitions
Problem 1: Does the center column always remain nonperiodic?
Of the three Rule 30 Prize Problems, this is the one on which the most progress has already been made. Because while it’s not known if the center column in the rule 30 pattern ever becomes periodic, Erica Jen showed in 1986 that no two columns can both become periodic. And in fact, one can also give arguments that a single column plus scattered cells in another column can t both be periodic.
The proof about a pair of columns uses a special feature of rule 30. Consider the structure of the rule:
#10005
RulePlot[CellularAutomaton[30]]
Normally one would just say that given each triple of cells, the rule determines the color of the center cell below. But for rule 30, one can effectively also run the rule sideways: given the cell to the right and above, one can also uniquely determine the color of the cell to the left. And what this means is that if one is given two adjacent columns, it s possible to reconstruct the whole pattern to the left:
#10005
GraphicsRow[
ArrayPlot[#, PlotRange > 1, Mesh > All, PlotRange > 1,
Background > LightGray,
ImageSize > {Automatic, 80}] /@ (PadLeft[#, {Length[#], 10},
10] /@
Module[{data = {{0, 1}, {1, 1}, {0, 0}, {0, 1}, {1, 1}, {1,
0}, {0, 1}, {1, 10}}},
Flatten[{{data},
Table[Join[
Table[Module[{p, q = data[[n, 1]], r = data[[n, 2]],
s = data[[n + 1, 1]] },
p = Mod[q  r  q r + s, 2];
PrependTo[data[[n]], p]], {n, 1, Length[data]  i}],
PrependTo[data[[#]], 10] /@ Reverse[Range[i]]], {i, 7}]},
1]])]
But if the columns were periodic, it immediately follows that the reconstructed pattern would also have to be periodic. Yet by construction at least the initial condition is definitely not periodic, and hence the columns cannot both be periodic. The same argument works if the columns are not adjacent, and if one doesn’t know every cell in both columns. But there’s no known way to extend the argument to a single column such as the center column and thus it doesn’t resolve the first Rule 30 Prize Problem.
OK, so what would be involved in resolving it? Well, if it turns out that the center column is eventually periodic, one could just compute it, and show that. We know it s not periodic for the first billion steps, but one could at least imagine that there could be a trillionstep transient, after which it s periodic.
Is that plausible? Well, transients do happen and theoretically (just like in the classic Turing machine halting problem) they can even be arbitrarily long. Here s a somewhat funky example found by a search of a rule with 4 possible colors (totalistic code 150898). Run it for 200 steps, and the center column looks quite random:
#10005
ArrayPlot[
CellularAutomaton[{150898, {4, 1}, 1}, {{1}, 0}, {200, 150 {1, 1}}],
ColorRules > {0 > Hue[0.12, 1, 1], 1 > Hue[0, 0.73, 0.92],
2 > Hue[0.13, 0.5, 1], 3 > Hue[0.17, 0, 1]},
PixelConstrained > 2, Frame > False]
After 500 steps, the whole pattern still looks quite random:
#10005
ArrayPlot[
CellularAutomaton[{150898, {4, 1}, 1}, {{1}, 0}, {500, 300 {1, 1}}],
ColorRules > {0 > Hue[0.12, 1, 1], 1 > Hue[0, 0.73, 0.92],
2 > Hue[0.13, 0.5, 1], 3 > Hue[0.17, 0, 1]}, Frame > False,
ImagePadding > 0, PlotRangePadding > 0, PixelConstrained > 1]
But if one zooms in around the center column, there’s something surprising: after 251 steps, the center column seems to evolve to a fixed value (or at least it’s fixed for more than a million steps):
#10005
Grid[{ArrayPlot[#, Mesh > True,
ColorRules > {0 > Hue[0.12, 1, 1], 1 > Hue[0, 0.73, 0.92],
2 > Hue[0.13, 0.5, 1], 3 > Hue[0.17, 0, 1]}, ImageSize > 38,
MeshStyle > Lighter[GrayLevel[.5, .65], .45]] /@
Partition[
CellularAutomaton[{150898, {4, 1}, 1}, {{1}, 0}, {1400, {4, 4}}],
100]}, Spacings > .35]
Could some transient like this happen in rule 30? Well, take a look at the rule 30 pattern, now highlighting where the diagonals on the left are periodic:
#10005
steps = 500;
diagonalsofrule30 =
Reverse /@
Transpose[
MapIndexed[RotateLeft[#1, (steps + 1)  #2[[1]]] ,
CellularAutomaton[30, {{1}, 0}, steps]]];
diagonaldataofrule30 =
Table[With[{split =
Split[Partition[Drop[diagonalsofrule30[[k]], 1], 8]],
ones = Flatten[
Position[Reverse[Drop[diagonalsofrule30[[k]], 1]],
1]]}, {Length[split[[1]]], split[[1, 1]],
If[Length[split] > 1, split[[2, 1]],
Length[diagonalsofrule30[[k]]]  Floor[k/2]]}], {k, 1,
2 steps + 1}];
transientdiagonalrule30 = %;
transitionpointofrule30 =
If[IntegerQ[#[[3]]], #[[3]],
If[#[[1]] > 1,
8 #[[1]] + Count[Split[#[[2]]  #[[3]]][[1]], 0] + 1, 0] ] /@
diagonaldataofrule30;
decreasingtransitionpointofrule30 =
Append[Min /@ Partition[transitionpointofrule30, 2, 1], 0];
transitioneddiagonalsofrule30 =
Table[Join[
Take[diagonalsofrule30[[n]],
decreasingtransitionpointofrule30[[n]]] + 2,
Drop[diagonalsofrule30[[n]],
decreasingtransitionpointofrule30[[n]]]], {n, 1, 2 steps + 1}];
transientdiagonalrule30 =
MapIndexed[RotateRight[#1, (steps + 1)  #2[[1]]] ,
Transpose[Reverse /@ transitioneddiagonalsofrule30]];
smallertransientdiagonalrule30 =
Take[#, {225, 775}] /@ Take[transientdiagonalrule30, 275];
Framed[ArrayPlot[smallertransientdiagonalrule30,
ColorRules > {0 > White, 1 > Gray, 2 > Hue[0.14, 0.55, 1],
3 > Hue[0.07, 1, 1]}, PixelConstrained > 1,
Frame > None,
ImagePadding > 0, ImageMargins > 0,
PlotRangePadding > 0, PlotRangePadding > Full
], FrameMargins > 0, FrameStyle > GrayLevel[.75]]
There seems to be a boundary that separates order on the left from disorder on the right. And at least over the first 100,000 or so steps, the boundary seems to move on average about 0.252 steps to the left at each step with roughly random fluctuations:
#10005
data = CloudGet[
CloudObject[
"https://www.wolframcloud.com/obj/bc470188f6294497965d\
a10fe057e2fd"]];
ListLinePlot[
MapIndexed[{First[#2], #  .252 First[#2]} ,
Module[{m = 1, w},
w = If[First[#] > m, m = First[#], m] /@ data[[1]]; m = 1;
Table[While[w[[m]] < i, m++]; m  i, {i, 100000}]]],
Filling > Axis, AspectRatio > 1/4, MaxPlotPoints > 10000,
Frame > True, PlotRangePadding > 0, AxesOrigin > {Automatic, 0},
PlotStyle > Hue[0.07`, 1, 1],
FillingStyle > Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]]
But how do we know that there won’t at some point be a huge fluctuation, that makes the order on the left cross the center column, and perhaps even make the whole pattern periodic? From the data we have so far, it looks unlikely, but I don’t know any way to know for sure.
And it s certainly the case that there are systems with exceptionally long transients . Consider the distribution of primes, and compute LogIntegral[n]  PrimePi[n]:
#10005
DiscretePlot[LogIntegral[n]  PrimePi[n], {n, 10000},
Filling > Axis,
Frame > True, PlotRangePadding > 0, AspectRatio > 1/4,
Joined > True, PlotStyle > Hue[0.07`, 1, 1],
FillingStyle > Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]]
Yes, there are fluctuations. But from this picture it certainly looks as if this difference is always going to be positive. And that’s, for example, what Ramanujan thought. But it turns out it isn t true. At first the bound for where it would fail was astronomically large (Skewes s number 10^10^10^964). And although still nobody has found an explicit value of n for which the difference is negative, it’s known that before n = 10317 there must be one (and eventually the difference will be negative at least nearly a millionth of the time).
I strongly suspect that nothing like this happens with the center column of rule 30. But until we have a proof that it can t, who knows?
One might think, by the way, that while one might be able to prove periodicity by exposing regularity in the center column of rule 30, nothing like that would be possible for nonperiodicity. But actually, there are patterns whose center columns one can readily see are nonperiodic, even though they re very regular. The main class of examples are nested patterns. Here s a very simple example, from rule 161 in which the center column has white cells when n = 2k:
#10005
GraphicsRow[
ArrayPlot[CellularAutomaton[161, {{1}, 0}, #]] /@ {40, 200}]
Here’s a slightly more elaborate example (from the 2neighbor 2color rule 69540422), in which the center column is a Thue–Morse sequence ThueMorse[n]:
#10005
GraphicsRow[
ArrayPlot[
CellularAutomaton[{69540422, 2, 2}, {{1},
0}, {#, {#, #}}]] /@ {40, 400}]
One can think of the Thue–Morse sequence as being generated by successively applying the substitutions:
#10005
RulePlot[SubstitutionSystem[{0 > {0, 1}, 1 > {1, 0}}],
Appearance > "Arrow"]
And it turns out that the nth term in this sequence is given by Mod[DigitCount[n, 2, 1], 2] which is never periodic.
Will it turn out that the center column of rule 30 can be generated by a substitution system? Again, I d be amazed (although there are seemingly natural examples where very complex substitution systems do appear). But once again, until one has a proof, who knows?
Here’s something else, that may be confusing, or may be helpful. The Rule 30 Prize Problems all concern rule 30 running in an infinite array of cells. But what if one considers just n cells, say with the periodic boundary conditions (i.e. taking the right neighbor of the rightmost cell to be the leftmost cell, and vice versa)? There are 2n possible total states of the system and one can draw a state transition diagram that shows which state evolves to which other. Here’s the diagram for n = 5:
#10005
Graph[# > CellularAutomaton[30][#] /@ Tuples[{1, 0}, 4],
VertexLabels > ((# >
ArrayPlot[{#}, ImageSize > 30, Mesh > True]) /@
Tuples[{1, 0}, 4])]
And here it is for n = 4 through n = 11:
#10005
Row[Table[
Framed[Graph[# > CellularAutomaton[30][#] /@
Tuples[{1, 0}, n]]], {n, 4, 11}]]
The structure is that there are a bunch of states that appear only as transients, together with other states that are on cycles. Inevitably, no cycle can be longer than 2n (actually, symmetry considerations show that it always has to be somewhat less than this).
OK, so on a sizen array, rule 30 always has to show behavior that becomes periodic with a period that s less than 2n. Here are the actual periods starting from a single black cell initial condition, plotted on a log scale:
#10005
ListLogPlot[
Normal[Values[
ResourceData[
"Repetition Periods for Elementary Cellular Automata"][
Select[#Rule == 30 ]][All, "RepetitionPeriods"]]],
Joined > True, Filling > Bottom, Mesh > All,
MeshStyle > PointSize[.008], AspectRatio > 1/3, Frame > True,
PlotRange > {{47, 2}, {0, 10^10}}, PlotRangePadding > .1,
PlotStyle > Hue[0.07`, 1, 1],
FillingStyle > Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]]
And at least for these values of n, a decent fit is that the period is about 20.63 n. And, yes, at least in all these cases, the period of the center column is equal to the period of the whole evolution. But what do these finitesize results imply about the infinitesize case? I, at least, don’t immediately see.
Problem 2: Does each color of cell occur on average equally often in the center column?
Here’s a plot of the running excess of 1s over 0s in 10,000 steps of the center column of rule 30:
#10005
ListLinePlot[
Accumulate[2 CellularAutomaton[30, {{1}, 0}, {10^4  1, {{0}}}]  1],
AspectRatio > 1/4, Frame > True, PlotRangePadding > 0,
AxesOrigin > {Automatic, 0}, Filling > Axis,
PlotStyle > Hue[0.07`, 1, 1],
FillingStyle > Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]]
Here it is for a million steps:
#10005
ListLinePlot[
Accumulate[
2 ResourceData[
"A Million Bits of the Center Column of the Rule 30 Cellular Automaton"]  1], Filling > Axis, Frame > True, PlotRangePadding > 0, AspectRatio > 1/4, MaxPlotPoints > 1000, PlotStyle > Hue[0.07`, 1, 1],
FillingStyle > Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]]
And a billion steps:
#10005
data=Flatten[IntegerDigits[#,2,8] /@Normal[ResourceData["A
Billion Bits of the Center Column of the Rule 30 Cellular Automaton"]]];
data=Accumulate[2 data1];
sdata=Downsample[data,10^5];
ListLinePlot[Transpose[{Range[10000] 10^5,sdata}],Filling>Axis,Frame>True,PlotRangePadding>0,AspectRatio>1/4,MaxPlotPoints>1000,PlotStyle>Hue[0.07`,1,1],FillingStyle>Directive[Opacity[0.35`],Hue[0.12`,1,1]]]
We can see that there are times when there’s an excess of 1s over 0s, and vice versa, though, yes, as we approach a billion steps 1 seems to be winning over 0, at least for now.
But let s compute the ratio of the total number of 1s to the total number 0f 0s. Here s what we get after 10,000 steps:
#10005
Quiet[ListLinePlot[
MapIndexed[#/(First[#2]  #) ,
Accumulate[CellularAutomaton[30, {{1}, 0}, {10^4  1, {{0}}}]]],
AspectRatio > 1/4, Filling > Axis, AxesOrigin > {Automatic, 1},
Frame > True, PlotRangePadding > 0, PlotStyle > Hue[0.07`, 1, 1],
FillingStyle > Directive[Opacity[0.35`], Hue[0.12`, 1, 1]],
PlotRange > {Automatic, {.88, 1.04}}]]
Is this approaching the value 1? It’s hard to tell. Go on a little longer, and this is what we see:
#10005
Quiet[ListLinePlot[
MapIndexed[#/(First[#2]  #) ,
Accumulate[CellularAutomaton[30, {{1}, 0}, {10^5  1, {{0}}}]]],
AspectRatio > 1/4, Filling > Axis, AxesOrigin > {Automatic, 1},
Frame > True, PlotRangePadding > 0, PlotStyle > Hue[0.07`, 1, 1],
FillingStyle > Directive[Opacity[0.35`], Hue[0.12`, 1, 1]],
PlotRange > {Automatic, {.985, 1.038}}]]
The scale is getting smaller, but it’s still hard to tell what will happen. Plotting the difference from 1 on a loglog plot up to a billion steps suggests it’s fairly systematically getting smaller:
#10005
accdata=Accumulate[Flatten[IntegerDigits[#,2,8] /@Normal[ResourceData["A
Billion Bits of the Center Column of the Rule 30 Cellular Automaton"]]]];
diffratio=FunctionCompile[Function[Typed[arg,TypeSpecifier["PackedArray"]["MachineInteger",1]],MapIndexed[Abs[N[#]/(First[#2]N[#])1.] ,arg]]];
data=diffratio[accdata];
ListLogLogPlot[Join[Transpose[{Range[3,10^5],data[[3;;10^5]]}],Transpose[{Range[10^5+1000,10^9,1000],data[[10^5+1000;;10^9;;1000]]}]],Joined>True,AspectRatio>1/4,Frame>True,Filling>Axis,PlotRangePadding>0,PlotStyle>Hue[0.07`,1,1],FillingStyle>Directive[Opacity[0.35`],Hue[0.12`,1,1]]]
But how do we know this trend will continue? Right now, we don’t. And, actually, things could get quite pathological. Maybe the fluctuations in 1s vs. 0s grow, so even though we’re averaging over longer and longer sequences, the overall ratio will never converge to a definite value.
Again, I doubt this is going to happen in the center column of rule 30. But without a proof, we don’t know for sure.
We’re asking here about the frequencies of black and white cells. But an obvious and potentially illuminating generalization is to ask instead about the frequencies for blocks of cells of length k. We can ask if all 2k such blocks have equal limiting frequency. Or we can ask the more basic question of whether all the blocks even ever occur or, in other words, whether if one goes far enough, the center column of rule 30 will contain any given sequence of length k (say a bitwise representation of some work of literature).
Again, we can get empirical evidence. For example, at least up to k = 22, all 2k sequences do occur and here’s how many steps it takes:
#10005
ListLogPlot[{3, 7, 13, 63, 116, 417, 1223, 1584, 2864, 5640, 23653,
42749, 78553, 143591, 377556, 720327, 1569318, 3367130, 7309616,
14383312, 32139368, 58671803}, Joined > True, AspectRatio > 1/4,
Frame > True, Mesh > True,
MeshStyle >
Directive[{Hue[0.07, 0.9500000000000001, 0.99], PointSize[.01]}],
PlotTheme > "Detailed",
PlotStyle > Directive[{Thickness[.004], Hue[0.1, 1, 0.99]}]]
It s worth noticing that one can succeed perfectly for blocks of one length, but then fail for larger blocks. For example, the Thue–Morse sequence mentioned above has exactly equal frequencies of 0 and 1, but pairs don t occur with equal frequencies, and triples of identical elements simply never occur.
In traditional mathematics and particularly dynamical systems theory one approach to take is to consider not just evolution from a singlecell initial condition, but evolution from all possible initial conditions. And in this case it s straightforward to show that, yes, if one evolves with equal probability from all possible initial conditions, then columns of cells generated by rule 30 will indeed contain every block with equal frequency. But if one asks the same thing for different distributions of initial conditions, one gets different results, and it s not clear what the implication of this kind of analysis is for the specific case of a singlecell initial condition.
If different blocks occurred with different frequencies in the center column of rule 30, then that would immediately show that the center column is not random , or in other words that it has statistical regularities that could be used to at least statistically predict it. Of course, at some level the center column is completely predictable : you just have to run rule 30 to find it. But the question is whether, given just the values in the center column on their own, there s a way to predict or compress them, say with much less computational effort than generating an arbitrary number of steps in the whole rule 30 pattern.
One could imagine running various data compression or statistical analysis algorithms, and asking whether they would succeed in finding regularities in the sequence. And particularly when one starts thinking about the overall computational capabilities of rule 30, it s conceivable that one could prove something about how across a spectrum of possible analysis algorithms, there s a limit to how much they could reduce the computation associated with the evolution of rule 30. But even given this, it d likely still be a major challenge to say anything about the specific case of relative frequencies of black and white cells.
It s perhaps worth mentioning one additional mathematical analog. Consider treating the values in a row of the rule 30 pattern as digits in a real number, say with the first digit of the fractional part being on the center column. Now, so far as we know, the evolution of rule 30 has no relation to any standard operations (like multiplication or taking powers) that one does on real numbers. But we can still ask about the sequence of numbers formed by looking at the righthand side of the rule 30 pattern. Here s a plot for the first 200 steps:
#10005
ListLinePlot[
FromDigits[{#, 0}, 2] /@
CellularAutomaton[30, {{1}, 0}, {200, {0, 200}}], Mesh > All,
AspectRatio > 1/4, Frame > True,
MeshStyle >
Directive[{Hue[0.07, 0.9500000000000001, 0.99], PointSize[.0085]}],
PlotTheme > "Detailed", PlotStyle > Directive[{
Hue[0.1, 1, 0.99]}], ImageSize > 575]
And here’s a histogram of the values reached at successively more steps:
#10005
Grid[{Table[
Histogram[
FromDigits[{#, 0}, 2] /@
CellularAutomaton[30, {{1}, 0}, {10^n, {0, 20}}], {.01},
Frame > True,
FrameTicks > {{None,
None}, {{{0, "0"}, .2, .4, .6, .8, {1, "1"}}, None}},
PlotLabel > (StringTemplate["`` steps"][10^n]),
ChartStyle > Directive[Opacity[.5], Hue[0.09, 1, 1]],
ImageSize > 208,
PlotRangePadding > {{0, 0}, {0, Scaled[.06]}}], {n, 4, 6}]},
Spacings > .2]
And, yes, it’s consistent with the limiting histogram being flat, or in other words, with these numbers being uniformly distributed in the interval 0 to 1.
Well, it turns out that in the early 1900s there were a bunch of mathematical results established about this kind of equidistribution. In particular, it s known that FractionalPart[h n] for successive n is always equidistributed if h isn t a rational number. It s also known that FractionalPart[hn] is equidistributed for almost all h (Pisot numbers like the golden ratio are exceptions). But specific cases like FractionalPart[(3/2)n] have eluded analysis for at least half a century. (By the way, it’s known that the digits of in base 16 and thus base 2 can be generated by a recurrence of the form xn = FractionalPart[16 xn1 + r[n]] where r[n] is a fixed rational function of n.)
Problem 3: Does computing the nth cell of the center column require at least O(n) computational effort?
Consider the pattern made by rule 150:
#10005
Row[{ArrayPlot[CellularAutomaton[150, {{1}, 0}, 30], Mesh > All,
ImageSize > 315],
ArrayPlot[CellularAutomaton[150, {{1}, 0}, 200], ImageSize > 300]}]
It’s a very regular, nested pattern. Its center column happens to be trivial (all cells are black). But if we look one column to the left or right, we find:
#10005
ArrayPlot[{Table[Mod[IntegerExponent[t, 2], 2], {t, 80}]},
Mesh > All, ImageSize > Full]
How do we work out the value of the nth cell? Well, in this particular case, it turns out there’s essentially just a simple formula: the value is given by Mod[IntegerExponent[n, 2], 2]. In other words, just look at the number n in base 2, and ask whether the number of zeros it has at the end is even or odd.
How much computational effort does it take to evaluate this formula ? Well, even if we have to check every bit in n, there are only about Log[2, n] of those. So we can expect that the computational effort is O(log n).
But what about the rule 30 case? We know we can work out the value of the nth cell in the center column just by explicitly applying the rule 30 update rule n2 times. But the question is whether there s a way to reduce the computational work that s needed. In the past, there s tended to be an implicit assumption throughout the mathematical sciences that if one has the right model for something, then by just being clever enough one will always find a way to make predictions or in other words, to work out what a system will do, using a lot less computational effort than the actual evolution of the system requires.
And, yes, there are plenty of examples of exact solutions (think 2body problem, 2D Ising model, etc.) where we essentially just get a formula for what a system will do. But there are also other cases (think 3body problem, 3D Ising model, etc.) where this has never successfully been done.
And as I first discussed in the early 1980s, I suspect that there are actually many systems (including these) that are computationally irreducible, in the sense that there s no way to significantly reduce the amount of computational work needed to determine their behavior.
So in effect Problem 3 is asking about the computational irreducibility of rule 30 or at least a specific aspect of it. (The choice of O(n) computational effort is somewhat arbitrary; another version of this problem could ask for O(n ) for any 1]
Rule 30, however, shows no such obvious modularity so it doesn’t seem plausible that one can establish universality in the “engineering” way it’s been established for all other knowntobeuniversal systems. Still, my Principle of Computational Equivalence strongly suggests that rule 30 is indeed universal; we just don’t yet have an obvious direction to take in trying to prove it.
If one can show that a system is universal, however, then this does have implications that are closer to our rule 30 problem. In particular, if a system is universal, then there ll be questions (like the halting problem) about its infinitetime behavior that will be undecidable, and which no guaranteedfinitetime computation can answer. But as such, universality is a statement about the existence of initial conditions that reproduce a given computation. It doesn t say anything about the specifics of a particular initial condition or about how long it will take to compute a particular result.
OK, but what about a different direction: what about getting empirical evidence about our Problem 3? Is there a way to use statistics, or cryptanalysis, or mathematics, or machine learning to even slightly reduce the computational effort needed to compute the nth value in the center column?
Well, we know that the whole 2D pattern of rule 30 is far from random. In fact, of all 2m2 patches, only m m can possibly occur and in practice the number weighted by probability is much smaller. And I don t doubt that facts like this can be used to reduce the effort to compute the center column to less than O(n2) effort (and that would be a nice partial result). But can it be less than O(n) effort? That s a much more difficult question.
Clearly if Problem 1 was answered in the negative then it could be. But in a sense asking for less than O(n) computation of the center column is precisely like asking whether there are predictable regularities in it. Of course, even if one could find smallscale statistical regularities in the sequence (as answering Problem 2 in the negative would imply), these wouldn t on their own give one a way to do more than perhaps slightly improve a constant multiplier in the speed of computing the sequence.
Could there be some systematically reduced way to compute the sequence using a neural net which is essentially a collection of nested realnumber functions? I ve tried to find such a neural net using our current deeplearning technology and haven t been able to get anywhere at all.
What about statistical methods? If we could find statistical nonrandomness in the sequence, then that would imply an ability to compress the sequence, and thus some redundancy or predictability in the sequence. But I ve tried all sorts of statistical randomness tests on the center column of rule 30 and never found any significant deviation from randomness. (And for many years until we found a slightly more efficient rule we used sequences from finitesize rule 30 systems as our source of random numbers in the Wolfram Language, and no legitimate it s not random! bugs ever showed up.)
Statistical tests of randomness typically work by saying, Take the supposedly random sequence and process it in some way, then see if the result is obviously nonrandom . But what kind of processing should be done? One might see if blocks occur with equal frequency, or if correlations exist, or if some compression algorithm succeeds in doing compression. But typically batteries of tests end up seeming a bit haphazard and arbitrary. In principle one can imagine enumerating all possible tests by enumerating all possible programs that can be applied to the sequence. But I ve tried doing this, for example for classes of cellular automaton rules and have never managed to detect any nonrandomness in the rule 30 sequence.
So how about using ideas from mathematics to predict the rule 30 sequence? Well, as such, rule 30 doesn t seem connected to any welldeveloped area of math. But of course it s conceivable that some mapping could be found between rule 30 and ideas, say, in an area like number theory and that these could either help in finding a shortcut for computing rule 30, or could show that computing it is equivalent to some problem like integer factoring that s thought to be fundamentally difficult.
I know a few examples of interesting interplays between traditional mathematical structures and cellular automata. For example, consider the digits of successive powers of 3 in base 2 and in base 6:
#10005
Row[Riffle[
ArrayPlot[#, ImageSize > {Automatic, 275}] /@ {Table[
IntegerDigits[3^t, 2, 159], {t, 100}],
Table[IntegerDigits[3^t, 6, 62], {t, 100}]}, Spacer[10]]]
It turns out that in the base 6 case, the rule for generating the pattern is exactly a cellular automaton. (For base 2, there are additional longrange carries.) But although both these patterns look complex, it turns out that their mathematical structure lets us speed up making certain predictions about them.
Consider the sth digit from the righthand edge of line n in each pattern. It s just the sth digit in 3n, which is given by the formula (where b is the base, here 2 or 6) Mod[Quotient[3n, bs], b]. But how easy is it to evaluate this formula? One might think that to compute 3n one would have to do n multiplications. But this isn t the case: instead, one can for example build up 3n using repeated squaring, with about log(n) multiplications. That this is possible is a consequence of the associativity of multiplication. There s nothing obviously like that for rule 30 but it s always conceivable that some mapping to a mathematical structure like this could be found.
Talking of mathematical structure, it s worth mentioning that there are more formulalike ways to state the basic rule for rule 30. For example, taking the values of three adjacent cells to be p, q, r the basic rule is just p (q r) or Xor[p, Or[q, r]]. With numerical cell values 0 and 1, the basic rule is just Mod[p + q + r + q r, 2]. Do these forms help? I don t know. But, for example, it s remarkable that in a sense all the complexity of rule 30 comes from the presence of that one little nonlinear q r term for without that term, one would have rule 150, about which one can develop a complete algebraic theory using quite traditional mathematics.
To work out n steps in the evolution of rule 30, one s effectively got to repeatedly compose the basic rule. And so far as one can tell, the symbolic expressions that arise just get more and more complicated and don t show any sign of simplifying in such a way as to save computational work.
In Problem 3, we re talking about the computational effort to compute the nth value in the center column of rule 30 and asking if it can be less than O(n). But imagine that we have a definite algorithm for doing the computation. For any given n, we can see what computational resources it uses. Say the result is r[n]. Then what we re asking is whether r[n] is less than big O of n, or whether MaxLimit[r[n]/n, n ] i\)]c[t + p] == c[t]\)\)
or
#10005
NotExists[{p, i}, ForAll[t, t > i, c[t + p] == c[t]]]
or “there does not exist a period p and an initial length i such that for all t with t>i, c[t + p] equals c[t]”.
Problem 2: Does each color of cell occur on average equally often in the center column?
#10005
\!\(\*UnderscriptBox[\(\[Limit]\), \(t\*
UnderscriptBox["\[Rule]",
TemplateBox[{},
"Integers"]]\[Infinity]\)]\) Total[c[t]]/t == 1/2
or
#10005
DiscreteLimit[Total[c[t]]/t, t > Infinity] == 1/2
or “the discrete limit of the total of the values in c[t]/t as t is 1/2”.
Problem 3: Does computing the nth cell of the center column require at least O(n) computational effort?
Define machine[m] to be a machine parametrized by m (for example TuringMachine[...]), and let machine[m][n] give {v, t}, where v is the output value, and t is the amount of computational effort taken (e.g. number of steps). Then the problem can be formulated as:
#10005
\!\(
\*SubscriptBox[\(\[NotExists]\), \(m\)]\((
\*SubscriptBox[\(\[ForAll]\), \(n\)]\(\(machine[m]\)[n]\)[[1]] ==
Last[c[n]]\ \[And] \
\*UnderscriptBox[\(\[MaxLimit]\), \(n > \[Infinity]\)]
\*FractionBox[\(\(\(machine[m]\)[n]\)[[
2]]\), \(n\)] \[Infinity])\)\)
or “there does not exist a machine m which for all n gives c[n], and for which the lim sup of the amount of computational effort spent, divided by n, is finite”. (Yes, one should also require that m be finite, so the machine’s rule can’t just store the answer.)
The Formal Character of Solutions
Before we discuss the individual problems, an obvious question to ask is what the interdependence of the problems might be. If the answer to Problem 3 is negative (which I very strongly doubt), then it holds the possibility for simple algorithms or formulas from which the answers to Problems 1 and 2 might become straightforward. If the answer to Problem 3 is affirmative (as I strongly suspect), then it implies that the answer to Problem 1 must also be affirmative. The contrapositive is also true: if the answer to Problem 1 is negative, then it implies that the answer to Problem 3 must also be negative.
If the answer to Problem 1 is negative, so that there is some periodic sequence that appears in the center column, then if one explicitly knows that sequence, one can immediately answer Problem 2. One might think that answering Problem 2 in the negative would imply something about Problem 3. And, yes, unequal probabilities for black and white implies compression by a constant factor in a Shannoninformation way. But to compute value with less than O(n) resources and therefore to answer Problem 3 in the negative requires that one be able to identify in a sense infinitely more compression.
So what does it take to establish the answers to the problems?
If Problem 1 is answered in the negative, then one can imagine explicitly exhibiting the pattern generated by rule 30 at some known step and being able to see the periodic sequence in the center. Of course, Problem 1 could still be answered in the negative, but less constructively. One might be able to show that eventually the sequence has to be periodic, but not know even any bound on where this might happen. If Problem 3 is answered in the negative, a way to do this is to explicitly give an algorithm (or, say, a Turing machine) that does the computation with less than O(n) computational resources.
But let s say one has such an algorithm. One still has to prove that for all n, the algorithm will correctly reproduce the nth value. This might be easy. Perhaps there would just be a proof by induction or some such. But it might be arbitrarily hard. For example, it could be that for most n, the running time of the algorithm is clearly less than n. But it might not be obvious that the running time will always even be finite. Indeed, the halting problem for the algorithm might simply be undecidable. But just showing that a particular algorithm doesn t halt for a given n doesn t really tell one anything about the answer to the problem. For that one would have to show that there s no algorithm that exists that will successfully halt in less than O(n) time.
The mention of undecidability brings up an issue, however: just what axiom system is one supposed to use to answer the problems? For the purposes of the Prize, I ll just say the traditional axioms of standard mathematics , which one can assume are Peano arithmetic and/or the axioms of set theory (with or without the continuum hypothesis).
Could it be that the answers to the problems depend on the choice of axioms or even that they re independent of the traditional axioms (in the sense of G?del’s incompleteness theorem)? Historical experience in mathematics makes this seem extremely unlikely, because, to date, essentially all natural problems in mathematics seem to have turned out to be decidable in the (sometimes rather implicit) axiom system that s used in doing the mathematics.
In the computational universe, though freed from the bounds of historical math tradition it s vastly more common to run into undecidability. And, actually, my guess is that a fair fraction of longunsolved problems even in traditional mathematics will also turn out to be undecidable. So that definitely raises the possibility that the problems here could be independent of at least some standard axiom systems.
OK, but assume there s no undecidability around, and one s not dealing with the few cases in which one can just answer a problem by saying look at this explicitly constructed thing . Well, then to answer the problem, we re going to have to give a proof.
In essence what drives the need for proof is the presence of something infinite. We want to know something for any n, even infinitely large, etc. And the only way to handle this is then to represent things symbolically ( the symbol Infinity means infinity , etc.), and apply formal rules to everything, defined by the axioms in the underlying axiom system one s assuming.
In the best case, one might be able to just explicitly exhibit that series of rule applications in such a way that a computer can immediately verify that they re correct. Perhaps the series of rule applications could be found by automated theorem proving (as in FindEquationalProof). More likely, it might be constructed using a proof assistant system.
It would certainly be exciting to have a fully formalized proof of the answer to any of the problems. But my guess is that it ll be vastly easier to construct a standard proof of the kind human mathematicians traditionally do. What is such a proof? Well, it’s basically an argument that will convince other humans that a result is correct.
There isn t really a precise definition of that. In our stepbystep solutions in WolframAlpha, we re effectively proving results (say in calculus) in such a way that students can follow them. In an academic math journal, one s giving proofs that successfully get past the peer review process for the journal.
My own guess would be that if one were to try to formalize essentially any nontrivial proof in the math literature, one would find little corners that require new results, though usually ones that wouldn t be too hard to get.
How can we handle this in practice for our prizes? In essence, we have to define a computational contract for what constitutes success, and when prize money should be paid out. For a constructive proof, we can get Wolfram Language code that can explicitly be run on any sufficiently large computer to establish the result. For formalized proofs, we can get Wolfram Language code that can run through the proof, validating each step.
But what about for a human proof ? Ultimately we have no choice but to rely on some kind of human review process. We can ask multiple people to verify the proof. We could have some blockchaininspired scheme where people stake the correctness of the proof, then if one eventually gets consensus (whatever this means) one pays out to people some of the prize money, in proportion to their stake. But whatever is done, it s going to be an imperfect, societal result like almost all of the pure mathematics that s so far been done in the world.
What Will It Take?
OK, so for people interested in working on the Problems, what skills are relevant? I don’t really know. It could be discrete and combinatorial mathematics. It could be number theory, if there’s a correspondence with numberbased systems found. It could be some branch of algebraic mathematics, if there’s a correspondence with algebraic systems found. It could be dynamical systems theory. It could be something closer to mathematical logic or theoretical computer science, like the theory of term rewriting systems.
Of course, it could be that no existing towers of knowledge say in branches of mathematics will be relevant to the problems, and that to solve them will require building from the ground up . And indeed that s effectively what ended up happening in the solution for my 2,3 Turing Machine Prize in 2007.
I m a great believer in the power of computer experiments and of course it s on the basis of computer experiments that I ve formulated the Rule 30 Prize Problems. But there are definitely more computer experiments that could be done. So far we know a billion elements in the center column sequence. And so far the sequence doesn’t seem to show any deviation from randomness (at least based on tests I’ve tried). But maybe at a trillion elements (which should be well within range of current computer systems) or a quadrillion elements, or more, it eventually will and it s definitely worth doing the computations to check.
The direct way to compute n elements in the center column is to run rule 30 for n steps, using at an intermediate stage up to n cells of memory. The actual computation is quite well optimized in the Wolfram Language. Running on my desktop computer, it takes less than 0.4 seconds to compute 100,000 elements:
#10005
CellularAutomaton[30, {{1}, 0}, {100000, {{0}}}]; // Timing
Internally, this is using the fact that rule 30 can be expressed as Xor[p, Or[q, r]], and implemented using bitwise operations on whole words of data at a time. Using explicit bitwise operations on long integers takes about twice as long as the builtin CellularAutomaton function:
#10005
Module[{a = 1},
Table[BitGet[a, a = BitXor[a, BitOr[2 a, 4 a]]; i  1], {i,
100000}]]; // Timing
But these results are from single CPU processors. It’s perfectly possible to imagine parallelizing across many CPUs, or using GPUs. One might imagine that one could speed up the computation by effectively caching the results of many steps in rule 30 evolution, but the fact that across the rows of the rule 30 pattern all blocks appear to occur with at least roughly equal frequency makes it seem as though this would not lead to significant speedup.
Solving some types of mathlike problems seem pretty certain to require deep knowledge of highlevel existing mathematics. For example, it seems quite unlikely that there can be an elementary proof of Fermat’s last theorem, or even of the fourcolor theorem. But for the Rule 30 Prize Problems it s not clear to me. Each of them might need sophisticated existing mathematics, or they might not. They might be accessible only to people professionally trained in mathematics, or they might be solvable by clever “programmingstyle” or “puzzlestyle” work, without sophisticated mathematics.
Generalizations and Relations
Sometimes the best way to solve a specific problem is first to solve a related problem often a more general one and then come back to the specific problem. And there are certainly many problems related to the Rule 30 Prize Problems that one can consider.
For example, instead of looking at the vertical column of cells at the center of the rule 30 pattern, one could look at a column of cells in a different direction. At 45°, it s easy to see that any sequence must be periodic. On the left the periods increase very slowly; on the right they increase rapidly. But what about other angles?
Or what about looking at rows of cells in the pattern? Do all possible blocks occur? How many steps is it before any given block appears? The empirical evidence doesn t see any deviation from blocks occurring at random, but obviously, for example, successive rows are highly correlated.
What about different initial conditions? There are many dynamical systems–style results about the behavior of rule 30 starting with equal probability from all possible infinite initial conditions. In this case, for example, it s easy to show that all possible blocks occur with equal frequency, both at a given row, and in a given vertical column. Things get more complicated if one asks for initial conditions that correspond, for example, to all possible sequences generated by a given finite state machine, and one could imagine that from a sequence of results about different sets of possible initial conditions, one would eventually be able to say something about the case of the single black cell initial condition.
Another straightforward generalization is just to look not at a single black cell initial condition, but at other special initial conditions. An infinite periodic initial condition will always give periodic behavior (that s the same as one gets in a finitesize region with periodic boundary conditions). But one can, for example, study what happens if one puts a single defect in the periodic pattern:
#10005
GraphicsRow[(ArrayPlot[
CellularAutomaton[30,
MapAt[1  #1 , Flatten[Table[#1, Round[150/Length[#1]]]], 50],
100]] ) /@ {{1, 0}, {1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0}, {1,
0, 0, 0, 0, 0, 0}, {1, 1, 1, 0, 0}}]
One can also ask what happens when one has not just a single black cell, but some longer sequence in the initial conditions. How does the center column change with different initial sequences? Are there finite initial sequences that lead to simpler center columns?
Or are there infinite initial conditions generated by other computational systems (say substitution systems) that aren t periodic, but still give somehow simple rule 30 patterns?
Then one can imagine going beyond rule 30. What happens if one adds longerrange exceptions to the rules? When do extensions to rule 30 show behavior that can be analyzed in one way or another? And can one then see the effect of removing the exceptions in the rule?
Of course, one can consider rules quite different from rule 30 as well and perhaps hope to develop intuition or methods relevant to rule 30 by looking at other rules. Even among the 256 twocolor nearestneighbor rules, there are others that show complex behavior starting from a simple initial condition:
#10005
Row[Riffle[
Labeled[ArrayPlot[CellularAutomaton[#, {{1}, 0}, {150, All}],
PixelConstrained > 1, Frame > False],
Style[Text[StringTemplate["rule ``"][#]], 12],
LabelStyle > Opacity[.5]] /@ {45, 73}, Spacer[8]]]
And if one looks at larger numbers of colors and larger neighbors one can find an infinite number of examples. There’s all sorts of behavior that one sees. And, for example, given any particular sequence, one can search for rules that will generate it as their center column. One can also try to classify the centercolumn sequences that one sees, perhaps identifying a general class “like rule 30” about which global statements can be made.
But let s discuss the specific Rule 30 Prize Problems. To investigate the possibility of periodicity in rule 30 (as in Problem 1), one could study lots of different rules, looking for examples with very long periods, or very long transients and try to use these to develop an intuition for how and when these can occur.
To investigate the equalfrequency phenomenon of Problem 2, one can look at different statistical features, and see both in rule 30 and across different rules when it s possible to detect regularity.
For Problem 3, one can start looking at different levels of computational effort. Can one find the nth value with computational effort O(n ) for any  ↑ 
5. Deploying Bandpass Filters Using the Wolfram Language Microcontroller KitÂò., 24 ñåíò.[−]
Realtime filters work like magic. Usually out of sight, they clean data to make it useful for the larger system they are part of, and sometimes even for human consumption. A fascinating thing about these filters is that they don’t have a bigpicture perspective. They work wonders with only a small window into the data that is streaming in. On the other hand, if I had a stream of numbers flying across my screen, I would at the very least need to plot it to make sense of the data. These types of filters are very simple as well.
Take a basic lowpass filter based on two weights, 49/50 and 1/50. The filter computes its next output as a weighted sum of the current output and input. Specifically, output_{k+1} = 49/50 output_{k} + 1/50 input_{k}. So with a few elementary operations, it essentially strips away the highfrequency component on a properly sampled input signal:
✕
With[{u = Table[Sin[t/10] + Sin[10 t], {t, 0, 150, 1/10.}]},
ListLinePlot[{u, RecurrenceFilter[{{1, (49/50)}, {1/50}}, u]},
PlotLegends > {"original signal", "filtered signal"}]]

In hindsight, this output makes sense. The filter relies only a bit on the current input, and hence misses the highfrequency portion of the signal, effectively getting rid of it. Any slowly changing component has already been captured in the output.
The filter weights need not be conjured up or arrived at by trial and error. They can be obtained in a systematic fashion using the signal processing functions in the Wolfram Language. (And that’s what I will be doing in the next section for a couple of nontrivial filters.) I can also compute the frequency response of the filter, which will tell me how it will shift and smother or amplify a signal. This creates a ballpark of what to expect from the realtime responses of the filter. I can simulate its realtime response to various signals as well. But how does the filter perform in real time? Can I see the shifting and smothering? And how does this compare to what is predicted by the frequency response?
To investigate all this, I will pick some slightly more involved filters—namely a bandpass Butterworth and a Chebyshev 1 filter—and deploy them to an Arduino Nano. To go about this, I need several things. Luckily, they are all now available in the Wolfram Language.
The functionality to analyze and design filters has existed in the Wolfram Language for quite some time now. The next link in the workflow is the C or Arduino filter code that needs to be deployed to the microcontroller. Thanks to the Microcontroller Kit, which was released with Version 12, I can generate and deploy the code directly from the Wolfram Language. It will automatically generate the needed microcontroller code, sparing me the wearisome task of having to write the code and make sure that I have the coefficients correct. And finally, I use the Wolfram Device Framework to transmit and receive the data from the Nano.
In the first part of this blog post, I will compute, analyze and deploy the filters. In the second part, I will acquire and analyze the filtered data to visualize the responses and evaluate the performance of the filters.
Compute, Analyze and Deploy the Filter
I start off by creating a function to compute a bandpass filter that will pass signals with frequencies between 2 Hz and 5 Hz and attenuate signals of 1 Hz and 10 Hz by 30 dB:
✕
tfm[filter_, s_: s] :=
filter[{"Bandpass", {1, 2, 5, 10}, {30, 1}}, s] // N //
TransferFunctionExpand // Chop

With this, I obtain the corresponding Butterworth and Chebyshev 1 bandpass filters:
✕
{brw, cbs} = tfm /@ {ButterworthFilterModel, Chebyshev1FilterModel};

✕
Grid[{{"Butterworth filter", brw}, {"Chebyshev 1 filter", cbs}},
Background > {None, {Lighter[#, 0.6] & /@ {cB, cC}}}, Frame > True,
Spacings > {1, 1}, Alignment > Left]

The Bode magnitude plot attests that frequencies between 2 Hz and 5 Hz will be passed through, and frequencies outside this range will be attenuated, with an attenuation of around 35 dB at 1 Hz and 10 Hz:
✕
{bMagPlot, bPhasePlot} =
BodePlot[{brw, cbs}, {0.5, 18}, PlotLayout > "List",
GridLines > Table[{{1, 2, 5, 10}, Automatic}, {2}],
FrameLabel > {{"frequency (Hz)",
"magnitude (dB)"}, {"frequency (Hz)", "phase (deg)"}},
FrameTicks >
Table[{{Automatic, Automatic}, {{1, 2, 5, 10}, Automatic}}, {2}],
ImageSize > Medium, PlotStyle > {cB, cC, cI},
PlotTheme > "Detailed",
PlotLegends > {"Butterworth", "Chebyshev 1"}];

The phase plot shows that at around 3 Hz for the Butterworth filter and 2 Hz for the Chebyshev, there will be no phase shift. For all other frequencies, there will be varying amounts of phase shifts:
Later, I will put these frequency responses side by side with the filtered response from the microcontroller to check if they add up.
For now, I will simulate the response of the filters to a signal with three frequency components. The second component lies in the bandpass range, while the other two lie outside it:
✕
inpC = Sin[0.75 t] + Sin[2.5 t] + Sin[10 t];

The responses have a single frequency component, and the two frequences outside the bandpass have in fact been stripped away:
✕
outC = Table[OutputResponse[sys, inpC, {t, 0, 15}], {sys, {brw, cbs}}];
Plot[{outC, inpC}, {t, 0, 15},
PlotLegends > {"Butterworth", "Chebyshev 1", "Input"},
PlotStyle > {cB, cC, cI}, PlotTheme > "Detailed"]

I then discretize the filters and put them together into one model:
✕
StateSpaceModel[ToDiscreteTimeModel[#, sp]] & /@ {brw, cbs} ;
sysD = NonlinearStateSpaceModel[
SystemsModelParallelConnect[Sequence @@ %, {{1, 1}}, {}]] /.
Times[1.`, v_] :> v // Chop

For good measure, I simulate and verify the response of the discretized model as well:
✕
With[{tmax = 15}, With[{inpD = Table[inpC, {t, 0, tmax, sp}]},
ListLinePlot[Join[OutputResponse[sysD, inpD], {inpD}],
PlotLegends > {"Butterworth", "Chebyshev 1", "Input"},
PlotStyle > {cB, cC, cI}, PlotTheme > "Detailed",
PlotMarkers > {Automatic, Scaled[0.025]}]]]

And I wrap up this first stage by deploying the filter and setting up the Arduino to send and receive the signals over the serial pins:
✕
\[ScriptCapitalM] =
MicrocontrollerEmbedCode[
sysD, <"Target" > "ArduinoNano", "Inputs" > "Serial",
"Outputs" > {"Serial", "Serial"}>, "/dev/cu.usbserialA106PX6Q"]

The port name /dev/cu.usbserialA106PX6Q will not work for you. If you are following along, you will have to change it to the correct value. You can figure it out using Device Manager on Windows, or by searching for file names of the form /dev/cu.usb* and /dev/ttyUSB* on Mac and Linux systems, respectively.
At this point, I can connect any other serial device to send and receive the data. I will use the Device Framework in Mathematica to do that, as its notebook interface provides a great way to visualize the data in real time.
Acquire and Display the Data
To set up the data transfer, I begin by identifying the start, delimiter and end bytes:
✕
{sB, dB, eB} =
Lookup[\[ScriptCapitalM]["Serial"], {"StartByte", "DelimiterByte",
"EndByte"}]

Then I create a scheduled task that reads the filtered output signals and sends the input signal over to the Arduino, and runs at exactly the same sampling period as the discretized filters:
✕
i1 = i2 = 1;
yRaw = y1 = y2 = u1 = u2 = {};
task1 = ScheduledTask[
If[DeviceExecute[dev, "SerialReadyQ"],
If[i1 > len1, i1 = 1]; If[i2 > len2, i2 = 1];
AppendTo[yRaw, DeviceReadBuffer[dev, "ReadTerminator" > eB]];
AppendTo[u1, in1[[i1++]]];
AppendTo[u2, in2[[i2++]]];
DeviceWrite[dev, sB];
DeviceWrite[dev, ToString[Last[u1] + Last[u2]]];
DeviceWrite[dev, eB]
]
,
sp];

I also create a second and less frequent scheduled task that parses the data and discards the old values:
✕
task2 = ScheduledTask[
yRaw1 = yRaw;
yRaw = Drop[yRaw, Length@yRaw1];
{y1P, y2P} = Transpose[parseData /@ yRaw1];
y1 = Join[y1, y1P];
y2 = Join[y2, y2P];
{u1, u2, y1, y2} = drop /@ {u1, u2, y1, y2};
,
3 sp
];

Now I am ready to actually send and receive the data, so I open a connection to the Arduino:
✕
dev = DeviceOpen["Serial", "/dev/cu.usbserialA106PX6Q"]

I then generate the input signals and submit the scheduled tasks to the kernel:
✕
signals[w1, g1, w2, g2];
{taskObj1, taskObj2} = SessionSubmit /@ {task1, task2};

The input signals are of the form g1 Sin[w1 t] and g2 Sin[w1 t]. The utility function signals generates a cyclically padded list of the sampled values for each input. The code for this and other utility functions are available in the downloadable notebook.
At this point, the data is going back and forth between my Mac and the Arduino. To visualize the data and control the input signals, I create a panel. From the panel, I control the frequency and magnitude of the input signals. I plot the input and filtered signals, and also the frequency response of the filters. The frequency response plots have lines showing the expected magnitude and phase of the filtered signals, which I can verify on the signal plots:
✕
DynamicModule[{
iPart, ioPart,
plotOpts = {ImageSize > Medium, GridLines > Automatic,
Frame > True, PlotTheme > "Business"}}, Panel[
Column[{
Grid[{{
Panel[
Column[{
Style["Input signal 1", Bold, cI1],
Grid[{{"freq.",
Slider[
Dynamic[w1, {None, (w1 = #)& , signals[#, g1, w2, g2]& }], {
0, 7, 0.25}, Appearance > "Labeled"]}}],
Grid[{{"mag.",
Slider[
Dynamic[g1, {None, (g1 = #)& , signals[w1, #, w2, g2]& }], {
0, 5, 0.25}, Appearance > "Labeled"]}}]}]],
Panel[
Column[{
Style["Input signal 2", Bold, cI2],
Grid[{{"freq.",
Slider[
Dynamic[w2, {None, (w2 = #)& , signals[w1, g1, #, g2]& }], {
0, 7, 0.25}, Appearance > "Labeled"]}}],
Grid[{{"mag.",
Slider[
Dynamic[g2, {None, (g2 = #)& , signals[w1, g1, w2, #]& }], {
0, 5, 0.25}, Appearance > "Labeled"]}}]}]]}}],
Grid[{{
Dynamic[
ListLinePlot[
Part[{u1, u2, u1 + u2}, iPart],
PlotStyle > Part[{cI1, cI2, cI}, iPart],
PlotRange > {All, (g1 + g2) {1, 1}}, plotOpts],
TrackedSymbols :> {u1, u2},
Initialization :> (iPart = {3}; g1 = (g2 = 0); Null)],
Dynamic[
ListLinePlot[
Part[
Join[{y1, y2}, {u1 + u2}], ioPart],
PlotStyle > Part[{cB, cC, cI}, ioPart],
PlotRange > {All, (g1 + g2) {1, 1}},
FrameTicks > {{None, All}, {Automatic, Automatic}},
plotOpts], TrackedSymbols :> {y1, y2, u1, u2},
Initialization :> (ioPart = {1, 2})]}, {
Grid[{{
CheckboxBar[
Dynamic[iPart],
Thread[Range[3] > ""], Appearance > "Vertical"], lI}}],
Grid[{{
CheckboxBar[
Dynamic[ioPart],
Thread[Range[3] > ""], Appearance > "Vertical"], lIO}}]}, {
Labeled[
Dynamic[
BodePlot[{brw, cbs}, {0.5, 18}, PlotLayout > "Magnitude",
PlotLegends > None, GridLines > Automatic,
ImageSize > Medium, PlotStyle > {cB, cC},
PlotTheme > "Detailed", Epilog > Join[
mLine[w1, cI1],
mLine[w2, cI2]]], TrackedSymbols :> {w1, w2},
SaveDefinitions > True], "Bode magnitude"],
Labeled[
Dynamic[
BodePlot[{brw, cbs}, {0.5, 18}, PlotLayout > "Phase",
PlotLegends > None, GridLines > Automatic,
ImageSize > Medium, PlotStyle > {cB, cC},
PlotTheme > "Detailed", Epilog > Join[
pLine[w1, cI1],
pLine[w2, cI2]]], TrackedSymbols :> {w1, w2},
SaveDefinitions > True], "Bode phase"]}},
Frame > {All, {True, None, True}},
FrameStyle > Lighter[Gray, 0.6]]}, Center]]]

Since the Arduino needs to be up and running to see the panel update dynamically, I’ll show you some examples of my results here:
Finally, before disconnecting the Arduino, I remove the tasks and close the connection to the device:
✕
TaskRemove /@ {taskObj1, taskObj2};
DeviceClose[dev]

It’s interesting to see in real time how the response of the deployed filter closely matches that of the analog version.
I think it’s extremely convenient to have the design and microcontroller code so closely coupled, the reason being that if I looked at just the microcontroller code, the coefficients of the filters are abstruse, especially compared to those of the simple firstorder lowpass filter. However, the design specifications from which they were obtained are very concrete; thus, having them in one unified place makes it better suited for further analysis and experimentation.
The entire notebook with the filter design, analysis, code generation and visualization is available for download. Of course, you need an Arduino Nano or a similar microcontroller. And with just a few small tweaks—in some cases just changing the target and port name—you can replicate it on a whole host of other microcontrollers. Happy tinkering!
 ↑ 
6. From Data to Insights: An Online Course for Learning the Multiparadigm WorkflowÂò., 17 ñåíò.[−]
Our interactive Multiparadigm Data Science (MPDS) course has been up at Wolfram U for over a month now, and we’re pretty satisfied with the results so far. Hundreds of people have started the course—including students from our first Data Science Boot Camp, who joined us at Wolfram headquarters for a threeweek training camp. Thanks to the success of the boot camp, we have also had several projects submitted for advanced MPDS certification, which will soon be available within the interactive course.
But what exactly does it mean to be a practitioner of MPDS? And how might the multiparadigm approach improve my computational projects? To find out, I decided to try this free course for myself.
About the Course?Taker
My background is pretty broad—I’ve done some technical support, audio engineering and web development, and my BS is in computer science and physics. Though I’ve had some experience with error propagation, linear regression and plenty of programming projects related to data analysis, I have never thought of myself as a data scientist.
As a technical writer for Wolfram, I get to play around with Wolfram Language functions here and there. I have had the pleasure of working on some fascinating pieces that really got my gears turning ( text analytics and geovisualization are currently my favorite coding areas). And recently, I’ve become pretty well acquainted with MPDS and how our technology enables it. The approach resonates with me, and I’ve been thinking about getting more into data science.
I don’t normally have the right schedule for traditional classes or guided webinars. This open, interactive course seemed just right for my situation and learning goals. And it’s free, so of course I jumped at the chance.
Starting with a Concrete Plan
I’ve seen and used a lot of data science functionality before, but mainly in isolation. Getting quick, highlevel output is easy with the Wolfram Language; for me, the challenge is usually figuring out what to do next. So I appreciated having the full workflow laid out up front:
Mapping the stages out this way helps me better understand which functions are useful for which steps. And having a repeatable process makes data science seem more like something I can achieve. Sometimes starting with the right question and following a consistent process can do more to solve a problem than an entire collection of neural nets.
At the same time, the opening section emphasizes that the process is iterative: later stages can generate new information to feed back into the earlier ones. Though continually “starting over” might seem counterintuitive, I see this process as comparable to the Agile approach I was taught for software development. Besides, scientific discovery comes from trying a lot of different things. Why should data science be any different?
Course Highlights
This was my first time taking an online course, so I was glad to see both a transcript and an interactive scratch notebook. I think I finished much faster (and probably learned more) because those made it easy to follow along. As a relentless tinkerer, I found the scratch notebook extra helpful in the more codeheavy sections, because it allowed me to experiment until I understood exactly what each piece of code was doing:
The variety of examples throughout the course was useful for me as well. Seeing MPDS applied to a range of subjects gave me a lot of food for thought. Beyond that, it helped solidify the idea that methods and techniques don’t need to be subjectspecific. For instance, while I’m familiar with using regression analysis as part of a lab experiment, I had never considered how it might apply to estimating a credit score:
Though it was by far the longest section in the course, the fourth section, Assemble a Multiparadigm Toolkit, turned out to be the most informative part for me. As Abrita’s post points out, there are a lot of different tools available, and this gave me a great overview without getting bogged down in unnecessary detail. I especially like that these later videos refer back to the Question stage, pointing out the different questions that might be answered by each technique described. That led to a few big “Ahha!” moments for me.
I also got a lot out of the final section, Getting the Message Across, which has some excellent examples of easy Wolfram Language deployments. Report generation is one functional area I’ve always been curious to explore but have never managed to get the hang of on my own. The examples here worked well as springboards for creating my own reports:
The quiz questions are mostly about Wolfram Language functionality, so I found them pretty straightforward. Even for those less familiar with the language, the videos answer the questions directly. And if you’re having difficulty, you can always review the content and try again. Once I completed all the videos and quizzes, I earned a certificate:
Hungry for More
After finishing the course, I walked away with a lot of questions—but in a good way! I found myself thinking about how I could apply my new understanding to my own computational ideas. How can I represent my data better? What happens if I swap classification methods? Will a different visualization show me new patterns?
That questioning nature is what drives successful exploration and discovery, and it’s a big part of what makes MPDS so effective. Following the multiparadigm workflow can open up new avenues for discovering all kinds of unique insights. But don’t take my word for it—try it for yourself!
 ↑ 
7. The Ease of WolframAlpha, the Power of Mathematica: Introducing WolframAlpha Notebook Edition×ò., 12 ñåíò.[−]
The Next Big Step for WolframAlpha
WolframAlpha has been a huge hit with students. Whether in college or high school, WolframAlpha has become a ubiquitous way for students to get answers. But it’s a oneshot process: a student enters the question they want to ask ( say in math) and WolframAlpha gives them the (usually richly contextualized) answer. It’s incredibly useful—especially when coupled with its stepbystep solution capabilities.
But what if one doesn’t want just a oneshot answer? What if one wants to build up (or work through) a whole computation? Well, that’s what we created Mathematica and its whole notebook interface to do. And for more than 30 years that’s how countless inventions and discoveries have been made around the world. It’s also how generations of higherlevel students have been taught.
But what about students who aren’t ready to use Mathematica yet? What if we could take the power of Mathematica (and what’s now the Wolfram Language), but combine it with the ease of WolframAlpha?
Well, that’s what we’ve done in WolframAlpha Notebook Edition.
It’s built on a huge tower of technology, but what it does is to let any student—without learning any syntax or reading any documentation— immediately build up or work through computations. Just type input the way you would in WolframAlpha. But now you’re not just getting a oneshot answer. Instead, everything is in a Wolfram Notebook, where you can save and use previous results, and build up or work through a whole computation:
The Power of Notebooks
Being able to use WolframAlphastyle freeform input is what opens WolframAlpha Notebook Edition up to the full range of students. But it’s the use of the notebook environment that makes it so uniquely valuable for education. Because by being able to work through things in a sequence of steps, students get to really engage with the computations they’re doing.
Try one step. See what happens. Change it if you want. Understand the output. See how it fits into the next step. And then—right there in the notebook—see how all your steps fit together to give your final results. And then save your work in the notebook, to continue—or review what you did—another time.
But notebooks aren’t just for storing computations. They can also contain text and structure. So students can use them not just to do their computations, but also to keep notes, and to explain the computations they’re doing, or the results they get:
And in fact, Wolfram Notebooks enable a whole new kind of student work: computational essays. A computational essay has both text and computation—combined to build up a narrative to which both human and computer contribute.
The process of creating a computational essay is a great way for students to engage with material they’re studying. Computational essays can also provide a great showcase of student achievement, as well as a means of assessing student understanding. And they’re not just something to produce for an assignment: they’re active computable documents that students can keep and use at any time in the future.
But students aren’t the only ones to produce notebooks. In WolframAlpha Notebook Edition, notebooks are also a great medium for teachers to provide material to students. Describe a concept in a notebook, then let students explore by doing their own computations right there in the notebook. Or make a notebook defining an assignment or a test—then let the students fill in their work (and grade it right there in the notebook).
It’s very common to use WolframAlpha Notebook Edition to create visualizations of concepts. Often students will just ask for the visualizations themselves. But teachers can also set up templates for visualizations, and let students fill in their own functions or data to explore for themselves.
WolframAlpha Notebook Edition also supports dynamic interactive visualizations—for example using the Wolfram Language Manipulate function. And in WolframAlpha Notebook Edition students (and teachers!) can build all sorts of dynamic visualizations just using natural language:
But what if you want some more sophisticated interactive demonstration, that might be hard to specify? Well, WolframAlpha Notebook Edition has direct access to the Wolfram Demonstrations Project, which contains over 12,000 Demonstrations. You can ask for Demonstrations using natural language, or you can just browse the Demonstrations Project website, select a Demonstration, copy it into your WolframAlpha Notebook Edition notebook, and then immediately use it there:
With WolframAlpha Notebook Edition it’s very easy to create compelling content. The content can involve pure calculations or visualizations. But—using the capabilities of the Wolfram Knowledgebase—it can also involve a vast range of realworld data, whether about countries, chemicals, words or artworks. And you can access it using natural language, and work with it directly in a notebook:
WolframAlpha Notebook Edition is a great tool for students to use on their own computers. But it’s also a great tool for lectures and class demonstrations (as well as for student presentations). Go to File > New > Presenter Notebook, and you’ll get a notebook that’s set up to create a WolframAlpha Notebook Edition slide show:
Click Start Presentation and you can start presenting. But what you’ll have is not just a “PowerPointstyle” slide show. It’s a fully interactive, editable, computable slide show. The Manipulate interfaces work. Everything is immediately editable. And you can do computations right there during the presentation, exploring different cases, pulling in different data, and so on.
Making Code from Natural Language
We invented notebooks more than 30 years ago, and they’ve been widely used in Mathematica ever since. But while in Mathematica (and Wolfram Desktop) notebooks you (by default) specify computations in the precise syntax and semantics of the Wolfram Language, in WolframAlpha Notebook Edition notebooks you instead specify them just using freeform WolframAlphastyle input.
And indeed one of the key technical achievements that’s made WolframAlpha Notebook Edition possible is that we’ve now developed increasingly robust naturallanguagetocode technology that’s able to go from the freeform natural language input you type to precise Wolfram Language code that can be used to build up computations:
By default, WolframAlpha Notebook Edition is set up to show you the Wolfram Language code it generates. You don’t need to look at this code (and you can set it to always be hidden). But— satisfyingly for me as a language designer—students seem to find it very easy to read, often actually easier than math. And reading it gives them an extra opportunity to understand what’s going on—and to make sure the computation they’ve specified is actually the one they want.
And there’s a great side effect to the fact that WolframAlpha Notebook Edition generates code: through routinely being exposed to code that represents natural language they’ve entered, students gradually absorb the idea of expressing things in computational language, and the concepts of computational thinking.
If a student wants to change a computation when they’re using WolframAlpha Notebook Edition, they can always edit the freeform input they gave. But they can also directly edit the Wolfram Language that’s been generated, giving them real computational language experience.
What Should I Do Next? The Predictive Interface
A central goal of WolframAlpha Notebook Edition is to be completely “selfservice”—so that students at all levels can successfully use it without any outside instruction or assistance. Of course, freeform input is a key part of achieving this. But another part is the WolframAlpha Notebook Edition Predictive Interface—that suggests what to do next based on what students have done.
Enter a computation and you’ll typically see some buttons pop up under the input field:
These buttons will suggest directions to take. Here stepbystep solution generates an enhanced interactive version of WolframAlpha Pro stepbystep functionality—all right in the notebook:
Click related computations and you’ll see suggestions for different computations you might want to do:
It suggests plotting the integrand and the integral:
It also suggests you might like to see a series expansion:
Now notice that underneath the output there’s a bar of suggestions about possible followon computations to do on this output. Click, for example, coefficient list to find the list of coefficients:
Now there are new suggestions. Click, for example, total to find the total of the coefficients:
The Math Experience
WolframAlpha Notebook Edition has got lots of features to enhance the “math experience”. For example, click the button at the top of the notebook and you’ll get a “math keyboard” that you can use to directly enter math notation:
The Wolfram Language that underlies WolframAlpha Notebook Edition routinely handles the math that’s needed by the world’s top mathematicians. But having all that sophisticated math can sometimes lead to confusions for students. So in WolframAlpha Notebook Edition there are ways to say “keep the math simple”. For example, you can set it to minimize the use of complex numbers:
WolframAlpha Notebook Edition also by default does things like adding constants of integration to indefinite integrals:
By the way, WolframAlpha Notebook Edition by default automatically formats mathematical output in elegant “traditional textbook” form. But it always includes a little button next to each output, so you can toggle between “traditional form”, and standard Wolfram Language form.
It’s quite common in doing math to have a function, and just say “I want to plot that!” But what range should you use? In Mathematica (or the Wolfram Language), you’d have to specify it. But in WolframAlpha Notebook Edition there’s always an automatic range that’s picked:
But since you can see the Wolfram Language code—including the range—it’s easy to change that, and specify whatever range you want.
What if you want to get an interactive control to change the range, or to change a parameter in the function? In Mathematica (or the Wolfram Language) you’d have to write a Manipulate. But in WolframAlpha Notebook Edition, you can build a whole interactive interface just using natural language:
And because in WolframAlpha Notebook Edition the Manipulate computations are all running directly on your local computer, nothing is being slowed down by network transmission—and so everything moves at full speed. (Also, if you have a long computation, you can just let it keep running on your computer; there’s no timeout like in WolframAlpha on the web.)
Multistep Computation
One of the important features of WolframAlpha Notebook Edition is that it doesn’t just do oneshot computations; it allows you to do multistep computations that in effect involve a backandforth conversation with the computer, in which you routinely refer to previous results:
Often it’s enough to just talk about the most recent result, and say things like “plot it as a function x”. But it’s also quite common to want to refer back to results earlier in the notebook. One way to do this is to say things like “the result before last”—or to use the Out[n] labels for each result. But another thing that WolframAlpha Notebook Edition allows you to do is to set values of variables, that you can then use throughout your session:
It’s also possible to define functions, all with natural language:
There are lots of complicated design and implementation issues that arise in dealing with multistep computations. For example, if you have a traditional result for an indefinite integral, with a constant of integration, what do you do with the constant when you want to plot the result? (WolframAlpha Notebook Edition consistently handles arbitrary additive constants in plots by effectively setting them to zero.)
It can also be complicated to know what refers to what in the “conversation”. If you say “plot”, are you trying to plot your latest result, or are you asking for an interface to create a completely new plot? If you use a pronoun, as in “plot it”, then it’s potentially more obvious what you mean, and WolframAlpha Notebook Edition has a better chance of being able to use its natural language understanding capabilities to figure it out.
The World with WolframAlpha Notebook Edition
It’s been very satisfying to see how extensively WolframAlpha has been adopted by students. But mostly that adoption has been outside the classroom. Now, with WolframAlpha Notebook Edition, we’ve got a tool that can immediately be put to use in the classroom, across the whole college and precollege spectrum. And I’m excited to see how it can streamline coursework, deepen understanding, enable new concepts to be taught, and effectively provide a coursebased personal AI tutor for every student.
Starting today, WolframAlpha Notebook Edition is available on all standard computer platforms (Mac, Windows, Linux). (A cloud version will also be available on the web soon.) Colleges and universities with full Wolfram Technology System site licenses can automatically start using WolframAlpha Notebook Edition today; at schools with other site licenses, it can immediately be added. It’s available to K–12 schools and junior colleges in classroom packs, or as a site license. And, of course, it’s also available to individual teachers, students, hobbyists and others.
(Oh, and if you have Mathematica or Wolfram Desktop, it’ll also be possible in future versions to create “WolframAlpha mode” notebooks that effectively integrate WolframAlpha Notebook Edition capabilities. And in general there’s perfect compatibility among WolframAlpha Notebook Edition, Mathematica, Wolfram Desktop, Wolfram Cloud, Wolfram Programming Lab, etc.—providing a seamless experience for people progressing across education and through professional careers.)
Like WolframAlpha—and the Wolfram Language—WolframAlpha Notebook Edition will continue to grow in capabilities far into the future. But what’s there today is already a remarkable achievement that I think will be transformative in many educational settings.
More than 31 years ago we introduced Mathematica (and what’s now the Wolfram Language). A decade ago we introduced WolframAlpha. Now, today, with the release of WolframAlpha Notebook Edition we’re giving a first taste—in the context of education—of a whole new approach to computing: a full computing environment that’s driven by natural language. It doesn’t supplant Wolfram Language, or WolframAlpha—but it defines a new direction that in time will bring the power of computation to a whole massive new audience.
To comment, please visit the copy of this post at Stephen Wolfram Writings »  ↑ 
8. Authorship Forensics from Twitter Data×ò., 05 ñåíò.[−]
A Year Ago Today
On September 5 of last year, The New York Times took the unusual step of publishing an oped anonymously. It began “ I Am Part of the Resistance inside the Trump Administration,” and quickly became known as the “Resistance” oped. From the start, there was wide?ranging speculation as to who might have been the author(s); to this day, that has not been settled. (Spoiler alert: it will not be settled in this blog post, either. But that’s getting ahead of things.) When I learned of this oped, the first thing that came to mind, of course, was, “I wonder if authorship attribution software could….” This was followed by, “Well, of course it could. If given the right training data.” When time permitted, I had a look on the internet into where one might find training data, and for that matter who were the people to consider for the pool of candidate authors. I found at least a couple of blog posts that mentioned the possibility of using tweets from administration officials. One gave a preliminary analysis (with President Trump himself receiving the highest score, though by a narrow margin—go figure). It even provided a means of downloading a dataset that the poster had gone to some work to cull from the Twitter site.
The code from that blog was in a language/script in which I am not fluent. My coauthor on two authorship attribution papers (and other work), Catalin Stoean, was able to download the data successfully. I first did some quick validation (to be seen) and got solid results. Upon setting the software loose on the oped in question, a clear winner emerged. So for a short time I “knew” who wrote that piece. Except. I decided more serious testing was required. I expanded the candidate pool, in the process learning how to gather tweets by author (code for which can be found in the downloadable notebook). At that point I started getting two moreorlessclear signals. Okay, so two authors. Maybe. Then I began to test against opeds of known (or at least stated) authorship. And I started looking hard at winning scores. Looking into the failures, it became clear that my data needed work (to be described). Rinse, repeat. At some point I realized: (1) I had no idea who wrote the anonymous oped; and (2) interesting patterns were emerging from analysis of other opeds. So it was the end of one story (failure, by and large) but the beginning of another. In short, what follows is a case study that exposes both strengths and weaknesses of stylometry in general, and of one methodology in particular.
Stylometry Methodology
In two prior blog posts, I wrote about determining authorship, the main idea being to use data of known provenance coupled with stylometry analysis software to deduce authorship of other works. The first post took up the case of the disputed Federalist Papers. In the second, as a proof of concept I split a number of Wolfram blog posts into training and test sets, and used the first in an attempt to verify authorship in the second. In both of these, the training data was closely related to the test sets, insofar as they shared genres. For purposes of authorship forensics, e.g. discovering who wrote a given threatening email or the like, there may be little or nothing of a similar nature to use for comparison, and so one must resort to samples from entirely different areas. Which is, of course, exactly the case in point.
Let’s begin with the basics. The general idea is to use one or more measures of similarity between works of known authorship to a given text in order to determine which author is the likeliest to have written that text. The “closed” problem assumes it was in fact written by one of the authors of the known texts. In the real world, one deals with the “open” variant, where the best answer might be “none of the above.”
How does one gauge similarity? Various methods have been proposed, the earliest involving collection of various statistics, such as average sentence length (in words and/or characters), frequency of usage of certain words (common and uncommon words both can play important roles here), frequency of certain character combinations and many others. Some of the more recent methods involve breaking sentences into syntactic entities and using those for comparison. There are also semantic possibilities, e.g. using frequencies of substitution words. Overall, the methods thus break into categories of lexical, syntactic and semantic, and each of these has numerous subcategories as well as overlap with the other categories.
To date, there is no definitive method, and very likely there never will be. Even the best class is not really clear. But it does seem that, at this time, certain lexical?based methods have the upper hand, at least as gauged by various benchmark tests. Among these, ones that use character sequence (“ n ?gram”) frequencies seem to perform particularly well—and is the method I use for my analyses.
Origin of the Method in Use
It is an odd duckling, as n?gram methods go. We never really count anything. Instead we turn text samples into images, and use further processing on the images to obtain a classifier. But n?gram frequencies are implicit to the method of creating images. The process extends Joel Jeffrey’s “Chaos Game Representation” (CGR), used for creating images from genetic nucleotide sequences, to alphabets of more than four characters. Explained in the CGR literature is the “Frequency Chaos Game Representation” (FCGR), which creates a pixelated grayscale array for which the darkness of pixels corresponds to n?gram frequencies.
What with having other projects—oh, and that small matter of a day job—I put this post on the back burner for a while. When I returned after a few months, I had learned a few more tricks. The biggest was that I had seen how very simple neural networks might be trained to distinguish numeric vectors even at high dimension. As it happens, the nearest image code I had written and applied to FCGR images creates vectors in the process of dimension reduction of the images. This can be done by flattening the image matrices into vectors and then using the singular values decomposition to retain the “strong” components of the vectorized images. (There is an optional prior dimensionreduction step, using Fourier trig transform and retaining only lowfrequency components. It seems to work well for FCGR images from genomes, but not so well for our extension of FCGR to text.) We then trained neural network classifiers on FCGR images produced from text training sets of several authorship benchmark suites. Our test results were now among the top scores for several such benchmarks.
Obtaining and Cleaning Text from Twitter
As I’ve mentioned, experiments with Twitter data showed some signs of success. I did need to expand the author pool, though, since it seemed like a good idea to include any number of administration officials who were—let us not mince words—being publicly accused by one or another person of having written the oped in question. It also seemed reasonable to include family members (some of whom are advisors in an official capacity), undersecretaries and a few others who are in some sense highranking but not in official appointments. So how to access the raw training data? It turns out that the Wolfram Language has a convenient interface to Twitter. I had to first create a Twitter account (my kids still can’t believe I did this), and then evaluate this code:
✕
twitter = ServiceConnect["Twitter", "New"];

A webpage pops up requiring that I click a permission box, and we’re off and running:
Now I can download tweets for a given tweeter (is that the term?) quite easily. Say I want to get tweets from Secretary of State Mike Pompeo. A web search indicates that his Twitter handle is @SecPompeo. I can gather a sample of his tweets, capping at a maximum of 5,000 tweets so as not to overwhelm the API.
✕
secPompeoTweets =
twitter["TweetList", "Username" > "SecPompeo", MaxItems > 5000];

Tweets are fairly short as text goes, and the methods I use tend to do best with a minimum of a few thousand characters for each FCGR image. So we concatenate tweets into strings of a fixed size, create FCGR images, reduce dimension and use the resulting vectors to train a neural net classifier.
A first experiment was to divide each author’s set of strings into two parts, train on one set and see how well the second set was recognized. The first thing I noticed was that underrepresented authors fared poorly. But this was really restricted to those with training data below a threshold; above that, most were recognized. So I had to exclude a few candidates on the basis of insufficient training data. A related issue is balance, in that authors with a huge amount of training data can sort of overwhelm the classifier, biasing it against authors with more modest amounts of training text. A heuristic I found to work well is to cap the training text size at around three times the minimum threshold. The pool meeting the threshold is around 30 authors (I have on occasion added some, found they do not show any sign of Resistance oped authorship and so removed them). With this pool, a typical result is a recognition rate above 90%, with some parameter settings taking the rate over 95%. So this was promising.
Authorship of OpEds Can Be Trickier Than You Would Expect
When set loose on opeds, however, the results were quite lame. I started looking harder at the raw training data. Lo and behold, it is not so clean as I might have thought. Silly me, I had no idea people might do things with other people’s tweets, like maybe retweet them (yes, I really am that far behind the times). So I needed some way to clean the data.
You can find the full code in the downloadable notebook, but the idea is to use string replacements to remove tweets that begin with “RT”, contain “Retweeted” or have any of several other unwanted features. While at it, I remove the URL parts (strings beginning with “http”). At this point I have raw data that validates quite well. Splitting into training vs. validation gives a recognition rate of 98.9%.
Here’s a plot of the confusion matrix:
(To those for which this term is itself a confusion, this matrix indicates by color strength the positioning of the larger elements. The off?diagonal elements indicate incorrect attributions, and so a strongly colored main diagonal is a sign that things are working well.)
Now we move on to a form of “transference,” wherein we apply a classifier trained on Twitter text to authorship assessment of opeds. The first thing to check is that this might be able to give plausible results. We again do a form of validation, by testing against articles of known authorship. Except we do not really know who writes what, do we? For tweets, it seems likely that the tweeter is the actual owner of the Twitter handle (we will give a sensible caveat on this later); for articles, it is a different matter. Why? We’ll start with the obvious: highlevel elected politicians and appointed officials are often extremely busy. And not all possess professional writing skills. Some have speechwriters who craft works based on draft outlines, and this could well extend to written opinion pieces. And they may have editors and technical assistants who do heavy revising. Ideas may also be shared among two or more people, and contributions on top of initial drafts might be solicited. On occasion, two people might agree that what one says would get wider acceptance if associated with the other. The list goes on, the point being that the claimed authorship can depart in varying degrees from the actual, for reasons ranging from quite legitimate, to inadvertent, to (let’s call it for what it is) shady.
Some Testing on OpEd Pieces
For the actual test data, I located and downloaded numerous articles available electronically on the internet, which appeared under the bylines of numerous highranking officials. As most of these are covered by copyright, I will need to exclude them from the notebook, but I will give appropriate reference information for several that get used explicitly here. I also had to get teaser subscriptions to two papers. These cost me one and two dollars, respectively (I wonder if Wolfram Research will reimburse me—I should try to find out). Early testing showed that tweaking parameters in the neural net training showed tradeoffs in strengths, so the protocol I used involved several classifiers trained with slightly different parameters, and aggregation of scores to determine the outcomes. This has at least two advantages over using just one trained neural net. For one, it tends to lower variance in outcomes caused by the randomness intrinsic to neural net training methods. The other is that it usually delivers outcomes that are near or sometimes even exceed the best of the individual classifiers. These observations came from when Catalin and I tested against some benchmark suites. It seems like a sensible route to take here as well.
Labor Secretary Alexander Acosta
We begin with Secretary of Labor Alexander Acosta. Because they were easy to find and download, I took for the test data two items that are not opeds, but rather articles found at a webpage of White House work. One is from July 23, 2018, called “ Reinvesting in the American Workforce.” The other is “ Trump Is Strengthening US Retirement Security. Here’s How.” from August 31, 2018. The following graph, intentionally unlabeled, shows a relatively high score above 9, a second score around 6.5 and all the rest very close to one another at a bit above 5:
The high score goes to Secretary Acosta. The secondhighest goes to Ivanka Trump. This might indicate a coincidental similarity of styles. Another possibility is she might have contributed some ideas to one or the other of these articles. Or maybe her tweets, like some of her oped articles, overlap with these articles in terms of subject matter; as a general remark, genre seems to play a role in stylometry similarities.
Ivanka Trump
Having mentioned Ivanka Trump, it makes sense to have a look at her work next. First is “ Empower Women to Foster Freedom,” published in The Wall Street Journal on February 6, 2019. The plot has one score way above the others; it attributes the authorship to Ms. Trump:
The next is “ Why We Need to Start Teaching Tech in Kindergarten” from The New York Post on October 4, 2017. The plot now gives Ms. Trump the second top score, with the highest going to Secretary Acosta. This illustrates an important point: we cannot always be certain we have it correct, and sometimes it might be best to say “uncertain” or, at most, “weakly favors authorship by candidate X.” And it is another indication that they share similarities in subject matter and/or style:
Last on the list for Ivanka Trump is “ The Trump Administration Is Taking Bold Action to Combat the Evil of Human Trafficking” from The Washington Post on November 29, 2018. This time the plot suggests a different outcome:
Now the high score goes to (then) Secretary of Homeland Security Kirstjen Nielsen. But again, this is not significantly higher than the low scores. It is clearly a case of “uncertain.” The next score is for Ambassador to the United Nations John Bolton; Ms. Trump’s score is fourth.
Donald Trump Jr.
We move now to Donald Trump Jr. Here, I combine two articles. One is from The Denver Post of September 21, 2018, and the other is from the Des Moines Register of August 31, 2018. And the high score, by a fair amount, does in fact go to Mr. Trump. The second score, which is noticeably above the remaining ones, goes to President Trump:
Secretary of Health and Human Services Alex Azar
Our test pieces are “ Why Drug Prices Keep Going Up—and Why They Need to Come Down” from January 29, 2019, and a USA Today article from September 19, 2018. The graph shows a very clear outcome, and indeed it goes to Secretary Alex Azar:
We will later see a bit more about Secretary Azar.
Secretary of Housing and Urban Development Ben Carson
Here, we can see that a USA Today article from December 15, 2017, is very clearly attributed to Secretary Ben Carson:
I also used “ My American Dream” from Forbes on February 20, 2016, and a Washington Examiner piece from January 18, 2019. In each case, it is Secretary Azar who has the high score, with Secretary Carson a close second on one and a distant second on the other. It should be noted, however, that the top scores were not so far above the average as to be taken as anywhere approaching definitive.
Secretary of Education Betsy DeVos
I used an article from The Washington Post of November 20, 2018. An interesting plot emerges:
The top score goes to Secretary Betsy DeVos, with a close second for Secretary Azar. This might indicate a secondary coauthor status, or it might be due to similarity of subject matter: I now show another article by Secretary Azar that explicitly mentions a trip to a school that he took with Secretary DeVos. It is titled “ Put Mental Health Services in Schools” from August 10, 2018:
In this instance she gets the top classifier score, though Azar’s is, for all intents and purposes, tied. It would be nice to know if they collaborate: that would be much better than, say, turf wars and interdepartmental squabbling.
Classification Failures
I did not include former US envoy Brett McGurk in the training set, so it might be instructive to see how the classifier performs on an article of his. I used one from January 18, 2019 (wherein he announces his resignation); the topic pertains to the fight against the Islamic State. The outcome shows a clear winner (Ambassador to the UN Bolton). This might reflect commonality of subject matter of the sort that might be found in the ambassador’s tweets:
Ambassador Bolton was also a clear winner for authorship of an article by his predecessor, Nikki Haley (I had to exclude Ms. Haley from the candidate pool because the quantity of tweets was not quite sufficient for obtaining viable results). Moreover, a similar failure appeared when I took an article by former president George W. Bush. In that instance, the author was determined to be Secretary Azar.
An observation is in order, however. When the pool is enlarged, such failures tend to be less common. The more likely scenario, when the actual author is not among the candidates, is that there is no clear winner.
Secretary of Transportation Elaine Chao
Here I used a piece in the Orlando Sentinel from February 15, 2018, called “ Let’s Invest in US Future.” Summary conclusion: Secretary Elaine Chao (presumably the person behind the tweets from handle @USDOT) is a very clear winner:
President Donald Trump
An article under President Donald Trump’s name was published in the Washington Post on April 30, 2017. The classifier does not obtain scores sufficiently high as to warrant attribution. That stated, one of the two top scores is indeed from the Twitter handle @realDonaldTrump:
A second piece under the president’s name is a whitehouse.gov statement, issued on November 20, 2018. The classifier in this instance gives a clear winner, and it is President Trump. The distant second is Ambassador Bolton, and this is quite possibly from commonality of subject matter (and articles by the ambassador); the topic is US relations with Saudi Arabia and Iran:
USA Today published an article on October 10, 2018, under the name of President Donald Trump. The classifier gives the lone high score to Secretary Azar. Not surprisingly, there is overlap in subject insofar as the article was about health plans:
Moreover, when I use the augmented training set, the apparent strong score for Secretary Azar largely evaporates. He does get the top score in the following graph, but it is now much closer to the average scores. The close second is Senator Bernie Sanders. Again, I would surmise there is commonality in the subject matter from their tweets:
Some Others in Brief
A couple of pieces published under the name of Ambassador Bolton were classified fairly strongly as being authored by him. Two by (then) Secretary Nielsen were strongly classified as hers, while a third was in the “no clear decision” category. Similarly, an article published by Wilbur Ross is attributed to him, with two others getting no clear winner from the classifier. An article under Mike Pompeo’s name was attributed to him by a fair margin. Another was weakly attributed to him, with the (then) State Department Spokesperson Heather Nauert receiving a second place score well above the rest. Possibly one wrote and the other revised, or this could once again just be due to similarity of subject matter in their respective tweets. One article published under Sonny Perdue’s name is strongly attributed to him, with another weakly so. A third has Donald Trump Jr. and Secretary DeVos on top but not to any great extent (certainly not at a level indicative of authorship on their parts).
I tested two that were published under Director of the National Economic Council Lawrence Kudlow’s name. They had him a close second to the @WhiteHouseCEA Twitter handle. I will say more about this later, but in brief, that account is quite likely to post content by or about Mr. Kudlow (among others). So no great surprises here.
An article published under the name of Vice President Mike Pence was attributed to him by the classifier; two others were very much in the “no clear winner” category. One of those, an oped piece on abortion law, showed an interesting feature. When I augmented the training set with Twitter posts from several prominent people outside the administration, it was attributed to former Ambassador Hillary Clinton. What can I say? The software is not perfect.
Going outside the Current Administration
Two articles published under former president Bill Clinton’s name have no clear winner. Similarly, two articles under former president Barack Obama’s name are unclassified. An article under Bobby Jindal’s name correctly identifies him as the top scorer, but not by enough to consider the classification definitive. An article published by Hillary Clinton has the top score going to her, but it is too close to the pack to be considered definitive. An article published by Senate Majority Leader Mitch McConnell is attributed to him. Three articles published under Rand Paul’s name are attributed to him, although two only weakly so.
An article published under Senator Bernie Sanders’s name is very strongly attributed to him. When tested using only members of the current administration, Secretary Azar is a clear winner, although not in the off?the?charts way that Sanders is. This might, however, indicate some modest similarity in their styles. Perhaps more surprising, an article published by Joe Biden is attributed to Secretary Azar even when tweets from Biden are in the training set (again, go figure). Another is weakly attributed to Mr. Biden. A third has no clear winner.
Some Surprises
Mitt Romney
An oped piece under (now) Senator Mitt Romney’s name from June 24, 2018, and another from January 2, 2019, were tested. Neither had a winner in terms of attribution; all scores were hovering near the average. Either his tweets are in some stylistic ways very different from his articles, or editing by others makes substantial changes. Yet in a Washington Post piece under his name from January 8, 2017, we do get a clear classification. The subject matter was an endorsement for confirmation of Betsy DeVos to be the secretary of education:
The classifier did not, however, attribute this to Mr. Romney. It went, rather, to Secretary DeVos.
The Council of Economic Advisers
At the time of this writing, Kevin Hassett is the chairman of the Council of Economic Advisers. This group has several members. Among them are some with the title “chief economist,” and this includes Casey Mulligan (again at the time of this writing, he is on leave from the University of Chicago where he is a professor of economics, and slated to return there in the near future). We begin with Chairman Hassett. I took two articles under his name that appeared in the National Review, one from March 6, 2017, and the other from April 3, 2017:
The classifier gives the high score to the Twitter handle @WhiteHouseCEA and I had, at the time of testing, believed that to be used by Chairman Hassett.
I also used a few articles by Casey Mulligan. He has written numerous columns for various publications, and has, by his count, several hundred blog items (he has rights to the articles, and they get reposted as blogs not long after appearing in print). I show the result after lumping together three such; individual results were similar. They are from Seeking Alpha on January 4, 2011, a Wall Street Journal article from July 5, 2017, and one in The Hill from March 31, 2018. The classifier assigns authorship to Professor Mulligan here:
The fourth article under Mulligan’s name was published in The Hill on May 7, 2018. You can see that the high dot is located in a very different place:
This time the classifier associates the work with the @WhiteHouseCEA handle. When I first encountered the article, it was from Professor Mulligan’s blog, and I thought perhaps he had a posted a guest blog. That was not the case. This got my curiosity, but now I had a question to pose that might uncomfortably be interpreted as “Did you really write that?”
In addition to motive, I also had opportunity. As it happens, Casey Mulligan makes considerable (and sophisticated) use of Mathematica in his work. I’ve seen this myself, in a very nice technical article. He is making productive use of our software and promoting it among his peers. Time permitting, he hopes to give a talk at our next Wolfram Technology Conference. He is really, really smart, and I don’t know for a certainty that he won’t eat a pesky blogger for lunch. This is not someone I care to annoy!
A colleague at work was kind enough to send a note of mutual introduction, along with a link to my recent authorship analysis blog post. Casey in turn agreed to speak to me. I began with some background for the current blog. He explained that many items related to his blogs get tweeted by himself, but some also get tweeted from the @WhiteHouseCEA account, and this was happening as far back as early 2018, before his leave for government policy work in DC. So okay, mystery solved: the Twitter handle for the Council of Economic Advisers is shared among many people, and so there is some reason it might score quite high when classifying work by any one of those people.
Anonymous OpEd Pieces
The “Resistance” OpEd
I claimed from the outset that using the data I found and the software at my disposal, I obtained no indication of who wrote this piece. The plot shows this quite clearly. The top scores are nowhere near far enough above the others to make any assessment, and they are too close to one another to distinguish. Additionally, one of them goes to President Trump himself:
When I test against the augmented training set having Senators Sanders, McConnell and others, as well as Ambassador Clinton and Presidents Clinton and Obama, the picture becomes more obviously a “don’t know” scenario:
Scaling notwithstanding, these scores all cluster near the average. And now Senators Graham and Sanders are at the top.
“I Hope a Long Shutdown Smokes Out the Resistance”
On January 14, 2019, The Daily Caller published an oped that was similarly attributed to an anonymous senior member of the Trump administration. While it did not garner the notoriety of the “Resistance” oped, there was speculation as to its authorship. My own question was whether the two might have the same author, or if they might overlap should either or both have multiple authors.
The plot shows similar features to that for the “Resistance” oped. And again, there is no indication of who might have written it. Another mystery.
I will describe another experiment. For test examples, I selected several opeds of known (or at least strongly suspected) authorship and also used the “Resistance” and “Shutdown” pieces. I then repeatedly selected between 8 and 12 of the Twitter accounts and used them for training a classifier. So sometimes a test piece would have its author in the training set, and sometimes not. In almost all cases where the actual author was represented in the training set, that person would correctly be ascribed the author of the test piece. When the correct author was not in the training set, things got interesting. It turns out that for most such cases, the training authors with the top probabilities overlapped only modestly with those for the “Resistance” oped; this was also the case for the “Shutdown” piece. But the ascribed authors for those two pieces did tend to overlap heavily (with usually the same top two, in the same order). While far from proof of common authorship, this again indicates a fair degree of similarity in style.
Parting Thoughts
This has been something of an educational experience for me. On the one hand, I had no sound reason to expect that Twitter data could be at all useful for deducing authorship of other materials. And I have seen (and shown) some problematic failures, wherein an article is strongly attributed to someone we know did not write it. I learned that common subject matter (“genre,” in the authorship attribution literature) might be playing a greater role than I would have expected. I learned that a candidate pool somewhat larger than 30 tends to tamp down on those problematic failures, at least for the subject matter under scrutiny herein. Maybe the most pleasant surprise was that the methodology can be made to work quite well. Actually, it was the second most pleasant surprise—the first being not having been eaten for lunch in that phone conversation with the (thankfully kindly disposed) economics professor.
Analyze tweets and more with WolframOne, the first fully clouddesktop hybrid, integrated computation platform. 
Get free trial

 ↑ 
9. Wolfram Cloud 1.50 and 1.51: Major New Releases Bring Cloud Another Step Closer to the Desktop×ò., 29 àâã.[−]
A couple weeks ago, we released Version 1.51 of the Wolfram Cloud. We’ve made quite a few significant functionality improvements even since 1.50—a major milestone from many months of hard work—as we continue to make cloud notebooks as easy and powerful to use as the notebooks on our desktop clients for WolframOne and Mathematica. You can read through everything that’s new in 1.51 in the detailed release notes. After working on this version through to its release, I’m excited to show off Wolfram Cloud 1.51—I’ve put together a few of the highlights and favorite new features for you here.
Cloud 1.50 Recap
But before we get into what’s new, let’s first catch up on what Version 1.51 is building on. The release of Wolfram Cloud 1.50 in May was an incredibly massive effort, especially for cloud notebooks. Here are some of our major accomplishments from 1.50:
 License modernization (simplification of products and subscriptions)
 New URL structure (/obj for deployed view, /env for editable view)
 DynamicGeoGraphics is supported in the cloud
 Improvements to serverside rendering of cloud notebooks (the HTML cache)
 CodeAssistOptions are implemented in the cloud
 The cell insertion menu design matches the desktop
 Autocompletion for special characters works in text cells as well as input cells
 ListPicker is modernized in the cloud
 New frontend web fonts are supported in the cloud
 Internal and Wolfram Enterprise Private Cloud (EPC) users can disable embeddedview branding via the Wolfram Language
 … and much, much more
ClicktoCopy for Noneditable Cells
In cloud notebooks, output cells are generally not editable, and selections within them didn’t work very well because browsers had a hard time dealing with their structure. To make it easy to still “grab” some output, we added clicktocopy in 1.51: you only need to click an output cell to copy its whole content to the clipboard. You can then paste it into another cell, cloud notebook, desktop or any other application (rich structures might not be preserved in other applications, though).
Clicktocopy is disabled on controls that are interactive by themselves, e.g. buttons, sliders and graphics (which you can select individually), in order not to interfere with that interactivity.
Of course, the longerterm goal is to actually support the same granular selection mechanism as on the desktop.
Evaluate in Place
You can evaluate expressions “in place” now, just like in desktop notebooks. Just select a (sub) expression in an input cell and press +Shift+ (on Windows) or + (on Mac) to evaluate it, replacing the original expression with the result from the computation.
Since cloud notebooks don’t yet support the featurerich “twodimensional” input found on the desktop, we added a new control that allows you to enter a textual representation of various typesetting constructs (what we call InputForm) and turn it into the corresponding twodimensional boxes. Press +Shift+1 (or, put differently: +!) to bring up the input field, and press to “resolve” it into its typeset form. It works both in input cells (with the right evaluation semantics) and textual cells.
You can enter x/y for a fraction, x^y for superscripts, Sqrt[x] for square roots and various other forms. Clicking the resulting output brings you back to the edit mode.
The control is quite similar to += for freeform linguistic input in the Wolfram Language. We’re working on another similar control that would allow entering T_{E}X syntax.
Inline Cell Group Openers and Closers
We now have the same cell group openers and closers as on the desktop ( Version 12). Closed cell groups have a little chevron attached to them to open the group, which makes the content within them easier to discover. If you use that opener, there will also be a corresponding closer.
Improved Support for Special Unicode Characters
Cloud notebooks are now better at rendering nonplane0 Unicode characters—including emojis! If you know their hexadecimal code point, you can enter them using the special \xxxxxx notation, or you might use some builtin OS functionality (e.g. ++ on Mac). They work in text as well as in computations.
✕
\01f329=(\01f600+\01f600+\01f600)*17

However, extended Unicode support isn’t quite perfect yet: ToCharacterCode still splits up nonplane0 characters into surrogate pairs (we’re currently working on a fix, which will come in Wolfram Cloud 1.52), and the editor cursor can be moved “inside” emojis (this will be improved in the longer term).
There are many other new features, and we fixed quite a few bugs as well. Check out the release notes for all the details! We always love hearing from our users, so let us know in the comments section if you have any questions or suggestions. You can also join us on Wolfram Community to continue the discussion about cloud news and developments. We’re continuing our hard work improving the cloud even after 1.51, so keep an eye out for 1.52, coming soon!
 ↑ 
10. Embracing Uncertainty: Better Model Selection with Bayesian Linear Regression×ò., 22 àâã.[−]
Readers who follow the Mathematica Stack Exchange (which I highly recommend to any Wolfram Language user) may have seen this post recently, in which I showed a function I wrote to make Bayesian linear regression easy to do. After finishing that function, I have been playing around with it to get a better feel of what it can do, and how it compares against regular fitting algorithms such as those used by Fit. In this blog post, I don’t want to focus too much on the underlying technicalities (check out my previous blog post to learn more about Bayesian neural network regression); rather, I will show you some of the practical applications and interpretations of Bayesian regression, and share some of the surprising results you can get from it.
Getting Started
The easiest way to get my BayesianLinearRegression function is through my submission to the Wolfram Function Repository. To use this version of the function with the code examples in this blog, you can evaluate the following line, which creates a shortcut for the resource function:
✕
BayesianLinearRegression = ResourceFunction["BayesianLinearRegression"]

You can also visit the GitHub repository and follow the installation instructions to load the BayesianInference package with the following:
✕
<< BayesianInference`

Alternatively, you can get the standalone source file of BayesianLinearRegression and evaluate it to make the function available to you, though the function regressionPlot1D I use later will not work in the following examples if you don’t get the full BayesianInference package. Its definition is in the BayesianVisualisations.wl file.
Back to Basics
I want to do something that will be very familiar to most people with some background in data fitting: polynomial regression. I could have picked something more complicated, but it turns out that the Bayesian take on fitting data adds quite a bit of depth and new possibilities even to something as simple as fitting polynomials, making it an ideal demonstration example.
I picked the example data here specifically because it hints at a straight trend, while still leaving some doubt:
✕
data = CompressedData["
1:eJxTTMoPSmViYGAQAWIQDQE/9kPobxC64SuUz3wAKm8Poc9A6IYPUP4zKP0A
qv4xlP4HoQ+wQPX/gqr7DNXPcABFHcNHmDlQ+g5U/BKUfoFGMzpA6Bsw9Wju
+Yqm/hOUPgOln0DV3YLSF2Dut0d11y8o/QFKv9qPat+O/QBd7zpq
"];
plot = ListPlot[data, PlotStyle > Red]

So let’s start off by doing the first sensible thing to do: fit a line through it. Concretely, I will fit the model , where NormalDistribution[0,?]. BayesianLinearRegression uses the same syntax as LinearModelFit, but it returns an association containing all relevant information about the fit:
✕
fit = N@BayesianLinearRegression[data, {1, \[FormalX]}, \[FormalX]];

I will focus most on explaining the posterior distributions and the logevidence. The logevidence is a number that indicates how well the model fits with the data:
✕
fit["LogEvidence"]

Is this value good or bad? We can’t tell yet: it only means something when compared to the logevidences of other fits of the same data. I will come back to model comparison in the section on the Bayesian Occam’s razor, but let’s first take a look in more detail at the linear fit we just computed. The fit association has a key called "Posterior" that contains a number of useful probability distributions:
✕
KeyValueMap[List, Short /@ fit["Posterior"]] // TableForm

In Bayesian inference, the word “posterior” refers to a state of knowledge after having observed the data (as opposed to “prior,” which refers to the state where you do not know the data yet). The posterior distribution of the regression parameters tells us how well the parameters and are constrained by the data. It’s best visualized with a ContourPlot:
✕
With[{coefficientDist =
fit["Posterior", "RegressionCoefficientDistribution"]},
ContourPlot[
Evaluate[PDF[coefficientDist, {\[FormalA], \[FormalB]}]],
{\[FormalA], 1, 1},
{\[FormalB], 0, 1},
PlotRange > {0, All}, PlotPoints > 20, FrameLabel > {"a", "b"},
ImageSize > 400,
PlotLabel > "Posterior PDF of regression coefficients"
]
]

The slant of the ellipses in the contour plot shows that there seems to be some positive correlation between and . You can calculate exactly how much correlation there is between them with Correlation:
✕
Correlation[
fit["Posterior", "RegressionCoefficientDistribution"]] // MatrixForm

At first glance, this may seem strange. Why are and correlated with each other? One way to look at this is to consider how the fit changes when you force one of the two coefficients to change. For example, you can fix and then try and find the best value of that fits the data with FindFit. The following illustrates this thought experiment:
✕
With[{dat = data, p = plot},
Animate[
With[{fit =
a + \[FormalB] \[FormalX] /.
FindFit[dat, a + \[FormalB] \[FormalX], \[FormalB], \[FormalX]]},
Show[
Plot[
fit,
{\[FormalX], 2.5, 2.5},
PlotLabel > \[FormalY] == fit,
PlotRange > {2.5, 2.5}, PlotStyle > Directive[Black, Dashed]
],
p
]
],
{{a, 0.}, 1, 1},
TrackedSymbols :> {a},
AnimationDirection > ForwardBackward
]
]

Intuitively, there is correlation between and because the center of mass of the points is slightly to the bottom left of the origin:
This means that when increases, also needs to increase so that it can still catch the center of mass of the cloud. Here I’ve moved all points even further down and left to amplify the effect:
✕
With[{dat = data  1},
Animate[
With[{fit =
a + \[FormalB] \[FormalX] /.
FindFit[dat, a + \[FormalB] \[FormalX], \[FormalB], \[FormalX]]},
Show[
Plot[
fit,
{\[FormalX], 3.5, 1.5},
PlotLabel > \[FormalY] == fit,
PlotRange > {3.5, 1.5}, PlotStyle > Directive[Black, Dashed]
],
ListPlot[dat, PlotStyle > Red]
]
],
{{a, 0.}, 1, 1},
TrackedSymbols :> {a},
AnimationDirection > ForwardBackward
]
]

Another way to think of the posterior distribution over and is to think of the fit as consisting of infinitely many lines, drawn at random across the page and with each line weighted according to how well it fits the data. Calculating that weight is essentially what Bayesian inference is about. In the following plot, I used RandomVariate to draw many lines from the posterior. By making the opacity of each individual line quite low, you can see the regions where the lines tend to concentrate:
✕
lines = With[{x1 = 0, x2 = 1},
Apply[
InfiniteLine[{{x1, #1}, {x2, #2}}] &,
RandomVariate[
fit["Posterior", "RegressionCoefficientDistribution"],
2000].{{1, 1}, {x1, x2}},
{1}
]
];
Show[
plot,
Graphics[{Black, Opacity[40/Length[lines]], Thickness[0.002], lines}],
PlotRange > {{2.5, 2.5}, All}
]

The distribution of this cloud of lines is what I like to call the “underlying value distribution,” which is one of the distributions returned by the fit. To plot it, it’s convenient to use InverseCDF to calculate quantiles of the distribution. In the next example, I plotted the 5, 50 and 95percent quantiles, meaning that you’d expect 90% of the lines to fall within the shaded areas:
✕
With[{valueDist = fit["Posterior", "UnderlyingValueDistribution"],
bands = Quantity[{95, 50, 5}, "Percent"]},
Show[
Plot[Evaluate@InverseCDF[valueDist, bands], {\[FormalX], 5, 5},
Filling > {1 > {2}, 3 > {2}}, PlotLegends > bands,
Prolog > {Black, Opacity[15/Length[lines]], Thickness[0.002],
lines}
],
plot,
PlotRange > All,
PlotLabel > "Posterior distribution of underlying values"
]
]

BayesianLinearRegression also estimates the standard deviation of the error term , and just like the regression coefficients and , it has its own distribution. Variance follows an InverseGammaDistribution, so to get the distribution of I use TransformedDistribution:
✕
Quiet@Plot[
Evaluate@PDF[
TransformedDistribution[
Sqrt[\[FormalV]], \[FormalV] \[Distributed]
fit["Posterior", "ErrorDistribution"]],
\[FormalX]],
{\[FormalX], 0, 1.5},
PlotRange > {0, All},
Filling > Axis, Frame > True, FrameLabel > {\[Sigma], "PDF"},
PlotLabel > \[Sigma]^2 \[Distributed]
fit["Posterior", "ErrorDistribution"]
]

So in short, the posterior error terms are distributed as NormalDistribution[0, σ], where InverseGammaDistribution[10.005,3.09154]. This distribution we can calculate with ParameterMixtureDistribution:
✕
epsDist =
ParameterMixtureDistribution[
NormalDistribution[0, Sqrt[\[FormalV]]], \[FormalV] \[Distributed]
InverseGammaDistribution[10.005`, 3.091536049663807`]];
Plot[PDF[epsDist, \[FormalX]], {\[FormalX], 2, 2}, Filling > Axis,
PlotRange > {0, All}, Frame > True,
PlotLabel > \[Epsilon] \[Distributed] epsDist]

Uncertain Uncertainties
Just to make the point clear: there is uncertainty about the uncertainty of the model, which for me was an idea that was a bit difficult to get my head around initially. To make things even more complicated, all of the lines I mentioned previously also have distributions of error bars that correlate with and . So if you want to make a prediction from this model, you need to consider an infinite number of trend lines that each carry an infinite number of error bars. That’s a lot of uncertainty to consider!
Thinking about regression problems this way makes it clear why Bayesian inference can be a daunting task that involves lots of complicated integrals. For linear regression, though, we’re fortunate enough that it’s possible to do all of these integrals symbolically and plow our way through the infinities. This is one of the reasons why scientists and engineers like linear mathematics so much: everything remains tractable, with nicelooking formulas.
The distribution that tells you where to expect to find future data (taking all of the aforementioned uncertainties into consideration) is called the posterior predictive distribution. For the data I just fitted, it looks like this:
✕
fit["Posterior", "PredictiveDistribution"]

Now I will plot this distribution using the function regressionPlot1D from the Git repository, which is just shorthand for the underlying values plot I showed earlier. I also included the distribution you get when you make a “point estimate” of the predictive model. This means that you take the best values of , and from the posterior and plot using those values as if they were completely certain. Point estimates can be useful in making your results more manageable since probability distributions can be difficult to work with, but reducing your distributions to single numbers always discards information. Here I used the Mean (expected value) to reduce the distributions to single values, but you could also use other measures of centrality like the median or mode:
✕
With[{
predictiveDist = fit["Posterior", "PredictiveDistribution"],
pointEst = NormalDistribution[
Mean@fit["Posterior",
"RegressionCoefficientDistribution"].{1, \[FormalX]},
Mean[TransformedDistribution[
Sqrt[\[FormalV]], \[FormalV] \[Distributed]
fit["Posterior", "ErrorDistribution"]]]
],
bands = {0.95, 0.5, 0.05}
},
Show[
regressionPlot1D[predictiveDist, {\[FormalX], 5, 5}],
regressionPlot1D[pointEst, {\[FormalX], 5, 5}, PlotStyle > Dashed,
PlotLegends > {"Point estimate"}],
plot,
PlotRange > All,
PlotLabel > "Posterior predictive distribution"
]
]

So that covers fitting a straight line through the points. However, the shape of the data suggests that maybe a quadratic fit is more appropriate (I cooked up the example just for this purpose, of course), so this is a good moment to demonstrate how Bayesian model comparison works. I am going to fit polynomials up to degree 5 to the data, with each of these polynomials representing a model that could explain the data. From this microuniverse of possible explanations of the data, I want to select the best one.
The Bayesian Occam’s Razor
In Bayesian inference, the models have probability distributions in the same way that the regression coefficients , and have distributions. The posterior probability of a model depends on two factors:
 A quantity commonly referred to as evidence or marginal likelihood. This measures how well a model fits the data while taking into account the uncertainties in the regression parameters such as , and . BayesianLinearRegression reports this quantity as the "LogEvidence": the higher the logevidence, the better the model. The evidence automatically accounts for model complexity due to an effect that is sometimes referred to as the Bayesian Occam’s razor (see, for example, chapter 28 in this book by David MacKay).
 The prior probability of the model.
Usually you’ll only want to consider models that, before seeing the data, you consider approximately equally likely. However, sometimes you do have strong prior information available that the data was generated by a certain model, and you’d only accept other explanations if the evidence overwhelmingly disproves that model in favor of another.
For example, if the data shown previously came from a noisy (and possibly biased) measurement of current through a standard resistor as a function of voltage, I would have little doubt that Ohm’s law applies here and that fitting a straight line is the correct way to interpret the data. It’s simply more likely that a little noise accidentally made the data look quadratic rather than that Ohm’s law suddenly stopped working in my experiment. In this case, I would assign a prior probability very close to 1 to Ohm’s law and divide the remaining sliver of probability among competing models I’m willing to consider. This is in accordance with a principle called Cromwell’s rule, which was coined by Dennis Lindley:
“Leave a little probability for the Moon being made of green cheese; it can be as small as 1 in a million, but have it there since otherwise an army of astronauts returning with samples of the said cheese will leave you unmoved.”
So let’s see how the fits of different polynomials look and what their evidence is:
✕
models = AssociationMap[
BayesianLinearRegression[data, \[FormalX]^
Range[0, #], \[FormalX]] &,
Range[0, 5]
];
Multicolumn[
KeyValueMap[
Show[
regressionPlot1D[#2["Posterior",
"PredictiveDistribution"], {\[FormalX], 5, 5}],
plot,
PlotRange > All,
PlotLabel >
StringForm[
"Degree: `1`\nLog evidence: `2`", #1, #2["LogEvidence"] ]
] &,
models
],
2
]

As you can see, the first and seconddegree fits are in close competition for being the best model. If we assume that the prior probabilities for all models are the same, we can calculate the relative probabilities of the fits by exponentiating the logevidences and normalizing them. For this operation, we can borrow SoftmaxLayer from the neural network framework:
✕
calculateRelativeProbabilities[models_] := Module[{
probabilities ,
temp = SortBy[#LogEvidence &]@models
},
probabilities =
SoftmaxLayer[]@
Replace[temp[[All, "LogEvidence"]],
assoc_?AssociationQ :> Values[assoc]];
temp[[All, "Probability"]] = probabilities;
temp[[All, "AccumulatedProbability"]] = Accumulate[probabilities];
temp
];

✕
models = calculateRelativeProbabilities[models];
Dataset[models][All, {"LogEvidence", "Probability",
"AccumulatedProbability"}]

In this table, I sorted the models by probabilities and then accumulated them so you can easily see how much total probability you’ve covered going from top to bottom. Keep in mind that these probabilities only mean anything in the microuniverse we’re currently looking at: the moment I would start adding more models, all of these probabilities could change. In Bayesian inference, you can only compare models you’re willing to formulate in the first place: there is no universal measure that tells you if the fit of the data is actually good or not in some completely objective sense. All you can really hope for is to find the best model(s) from the group you’re considering because the set of models you’re not considering is impossible to compare against.
Interlude: Comparing against LinearModelFit
At this point, it’s also good to compare the fits by BayesianLinearRegression against those by LinearModelFit, which also presents several goodnessoffit measures:
✕
standardFits = AssociationMap[
LinearModelFit[
data, \[FormalX]^Range[0, #], \[FormalX],
ConfidenceLevel > 0.90 (*
To agree with our earlier choice of 95% and 5% prediction bands *)
\
] &,
Range[0, 5]
];

✕
With[{
keys = {"BIC", "AIC", "AICc", "AdjustedRSquared", "RSquared"}
},
AssociationThread[keys, #[keys]] & /@ standardFits // Dataset
]

As expected, the measure goes up with the degree of the model since it does not penalize for model complexity. All the other measures seem to favor the secondorder model: keep in mind that better models have lower values for the Bayesian information criterion (or BIC, which is an approximation to the negative logevidence), Akaike information criterion (AIC) and AIC corrected (for small sample size). For adjusted , the highest value indicates the best model. Let’s also compare the prediction bands from LinearModelFit with the Bayesian ones for the secondorder model:
✕
With[{n = 2, xmin = 5, xmax = 5},
Show[
regressionPlot1D[
models[n, "Posterior", "PredictiveDistribution"], {\[FormalX],
xmin, xmax}, {95, 5}],
Plot[
Evaluate[Reverse@standardFits[n]["SinglePredictionBands"]],
{\[FormalX], xmin, xmax},
Filling > {1 > {2}}, PlotStyle > Dashed,
PlotLegends > {"LinearModelFit"},
PlotLabel > "Comparison of prediction bands"
],
plot
]
]

As you can see, the confidence bands from LinearModelFit are slightly wider (and therefore more pessimistic) than the posterior prediction bands. The main reason for this difference is that the Bayesian prediction bands take into account all correlations in the posterior distribution between the model parameters and propagates these correlations into the predictions to narrow them down a little bit further. Another way to put it is that the Bayesian analysis does not discard information prematurely when calculating the prediction bands because it retains all intermediate distributions fully. This effect becomes more pronounced when less data is available, as the following fits illustrate (if you’re skeptical about the role of the Bayesian prior in this example, I recommend you try this for yourself with decreasingly informative priors):
✕
BlockRandom@
Module[{dat, fits, priorScale = 1/100000, ndat = 5,
bands = Quantity[{95, 5}, "Percent"]},
SeedRandom[2];
dat = RandomVariate[BinormalDistribution[0.7], ndat];
fits = {
InverseCDF[
BayesianLinearRegression[dat, \[FormalX], \[FormalX],
"PriorParameters" > <"B" > {0, 0},
"Lambda" > priorScale {{1, 0}, {0, 1}}, "V" > priorScale,
"Nu" > priorScale>
]["Posterior", "PredictiveDistribution"],
bands
],
Reverse@
LinearModelFit[dat, \[FormalX], \[FormalX],
ConfidenceLevel > 0.90]["SinglePredictionBands"]
};
Show[
Plot[Evaluate@fits[[1]], {\[FormalX], 2, 2},
Filling > {1 > {2}}, PlotLegends > bands],
Plot[Evaluate@fits[[2]], {\[FormalX], 2, 2},
Filling > {1 > {2}}, PlotStyle > Dashed,
PlotLegends > {"LinearModelFit"}],
ListPlot[dat, PlotStyle > Red]
]
]

Mixing It Up
So now we’re in a position where we have several polynomial models, with two of them competing for the first position and no very clear winner. Which one do we choose? The Bayesian answer to this question is simple: why not both? Why not all? We’re still working from a probabilistic perspective: the truth is simply somewhere in the middle, and there’s no need to make a definite choice. Picking a model is also a form of point estimate, and we’ve seen before that point estimates discard potentially useful information. Instead, we can just split the difference in the same way that we did earlier, by averaging out over all possible values of , and while fitting a straight line to the data. The function MixtureDistribution is of great use here to combine the different posterior predictions into a new distribution. It is not even necessary to consider only the first and secondorder models: we can just combine all of them by weight:
✕
mixDists = MixtureDistribution[
Values@models[[All, "Probability"]],
Values@models[[All, "Posterior", #]]
] & /@ {"PredictiveDistribution",
"UnderlyingValueDistribution"};
mixDists
Show[
regressionPlot1D[mixDists[[1]], {\[FormalX], 4, 4}],
regressionPlot1D[mixDists[[2]], {\[FormalX], 4, 4},
PlotLegends > {"Underlying value"}, PlotStyle > Dashed],
plot,
PlotRange > All,
PlotLabel > "Predictions by mixture model up to degree 5"
]

(Note the custom compact formatting of MixtureDistribution from the BayesianInference package.) The underlying value bands highlight the hybridization of models particularly well.
And why even stop there? There are more models we could explore in the polynomial universe, so let’s expand the horizons a bit. For example, why not try a fit like (i.e. without the constant offset)? There are 63 different polynomials of up to order 5 to try:
✕
polynomialBases = DeleteCases[{}]@Subsets[\[FormalX]^Range[0, 5]];
Short[polynomialBases]
Length[polynomialBases]

Here is a function that makes fitting these models a little easier and faster:
✕
fitModels[data_, modelBases_, vars_] := Module[{
models
},
models = Reverse@SortBy[#LogEvidence &]@ParallelMap[
BayesianLinearRegression[data, #, vars,
IncludeConstantBasis > False] &,
modelBases,
Method > "CoarsestGrained"
];
calculateRelativeProbabilities[models]
];

Fit all possible polynomials up to degree 5, and view the 10 best fits:
✕
models2 = fitModels[data, polynomialBases, \[FormalX]];
Dataset[models2][
;; 10,
{"LogEvidence", "Probability", "AccumulatedProbability", "Basis"}
]

I can now calculate the prediction bands again, but it’s not really necessary to include all models in the mixture since only a few carry any significant weight. To make the distribution more manageable, I will throw away the least likely models that together account for percent of the total probability mass:
✕
selection = Take[models2,
UpTo[LengthWhile[models2, #AccumulatedProbability <= 0.99 &] + 1]
];
With[{
mixDist = MixtureDistribution[
selection[[All, "Probability"]],
selection[[All, "Posterior", #]]
] & /@ {"PredictiveDistribution",
"UnderlyingValueDistribution"}
},
Show[
regressionPlot1D[mixDist[[1]], {\[FormalX], 4, 4}],
regressionPlot1D[mixDist[[2]], {\[FormalX], 4, 4},
PlotLegends > {"Underlying value"}, PlotStyle > Dashed],
plot,
PlotRange > All,
PlotLabel > StringForm[
"Mixture model over `1` most likely polynomials (degree \
\[LessEqual] 5)",
Length[selection]
]
]
]

It is interesting to see that models with no constant offset are strongly preferred, leading to a very narrow estimate of the underlying value near the origin. The extrapolation bands have also become wider than before, but that’s just a warning about the dangers of extrapolation: if we really have the prior belief that all of these polynomials are equal contenders for explaining the data, then the wide extrapolation bands are a natural consequence of that assumption since there are so many different possible explanations of the data. It’s better than the alternative: thinking you can extrapolate very accurately and then being proven wrong later, after having made important decisions based on false accuracy.
Additionally, it’s interesting to consider what we now know about the regression coefficients of the basis functions of our fit. In the following code, I compute the MarginalDistribution of each component of the regression coefficient distributions and visualize their credible intervals:
✕
marginals =
With[{zeroDist =
ProbabilityDistribution[
DiracDelta[\[FormalX]], {\[FormalX], \[Infinity], \
\[Infinity]}]},
Association@Table[
basefun > MixtureDistribution[
selection[[All, "Probability"]],
Map[
If[MemberQ[#Basis, basefun],
MarginalDistribution[
#["Posterior", "RegressionCoefficientDistribution"],
First@FirstPosition[#Basis, basefun]
],
zeroDist
] &,
selection
]
],
{basefun, \[FormalX]^Range[0, 5]}
]
];
intervalPlot[assoc_, bands_] := ListPlot[
MapAt[Thread[Callout[#, Keys[assoc]]] &, Length[bands]]@Transpose[
KeyValueMap[
Thread@Tooltip[Quiet@InverseCDF[#2, bands], #1] &,
assoc
]
],
Filling > {1 > {2}, 3 > {2}}, PlotLegends > bands,
Ticks > {None, Automatic}, PlotRange > All
];
Show[intervalPlot[marginals, Quantity[{99, 50, 1}, "Percent"]],
PlotLabel > "Credible intervals of coefficients"]

Compare this against the credible intervals you get when you fit a single fifthdegree polynomial rather than 63 different ones, which is significantly less informative:
✕
Module[{fit5 =
BayesianLinearRegression[data, \[FormalX]^Range[0, 5], \[FormalX]],
marginals},
marginals = AssociationMap[
MarginalDistribution[
fit5["Posterior", "RegressionCoefficientDistribution"], # +
1] &,
Range[0, 5]
];
Show[intervalPlot[marginals, Quantity[{99, 50, 1}, "Percent"]],
PlotLabel >
"Credible intervals of coefficients of single polynomial"]
]

The first plot shows us we can be quite sure that the and terms in the polynomial are 0, and we now also know the signs of the , and terms, but we can only come to these conclusions by trying out all possible polynomials and comparing them against each other.
This procedure is also quite different from another popular method for making fitting terms disappear, which is called LASSO FitRegularization:
✕
Fit[data, \[FormalX]^Range[0, 5], \[FormalX],
FitRegularization > {"LASSO", 10}]
Show[Plot[%, {\[FormalX], 2.5, 2.5}], plot]

Interestingly, the LASSO penalty used here discards the quadratic term rather than the cubic one, but the result depends on the strength of the regularization strength (which I somewhat arbitrarily set to 10). The main difference between these two procedures is that at no point during the Bayesian analysis did I have to apply an artificial penalty for models with more terms: it is a result that simply drops out after considering all the possibilities and working out their probabilities.
Find Your Own Fit
I hope that this exploration of Bayesian regression was as useful for you to read as it was for me to write. In case you’re interested in the underlying mathematics used by BayesianLinearRegression, you can read more about Bayesian linear regression, Bayesian multivariate regression and conjugate priors.
Check out BayesianLinearRegression in the Wolfram Function Repository or download the BayesianInference package, and see how you can use Bayesian regression to better model your own data. I’m excited to see the different ways you’re using BayesianLinearRegression—share your results and tips on Wolfram Community!
 ↑ 
Powered by
 