Wolfram Blog., 16 .
News, views, and ideas from the front lines at Wolfram Research.

 
 
1. Envisioning City Spaces, Aligning DNA Sequences, Classifying Emotional Speech and More: Wolfram Community Highlights., 10 .[−]

Envisioning City Spaces, Aligning DNA Sequences, Classifying Emotional Speech and More: Wolfram Community HighlightsEnvisioning City Spaces, Aligning DNA Sequences, Classifying Emotional Speech and More: Wolfram Community Highlights

In this roundup of our recent Wolfram Community favorites, our talented users explore different methods of accessing, interpreting and representing datacreating some eye-catching results that offer new ways of looking at the world. Were also excited to showcase a few projects from alumni of our annual Wolfram High School Summer Camp and Wolfram Summer School. Check out the culmination of their hard work, as well as how Community members find clever solutions using the Wolfram Language.

Fitting the World: Physical Scale from Satellite Images

William Goodall (Wolfram Summer Camp 2019)

How can you decipher the scale of an area, or the size of an object within that area, if the area is shown using a satellite image? Though this is typically quite a challenging issue, as Earths terrain can look the same at different zoom levels, William Goodall makes it look easy by using the Wolfram Language. For his Summer Camp project, William produced an algorithm that uses feature extraction and a neural network to accurately predict map zoom.

Physical scale from satellite images

Tokyo 2020 Olympic and Paralympic Emblems

Kotaro Okazaki

Tokyo 2020 Olympic and Paralympic emblems

The emblems representing the 2020 Summer Olympics and Paralympics combine checkered patterns used in Japanese culture (named ichimatsu moyo during the Edo period), indigo blue (a traditional Japanese color) and varieties of rectangular shapes (representing the diversity promoted by and found within the Olympic and Paralympic Games). Kotaro Okazaki, an inventor at Fujitsu Limited in Japan, explores the mathematics used to create the emblems.

Crime Rates in US Major Cities (2014)

Diego Zviovich

After wondering which US cities have reported the highest crime rates, Diego Zviovich set out to uncover the answerand found the Wolfram Language crucial in interpreting and analyzing the crime statistics data compiled by the FBI. Diego homed in on specific fields of interest and assembled them into a Dataset, finally creating a BubbleChart that gives an at-a-glance visualization of national crime rates.

Monitoring the Development and Spread of Cities

Ahmed Elbanna (Wolfram Summer School 2019)

Predicting urban development is essential for citizens, governments and companies. Ahmed Elbanna, a researcher and lecturer at Tanta University, Egypt, explores several different ways to collect satellite images, including manual collection and using the API from NASAs website, to predict how a city can grow and change. Then to build the model, Ahmed uses the machine learning superfunction Predict to generate a new predicted binary image and add it to the list of binary images of the area in study.

Development and spread of Dubai

Cartogram Function

Richard Lapin (Wolfram Summer School 2019)

Richard Lapin, a recent BSc economics graduate from University College London, created the Cartogram function in order to generate cartograms (geometrically distorted maps where the distortion conveys information derived from input data). With Cartogram, Richard made great use of the Wolfram Function Repositorya public resource of user-contributed Wolfram Language functions.

Cartogram

More Instructions on Installing a Wolfram Engine Kernel for Jupyter

Seth Chandler

Since weve announced the Free Wolfram Engine for Developers, more users than ever before have access to the power of Wolframs computational intelligence. One popular way to take advantage of the Wolfram Engine is to call it from a system like Jupyter. Seth Chandler, Wolfram Innovator Award winner and a Foundation Professor of Law at the University of Houston, shows us how by walking us through a comprehensive tutorial on installing a Wolfram Engine kernel for Jupyter.

Mood Detection in Human Speech

Dev Chheda (Wolfram Summer Camp 2019)

Dev Chheda created a system capable of detecting three moods in human speech: angry, happy and sad. Dev determined the features most useful for detecting mood, extracting the amplitudes, fundamental frequencies and format frequencies of each training clip. He then trained the system on labeled samples of emotional speech using Classify, allowing him to identify and label new audio recordings with the mood of the speaker.

Mood detection in human speech

Visualizing Data with Chord Diagrams

George Varnavides

Visualizing data with chord diagrams

Chord diagrams are an elegant way to represent interrelationships between variables. After searching for built-in capabilities or other user-provided answers and coming up short, George Varnavides built a chord diagram from scratch using the visualization capabilities of the Wolfram Language. In addition to sharing his process on Wolfram Community, George also made sure other users would be able to easily access his code by submitting ChordDiagram to the Wolfram Function Repository. Thanks, George!

Implementing a Visualizer for DNA Sequence Alignments

Jessica Shi (Wolfram Summer Camp 2019)

In bioinformatics, DNA sequence alignment is an algorithm-based method of arranging sequences to find similarities between them. Jessica Shi, another Summer Camp student, used pairwise alignment (and a bit of inspiration from a Wolfram Demonstration published in 2011) to create a visual output representing sequence alignment.

DNA sequence alignments

If you havent yet signed up to be a member of Wolfram Community, please do so! You can join in on similar discussions, post your own work in groups that cover your interests and browse the complete list of Staff Picks. If youre looking for more interesting work produced by our Summer Program alumni, visit our Wolfram Community groups for Summer Camp and Summer School projects.

 (1)

2. Powering Engineering Education with Wolfram Virtual Labs., 08 .[−]

How can you make teaching come alive and be more engaging? For many educators, the answer turns out to be not so much a single solution, but rather a set of tools that can vary according to subject and even by student. So today, I want to add something new to the pedagogical toolkit: Wolfram Virtual Labs.

Wolfram Virtual Labs are open educational resources in the form of interactive courseware that are used to explain different concepts in the classroom. Our ambition is to provide an easy way to study difficult concepts and promote student curiosity.

Powering Engineering Education with Wolfram Virtual Labs

For this post, I spoke with Dr. Matteo Fasano about his experience with using Virtual Labs as a course complement in the masters courses in which he acts as a teaching assistant. He also told me why and how he supported the Wolfram MathCore group to develop the CollegeThermal Virtual Labs (now available) and how they can help teachers or instructors make learning more engaging.

Tell us a bit about yourself. What are these Virtual Labs?

I am a postdoctoral researcher at Politecnico di Torino. I am a teaching assistant for five masters courses on energy and thermal engineering. I was the recipient of the Young Researcher of the Year award from energy company Eni in 2017 for my research work Heat and Mass Transfer of Water at Nanoscale Solid-Liquid Interfaces.

It was in May last year when Ialong with my colleaguesattended a Wolfram SystemModeler demo. During this demo, we got to know about an internal project at Wolfram called Virtual Labs. The idea was simple: these were a set of interactive computer exercises meant as a complement to teaching materials in which you explore a specific topic using system models, either by creating new models to describe the topic or by interacting with pre-built models. It was planned to be distributed as an open educational resource to teachers, students and anyone else wishing to learn about a subject.

I was intrigued by this concept and started correspondence with your team. I sent some exercises from my course material to check if it was possible to prepare models for these case studies, and if it could be modeled as a Virtual Lab with interactive instructions for students. On first review it looked doable; however, because of the length of the content, we proposed for it to be split into two Virtual Labs instead, and these are now available to everyone as the CollegeThermal library.

What prior knowledge is required to use these materials?

We tried to make sure that the content can be followed by anyone with basic thermodynamic knowledge.

Can you walk us through one of the labs?

Ever wondered what the thermal power of the radiator in your house should be to guarantee thermal comfort, or how the ambient conditions affect your room temperature? To answer these questions, it is important to understand the thermal behavior of different components in your house. The Room Heating lab has the ambition to model the most significant components of a house and combine them to see how your room temperature changes with ambient temperature fluctuations.

We first begin by observing the thermal behavior of a multi-layer wall crossed by a heat flux normal to its surface. The wall consists of three layers in series: insulation, brick and insulation, respectively. As a first approximation, the thermal response of this configuration can be studied by a one-dimensional, lumped-parameters model, in which only heat conduction through the wall is considered (for the moment). In the following figure, we show how the thermal conductivity of the different materials affects the temperature distribution inside the wall, at fixed-layers thickness:

Temperature distribution

There is a first interesting behavior that can be noticed here: if we consider typical values for the thermal insulating and brick layers, namely 0.07 W/m K and 0.7 W/m K, respectively, we observe a large temperature drop across the insulation layers, whereas the temperature drop across the brick layer is only minimal, even if the thickness of the brick layer is three times that of the insulation layers. In fact, the law of heat conduction (also known as Fouriers law) states thatat fixed-heat fluxtemperature gradients are linearly proportional to thermal resistance, which can be here determined as the ratio between the layer thickness and thermal conductivity.

We now add another part to our model: the convective heat transfer from the ambient room to the internal wall, and from the external wall to the environment. The thermal resistances due to convection are added in series with respect to the conductive ones through the wall, thus causing further temperature drops at both sides of the wall. In this case, the typical boundary layer of natural convection can be seen by the nonlinear temperature profiles of air in the proximity of the wall surface:

Typical boundary layer of natural convection

In addition to opaque walls, buildings have transparent walls (windows). The thermal model of windows here considers the solar radiation through the glass, the air leakages through the window and the thermal conduction across the glass and frame of the window. We can analyze the model response by observing the heat-flow values for the different contributions. In this case, the indoor and outdoor temperatures are set at 20 C and -10 C, respectively. Results in the following figure show that the heat flux is positive when it flows from the external ambient to the internal one, as in the case of solar radiation, whereas heat fluxes are negative (i.e. thermal losses) when they flow in the opposite direction, as in the cases of leakages and conduction through the window:

Positive heat flux

The thermal model of both opaque and transparent walls can be combined to represent a wall with a window. This assembled model allows you to compare the thermal flux flowing in/out to/from the room at different ambient and room temperatures. Clearly, a nonzero thermal flux balance would lead to the dynamical change of room temperature. If the net thermal flux has negative sign, the room temperature will tend to decrease with time, therefore reducing the thermal comfort:

Room temperature decrease

To observe the actual dynamic behavior of temperature with time, we then refine the thermal model of the room by introducing: (1) other relevant components of the room that significantly affect its temperature, namely the roof, floor (opaque walls) and people (inner heat source); and (2) the thermal capacitance of air in the room. All the components are then combined to create a more comprehensive room model. This model is tested for a given outside temperature:

This model is tested for a given outside temperature

In this case, we observe from the following figure that the net heat outflow is greater than the inflow one; as a result, the temperature in the room decreases and eventually stabilizes at 10 C, when inlet and outlet thermal fluxes equalize:

Net heat outflow

Such equilibrium temperature causes thermal discomfort in the room, which should be avoided by introducing a heating system in the building.

In the Virtual Lab, we have modeled the heating system using radiators as heat exchangers. Specifically, both electric and water radiators as shown in the following diagrams have been considered, since the multidomain approach of SystemModeler allows us to combine different domains such as electric, thermal and control in one model. As a first implementation, the radiator operates at a fixed nominal thermal power and is controlled by an on/off strategy:

Thermal model of the electric radiator

Model of water radiator

We will now use Wolfram|Alpha to get temperature data for one example day during the winter season of Rome as a case study, and use it to define the average external temperature required by our complete room + radiator model:

tempsWinter
&#10005

tempsWinter =
  WeatherData["Rome", "Temperature", {{2016, 01, 01}}]["Path"];
tempsInSecondsWinter = {tempsWinter[[All, 1]] - tempsWinter[[1, 1]],
    QuantityMagnitude[tempsWinter[[All, 2]],
     "DegreesCelsius"]}\[Transpose];

The following plot shows the temperature variation for a span of about nine hours. This data can then be fed to our model:

ListPlot
&#10005

ListPlot[tempsInSecondsWinter,
 AxesLabel -> {"Time[s]", "Temperature[\[Degree]C]"}]

Thanks to an intuitive graphical user interface, the user can now perform fast sensitivity analyses to assess the effect of the model parameters on the room temperature. For example, in this figure, the room reference temperature, the heat capacity of the room and the solar irradiance can be changed to explore their effects on the on/off cycles of the radiator (and thus the heating system):

Effects on the on/off cycles of the radiator

The overall energy consumption by the heating system is also estimated during the day. Here the students can appreciate the energy- (and cost-) saving potential of decreasing the room reference temperature by just 1 C during the heating season, as well as the importance of an adequate sizing of the heating system to guarantee stable comfort conditions throughout the whole day.

In this lab, you saw how we started with trivial models and ended up observing a nontrivial example. The students have access to all the system models used in the labs, where they can learn how these models were created and try to create new models or to refine the existing ones.

What are the problems you are trying to address using these labs?

Existing ways of teaching using only presentations and blackboards have no doubt worked for a long time. I believe that this teaching approach is still required to get a full preliminary understanding of the physical heat- and mass-transfer mechanisms and the thermodynamics underlying thermal systems. Nowadays, this approach can be assisted by the real-time simulation of thermal systems to get a fast, hands-on experience of their actual operating conditions. Virtual Labs can support this approach without the need for prior knowledge of low-level coding, which is a plus for students without a solid programming background.

It took me less than a day to create a model, and I felt that I could now create Virtual Labs by myself.

How was your experience using Wolfram tech?

The notebook interface and wide visualization tools of Mathematica provide a convenient way to create dynamic contents. Modeling using SystemModeler is also easy with its wide range of built-in components and equation-based models. It took me less than a day to create a model, and I felt that I could now create Virtual Labs by myself. I am curious to also test the power of the Wolfram Language for my research: in the near future, I would like to explore the machine learning capabilities of Mathematica to predict heat- and mass-transfer processesfor instance, for the injection molding of nanocomposites. Not only me, but I can also see my students making use of these tools to improve their understanding of the physical concepts learned during the lectures.

Do you have anything to say to the teachers?

A small tip to teachers: the Wolfram MathCore group is easy to talk to and open to providing help. If you have any ideas or questions on how you could improve your teaching capacities, do contact them! If you want your students to experience innovative learning processes, take an extra step to transform your courses with the help of flexible, intuitive and high-level programming.

Have a good idea for an educational area where Virtual Labs could be helpful? Let us know by sending us an email at virtuallabs@wolfram.com!

Get full access to the latest Wolfram Language functionality for SystemModeler models and Virtual Labs with Mathematica 12.

Learn more

 (0)

3. Innovating in Education, Analytics and Engineering: Thirty Years Using Wolfram Technology., 04 .[−]

Robert Prince-Wright has been using Mathematica since its debut in 1988 to develop computational tools in education, business consulting and offshore engineering. We recently talked to Prince-Wright about his work developing simulation models for deepwater drilling equipment at safety and systems engineering company Berkeley & Imperial.

Innovating in Education, Analytics and Engineering: Thirty Years Using Wolfram Technology

His latest work is cutting edgebut its only part of the story. Throughout his career, Prince-Wright has used Wolfram technologies for modeling systems as varied as downhole wellbore trajectory, radionuclide dispersion and PID control of automation systems. Read on to learn more about Prince-Wrights accomplishments and discover why Wolfram technology is his go-to for developing unique computational solutions.

Simplifying Complex Ideas

When Mathematica Version 1.0 was released, Prince-Wright was in a PhD program in probabilistic modeling at the University of Glasgow. At the time, Mathematica was sold as a specialized tool for symbolic algebra. Its unique treatment of differential equations and linear algebra made it indispensable to Prince-Wright in his research on reliability analysis. Mathematica 1.0

Over the next few years, while teaching engineering mathematics, he found the symbolic approach of the Wolfram Language helpful in demonstrating math concepts intuitively. In particular, he used it to show students how the use of vectors and complex numbers made location and rotation algebra much simpler than trigonometry. He also notes that even this early interface produced higher-quality visualizations faster than any other software at the time.

Publishing Interactive Models

Eventually, Prince-Wright moved to the private sectoranalyzing engineering systems, presenting findings and making policy proposals for a range of clients. Though he tried other software, he always gravitated toward the Wolfram Language for its highly automated interface construction (after all, he was a civil engineer, not a programmer). In the early 2000s, he began taking advantage of the publishing capabilities of Wolfram Notebooks. He could readily demonstrate custom models to clients anywhere, either by sending them full-featured standalone notebooks or by generating interactive notebook-based interfaces that connected to pre-deployed models on the back end.

Ongoing improvements in Mathematicas modeling and analysis functionality also allowed Prince-Wright to tackle new challenges. In 2005, he helped develop a new well-control algorithm for a major oil company that combined concepts from several engineering domains. He was pleased by how easy it was to repurpose the language to create this sophisticated multiphysics solution: What was remarkable was thinking back 14 years and using the skills I developed using Mathematica as a teacher to solve real-world problems.

Unified Analysis and Simulation

In 2012, while working in safety and systems engineering at Berkeley & Imperial, Prince-Wright discovered Wolfram SystemModeler and its Wolfram Language integration features. He began leveraging this new Modelica interface to test and improve models hed built in the Wolfram Language. Combining the drag-and-drop simulation workflow of SystemModeler with the real-time analysis of the Wolfram Language allowed him to achieve unprecedented speed and accuracy in his models.

Systems modeling

Around that time, Prince-Wright heard about the Wolfram Connected Devices Project. In addition to improved integration with low-level devices, he soon realized the project gave him a framework for comparing system models to the actual systems they representin his words, integrating the language with the real world around us. This workflow turned out to be ideal for simulating and testing the kinds of embedded systems used in deepwater drilling. Since then, he and his team have continued to push this concept further, exploring the use of digital twins and hardware-in-the-loop simulations to continue improving his models.

An Ever-Evolving System

In many ways, the modern Wolfram technology stack has advanced in parallel with Prince-Wrights career path, growing from a symbolic math tool to a cross-platform computational system to the full-featured modern engineering and analytics suite available today. Like his work, Wolfram technology can be applied across a diverse range of engineering fields. And the Wolfram Language maintains consistent syntax and structure throughout, making it easy to try different techniquesand complementing Prince-Wrights approach of constantly pushing the boundaries. These qualities combine to provide lasting value for innovators like Prince-Wright: The evolution of the Wolfram System has been very carefully designed to ensure that the core language was honored through the process. What that means now is that using the Wolfram Language gives you an opportunity to solve even bigger and more complex problems.

Find out more about Robert Prince-Wright, as well as other users’ exciting Wolfram Language applications, on our Customer Stories pages.

Build and test your own engineering models with Wolfram SystemModeler, the easy-to-use, next-generation modeling and simulation environment.

Try now

 (0)

4. Announcing the Rule 30 Prizes., 01 .[−]
The Story of Rule 30 How can something that simple produce something that complex? Its been nearly 40 years since I first saw rule 30 but it still amazes me. Long ago it became my personal all-time favorite science discovery, and over the years its changed my whole worldview and led me to all sorts of science, technology, philosophy and more. But even after all these years, there are still many basic things we don t know about rule 30. And I ve decided that it s now time to do what I can to stimulate the process of finding more of them out. So as of today, I am offering $30,000 in prizes for the answers to three basic questions about rule 30. The setup for rule 30 is extremely simple. One s dealing with a sequence of lines of black and white cells. And given a particular line of black and white cells, the colors of the cells on the line below are determined by looking at each cell and its immediate neighbors and then applying the following simple rule: #10005 RulePlot[CellularAutomaton[30]] If you start with a single black cell, what will happen? One might assume as I at first did that the rule is simple enough that the pattern it produces must somehow be correspondingly simple. But if you actually do the experiment, heres what you find happens over the first 50 steps: #10005 RulePlot[CellularAutomaton[30], {{1}, 0}, 50, Mesh -> All, ImageSize -> Full] But surely, one might think, this must eventually resolve into something much simpler. Yet heres what happens over the first 300 steps: And, yes, theres some regularity over on the left. But many aspects of this pattern look for all practical purposes random. Its amazing that a rule so simple can produce behavior thats so complex. But Ive discovered that in the computational universe of possible programs this kind of thing is common, even ubiquitous. And Ive built a whole new kind of science with all sorts of principles based on this. And gradually there s been more and more evidence for these principles. But what specifically can rule 30 tell us? What concretely can we say about how it behaves? Even the most obvious questions turn out to be difficult. And after decades without answers, I ve decided it s time to define some specific questions about rule 30, and offer substantial prizes for their solutions. I did something similar in 2007, putting a prize on a core question about a particular Turing machine. And at least in that case the outcome was excellent. In just a few months, the prize was won establishing forever what the simplest possible universal Turing machine is, as well as providing strong further evidence for my general Principle of Computational Equivalence. The Rule 30 Prize Problems again get at a core issue: just how complex really is the behavior of rule 30? Each of the problems asks this in a different, concrete way. Like rule 30 itself, they re all deceptively simple to state. Yet to solve any of them will be a major achievement that will help illuminate fundamental principles about the computational universe that go far beyond the specifics of rule 30. I ve wondered about every one of the problems for more than 35 years. And all that time I ve been waiting for the right idea, or the right kind of mathematical or computational thinking, to finally be able to crack even one of them. But now I want to open this process up to the world. And I m keen to see just what can be achieved, and what methods it will take. The Rule 30 Prize Problems For the Rule 30 Prize Problems, Im concentrating on a particularly dramatic feature of rule 30: the apparent randomness of its center column of cells. Start from a single black cell, then just look down the sequence of values of this cell and it seems random: #10005 ArrayPlot[ MapIndexed[If[#2[[2]] != 21, # /. {0 -> 0.2, 1 -> .6}, #] , CellularAutomaton[30, {{1}, 0}, 20], {2}], Mesh -> All] But in what sense is it really random? And can one prove it? Each of the Prize Problems in effect uses a different criterion for randomness, then asks whether the sequence is random according to that criterion. Problem 1: Does the center column always remain non-periodic? Heres the beginning of the center column of rule 30: #10005 ArrayPlot[List@CellularAutomaton[30, {{1}, 0}, {80, {{0}}}], Mesh -> True, ImageSize -> Full] Its easy to see that this doesnt repeat it doesnt become periodic. But this problem is about whether the center column ever becomes periodic, even after an arbitrarily large number of steps. Just by running rule 30, we know the sequence doesnt become periodic in the first billion steps. But what about ever? To establish that, we need a proof. (Here are the first million and first billion bits in the sequence, by the way, as entries in the Wolfram Data Repository.) Problem 2: Does each color of cell occur on average equally often in the center column? Heres what one gets if one tallies the number of black and of white cells in successively more steps in the center column of rule 30: #10005 Dataset[{{1, 1, 0, ""}, {10, 7, 3, 2.3333333333333335}, {100, 52, 48, 1.0833333333333333}, {1000, 481, 519, 0.9267822736030829}, {10000, 5032, 4968, 1.0128824476650564}, {100000, 50098, 49902, 1.0039276982886458}, {1000000, 500768, 499232, 1.003076725850907}, {10000000, 5002220, 4997780, 1.0008883944471345}, {100000000, 50009976, 49990024, 1.000399119632349}, {1000000000, 500025038, 499974962, 1.0001001570154626}}] The results are certainly close to equal for black vs. white. But what this problem asks is whether the limit of the ratio after an arbitrarily large number of steps is exactly 1. Problem 3: Does computing the nth cell of the center column require at least O(n) computational effort? To find the nth cell in the center column, one can always just run rule 30 for n steps, computing the values of all the cells in this diamond: #10005 With[{n = 100}, ArrayPlot[ MapIndexed[If[Total[Abs[#2 - n/2 - 1]] = n/2, #, #/4] , CellularAutomaton[30, CenterArray[{1}, n + 1], n], {2}]]] But if one does this directly, ones doing n2 individual cell updates, so the computational effort required goes up like O(n2). This problem asks if theres a shortcut way to compute the value of the nth cell, without all this intermediate computation or, in particular, in less than O(n) computational effort. The Digits of Pi Rule 30 is a creature of the computational universe: a system found by exploring possible simple programs with the new intellectual framework that the paradigm of computation provides. But the problems Ive defined about rule 30 have analogs in mathematics that are centuries old. Consider the digits of . They re a little like the center column of rule 30. There s a definite algorithm for generating them. Yet once generated they seem for all practical purposes random: #10005 N[Pi, 85] Just to make the analog a little closer, here are the first few digits of in base 2: #10005 BaseForm[N[Pi, 25], 2] And here are the first few bits in the center column of rule 30: #10005 Row[CellularAutomaton[30, {{1}, 0}, {90, {{0}}}]] Just for fun, one can convert these to base 10: #10005 N[FromDigits[{Flatten[CellularAutomaton[30, {{1}, 0}, {500, {0}}]], 0}, 2], 85] Of course, the known algorithms for generating the digits of are considerably more complicated than the simple rule for generating the center column of rule 30. But, OK, so whats known about the digits of ? Well, we know they don t repeat. That was proved in the 1760s when it was shown that is an irrational number because the only numbers whose digits repeat are rational numbers. (It was also shown in 1882 that is transcendental, i.e. that it cannot be expressed in terms of roots of polynomials.) How about the analog of problem 2? Do we know if in the digit sequence of different digits occur with equal frequency? By now more than 100 trillion binary digits have been computed and the measured frequencies of digits are very close (in the first 40 trillion binary digits the ratio of 1s to 0s is about 0.9999998064). But in the limit, are the frequencies exactly the same? People have been wondering about this for several centuries. But so far mathematics hasn t succeeded in delivering any results. For rational numbers, digit sequences are periodic, and it s easy to work out relative frequencies of digits. But for the digit sequences of all other naturally constructed numbers, basically there s nothing known about limiting frequencies of digits. It s a reasonable guess that actually the digits of (as well as the center column of rule 30) are normal , in the sense that not only every individual digit, but also every block of digits of any given length in the limit occur with equal frequency. And as was noted in the 1930s, it s perfectly possible to digit-construct normal numbers. Champernownes number, formed by concatenating the digits of successive integers, is an example (and, yes, this works in any base, and one can also get normal numbers by concatenating values of functions of successive integers): #10005 N[ChampernowneNumber[10], 85] But the point is that for naturally constructed numbers formed by combinations of standard mathematical functions, theres simply no example known where any regularity of digits has been found. Of course, it ultimately depends what one means by regularity and at some level the problem devolves into a kind of number-digit analog of the search for extraterrestrial intelligence. But theres absolutely no proof that one couldnt, for example, find even some strange combination of square roots that would have a digit sequence with some very obvious regularity. OK, so what about the analog of problem 3 for the digits of ? Unlike rule 30, where the obvious way to compute elements in the sequence is one step at a time, traditional ways of computing digits of involve getting better approximations to as a complete number. With the standard (bizarre-looking) series invented by Ramanujan in 1910 and improved by the Chudnovsky brothers in 1989, the first few terms in the series give the following approximations: #10005 Style[Table[N[(12*\!\( \*UnderoverscriptBox[\(\[Sum]\), \(k = 0\), \(n\)] \*FractionBox[\( \*SuperscriptBox[\((\(-1\))\), \(k\)]*\(\((6*k)\)!\)*\((13591409 + 545140134*k)\)\), \(\(\((3*k)\)!\) \*SuperscriptBox[\((\(k!\))\), \(3\)]* \*SuperscriptBox[\(640320\), \(3*k + 3/2\)]\)]\))^-1, 100], {n, 10}] // Column, 9] So how much computational effort is it to find the nth digit? The number of terms required in the series is O(n). But each term needs to be computed to n-digit precision, which requires at least O(n) individual digit operations implying that altogether the computational effort required is more than O(n). Until the 1990s it was assumed that there wasn t any way to compute the nth digit of without computing all previous ones. But in 1995 Simon Plouffe discovered that actually it s possible to compute albeit slightly probabilistically the nth digit without computing earlier ones. And while one might have thought that this would allow the nth digit to be obtained with less than O(n) computational effort, the fact that one has to do computations at n-digit precision means that at least O(n) computational effort is still required. Results, Analogies and Intuitions Problem 1: Does the center column always remain non-periodic? Of the three Rule 30 Prize Problems, this is the one on which the most progress has already been made. Because while its not known if the center column in the rule 30 pattern ever becomes periodic, Erica Jen showed in 1986 that no two columns can both become periodic. And in fact, one can also give arguments that a single column plus scattered cells in another column can t both be periodic. The proof about a pair of columns uses a special feature of rule 30. Consider the structure of the rule: #10005 RulePlot[CellularAutomaton[30]] Normally one would just say that given each triple of cells, the rule determines the color of the center cell below. But for rule 30, one can effectively also run the rule sideways: given the cell to the right and above, one can also uniquely determine the color of the cell to the left. And what this means is that if one is given two adjacent columns, it s possible to reconstruct the whole pattern to the left: #10005 GraphicsRow[ ArrayPlot[#, PlotRange -> 1, Mesh -> All, PlotRange -> 1, Background -> LightGray, ImageSize -> {Automatic, 80}] /@ (PadLeft[#, {Length[#], 10}, 10] /@ Module[{data = {{0, 1}, {1, 1}, {0, 0}, {0, 1}, {1, 1}, {1, 0}, {0, 1}, {1, 10}}}, Flatten[{{data}, Table[Join[ Table[Module[{p, q = data[[n, 1]], r = data[[n, 2]], s = data[[n + 1, 1]] }, p = Mod[-q - r - q r + s, 2]; PrependTo[data[[n]], p]], {n, 1, Length[data] - i}], PrependTo[data[[-#]], 10] /@ Reverse[Range[i]]], {i, 7}]}, 1]])] But if the columns were periodic, it immediately follows that the reconstructed pattern would also have to be periodic. Yet by construction at least the initial condition is definitely not periodic, and hence the columns cannot both be periodic. The same argument works if the columns are not adjacent, and if one doesnt know every cell in both columns. But theres no known way to extend the argument to a single column such as the center column and thus it doesnt resolve the first Rule 30 Prize Problem. OK, so what would be involved in resolving it? Well, if it turns out that the center column is eventually periodic, one could just compute it, and show that. We know it s not periodic for the first billion steps, but one could at least imagine that there could be a trillion-step transient, after which it s periodic. Is that plausible? Well, transients do happen and theoretically (just like in the classic Turing machine halting problem) they can even be arbitrarily long. Here s a somewhat funky example found by a search of a rule with 4 possible colors (totalistic code 150898). Run it for 200 steps, and the center column looks quite random: #10005 ArrayPlot[ CellularAutomaton[{150898, {4, 1}, 1}, {{1}, 0}, {200, 150 {-1, 1}}], ColorRules -> {0 -> Hue[0.12, 1, 1], 1 -> Hue[0, 0.73, 0.92], 2 -> Hue[0.13, 0.5, 1], 3 -> Hue[0.17, 0, 1]}, PixelConstrained -> 2, Frame -> False] After 500 steps, the whole pattern still looks quite random: #10005 ArrayPlot[ CellularAutomaton[{150898, {4, 1}, 1}, {{1}, 0}, {500, 300 {-1, 1}}], ColorRules -> {0 -> Hue[0.12, 1, 1], 1 -> Hue[0, 0.73, 0.92], 2 -> Hue[0.13, 0.5, 1], 3 -> Hue[0.17, 0, 1]}, Frame -> False, ImagePadding -> 0, PlotRangePadding -> 0, PixelConstrained -> 1] But if one zooms in around the center column, theres something surprising: after 251 steps, the center column seems to evolve to a fixed value (or at least its fixed for more than a million steps): #10005 Grid[{ArrayPlot[#, Mesh -> True, ColorRules -> {0 -> Hue[0.12, 1, 1], 1 -> Hue[0, 0.73, 0.92], 2 -> Hue[0.13, 0.5, 1], 3 -> Hue[0.17, 0, 1]}, ImageSize -> 38, MeshStyle -> Lighter[GrayLevel[.5, .65], .45]] /@ Partition[ CellularAutomaton[{150898, {4, 1}, 1}, {{1}, 0}, {1400, {-4, 4}}], 100]}, Spacings -> .35] Could some transient like this happen in rule 30? Well, take a look at the rule 30 pattern, now highlighting where the diagonals on the left are periodic: #10005 steps = 500; diagonalsofrule30 = Reverse /@ Transpose[ MapIndexed[RotateLeft[#1, (steps + 1) - #2[[1]]] , CellularAutomaton[30, {{1}, 0}, steps]]]; diagonaldataofrule30 = Table[With[{split = Split[Partition[Drop[diagonalsofrule30[[k]], 1], 8]], ones = Flatten[ Position[Reverse[Drop[diagonalsofrule30[[k]], 1]], 1]]}, {Length[split[[1]]], split[[1, 1]], If[Length[split] > 1, split[[2, 1]], Length[diagonalsofrule30[[k]]] - Floor[k/2]]}], {k, 1, 2 steps + 1}]; transientdiagonalrule30 = %; transitionpointofrule30 = If[IntegerQ[#[[3]]], #[[3]], If[#[[1]] > 1, 8 #[[1]] + Count[Split[#[[2]] - #[[3]]][[1]], 0] + 1, 0] ] /@ diagonaldataofrule30; decreasingtransitionpointofrule30 = Append[Min /@ Partition[transitionpointofrule30, 2, 1], 0]; transitioneddiagonalsofrule30 = Table[Join[ Take[diagonalsofrule30[[n]], decreasingtransitionpointofrule30[[n]]] + 2, Drop[diagonalsofrule30[[n]], decreasingtransitionpointofrule30[[n]]]], {n, 1, 2 steps + 1}]; transientdiagonalrule30 = MapIndexed[RotateRight[#1, (steps + 1) - #2[[1]]] , Transpose[Reverse /@ transitioneddiagonalsofrule30]]; smallertransientdiagonalrule30 = Take[#, {225, 775}] /@ Take[transientdiagonalrule30, 275]; Framed[ArrayPlot[smallertransientdiagonalrule30, ColorRules -> {0 -> White, 1 -> Gray, 2 -> Hue[0.14, 0.55, 1], 3 -> Hue[0.07, 1, 1]}, PixelConstrained -> 1, Frame -> None, ImagePadding -> 0, ImageMargins -> 0, PlotRangePadding -> 0, PlotRangePadding -> Full ], FrameMargins -> 0, FrameStyle -> GrayLevel[.75]] There seems to be a boundary that separates order on the left from disorder on the right. And at least over the first 100,000 or so steps, the boundary seems to move on average about 0.252 steps to the left at each step with roughly random fluctuations: #10005 data = CloudGet[ CloudObject[ "https://www.wolframcloud.com/obj/bc470188-f629-4497-965d-\ a10fe057e2fd"]]; ListLinePlot[ MapIndexed[{First[#2], -# - .252 First[#2]} , Module[{m = -1, w}, w = If[First[#] > m, m = First[#], m] /@ data[[1]]; m = 1; Table[While[w[[m]] < i, m++]; m - i, {i, 100000}]]], Filling -> Axis, AspectRatio -> 1/4, MaxPlotPoints -> 10000, Frame -> True, PlotRangePadding -> 0, AxesOrigin -> {Automatic, 0}, PlotStyle -> Hue[0.07`, 1, 1], FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]] But how do we know that there wont at some point be a huge fluctuation, that makes the order on the left cross the center column, and perhaps even make the whole pattern periodic? From the data we have so far, it looks unlikely, but I dont know any way to know for sure. And it s certainly the case that there are systems with exceptionally long transients . Consider the distribution of primes, and compute LogIntegral[n] - PrimePi[n]: #10005 DiscretePlot[LogIntegral[n] - PrimePi[n], {n, 10000}, Filling -> Axis, Frame -> True, PlotRangePadding -> 0, AspectRatio -> 1/4, Joined -> True, PlotStyle -> Hue[0.07`, 1, 1], FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]] Yes, there are fluctuations. But from this picture it certainly looks as if this difference is always going to be positive. And thats, for example, what Ramanujan thought. But it turns out it isn t true. At first the bound for where it would fail was astronomically large (Skewes s number 10^10^10^964). And although still nobody has found an explicit value of n for which the difference is negative, its known that before n = 10317 there must be one (and eventually the difference will be negative at least nearly a millionth of the time). I strongly suspect that nothing like this happens with the center column of rule 30. But until we have a proof that it can t, who knows? One might think, by the way, that while one might be able to prove periodicity by exposing regularity in the center column of rule 30, nothing like that would be possible for non-periodicity. But actually, there are patterns whose center columns one can readily see are non-periodic, even though they re very regular. The main class of examples are nested patterns. Here s a very simple example, from rule 161 in which the center column has white cells when n = 2k: #10005 GraphicsRow[ ArrayPlot[CellularAutomaton[161, {{1}, 0}, #]] /@ {40, 200}] Heres a slightly more elaborate example (from the 2-neighbor 2-color rule 69540422), in which the center column is a ThueMorse sequence ThueMorse[n]: #10005 GraphicsRow[ ArrayPlot[ CellularAutomaton[{69540422, 2, 2}, {{1}, 0}, {#, {-#, #}}]] /@ {40, 400}] One can think of the ThueMorse sequence as being generated by successively applying the substitutions: #10005 RulePlot[SubstitutionSystem[{0 -> {0, 1}, 1 -> {1, 0}}], Appearance -> "Arrow"] And it turns out that the nth term in this sequence is given by Mod[DigitCount[n, 2, 1], 2] which is never periodic. Will it turn out that the center column of rule 30 can be generated by a substitution system? Again, I d be amazed (although there are seemingly natural examples where very complex substitution systems do appear). But once again, until one has a proof, who knows? Heres something else, that may be confusing, or may be helpful. The Rule 30 Prize Problems all concern rule 30 running in an infinite array of cells. But what if one considers just n cells, say with the periodic boundary conditions (i.e. taking the right neighbor of the rightmost cell to be the leftmost cell, and vice versa)? There are 2n possible total states of the system and one can draw a state transition diagram that shows which state evolves to which other. Heres the diagram for n = 5: #10005 Graph[# -> CellularAutomaton[30][#] /@ Tuples[{1, 0}, 4], VertexLabels -> ((# -> ArrayPlot[{#}, ImageSize -> 30, Mesh -> True]) /@ Tuples[{1, 0}, 4])] And here it is for n = 4 through n = 11: #10005 Row[Table[ Framed[Graph[# -> CellularAutomaton[30][#] /@ Tuples[{1, 0}, n]]], {n, 4, 11}]] The structure is that there are a bunch of states that appear only as transients, together with other states that are on cycles. Inevitably, no cycle can be longer than 2n (actually, symmetry considerations show that it always has to be somewhat less than this). OK, so on a size-n array, rule 30 always has to show behavior that becomes periodic with a period that s less than 2n. Here are the actual periods starting from a single black cell initial condition, plotted on a log scale: #10005 ListLogPlot[ Normal[Values[ ResourceData[ "Repetition Periods for Elementary Cellular Automata"][ Select[#Rule == 30 ]][All, "RepetitionPeriods"]]], Joined -> True, Filling -> Bottom, Mesh -> All, MeshStyle -> PointSize[.008], AspectRatio -> 1/3, Frame -> True, PlotRange -> {{47, 2}, {0, 10^10}}, PlotRangePadding -> .1, PlotStyle -> Hue[0.07`, 1, 1], FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]] And at least for these values of n, a decent fit is that the period is about 20.63 n. And, yes, at least in all these cases, the period of the center column is equal to the period of the whole evolution. But what do these finite-size results imply about the infinite-size case? I, at least, dont immediately see. Problem 2: Does each color of cell occur on average equally often in the center column? Heres a plot of the running excess of 1s over 0s in 10,000 steps of the center column of rule 30: #10005 ListLinePlot[ Accumulate[2 CellularAutomaton[30, {{1}, 0}, {10^4 - 1, {{0}}}] - 1], AspectRatio -> 1/4, Frame -> True, PlotRangePadding -> 0, AxesOrigin -> {Automatic, 0}, Filling -> Axis, PlotStyle -> Hue[0.07`, 1, 1], FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]] Here it is for a million steps: #10005 ListLinePlot[ Accumulate[ 2 ResourceData[ "A Million Bits of the Center Column of the Rule 30 Cellular Automaton"] - 1], Filling -> Axis, Frame -> True, PlotRangePadding -> 0, AspectRatio -> 1/4, MaxPlotPoints -> 1000, PlotStyle -> Hue[0.07`, 1, 1], FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]]] And a billion steps: #10005 data=Flatten[IntegerDigits[#,2,8] /@Normal[ResourceData["A Billion Bits of the Center Column of the Rule 30 Cellular Automaton"]]]; data=Accumulate[2 data-1]; sdata=Downsample[data,10^5]; ListLinePlot[Transpose[{Range[10000] 10^5,sdata}],Filling->Axis,Frame->True,PlotRangePadding->0,AspectRatio->1/4,MaxPlotPoints->1000,PlotStyle->Hue[0.07`,1,1],FillingStyle->Directive[Opacity[0.35`],Hue[0.12`,1,1]]] We can see that there are times when theres an excess of 1s over 0s, and vice versa, though, yes, as we approach a billion steps 1 seems to be winning over 0, at least for now. But let s compute the ratio of the total number of 1s to the total number 0f 0s. Here s what we get after 10,000 steps: #10005 Quiet[ListLinePlot[ MapIndexed[#/(First[#2] - #) , Accumulate[CellularAutomaton[30, {{1}, 0}, {10^4 - 1, {{0}}}]]], AspectRatio -> 1/4, Filling -> Axis, AxesOrigin -> {Automatic, 1}, Frame -> True, PlotRangePadding -> 0, PlotStyle -> Hue[0.07`, 1, 1], FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]], PlotRange -> {Automatic, {.88, 1.04}}]] Is this approaching the value 1? Its hard to tell. Go on a little longer, and this is what we see: #10005 Quiet[ListLinePlot[ MapIndexed[#/(First[#2] - #) , Accumulate[CellularAutomaton[30, {{1}, 0}, {10^5 - 1, {{0}}}]]], AspectRatio -> 1/4, Filling -> Axis, AxesOrigin -> {Automatic, 1}, Frame -> True, PlotRangePadding -> 0, PlotStyle -> Hue[0.07`, 1, 1], FillingStyle -> Directive[Opacity[0.35`], Hue[0.12`, 1, 1]], PlotRange -> {Automatic, {.985, 1.038}}]] The scale is getting smaller, but its still hard to tell what will happen. Plotting the difference from 1 on a log-log plot up to a billion steps suggests its fairly systematically getting smaller: #10005 accdata=Accumulate[Flatten[IntegerDigits[#,2,8] /@Normal[ResourceData["A Billion Bits of the Center Column of the Rule 30 Cellular Automaton"]]]]; diffratio=FunctionCompile[Function[Typed[arg,TypeSpecifier["PackedArray"]["MachineInteger",1]],MapIndexed[Abs[N[#]/(First[#2]-N[#])-1.] ,arg]]]; data=diffratio[accdata]; ListLogLogPlot[Join[Transpose[{Range[3,10^5],data[[3;;10^5]]}],Transpose[{Range[10^5+1000,10^9,1000],data[[10^5+1000;;10^9;;1000]]}]],Joined->True,AspectRatio->1/4,Frame->True,Filling->Axis,PlotRangePadding->0,PlotStyle->Hue[0.07`,1,1],FillingStyle->Directive[Opacity[0.35`],Hue[0.12`,1,1]]] But how do we know this trend will continue? Right now, we dont. And, actually, things could get quite pathological. Maybe the fluctuations in 1s vs. 0s grow, so even though were averaging over longer and longer sequences, the overall ratio will never converge to a definite value. Again, I doubt this is going to happen in the center column of rule 30. But without a proof, we dont know for sure. Were asking here about the frequencies of black and white cells. But an obvious and potentially illuminating generalization is to ask instead about the frequencies for blocks of cells of length k. We can ask if all 2k such blocks have equal limiting frequency. Or we can ask the more basic question of whether all the blocks even ever occur or, in other words, whether if one goes far enough, the center column of rule 30 will contain any given sequence of length k (say a bitwise representation of some work of literature). Again, we can get empirical evidence. For example, at least up to k = 22, all 2k sequences do occur and heres how many steps it takes: #10005 ListLogPlot[{3, 7, 13, 63, 116, 417, 1223, 1584, 2864, 5640, 23653, 42749, 78553, 143591, 377556, 720327, 1569318, 3367130, 7309616, 14383312, 32139368, 58671803}, Joined -> True, AspectRatio -> 1/4, Frame -> True, Mesh -> True, MeshStyle -> Directive[{Hue[0.07, 0.9500000000000001, 0.99], PointSize[.01]}], PlotTheme -> "Detailed", PlotStyle -> Directive[{Thickness[.004], Hue[0.1, 1, 0.99]}]] It s worth noticing that one can succeed perfectly for blocks of one length, but then fail for larger blocks. For example, the ThueMorse sequence mentioned above has exactly equal frequencies of 0 and 1, but pairs don t occur with equal frequencies, and triples of identical elements simply never occur. In traditional mathematics and particularly dynamical systems theory one approach to take is to consider not just evolution from a single-cell initial condition, but evolution from all possible initial conditions. And in this case it s straightforward to show that, yes, if one evolves with equal probability from all possible initial conditions, then columns of cells generated by rule 30 will indeed contain every block with equal frequency. But if one asks the same thing for different distributions of initial conditions, one gets different results, and it s not clear what the implication of this kind of analysis is for the specific case of a single-cell initial condition. If different blocks occurred with different frequencies in the center column of rule 30, then that would immediately show that the center column is not random , or in other words that it has statistical regularities that could be used to at least statistically predict it. Of course, at some level the center column is completely predictable : you just have to run rule 30 to find it. But the question is whether, given just the values in the center column on their own, there s a way to predict or compress them, say with much less computational effort than generating an arbitrary number of steps in the whole rule 30 pattern. One could imagine running various data compression or statistical analysis algorithms, and asking whether they would succeed in finding regularities in the sequence. And particularly when one starts thinking about the overall computational capabilities of rule 30, it s conceivable that one could prove something about how across a spectrum of possible analysis algorithms, there s a limit to how much they could reduce the computation associated with the evolution of rule 30. But even given this, it d likely still be a major challenge to say anything about the specific case of relative frequencies of black and white cells. It s perhaps worth mentioning one additional mathematical analog. Consider treating the values in a row of the rule 30 pattern as digits in a real number, say with the first digit of the fractional part being on the center column. Now, so far as we know, the evolution of rule 30 has no relation to any standard operations (like multiplication or taking powers) that one does on real numbers. But we can still ask about the sequence of numbers formed by looking at the right-hand side of the rule 30 pattern. Here s a plot for the first 200 steps: #10005 ListLinePlot[ FromDigits[{#, 0}, 2] /@ CellularAutomaton[30, {{1}, 0}, {200, {0, 200}}], Mesh -> All, AspectRatio -> 1/4, Frame -> True, MeshStyle -> Directive[{Hue[0.07, 0.9500000000000001, 0.99], PointSize[.0085]}], PlotTheme -> "Detailed", PlotStyle -> Directive[{ Hue[0.1, 1, 0.99]}], ImageSize -> 575] And heres a histogram of the values reached at successively more steps: #10005 Grid[{Table[ Histogram[ FromDigits[{#, 0}, 2] /@ CellularAutomaton[30, {{1}, 0}, {10^n, {0, 20}}], {.01}, Frame -> True, FrameTicks -> {{None, None}, {{{0, "0"}, .2, .4, .6, .8, {1, "1"}}, None}}, PlotLabel -> (StringTemplate["`` steps"][10^n]), ChartStyle -> Directive[Opacity[.5], Hue[0.09, 1, 1]], ImageSize -> 208, PlotRangePadding -> {{0, 0}, {0, Scaled[.06]}}], {n, 4, 6}]}, Spacings -> .2] And, yes, its consistent with the limiting histogram being flat, or in other words, with these numbers being uniformly distributed in the interval 0 to 1. Well, it turns out that in the early 1900s there were a bunch of mathematical results established about this kind of equidistribution. In particular, it s known that FractionalPart[h n] for successive n is always equidistributed if h isn t a rational number. It s also known that FractionalPart[hn] is equidistributed for almost all h (Pisot numbers like the golden ratio are exceptions). But specific cases like FractionalPart[(3/2)n] have eluded analysis for at least half a century. (By the way, its known that the digits of in base 16 and thus base 2 can be generated by a recurrence of the form xn = FractionalPart[16 xn-1 + r[n]] where r[n] is a fixed rational function of n.) Problem 3: Does computing the nth cell of the center column require at least O(n) computational effort? Consider the pattern made by rule 150: #10005 Row[{ArrayPlot[CellularAutomaton[150, {{1}, 0}, 30], Mesh -> All, ImageSize -> 315], ArrayPlot[CellularAutomaton[150, {{1}, 0}, 200], ImageSize -> 300]}] Its a very regular, nested pattern. Its center column happens to be trivial (all cells are black). But if we look one column to the left or right, we find: #10005 ArrayPlot[{Table[Mod[IntegerExponent[t, 2], 2], {t, 80}]}, Mesh -> All, ImageSize -> Full] How do we work out the value of the nth cell? Well, in this particular case, it turns out theres essentially just a simple formula: the value is given by Mod[IntegerExponent[n, 2], 2]. In other words, just look at the number n in base 2, and ask whether the number of zeros it has at the end is even or odd. How much computational effort does it take to evaluate this formula ? Well, even if we have to check every bit in n, there are only about Log[2, n] of those. So we can expect that the computational effort is O(log n). But what about the rule 30 case? We know we can work out the value of the nth cell in the center column just by explicitly applying the rule 30 update rule n2 times. But the question is whether there s a way to reduce the computational work that s needed. In the past, there s tended to be an implicit assumption throughout the mathematical sciences that if one has the right model for something, then by just being clever enough one will always find a way to make predictions or in other words, to work out what a system will do, using a lot less computational effort than the actual evolution of the system requires. And, yes, there are plenty of examples of exact solutions (think 2-body problem, 2D Ising model, etc.) where we essentially just get a formula for what a system will do. But there are also other cases (think 3-body problem, 3D Ising model, etc.) where this has never successfully been done. And as I first discussed in the early 1980s, I suspect that there are actually many systems (including these) that are computationally irreducible, in the sense that there s no way to significantly reduce the amount of computational work needed to determine their behavior. So in effect Problem 3 is asking about the computational irreducibility of rule 30 or at least a specific aspect of it. (The choice of O(n) computational effort is somewhat arbitrary; another version of this problem could ask for O(n ) for any 1] Rule 30, however, shows no such obvious modularity so it doesnt seem plausible that one can establish universality in the engineering way its been established for all other known-to-be-universal systems. Still, my Principle of Computational Equivalence strongly suggests that rule 30 is indeed universal; we just dont yet have an obvious direction to take in trying to prove it. If one can show that a system is universal, however, then this does have implications that are closer to our rule 30 problem. In particular, if a system is universal, then there ll be questions (like the halting problem) about its infinite-time behavior that will be undecidable, and which no guaranteed-finite-time computation can answer. But as such, universality is a statement about the existence of initial conditions that reproduce a given computation. It doesn t say anything about the specifics of a particular initial condition or about how long it will take to compute a particular result. OK, but what about a different direction: what about getting empirical evidence about our Problem 3? Is there a way to use statistics, or cryptanalysis, or mathematics, or machine learning to even slightly reduce the computational effort needed to compute the nth value in the center column? Well, we know that the whole 2D pattern of rule 30 is far from random. In fact, of all 2m2 patches, only m m can possibly occur and in practice the number weighted by probability is much smaller. And I don t doubt that facts like this can be used to reduce the effort to compute the center column to less than O(n2) effort (and that would be a nice partial result). But can it be less than O(n) effort? That s a much more difficult question. Clearly if Problem 1 was answered in the negative then it could be. But in a sense asking for less than O(n) computation of the center column is precisely like asking whether there are predictable regularities in it. Of course, even if one could find small-scale statistical regularities in the sequence (as answering Problem 2 in the negative would imply), these wouldn t on their own give one a way to do more than perhaps slightly improve a constant multiplier in the speed of computing the sequence. Could there be some systematically reduced way to compute the sequence using a neural net which is essentially a collection of nested real-number functions? I ve tried to find such a neural net using our current deep-learning technology and haven t been able to get anywhere at all. What about statistical methods? If we could find statistical non-randomness in the sequence, then that would imply an ability to compress the sequence, and thus some redundancy or predictability in the sequence. But I ve tried all sorts of statistical randomness tests on the center column of rule 30 and never found any significant deviation from randomness. (And for many years until we found a slightly more efficient rule we used sequences from finite-size rule 30 systems as our source of random numbers in the Wolfram Language, and no legitimate it s not random! bugs ever showed up.) Statistical tests of randomness typically work by saying, Take the supposedly random sequence and process it in some way, then see if the result is obviously non-random . But what kind of processing should be done? One might see if blocks occur with equal frequency, or if correlations exist, or if some compression algorithm succeeds in doing compression. But typically batteries of tests end up seeming a bit haphazard and arbitrary. In principle one can imagine enumerating all possible tests by enumerating all possible programs that can be applied to the sequence. But I ve tried doing this, for example for classes of cellular automaton rules and have never managed to detect any non-randomness in the rule 30 sequence. So how about using ideas from mathematics to predict the rule 30 sequence? Well, as such, rule 30 doesn t seem connected to any well-developed area of math. But of course it s conceivable that some mapping could be found between rule 30 and ideas, say, in an area like number theory and that these could either help in finding a shortcut for computing rule 30, or could show that computing it is equivalent to some problem like integer factoring that s thought to be fundamentally difficult. I know a few examples of interesting interplays between traditional mathematical structures and cellular automata. For example, consider the digits of successive powers of 3 in base 2 and in base 6: #10005 Row[Riffle[ ArrayPlot[#, ImageSize -> {Automatic, 275}] /@ {Table[ IntegerDigits[3^t, 2, 159], {t, 100}], Table[IntegerDigits[3^t, 6, 62], {t, 100}]}, Spacer[10]]] It turns out that in the base 6 case, the rule for generating the pattern is exactly a cellular automaton. (For base 2, there are additional long-range carries.) But although both these patterns look complex, it turns out that their mathematical structure lets us speed up making certain predictions about them. Consider the sth digit from the right-hand edge of line n in each pattern. It s just the sth digit in 3n, which is given by the formula (where b is the base, here 2 or 6) Mod[Quotient[3n, bs], b]. But how easy is it to evaluate this formula? One might think that to compute 3n one would have to do n multiplications. But this isn t the case: instead, one can for example build up 3n using repeated squaring, with about log(n) multiplications. That this is possible is a consequence of the associativity of multiplication. There s nothing obviously like that for rule 30 but it s always conceivable that some mapping to a mathematical structure like this could be found. Talking of mathematical structure, it s worth mentioning that there are more formula-like ways to state the basic rule for rule 30. For example, taking the values of three adjacent cells to be p, q, r the basic rule is just p (q r) or Xor[p, Or[q, r]]. With numerical cell values 0 and 1, the basic rule is just Mod[p + q + r + q r, 2]. Do these forms help? I don t know. But, for example, it s remarkable that in a sense all the complexity of rule 30 comes from the presence of that one little nonlinear q r term for without that term, one would have rule 150, about which one can develop a complete algebraic theory using quite traditional mathematics. To work out n steps in the evolution of rule 30, one s effectively got to repeatedly compose the basic rule. And so far as one can tell, the symbolic expressions that arise just get more and more complicated and don t show any sign of simplifying in such a way as to save computational work. In Problem 3, we re talking about the computational effort to compute the nth value in the center column of rule 30 and asking if it can be less than O(n). But imagine that we have a definite algorithm for doing the computation. For any given n, we can see what computational resources it uses. Say the result is r[n]. Then what we re asking is whether r[n] is less than big O of n, or whether MaxLimit[r[n]/n, n ] i\)]c[t + p] == c[t]\)\) or #10005 NotExists[{p, i}, ForAll[t, t > i, c[t + p] == c[t]]] or there does not exist a period p and an initial length i such that for all t with t>i, c[t + p] equals c[t]. Problem 2: Does each color of cell occur on average equally often in the center column? #10005 \!\(\*UnderscriptBox[\(\[Limit]\), \(t\* UnderscriptBox["\[Rule]", TemplateBox[{}, "Integers"]]\[Infinity]\)]\) Total[c[t]]/t == 1/2 or #10005 DiscreteLimit[Total[c[t]]/t, t -> Infinity] == 1/2 or the discrete limit of the total of the values in c[t]/t as t is 1/2. Problem 3: Does computing the nth cell of the center column require at least O(n) computational effort? Define machine[m] to be a machine parametrized by m (for example TuringMachine[...]), and let machine[m][n] give {v, t}, where v is the output value, and t is the amount of computational effort taken (e.g. number of steps). Then the problem can be formulated as: #10005 \!\( \*SubscriptBox[\(\[NotExists]\), \(m\)]\(( \*SubscriptBox[\(\[ForAll]\), \(n\)]\(\(machine[m]\)[n]\)[[1]] == Last[c[n]]\ \[And] \ \*UnderscriptBox[\(\[MaxLimit]\), \(n -> \[Infinity]\)] \*FractionBox[\(\(\(machine[m]\)[n]\)[[ 2]]\), \(n\)] \[Infinity])\)\) or there does not exist a machine m which for all n gives c[n], and for which the lim sup of the amount of computational effort spent, divided by n, is finite. (Yes, one should also require that m be finite, so the machines rule cant just store the answer.) The Formal Character of Solutions Before we discuss the individual problems, an obvious question to ask is what the interdependence of the problems might be. If the answer to Problem 3 is negative (which I very strongly doubt), then it holds the possibility for simple algorithms or formulas from which the answers to Problems 1 and 2 might become straightforward. If the answer to Problem 3 is affirmative (as I strongly suspect), then it implies that the answer to Problem 1 must also be affirmative. The contrapositive is also true: if the answer to Problem 1 is negative, then it implies that the answer to Problem 3 must also be negative. If the answer to Problem 1 is negative, so that there is some periodic sequence that appears in the center column, then if one explicitly knows that sequence, one can immediately answer Problem 2. One might think that answering Problem 2 in the negative would imply something about Problem 3. And, yes, unequal probabilities for black and white implies compression by a constant factor in a Shannon-information way. But to compute value with less than O(n) resources and therefore to answer Problem 3 in the negative requires that one be able to identify in a sense infinitely more compression. So what does it take to establish the answers to the problems? If Problem 1 is answered in the negative, then one can imagine explicitly exhibiting the pattern generated by rule 30 at some known step and being able to see the periodic sequence in the center. Of course, Problem 1 could still be answered in the negative, but less constructively. One might be able to show that eventually the sequence has to be periodic, but not know even any bound on where this might happen. If Problem 3 is answered in the negative, a way to do this is to explicitly give an algorithm (or, say, a Turing machine) that does the computation with less than O(n) computational resources. But let s say one has such an algorithm. One still has to prove that for all n, the algorithm will correctly reproduce the nth value. This might be easy. Perhaps there would just be a proof by induction or some such. But it might be arbitrarily hard. For example, it could be that for most n, the running time of the algorithm is clearly less than n. But it might not be obvious that the running time will always even be finite. Indeed, the halting problem for the algorithm might simply be undecidable. But just showing that a particular algorithm doesn t halt for a given n doesn t really tell one anything about the answer to the problem. For that one would have to show that there s no algorithm that exists that will successfully halt in less than O(n) time. The mention of undecidability brings up an issue, however: just what axiom system is one supposed to use to answer the problems? For the purposes of the Prize, I ll just say the traditional axioms of standard mathematics , which one can assume are Peano arithmetic and/or the axioms of set theory (with or without the continuum hypothesis). Could it be that the answers to the problems depend on the choice of axioms or even that they re independent of the traditional axioms (in the sense of G?dels incompleteness theorem)? Historical experience in mathematics makes this seem extremely unlikely, because, to date, essentially all natural problems in mathematics seem to have turned out to be decidable in the (sometimes rather implicit) axiom system that s used in doing the mathematics. In the computational universe, though freed from the bounds of historical math tradition it s vastly more common to run into undecidability. And, actually, my guess is that a fair fraction of long-unsolved problems even in traditional mathematics will also turn out to be undecidable. So that definitely raises the possibility that the problems here could be independent of at least some standard axiom systems. OK, but assume there s no undecidability around, and one s not dealing with the few cases in which one can just answer a problem by saying look at this explicitly constructed thing . Well, then to answer the problem, we re going to have to give a proof. In essence what drives the need for proof is the presence of something infinite. We want to know something for any n, even infinitely large, etc. And the only way to handle this is then to represent things symbolically ( the symbol Infinity means infinity , etc.), and apply formal rules to everything, defined by the axioms in the underlying axiom system one s assuming. In the best case, one might be able to just explicitly exhibit that series of rule applications in such a way that a computer can immediately verify that they re correct. Perhaps the series of rule applications could be found by automated theorem proving (as in FindEquationalProof). More likely, it might be constructed using a proof assistant system. It would certainly be exciting to have a fully formalized proof of the answer to any of the problems. But my guess is that it ll be vastly easier to construct a standard proof of the kind human mathematicians traditionally do. What is such a proof? Well, its basically an argument that will convince other humans that a result is correct. There isn t really a precise definition of that. In our step-by-step solutions in Wolfram|Alpha, we re effectively proving results (say in calculus) in such a way that students can follow them. In an academic math journal, one s giving proofs that successfully get past the peer review process for the journal. My own guess would be that if one were to try to formalize essentially any nontrivial proof in the math literature, one would find little corners that require new results, though usually ones that wouldn t be too hard to get. How can we handle this in practice for our prizes? In essence, we have to define a computational contract for what constitutes success, and when prize money should be paid out. For a constructive proof, we can get Wolfram Language code that can explicitly be run on any sufficiently large computer to establish the result. For formalized proofs, we can get Wolfram Language code that can run through the proof, validating each step. But what about for a human proof ? Ultimately we have no choice but to rely on some kind of human review process. We can ask multiple people to verify the proof. We could have some blockchain-inspired scheme where people stake the correctness of the proof, then if one eventually gets consensus (whatever this means) one pays out to people some of the prize money, in proportion to their stake. But whatever is done, it s going to be an imperfect, societal result like almost all of the pure mathematics that s so far been done in the world. What Will It Take? OK, so for people interested in working on the Problems, what skills are relevant? I dont really know. It could be discrete and combinatorial mathematics. It could be number theory, if theres a correspondence with number-based systems found. It could be some branch of algebraic mathematics, if theres a correspondence with algebraic systems found. It could be dynamical systems theory. It could be something closer to mathematical logic or theoretical computer science, like the theory of term rewriting systems. Of course, it could be that no existing towers of knowledge say in branches of mathematics will be relevant to the problems, and that to solve them will require building from the ground up . And indeed that s effectively what ended up happening in the solution for my 2,3 Turing Machine Prize in 2007. I m a great believer in the power of computer experiments and of course it s on the basis of computer experiments that I ve formulated the Rule 30 Prize Problems. But there are definitely more computer experiments that could be done. So far we know a billion elements in the center column sequence. And so far the sequence doesnt seem to show any deviation from randomness (at least based on tests Ive tried). But maybe at a trillion elements (which should be well within range of current computer systems) or a quadrillion elements, or more, it eventually will and it s definitely worth doing the computations to check. The direct way to compute n elements in the center column is to run rule 30 for n steps, using at an intermediate stage up to n cells of memory. The actual computation is quite well optimized in the Wolfram Language. Running on my desktop computer, it takes less than 0.4 seconds to compute 100,000 elements: #10005 CellularAutomaton[30, {{1}, 0}, {100000, {{0}}}]; // Timing Internally, this is using the fact that rule 30 can be expressed as Xor[p, Or[q, r]], and implemented using bitwise operations on whole words of data at a time. Using explicit bitwise operations on long integers takes about twice as long as the built-in CellularAutomaton function: #10005 Module[{a = 1}, Table[BitGet[a, a = BitXor[a, BitOr[2 a, 4 a]]; i - 1], {i, 100000}]]; // Timing But these results are from single CPU processors. Its perfectly possible to imagine parallelizing across many CPUs, or using GPUs. One might imagine that one could speed up the computation by effectively caching the results of many steps in rule 30 evolution, but the fact that across the rows of the rule 30 pattern all blocks appear to occur with at least roughly equal frequency makes it seem as though this would not lead to significant speedup. Solving some types of math-like problems seem pretty certain to require deep knowledge of high-level existing mathematics. For example, it seems quite unlikely that there can be an elementary proof of Fermats last theorem, or even of the four-color theorem. But for the Rule 30 Prize Problems it s not clear to me. Each of them might need sophisticated existing mathematics, or they might not. They might be accessible only to people professionally trained in mathematics, or they might be solvable by clever programming-style or puzzle-style work, without sophisticated mathematics. Generalizations and Relations Sometimes the best way to solve a specific problem is first to solve a related problem often a more general one and then come back to the specific problem. And there are certainly many problems related to the Rule 30 Prize Problems that one can consider. For example, instead of looking at the vertical column of cells at the center of the rule 30 pattern, one could look at a column of cells in a different direction. At 45, it s easy to see that any sequence must be periodic. On the left the periods increase very slowly; on the right they increase rapidly. But what about other angles? Or what about looking at rows of cells in the pattern? Do all possible blocks occur? How many steps is it before any given block appears? The empirical evidence doesn t see any deviation from blocks occurring at random, but obviously, for example, successive rows are highly correlated. What about different initial conditions? There are many dynamical systemsstyle results about the behavior of rule 30 starting with equal probability from all possible infinite initial conditions. In this case, for example, it s easy to show that all possible blocks occur with equal frequency, both at a given row, and in a given vertical column. Things get more complicated if one asks for initial conditions that correspond, for example, to all possible sequences generated by a given finite state machine, and one could imagine that from a sequence of results about different sets of possible initial conditions, one would eventually be able to say something about the case of the single black cell initial condition. Another straightforward generalization is just to look not at a single black cell initial condition, but at other special initial conditions. An infinite periodic initial condition will always give periodic behavior (that s the same as one gets in a finite-size region with periodic boundary conditions). But one can, for example, study what happens if one puts a single defect in the periodic pattern: #10005 GraphicsRow[(ArrayPlot[ CellularAutomaton[30, MapAt[1 - #1 , Flatten[Table[#1, Round[150/Length[#1]]]], 50], 100]] ) /@ {{1, 0}, {1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0}, {1, 0, 0, 0, 0, 0, 0}, {1, 1, 1, 0, 0}}] One can also ask what happens when one has not just a single black cell, but some longer sequence in the initial conditions. How does the center column change with different initial sequences? Are there finite initial sequences that lead to simpler center columns? Or are there infinite initial conditions generated by other computational systems (say substitution systems) that aren t periodic, but still give somehow simple rule 30 patterns? Then one can imagine going beyond rule 30. What happens if one adds longer-range exceptions to the rules? When do extensions to rule 30 show behavior that can be analyzed in one way or another? And can one then see the effect of removing the exceptions in the rule? Of course, one can consider rules quite different from rule 30 as well and perhaps hope to develop intuition or methods relevant to rule 30 by looking at other rules. Even among the 256 two-color nearest-neighbor rules, there are others that show complex behavior starting from a simple initial condition: #10005 Row[Riffle[ Labeled[ArrayPlot[CellularAutomaton[#, {{1}, 0}, {150, All}], PixelConstrained -> 1, Frame -> False], Style[Text[StringTemplate["rule ``"][#]], 12], LabelStyle -> Opacity[.5]] /@ {45, 73}, Spacer[8]]] And if one looks at larger numbers of colors and larger neighbors one can find an infinite number of examples. Theres all sorts of behavior that one sees. And, for example, given any particular sequence, one can search for rules that will generate it as their center column. One can also try to classify the center-column sequences that one sees, perhaps identifying a general class like rule 30 about which global statements can be made. But let s discuss the specific Rule 30 Prize Problems. To investigate the possibility of periodicity in rule 30 (as in Problem 1), one could study lots of different rules, looking for examples with very long periods, or very long transients and try to use these to develop an intuition for how and when these can occur. To investigate the equal-frequency phenomenon of Problem 2, one can look at different statistical features, and see both in rule 30 and across different rules when it s possible to detect regularity. For Problem 3, one can start looking at different levels of computational effort. Can one find the nth value with computational effort O(n ) for any  (0)

5. Deploying Bandpass Filters Using the Wolfram Language Microcontroller Kit., 24 .[−]

Deploying Bandpass Filters Using the Wolfram Language Microcontroller Kit

Real-time filters work like magic. Usually out of sight, they clean data to make it useful for the larger system they are part of, and sometimes even for human consumption. A fascinating thing about these filters is that they dont have a big-picture perspective. They work wonders with only a small window into the data that is streaming in. On the other hand, if I had a stream of numbers flying across my screen, I would at the very least need to plot it to make sense of the data. These types of filters are very simple as well.

Take a basic lowpass filter based on two weights, 49/50 and 1/50. The filter computes its next output as a weighted sum of the current output and input. Specifically, outputk+1 = 49/50 outputk + 1/50 inputk. So with a few elementary operations, it essentially strips away the high-frequency component on a properly sampled input signal:

With
&#10005

With[{u = Table[Sin[t/10] + Sin[10 t], {t, 0, 150, 1/10.}]},
 ListLinePlot[{u, RecurrenceFilter[{{1, -(49/50)}, {1/50}}, u]},
  PlotLegends -> {"original signal", "filtered signal"}]]

In hindsight, this output makes sense. The filter relies only a bit on the current input, and hence misses the high-frequency portion of the signal, effectively getting rid of it. Any slowly changing component has already been captured in the output.

The filter weights need not be conjured up or arrived at by trial and error. They can be obtained in a systematic fashion using the signal processing functions in the Wolfram Language. (And thats what I will be doing in the next section for a couple of nontrivial filters.) I can also compute the frequency response of the filter, which will tell me how it will shift and smother or amplify a signal. This creates a ballpark of what to expect from the real-time responses of the filter. I can simulate its real-time response to various signals as well. But how does the filter perform in real time? Can I see the shifting and smothering? And how does this compare to what is predicted by the frequency response?

To investigate all this, I will pick some slightly more involved filtersnamely a bandpass Butterworth and a Chebyshev 1 filterand deploy them to an Arduino Nano. To go about this, I need several things. Luckily, they are all now available in the Wolfram Language.

The functionality to analyze and design filters has existed in the Wolfram Language for quite some time now. The next link in the workflow is the C or Arduino filter code that needs to be deployed to the microcontroller. Thanks to the Microcontroller Kit, which was released with Version 12, I can generate and deploy the code directly from the Wolfram Language. It will automatically generate the needed microcontroller code, sparing me the wearisome task of having to write the code and make sure that I have the coefficients correct. And finally, I use the Wolfram Device Framework to transmit and receive the data from the Nano.

In the first part of this blog post, I will compute, analyze and deploy the filters. In the second part, I will acquire and analyze the filtered data to visualize the responses and evaluate the performance of the filters.

Compute, Analyze and Deploy the Filter

I start off by creating a function to compute a bandpass filter that will pass signals with frequencies between 2 Hz and 5 Hz and attenuate signals of 1 Hz and 10 Hz by -30 dB:

tfm
&#10005

tfm[filter_, s_: s] :=
 filter[{"Bandpass", {1, 2, 5, 10}, {30, 1}}, s] // N //
   TransferFunctionExpand // Chop

With this, I obtain the corresponding Butterworth and Chebyshev 1 bandpass filters:

{brw, cbs} = tfm /@ {ButterworthFilterModel, Chebyshev1FilterModel};
&#10005

{brw, cbs} = tfm /@ {ButterworthFilterModel, Chebyshev1FilterModel};

Grid
&#10005

Grid[{{"Butterworth filter", brw}, {"Chebyshev 1 filter", cbs}},
 Background -> {None, {Lighter[#, 0.6] & /@ {cB, cC}}}, Frame -> True,
  Spacings -> {1, 1}, Alignment -> Left]

The Bode magnitude plot attests that frequencies between 2 Hz and 5 Hz will be passed through, and frequencies outside this range will be attenuated, with an attenuation of around -35 dB at 1 Hz and 10 Hz:

{bMagPlot, bPhasePlot} = BodePlot
&#10005

{bMagPlot, bPhasePlot} =
  BodePlot[{brw, cbs}, {0.5, 18}, PlotLayout -> "List",
   GridLines -> Table[{{1, 2, 5, 10}, Automatic}, {2}],
   FrameLabel -> {{"frequency (Hz)",
      "magnitude (dB)"}, {"frequency (Hz)", "phase (deg)"}},
   FrameTicks ->
    Table[{{Automatic, Automatic}, {{1, 2, 5, 10}, Automatic}}, {2}],
   ImageSize -> Medium, PlotStyle -> {cB, cC, cI},
   PlotTheme -> "Detailed",
   PlotLegends -> {"Butterworth", "Chebyshev 1"}];

bMagPlot
&#10005

bMagPlot

The phase plot shows that at around 3 Hz for the Butterworth filter and 2 Hz for the Chebyshev, there will be no phase shift. For all other frequencies, there will be varying amounts of phase shifts:

bPhasePlot
&#10005

bPhasePlot

Later, I will put these frequency responses side by side with the filtered response from the microcontroller to check if they add up.

For now, I will simulate the response of the filters to a signal with three frequency components. The second component lies in the bandpass range, while the other two lie outside it:

inpC = Sin
&#10005

inpC = Sin[0.75 t] + Sin[2.5 t] + Sin[10 t];

The responses have a single frequency component, and the two frequences outside the bandpass have in fact been stripped away:

outC = Table
&#10005

outC = Table[OutputResponse[sys, inpC, {t, 0, 15}], {sys, {brw, cbs}}];
Plot[{outC, inpC}, {t, 0, 15},
 PlotLegends -> {"Butterworth", "Chebyshev 1", "Input"},
 PlotStyle -> {cB, cC, cI}, PlotTheme -> "Detailed"]

I then discretize the filters and put them together into one model:

sp = 0.1;
&#10005

sp = 0.1;

StateSpaceModel
&#10005

StateSpaceModel[ToDiscreteTimeModel[#, sp]] & /@ {brw, cbs} ;
sysD = NonlinearStateSpaceModel[
    SystemsModelParallelConnect[Sequence @@ %, {{1, 1}}, {}]] /.
   Times[1.`, v_] :> v // Chop

For good measure, I simulate and verify the response of the discretized model as well:

With
&#10005

With[{tmax = 15}, With[{inpD = Table[inpC, {t, 0, tmax, sp}]},
  ListLinePlot[Join[OutputResponse[sysD, inpD], {inpD}],
   PlotLegends -> {"Butterworth", "Chebyshev 1", "Input"},
   PlotStyle -> {cB, cC, cI}, PlotTheme -> "Detailed",
   PlotMarkers -> {Automatic, Scaled[0.025]}]]]

And I wrap up this first stage by deploying the filter and setting up the Arduino to send and receive the signals over the serial pins:

MicrocontrollerEmbedCode
&#10005

\[ScriptCapitalM] =
 MicrocontrollerEmbedCode[
  sysD, <|"Target" -> "ArduinoNano", "Inputs" -> "Serial",
   "Outputs" -> {"Serial", "Serial"}|>, "/dev/cu.usbserial-A106PX6Q"]

The port name /dev/cu.usbserial-A106PX6Q will not work for you. If you are following along, you will have to change it to the correct value. You can figure it out using Device Manager on Windows, or by searching for file names of the form /dev/cu.usb* and /dev/ttyUSB* on Mac and Linux systems, respectively.

At this point, I can connect any other serial device to send and receive the data. I will use the Device Framework in Mathematica to do that, as its notebook interface provides a great way to visualize the data in real time.

Acquire and Display the Data

To set up the data transfer, I begin by identifying the start, delimiter and end bytes:

{sB, dB, eB} = Lookup
&#10005

{sB, dB, eB} =
 Lookup[\[ScriptCapitalM]["Serial"], {"StartByte", "DelimiterByte",
   "EndByte"}]

Then I create a scheduled task that reads the filtered output signals and sends the input signal over to the Arduino, and runs at exactly the same sampling period as the discretized filters:

i1 = i2 = 1; yRaw = y1 = y2 = u1 = u2 = {}; task1 = ScheduledTask
&#10005

i1 = i2 = 1;
 yRaw = y1 = y2 = u1 = u2 = {};
task1 = ScheduledTask[
   If[DeviceExecute[dev, "SerialReadyQ"],
    If[i1 > len1, i1 = 1]; If[i2 > len2, i2 = 1];
    AppendTo[yRaw, DeviceReadBuffer[dev, "ReadTerminator" -> eB]];
    AppendTo[u1, in1[[i1++]]];
    AppendTo[u2, in2[[i2++]]];
    DeviceWrite[dev, sB];
    DeviceWrite[dev, ToString[Last[u1] + Last[u2]]];
    DeviceWrite[dev, eB]
    ]
   ,
   sp];

I also create a second and less frequent scheduled task that parses the data and discards the old values:

task2 = ScheduledTask
&#10005

task2 = ScheduledTask[
   yRaw1 = yRaw;
   yRaw = Drop[yRaw, Length@yRaw1];
   {y1P, y2P} = Transpose[parseData /@ yRaw1];
   y1 = Join[y1, y1P];
   y2 = Join[y2, y2P];
   {u1, u2, y1, y2} = drop /@ {u1, u2, y1, y2};
   ,
   3 sp
   ];

Now I am ready to actually send and receive the data, so I open a connection to the Arduino:

dev = DeviceOpen
&#10005

dev = DeviceOpen["Serial", "/dev/cu.usbserial-A106PX6Q"]

I then generate the input signals and submit the scheduled tasks to the kernel:

signals
&#10005

signals[w1, g1, w2, g2];
{taskObj1, taskObj2} = SessionSubmit /@ {task1, task2};

The input signals are of the form g1 Sin[w1 t] and g2 Sin[w1 t]. The utility function signals generates a cyclically padded list of the sampled values for each input. The code for this and other utility functions are available in the downloadable notebook.

At this point, the data is going back and forth between my Mac and the Arduino. To visualize the data and control the input signals, I create a panel. From the panel, I control the frequency and magnitude of the input signals. I plot the input and filtered signals, and also the frequency response of the filters. The frequency response plots have lines showing the expected magnitude and phase of the filtered signals, which I can verify on the signal plots:

DynamicModule
&#10005

DynamicModule[{
 iPart, ioPart,
  plotOpts = {ImageSize -> Medium, GridLines -> Automatic,
    Frame -> True, PlotTheme -> "Business"}}, Panel[
Column[{
Grid[{{
Panel[
Column[{
Style["Input signal 1", Bold, cI1],
Grid[{{"freq.",
Slider[
Dynamic[w1, {None, (w1 = #)& , signals[#, g1, w2, g2]& }], {
              0, 7, 0.25}, Appearance -> "Labeled"]}}],
Grid[{{"mag.",
Slider[
Dynamic[g1, {None, (g1 = #)& , signals[w1, #, w2, g2]& }], {
              0, 5, 0.25}, Appearance -> "Labeled"]}}]}]],
Panel[
Column[{
Style["Input signal 2", Bold, cI2],
Grid[{{"freq.",
Slider[
Dynamic[w2, {None, (w2 = #)& , signals[w1, g1, #, g2]& }], {
              0, 7, 0.25}, Appearance -> "Labeled"]}}],
Grid[{{"mag.",
Slider[
Dynamic[g2, {None, (g2 = #)& , signals[w1, g1, w2, #]& }], {
              0, 5, 0.25}, Appearance -> "Labeled"]}}]}]]}}],
Grid[{{
Dynamic[
ListLinePlot[
Part[{u1, u2, u1 + u2}, iPart],
         PlotStyle -> Part[{cI1, cI2, cI}, iPart],
         PlotRange -> {All, (g1 + g2) {1, -1}}, plotOpts],
        TrackedSymbols :> {u1, u2},
        Initialization :> (iPart = {3}; g1 = (g2 = 0); Null)],
Dynamic[
ListLinePlot[
Part[
Join[{y1, y2}, {u1 + u2}], ioPart],
         PlotStyle -> Part[{cB, cC, cI}, ioPart],
         PlotRange -> {All, (g1 + g2) {1, -1}},
         FrameTicks -> {{None, All}, {Automatic, Automatic}},
         plotOpts], TrackedSymbols :> {y1, y2, u1, u2},
        Initialization :> (ioPart = {1, 2})]}, {
Grid[{{
CheckboxBar[
Dynamic[iPart],
Thread[Range[3] -> ""], Appearance -> "Vertical"], lI}}],
Grid[{{
CheckboxBar[
Dynamic[ioPart],
Thread[Range[3] -> ""], Appearance -> "Vertical"], lIO}}]}, {
Labeled[
Dynamic[
BodePlot[{brw, cbs}, {0.5, 18}, PlotLayout -> "Magnitude",
          PlotLegends -> None, GridLines -> Automatic,
          ImageSize -> Medium, PlotStyle -> {cB, cC},
          PlotTheme -> "Detailed", Epilog -> Join[
mLine[w1, cI1],
mLine[w2, cI2]]], TrackedSymbols :> {w1, w2},
         SaveDefinitions -> True], "Bode magnitude"],
Labeled[
Dynamic[
BodePlot[{brw, cbs}, {0.5, 18}, PlotLayout -> "Phase",
          PlotLegends -> None, GridLines -> Automatic,
          ImageSize -> Medium, PlotStyle -> {cB, cC},
          PlotTheme -> "Detailed", Epilog -> Join[
pLine[w1, cI1],
pLine[w2, cI2]]], TrackedSymbols :> {w1, w2},
         SaveDefinitions -> True], "Bode phase"]}},
     Frame -> {All, {True, None, True}},
     FrameStyle -> Lighter[Gray, 0.6]]}, Center]]]

Since the Arduino needs to be up and running to see the panel update dynamically, Ill show you some examples of my results here:

Finally, before disconnecting the Arduino, I remove the tasks and close the connection to the device:

TaskRemove /@ {taskObj1, taskObj2}; DeviceClose
&#10005

TaskRemove /@ {taskObj1, taskObj2};
DeviceClose[dev]

Its interesting to see in real time how the response of the deployed filter closely matches that of the analog version.

I think its extremely convenient to have the design and microcontroller code so closely coupled, the reason being that if I looked at just the microcontroller code, the coefficients of the filters are abstruse, especially compared to those of the simple first-order lowpass filter. However, the design specifications from which they were obtained are very concrete; thus, having them in one unified place makes it better suited for further analysis and experimentation.

The entire notebook with the filter design, analysis, code generation and visualization is available for download. Of course, you need an Arduino Nano or a similar microcontroller. And with just a few small tweaksin some cases just changing the target and port nameyou can replicate it on a whole host of other microcontrollers. Happy tinkering!

Get full access to the latest Wolfram Language functionality for the Microcontroller Kit with Mathematica 12.

Buy now

 (2)

6. From Data to Insights: An Online Course for Learning the Multiparadigm Workflow., 17 .[−]

From Data to Insights: An Online Course for Learning the Multiparadigm Workflow

Our interactive Multiparadigm Data Science (MPDS) course has been up at Wolfram U for over a month now, and were pretty satisfied with the results so far. Hundreds of people have started the courseincluding students from our first Data Science Boot Camp, who joined us at Wolfram headquarters for a three-week training camp. Thanks to the success of the boot camp, we have also had several projects submitted for advanced MPDS certification, which will soon be available within the interactive course.

But what exactly does it mean to be a practitioner of MPDS? And how might the multiparadigm approach improve my computational projects? To find out, I decided to try this free course for myself.

About the Course?Taker

My background is pretty broadIve done some technical support, audio engineering and web development, and my BS is in computer science and physics. Though Ive had some experience with error propagation, linear regression and plenty of programming projects related to data analysis, I have never thought of myself as a data scientist.

As a technical writer for Wolfram, I get to play around with Wolfram Language functions here and there. I have had the pleasure of working on some fascinating pieces that really got my gears turning ( text analytics and geovisualization are currently my favorite coding areas). And recently, Ive become pretty well acquainted with MPDS and how our technology enables it. The approach resonates with me, and Ive been thinking about getting more into data science.

I dont normally have the right schedule for traditional classes or guided webinars. This open, interactive course seemed just right for my situation and learning goals. And its free, so of course I jumped at the chance.

Starting with a Concrete Plan

Ive seen and used a lot of data science functionality before, but mainly in isolation. Getting quick, high-level output is easy with the Wolfram Language; for me, the challenge is usually figuring out what to do next. So I appreciated having the full workflow laid out up front:

MPDS workflow

Mapping the stages out this way helps me better understand which functions are useful for which steps. And having a repeatable process makes data science seem more like something I can achieve. Sometimes starting with the right question and following a consistent process can do more to solve a problem than an entire collection of neural nets.

At the same time, the opening section emphasizes that the process is iterative: later stages can generate new information to feed back into the earlier ones. Though continually starting over might seem counterintuitive, I see this process as comparable to the Agile approach I was taught for software development. Besides, scientific discovery comes from trying a lot of different things. Why should data science be any different?

Course Highlights

This was my first time taking an online course, so I was glad to see both a transcript and an interactive scratch notebook. I think I finished much faster (and probably learned more) because those made it easy to follow along. As a relentless tinkerer, I found the scratch notebook extra helpful in the more code-heavy sections, because it allowed me to experiment until I understood exactly what each piece of code was doing:

Course layout

The variety of examples throughout the course was useful for me as well. Seeing MPDS applied to a range of subjects gave me a lot of food for thought. Beyond that, it helped solidify the idea that methods and techniques dont need to be subject-specific. For instance, while Im familiar with using regression analysis as part of a lab experiment, I had never considered how it might apply to estimating a credit score:

Credit score

Though it was by far the longest section in the course, the fourth section, Assemble a Multiparadigm Toolkit, turned out to be the most informative part for me. As Abritas post points out, there are a lot of different tools available, and this gave me a great overview without getting bogged down in unnecessary detail. I especially like that these later videos refer back to the Question stage, pointing out the different questions that might be answered by each technique described. That led to a few big Ah-ha! moments for me.

Course questions

I also got a lot out of the final section, Getting the Message Across, which has some excellent examples of easy Wolfram Language deployments. Report generation is one functional area Ive always been curious to explore but have never managed to get the hang of on my own. The examples here worked well as springboards for creating my own reports:

Create reports

The quiz questions are mostly about Wolfram Language functionality, so I found them pretty straightforward. Even for those less familiar with the language, the videos answer the questions directly. And if youre having difficulty, you can always review the content and try again. Once I completed all the videos and quizzes, I earned a certificate:

Course certificate

Hungry for More

After finishing the course, I walked away with a lot of questionsbut in a good way! I found myself thinking about how I could apply my new understanding to my own computational ideas. How can I represent my data better? What happens if I swap classification methods? Will a different visualization show me new patterns?

That questioning nature is what drives successful exploration and discovery, and its a big part of what makes MPDS so effective. Following the multiparadigm workflow can open up new avenues for discovering all kinds of unique insights. But dont take my word for ittry it for yourself!

Take the interactive MPDS course now to streamline your data science workflow, or check out our Data Science & Statistics page for the latest Wolfram U events and courses.
 (0)

7. The Ease of Wolfram|Alpha, the Power of Mathematica: Introducing Wolfram|Alpha Notebook Edition., 12 .[−]

Wolfram|Alpha Notebook Edition

The Next Big Step for Wolfram|Alpha

Wolfram|Alpha has been a huge hit with students. Whether in college or high school, Wolfram|Alpha has become a ubiquitous way for students to get answers. But its a one-shot process: a student enters the question they want to ask ( say in math) and Wolfram|Alpha gives them the (usually richly contextualized) answer. Its incredibly useful—especially when coupled with its step-by-step solution capabilities.

But what if one doesn’t want just a one-shot answer? What if one wants to build up (or work through) a whole computation? Well, that’s what we created Mathematica and its whole notebook interface to do. And for more than 30 years that’s how countless inventions and discoveries have been made around the world. It’s also how generations of higher-level students have been taught.

But what about students who aren’t ready to use Mathematica yet? What if we could take the power of Mathematica (and what’s now the Wolfram Language), but combine it with the ease of Wolfram|Alpha?

Well, that’s what we’ve done in Wolfram|Alpha Notebook Edition.

It’s built on a huge tower of technology, but what it does is to let any student—without learning any syntax or reading any documentation— immediately build up or work through computations. Just type input the way you would in Wolfram|Alpha. But now you’re not just getting a one-shot answer. Instead, everything is in a Wolfram Notebook, where you can save and use previous results, and build up or work through a whole computation:

Wolfram Notebook

The Power of Notebooks

Being able to use Wolfram|Alpha-style free-form input is what opens Wolfram|Alpha Notebook Edition up to the full range of students. But its the use of the notebook environment that makes it so uniquely valuable for education. Because by being able to work through things in a sequence of steps, students get to really engage with the computations theyre doing.

Try one step. See what happens. Change it if you want. Understand the output. See how it fits into the next step. And then—right there in the notebook—see how all your steps fit together to give your final results. And then save your work in the notebook, to continue—or review what you did—another time.

But notebooks aren’t just for storing computations. They can also contain text and structure. So students can use them not just to do their computations, but also to keep notes, and to explain the computations they’re doing, or the results they get:

Student notebook

And in fact, Wolfram Notebooks enable a whole new kind of student work: computational essays. A computational essay has both text and computation—combined to build up a narrative to which both human and computer contribute.

The process of creating a computational essay is a great way for students to engage with material they’re studying. Computational essays can also provide a great showcase of student achievement, as well as a means of assessing student understanding. And they’re not just something to produce for an assignment: they’re active computable documents that students can keep and use at any time in the future.

But students arent the only ones to produce notebooks. In Wolfram|Alpha Notebook Edition, notebooks are also a great medium for teachers to provide material to students. Describe a concept in a notebook, then let students explore by doing their own computations right there in the notebook. Or make a notebook defining an assignment or a test—then let the students fill in their work (and grade it right there in the notebook).

Assignment

Its very common to use Wolfram|Alpha Notebook Edition to create visualizations of concepts. Often students will just ask for the visualizations themselves. But teachers can also set up templates for visualizations, and let students fill in their own functions or data to explore for themselves.

Visualizations

Wolfram|Alpha Notebook Edition also supports dynamic interactive visualizations—for example using the Wolfram Language Manipulate function. And in Wolfram|Alpha Notebook Edition students (and teachers!) can build all sorts of dynamic visualizations just using natural language:

Dynamic visualizations

But what if you want some more sophisticated interactive demonstration, that might be hard to specify? Well, Wolfram|Alpha Notebook Edition has direct access to the Wolfram Demonstrations Project, which contains over 12,000 Demonstrations. You can ask for Demonstrations using natural language, or you can just browse the Demonstrations Project website, select a Demonstration, copy it into your Wolfram|Alpha Notebook Edition notebook, and then immediately use it there:

Demonstrations

With Wolfram|Alpha Notebook Edition its very easy to create compelling content. The content can involve pure calculations or visualizations. But—using the capabilities of the Wolfram Knowledgebase—it can also involve a vast range of real-world data, whether about countries, chemicals, words or artworks. And you can access it using natural language, and work with it directly in a notebook:

Using natural language

Wolfram|Alpha Notebook Edition is a great tool for students to use on their own computers. But it’s also a great tool for lectures and class demonstrations (as well as for student presentations). Go to File > New > Presenter Notebook, and you’ll get a notebook that’s set up to create a Wolfram|Alpha Notebook Edition slide show:

Presenter notebook

Click Start Presentation and you can start presenting. But what youll have is not just a PowerPoint-style slide show. Its a fully interactive, editable, computable slide show. The Manipulate interfaces work. Everything is immediately editable. And you can do computations right there during the presentation, exploring different cases, pulling in different data, and so on.

Slide show

Making Code from Natural Language

We invented notebooks more than 30 years ago, and theyve been widely used in Mathematica ever since. But while in Mathematica (and Wolfram Desktop) notebooks you (by default) specify computations in the precise syntax and semantics of the Wolfram Language, in Wolfram|Alpha Notebook Edition notebooks you instead specify them just using free-form Wolfram|Alpha-style input.

And indeed one of the key technical achievements that’s made Wolfram|Alpha Notebook Edition possible is that we’ve now developed increasingly robust natural-language-to-code technology that’s able to go from the free-form natural language input you type to precise Wolfram Language code that can be used to build up computations:

Natural language to code

By default, Wolfram|Alpha Notebook Edition is set up to show you the Wolfram Language code it generates. You don’t need to look at this code (and you can set it to always be hidden). But— satisfyingly for me as a language designer—students seem to find it very easy to read, often actually easier than math. And reading it gives them an extra opportunity to understand what’s going on—and to make sure the computation theyve specified is actually the one they want.

And there’s a great side effect to the fact that Wolfram|Alpha Notebook Edition generates code: through routinely being exposed to code that represents natural language they’ve entered, students gradually absorb the idea of expressing things in computational language, and the concepts of computational thinking.

If a student wants to change a computation when they’re using Wolfram|Alpha Notebook Edition, they can always edit the free-form input they gave. But they can also directly edit the Wolfram Language that’s been generated, giving them real computational language experience.

Free-form input

What Should I Do Next? The Predictive Interface

A central goal of Wolfram|Alpha Notebook Edition is to be completely self-service—so that students at all levels can successfully use it without any outside instruction or assistance. Of course, free-form input is a key part of achieving this. But another part is the Wolfram|Alpha Notebook Edition Predictive Interface—that suggests what to do next based on what students have done.

Enter a computation and youll typically see some buttons pop up under the input field:

Buttons

These buttons will suggest directions to take. Here step-by-step solution generates an enhanced interactive version of Wolfram|Alpha Pro step-by-step functionality—all right in the notebook:

Step-by-step functionality

Click related computations and youll see suggestions for different computations you might want to do:

Related computations

It suggests plotting the integrand and the integral:

Plotting the integrand and the integral

It also suggests you might like to see a series expansion:

Series expansion

Now notice that underneath the output theres a bar of suggestions about possible follow-on computations to do on this output. Click, for example, coefficient list to find the list of coefficients:

Coefficient list

Now there are new suggestions. Click, for example, total to find the total of the coefficients:

Find the total of the coefficients

The Math Experience

Wolfram|Alpha Notebook Edition has got lots of features to enhance the math experience. For example, click the button at the top of the notebook and youll get a math keyboard that you can use to directly enter math notation:

Math keyboard

The Wolfram Language that underlies Wolfram|Alpha Notebook Edition routinely handles the math thats needed by the worlds top mathematicians. But having all that sophisticated math can sometimes lead to confusions for students. So in Wolfram|Alpha Notebook Edition there are ways to say keep the math simple. For example, you can set it to minimize the use of complex numbers:

Simplified

Simplified

Wolfram|Alpha Notebook Edition also by default does things like adding constants of integration to indefinite integrals:

Constants of integration

By the way, Wolfram|Alpha Notebook Edition by default automatically formats mathematical output in elegant “traditional textbook” form. But it always includes a little button next to each output, so you can toggle between traditional form, and standard Wolfram Language form.

Its quite common in doing math to have a function, and just say I want to plot that! But what range should you use? In Mathematica (or the Wolfram Language), youd have to specify it. But in Wolfram|Alpha Notebook Edition theres always an automatic range thats picked:

Automatic range

But since you can see the Wolfram Language code—including the range—its easy to change that, and specify whatever range you want.

Specify range

What if you want to get an interactive control to change the range, or to change a parameter in the function? In Mathematica (or the Wolfram Language) you’d have to write a Manipulate. But in Wolfram|Alpha Notebook Edition, you can build a whole interactive interface just using natural language:

Interactive interface

And because in Wolfram|Alpha Notebook Edition the Manipulate computations are all running directly on your local computer, nothing is being slowed down by network transmission—and so everything moves at full speed. (Also, if you have a long computation, you can just let it keep running on your computer; there’s no timeout like in Wolfram|Alpha on the web.)

Multistep Computation

One of the important features of Wolfram|Alpha Notebook Edition is that it doesnt just do one-shot computations; it allows you to do multistep computations that in effect involve a back-and-forth conversation with the computer, in which you routinely refer to previous results:

Multistep computation

Often it’s enough to just talk about the most recent result, and say things like “plot it as a function x”. But it’s also quite common to want to refer back to results earlier in the notebook. One way to do this is to say things like “the result before last”—or to use the Out[n] labels for each result. But another thing that Wolfram|Alpha Notebook Edition allows you to do is to set values of variables, that you can then use throughout your session:

Set values

Its also possible to define functions, all with natural language:

Define functions

There are lots of complicated design and implementation issues that arise in dealing with multistep computations. For example, if you have a traditional result for an indefinite integral, with a constant of integration, what do you do with the constant when you want to plot the result? (Wolfram|Alpha Notebook Edition consistently handles arbitrary additive constants in plots by effectively setting them to zero.)

Integrate x

It can also be complicated to know what refers to what in the conversation. If you say plot, are you trying to plot your latest result, or are you asking for an interface to create a completely new plot? If you use a pronoun, as in plot it, then its potentially more obvious what you mean, and Wolfram|Alpha Notebook Edition has a better chance of being able to use its natural language understanding capabilities to figure it out.

The World with Wolfram|Alpha Notebook Edition

Its been very satisfying to see how extensively Wolfram|Alpha has been adopted by students. But mostly that adoption has been outside the classroom. Now, with Wolfram|Alpha Notebook Edition, weve got a tool that can immediately be put to use in the classroom, across the whole college and precollege spectrum. And Im excited to see how it can streamline coursework, deepen understanding, enable new concepts to be taught, and effectively provide a course-based personal AI tutor for every student.

Starting today, Wolfram|Alpha Notebook Edition is available on all standard computer platforms (Mac, Windows, Linux). (A cloud version will also be available on the web soon.) Colleges and universities with full Wolfram Technology System site licenses can automatically start using Wolfram|Alpha Notebook Edition today; at schools with other site licenses, it can immediately be added. It’s available to K12 schools and junior colleges in classroom packs, or as a site license. And, of course, it’s also available to individual teachers, students, hobbyists and others.

(Oh, and if you have Mathematica or Wolfram Desktop, it’ll also be possible in future versions to create “Wolfram|Alpha mode” notebooks that effectively integrate Wolfram|Alpha Notebook Edition capabilities. And in general there’s perfect compatibility among Wolfram|Alpha Notebook Edition, Mathematica, Wolfram Desktop, Wolfram Cloud, Wolfram Programming Lab, etc.—providing a seamless experience for people progressing across education and through professional careers.)

Like Wolfram|Alpha—and the Wolfram Language—Wolfram|Alpha Notebook Edition will continue to grow in capabilities far into the future. But what’s there today is already a remarkable achievement that I think will be transformative in many educational settings.

More than 31 years ago we introduced Mathematica (and what’s now the Wolfram Language). A decade ago we introduced Wolfram|Alpha. Now, today, with the release of Wolfram|Alpha Notebook Edition we’re giving a first taste—in the context of education—of a whole new approach to computing: a full computing environment that’s driven by natural language. It doesn’t supplant Wolfram Language, or Wolfram|Alpha—but it defines a new direction that in time will bring the power of computation to a whole massive new audience.


To comment, please visit the copy of this post at Stephen Wolfram Writings

 (0)

8. Authorship Forensics from Twitter Data., 05 .[−]

Authorship Forensics from Twitter Data

A Year Ago Today

On September 5 of last year, The New York Times took the unusual step of publishing an op-ed anonymously. It began I Am Part of the Resistance inside the Trump Administration, and quickly became known as the Resistance op-ed. From the start, there was wide?ranging speculation as to who might have been the author(s); to this day, that has not been settled. (Spoiler alert: it will not be settled in this blog post, either. But thats getting ahead of things.) When I learned of this op-ed, the first thing that came to mind, of course, was, I wonder if authorship attribution software could…. This was followed by, Well, of course it could. If given the right training data. When time permitted, I had a look on the internet into where one might find training data, and for that matter who were the people to consider for the pool of candidate authors. I found at least a couple of blog posts that mentioned the possibility of using tweets from administration officials. One gave a preliminary analysis (with President Trump himself receiving the highest score, though by a narrow margingo figure). It even provided a means of downloading a dataset that the poster had gone to some work to cull from the Twitter site.

The code from that blog was in a language/script in which I am not fluent. My coauthor on two authorship attribution papers (and other work), Catalin Stoean, was able to download the data successfully. I first did some quick validation (to be seen) and got solid results. Upon setting the software loose on the op-ed in question, a clear winner emerged. So for a short time I knew who wrote that piece. Except. I decided more serious testing was required. I expanded the candidate pool, in the process learning how to gather tweets by author (code for which can be found in the downloadable notebook). At that point I started getting two more-or-less-clear signals. Okay, so two authors. Maybe. Then I began to test against op-eds of known (or at least stated) authorship. And I started looking hard at winning scores. Looking into the failures, it became clear that my data needed work (to be described). Rinse, repeat. At some point I realized: (1) I had no idea who wrote the anonymous op-ed; and (2) interesting patterns were emerging from analysis of other op-eds. So it was the end of one story (failure, by and large) but the beginning of another. In short, what follows is a case study that exposes both strengths and weaknesses of stylometry in general, and of one methodology in particular.

Stylometry Methodology

In two prior blog posts, I wrote about determining authorship, the main idea being to use data of known provenance coupled with stylometry analysis software to deduce authorship of other works. The first post took up the case of the disputed Federalist Papers. In the second, as a proof of concept I split a number of Wolfram blog posts into training and test sets, and used the first in an attempt to verify authorship in the second. In both of these, the training data was closely related to the test sets, insofar as they shared genres. For purposes of authorship forensics, e.g. discovering who wrote a given threatening email or the like, there may be little or nothing of a similar nature to use for comparison, and so one must resort to samples from entirely different areas. Which is, of course, exactly the case in point.

Lets begin with the basics. The general idea is to use one or more measures of similarity between works of known authorship to a given text in order to determine which author is the likeliest to have written that text. The closed problem assumes it was in fact written by one of the authors of the known texts. In the real world, one deals with the open variant, where the best answer might be none of the above.

How does one gauge similarity? Various methods have been proposed, the earliest involving collection of various statistics, such as average sentence length (in words and/or characters), frequency of usage of certain words (common and uncommon words both can play important roles here), frequency of certain character combinations and many others. Some of the more recent methods involve breaking sentences into syntactic entities and using those for comparison. There are also semantic possibilities, e.g. using frequencies of substitution words. Overall, the methods thus break into categories of lexical, syntactic and semantic, and each of these has numerous subcategories as well as overlap with the other categories.

To date, there is no definitive method, and very likely there never will be. Even the best class is not really clear. But it does seem that, at this time, certain lexical?based methods have the upper hand, at least as gauged by various benchmark tests. Among these, ones that use character sequence ( n ?gram) frequencies seem to perform particularly welland is the method I use for my analyses.

Origin of the Method in Use

It is an odd duckling, as n?gram methods go. We never really count anything. Instead we turn text samples into images, and use further processing on the images to obtain a classifier. But n?gram frequencies are implicit to the method of creating images. The process extends Joel Jeffrey’s Chaos Game Representation (CGR), used for creating images from genetic nucleotide sequences, to alphabets of more than four characters. Explained in the CGR literature is the “Frequency Chaos Game Representation” (FCGR), which creates a pixelated grayscale array for which the darkness of pixels corresponds to n?gram frequencies.

What with having other projectsoh, and that small matter of a day jobI put this post on the back burner for a while. When I returned after a few months, I had learned a few more tricks. The biggest was that I had seen how very simple neural networks might be trained to distinguish numeric vectors even at high dimension. As it happens, the nearest image code I had written and applied to FCGR images creates vectors in the process of dimension reduction of the images. This can be done by flattening the image matrices into vectors and then using the singular values decomposition to retain the strong components of the vectorized images. (There is an optional prior dimension-reduction step, using Fourier trig transform and retaining only low-frequency components. It seems to work well for FCGR images from genomes, but not so well for our extension of FCGR to text.) We then trained neural network classifiers on FCGR images produced from text training sets of several authorship benchmark suites. Our test results were now among the top scores for several such benchmarks.

Obtaining and Cleaning Text from Twitter

As Ive mentioned, experiments with Twitter data showed some signs of success. I did need to expand the author pool, though, since it seemed like a good idea to include any number of administration officials who werelet us not mince wordsbeing publicly accused by one or another person of having written the op-ed in question. It also seemed reasonable to include family members (some of whom are advisors in an official capacity), undersecretaries and a few others who are in some sense high-ranking but not in official appointments. So how to access the raw training data? It turns out that the Wolfram Language has a convenient interface to Twitter. I had to first create a Twitter account (my kids still cant believe I did this), and then evaluate this code:

twitter = ServiceConnect
&#10005

twitter = ServiceConnect["Twitter", "New"];

A webpage pops up requiring that I click a permission box, and were off and running:

Twitter sign-in

Now I can download tweets for a given tweeter (is that the term?) quite easily. Say I want to get tweets from Secretary of State Mike Pompeo. A web search indicates that his Twitter handle is @SecPompeo. I can gather a sample of his tweets, capping at a maximum of 5,000 tweets so as not to overwhelm the API.

secPompeoTweets = twitter
&#10005

secPompeoTweets =
  twitter["TweetList", "Username" -> "SecPompeo", MaxItems -> 5000];

Tweets are fairly short as text goes, and the methods I use tend to do best with a minimum of a few thousand characters for each FCGR image. So we concatenate tweets into strings of a fixed size, create FCGR images, reduce dimension and use the resulting vectors to train a neural net classifier.

A first experiment was to divide each authors set of strings into two parts, train on one set and see how well the second set was recognized. The first thing I noticed was that underrepresented authors fared poorly. But this was really restricted to those with training data below a threshold; above that, most were recognized. So I had to exclude a few candidates on the basis of insufficient training data. A related issue is balance, in that authors with a huge amount of training data can sort of overwhelm the classifier, biasing it against authors with more modest amounts of training text. A heuristic I found to work well is to cap the training text size at around three times the minimum threshold. The pool meeting the threshold is around 30 authors (I have on occasion added some, found they do not show any sign of Resistance op-ed authorship and so removed them). With this pool, a typical result is a recognition rate above 90%, with some parameter settings taking the rate over 95%. So this was promising.

Authorship of Op-Eds Can Be Trickier Than You Would Expect

When set loose on op-eds, however, the results were quite lame. I started looking harder at the raw training data. Lo and behold, it is not so clean as I might have thought. Silly me, I had no idea people might do things with other peoples tweets, like maybe retweet them (yes, I really am that far behind the times). So I needed some way to clean the data.

You can find the full code in the downloadable notebook, but the idea is to use string replacements to remove tweets that begin with RT, contain Retweeted or have any of several other unwanted features. While at it, I remove the URL parts (strings beginning with http). At this point I have raw data that validates quite well. Splitting into training vs. validation gives a recognition rate of 98.9%.

Heres a plot of the confusion matrix:

Plot 1

(To those for which this term is itself a confusion, this matrix indicates by color strength the positioning of the larger elements. The off?diagonal elements indicate incorrect attributions, and so a strongly colored main diagonal is a sign that things are working well.)

Now we move on to a form of transference, wherein we apply a classifier trained on Twitter text to authorship assessment of op-eds. The first thing to check is that this might be able to give plausible results. We again do a form of validation, by testing against articles of known authorship. Except we do not really know who writes what, do we? For tweets, it seems likely that the tweeter is the actual owner of the Twitter handle (we will give a sensible caveat on this later); for articles, it is a different matter. Why? Well start with the obvious: high-level elected politicians and appointed officials are often extremely busy. And not all possess professional writing skills. Some have speechwriters who craft works based on draft outlines, and this could well extend to written opinion pieces. And they may have editors and technical assistants who do heavy revising. Ideas may also be shared among two or more people, and contributions on top of initial drafts might be solicited. On occasion, two people might agree that what one says would get wider acceptance if associated with the other. The list goes on, the point being that the claimed authorship can depart in varying degrees from the actual, for reasons ranging from quite legitimate, to inadvertent, to (lets call it for what it is) shady.

Some Testing on Op-Ed Pieces

For the actual test data, I located and downloaded numerous articles available electronically on the internet, which appeared under the bylines of numerous high-ranking officials. As most of these are covered by copyright, I will need to exclude them from the notebook, but I will give appropriate reference information for several that get used explicitly here. I also had to get teaser subscriptions to two papers. These cost me one and two dollars, respectively (I wonder if Wolfram Research will reimburse meI should try to find out). Early testing showed that tweaking parameters in the neural net training showed tradeoffs in strengths, so the protocol I used involved several classifiers trained with slightly different parameters, and aggregation of scores to determine the outcomes. This has at least two advantages over using just one trained neural net. For one, it tends to lower variance in outcomes caused by the randomness intrinsic to neural net training methods. The other is that it usually delivers outcomes that are near or sometimes even exceed the best of the individual classifiers. These observations came from when Catalin and I tested against some benchmark suites. It seems like a sensible route to take here as well.

Labor Secretary Alexander Acosta

We begin with Secretary of Labor Alexander Acosta. Because they were easy to find and download, I took for the test data two items that are not op-eds, but rather articles found at a webpage of White House work. One is from July 23, 2018, called Reinvesting in the American Workforce. The other is Trump Is Strengthening US Retirement Security. Heres How. from August 31, 2018. The following graph, intentionally unlabeled, shows a relatively high score above 9, a second score around 6.5 and all the rest very close to one another at a bit above 5:

Plot 2

The high score goes to Secretary Acosta. The second-highest goes to Ivanka Trump. This might indicate a coincidental similarity of styles. Another possibility is she might have contributed some ideas to one or the other of these articles. Or maybe her tweets, like some of her op-ed articles, overlap with these articles in terms of subject matter; as a general remark, genre seems to play a role in stylometry similarities.

Ivanka Trump

Having mentioned Ivanka Trump, it makes sense to have a look at her work next. First is Empower Women to Foster Freedom, published in The Wall Street Journal on February 6, 2019. The plot has one score way above the others; it attributes the authorship to Ms. Trump:

Plot 3

The next is Why We Need to Start Teaching Tech in Kindergarten from The New York Post on October 4, 2017. The plot now gives Ms. Trump the second top score, with the highest going to Secretary Acosta. This illustrates an important point: we cannot always be certain we have it correct, and sometimes it might be best to say uncertain or, at most, weakly favors authorship by candidate X. And it is another indication that they share similarities in subject matter and/or style:

Plot 4

Last on the list for Ivanka Trump is “ The Trump Administration Is Taking Bold Action to Combat the Evil of Human Trafficking” from The Washington Post on November 29, 2018. This time the plot suggests a different outcome:

Plot 5

Now the high score goes to (then) Secretary of Homeland Security Kirstjen Nielsen. But again, this is not significantly higher than the low scores. It is clearly a case of uncertain. The next score is for Ambassador to the United Nations John Bolton; Ms. Trumps score is fourth.

Donald Trump Jr.

We move now to Donald Trump Jr. Here, I combine two articles. One is from The Denver Post of September 21, 2018, and the other is from the Des Moines Register of August 31, 2018. And the high score, by a fair amount, does in fact go to Mr. Trump. The second score, which is noticeably above the remaining ones, goes to President Trump:

Plot 6

Secretary of Health and Human Services Alex Azar

Our test pieces are Why Drug Prices Keep Going Upand Why They Need to Come Down from January 29, 2019, and a USA Today article from September 19, 2018. The graph shows a very clear outcome, and indeed it goes to Secretary Alex Azar:

Plot 7

We will later see a bit more about Secretary Azar.

Secretary of Housing and Urban Development Ben Carson

Here, we can see that a USA Today article from December 15, 2017, is very clearly attributed to Secretary Ben Carson:

Plot 8

I also used My American Dream from Forbes on February 20, 2016, and a Washington Examiner piece from January 18, 2019. In each case, it is Secretary Azar who has the high score, with Secretary Carson a close second on one and a distant second on the other. It should be noted, however, that the top scores were not so far above the average as to be taken as anywhere approaching definitive.

Secretary of Education Betsy DeVos

I used an article from The Washington Post of November 20, 2018. An interesting plot emerges:

Plot 9

The top score goes to Secretary Betsy DeVos, with a close second for Secretary Azar. This might indicate a secondary coauthor status, or it might be due to similarity of subject matter: I now show another article by Secretary Azar that explicitly mentions a trip to a school that he took with Secretary DeVos. It is titled Put Mental Health Services in Schools from August 10, 2018:

Plot 10

In this instance she gets the top classifier score, though Azars is, for all intents and purposes, tied. It would be nice to know if they collaborate: that would be much better than, say, turf wars and interdepartmental squabbling.

Classification Failures

I did not include former US envoy Brett McGurk in the training set, so it might be instructive to see how the classifier performs on an article of his. I used one from January 18, 2019 (wherein he announces his resignation); the topic pertains to the fight against the Islamic State. The outcome shows a clear winner (Ambassador to the UN Bolton). This might reflect commonality of subject matter of the sort that might be found in the ambassadors tweets:

Plot 11

Ambassador Bolton was also a clear winner for authorship of an article by his predecessor, Nikki Haley (I had to exclude Ms. Haley from the candidate pool because the quantity of tweets was not quite sufficient for obtaining viable results). Moreover, a similar failure appeared when I took an article by former president George W. Bush. In that instance, the author was determined to be Secretary Azar.

An observation is in order, however. When the pool is enlarged, such failures tend to be less common. The more likely scenario, when the actual author is not among the candidates, is that there is no clear winner.

Secretary of Transportation Elaine Chao

Here I used a piece in the Orlando Sentinel from February 15, 2018, called Lets Invest in US Future. Summary conclusion: Secretary Elaine Chao (presumably the person behind the tweets from handle @USDOT) is a very clear winner:

Plot 12

President Donald Trump

An article under President Donald Trumps name was published in the Washington Post on April 30, 2017. The classifier does not obtain scores sufficiently high as to warrant attribution. That stated, one of the two top scores is indeed from the Twitter handle @realDonaldTrump:

Plot 13

A second piece under the presidents name is a whitehouse.gov statement, issued on November 20, 2018. The classifier in this instance gives a clear winner, and it is President Trump. The distant second is Ambassador Bolton, and this is quite possibly from commonality of subject matter (and articles by the ambassador); the topic is US relations with Saudi Arabia and Iran:

Plot 14

USA Today published an article on October 10, 2018, under the name of President Donald Trump. The classifier gives the lone high score to Secretary Azar. Not surprisingly, there is overlap in subject insofar as the article was about health plans:

Plot 15

Moreover, when I use the augmented training set, the apparent strong score for Secretary Azar largely evaporates. He does get the top score in the following graph, but it is now much closer to the average scores. The close second is Senator Bernie Sanders. Again, I would surmise there is commonality in the subject matter from their tweets:

Plot 16

Some Others in Brief

A couple of pieces published under the name of Ambassador Bolton were classified fairly strongly as being authored by him. Two by (then) Secretary Nielsen were strongly classified as hers, while a third was in the no clear decision category. Similarly, an article published by Wilbur Ross is attributed to him, with two others getting no clear winner from the classifier. An article under Mike Pompeos name was attributed to him by a fair margin. Another was weakly attributed to him, with the (then) State Department Spokesperson Heather Nauert receiving a second place score well above the rest. Possibly one wrote and the other revised, or this could once again just be due to similarity of subject matter in their respective tweets. One article published under Sonny Perdues name is strongly attributed to him, with another weakly so. A third has Donald Trump Jr. and Secretary DeVos on top but not to any great extent (certainly not at a level indicative of authorship on their parts).

I tested two that were published under Director of the National Economic Council Lawrence Kudlows name. They had him a close second to the @WhiteHouseCEA Twitter handle. I will say more about this later, but in brief, that account is quite likely to post content by or about Mr. Kudlow (among others). So no great surprises here.

An article published under the name of Vice President Mike Pence was attributed to him by the classifier; two others were very much in the no clear winner category. One of those, an op-ed piece on abortion law, showed an interesting feature. When I augmented the training set with Twitter posts from several prominent people outside the administration, it was attributed to former Ambassador Hillary Clinton. What can I say? The software is not perfect.

Going outside the Current Administration

Two articles published under former president Bill Clintons name have no clear winner. Similarly, two articles under former president Barack Obamas name are unclassified. An article under Bobby Jindals name correctly identifies him as the top scorer, but not by enough to consider the classification definitive. An article published by Hillary Clinton has the top score going to her, but it is too close to the pack to be considered definitive. An article published by Senate Majority Leader Mitch McConnell is attributed to him. Three articles published under Rand Pauls name are attributed to him, although two only weakly so.

An article published under Senator Bernie Sanderss name is very strongly attributed to him. When tested using only members of the current administration, Secretary Azar is a clear winner, although not in the off?the?charts way that Sanders is. This might, however, indicate some modest similarity in their styles. Perhaps more surprising, an article published by Joe Biden is attributed to Secretary Azar even when tweets from Biden are in the training set (again, go figure). Another is weakly attributed to Mr. Biden. A third has no clear winner.

Some Surprises

Mitt Romney

An op-ed piece under (now) Senator Mitt Romneys name from June 24, 2018, and another from January 2, 2019, were tested. Neither had a winner in terms of attribution; all scores were hovering near the average. Either his tweets are in some stylistic ways very different from his articles, or editing by others makes substantial changes. Yet in a Washington Post piece under his name from January 8, 2017, we do get a clear classification. The subject matter was an endorsement for confirmation of Betsy DeVos to be the secretary of education:

Plot 17

The classifier did not, however, attribute this to Mr. Romney. It went, rather, to Secretary DeVos.

The Council of Economic Advisers

At the time of this writing, Kevin Hassett is the chairman of the Council of Economic Advisers. This group has several members. Among them are some with the title chief economist, and this includes Casey Mulligan (again at the time of this writing, he is on leave from the University of Chicago where he is a professor of economics, and slated to return there in the near future). We begin with Chairman Hassett. I took two articles under his name that appeared in the National Review, one from March 6, 2017, and the other from April 3, 2017:

Plot 18

The classifier gives the high score to the Twitter handle @WhiteHouseCEA and I had, at the time of testing, believed that to be used by Chairman Hassett.

I also used a few articles by Casey Mulligan. He has written numerous columns for various publications, and has, by his count, several hundred blog items (he has rights to the articles, and they get reposted as blogs not long after appearing in print). I show the result after lumping together three such; individual results were similar. They are from Seeking Alpha on January 4, 2011, a Wall Street Journal article from July 5, 2017, and one in The Hill from March 31, 2018. The classifier assigns authorship to Professor Mulligan here:

Plot 19

The fourth article under Mulligans name was published in The Hill on May 7, 2018. You can see that the high dot is located in a very different place:

Plot 20

This time the classifier associates the work with the @WhiteHouseCEA handle. When I first encountered the article, it was from Professor Mulligans blog, and I thought perhaps he had a posted a guest blog. That was not the case. This got my curiosity, but now I had a question to pose that might uncomfortably be interpreted as Did you really write that?

In addition to motive, I also had opportunity. As it happens, Casey Mulligan makes considerable (and sophisticated) use of Mathematica in his work. Ive seen this myself, in a very nice technical article. He is making productive use of our software and promoting it among his peers. Time permitting, he hopes to give a talk at our next Wolfram Technology Conference. He is really, really smart, and I dont know for a certainty that he wont eat a pesky blogger for lunch. This is not someone I care to annoy!

A colleague at work was kind enough to send a note of mutual introduction, along with a link to my recent authorship analysis blog post. Casey in turn agreed to speak to me. I began with some background for the current blog. He explained that many items related to his blogs get tweeted by himself, but some also get tweeted from the @WhiteHouseCEA account, and this was happening as far back as early 2018, before his leave for government policy work in DC. So okay, mystery solved: the Twitter handle for the Council of Economic Advisers is shared among many people, and so there is some reason it might score quite high when classifying work by any one of those people.

Anonymous Op-Ed Pieces

The “Resistance” Op-Ed

I claimed from the outset that using the data I found and the software at my disposal, I obtained no indication of who wrote this piece. The plot shows this quite clearly. The top scores are nowhere near far enough above the others to make any assessment, and they are too close to one another to distinguish. Additionally, one of them goes to President Trump himself:

Plot 21

When I test against the augmented training set having Senators Sanders, McConnell and others, as well as Ambassador Clinton and Presidents Clinton and Obama, the picture becomes more obviously a dont know scenario:

Plot 22

Scaling notwithstanding, these scores all cluster near the average. And now Senators Graham and Sanders are at the top.

“I Hope a Long Shutdown Smokes Out the Resistance”

On January 14, 2019, The Daily Caller published an op-ed that was similarly attributed to an anonymous senior member of the Trump administration. While it did not garner the notoriety of the Resistance op-ed, there was speculation as to its authorship. My own question was whether the two might have the same author, or if they might overlap should either or both have multiple authors.

Plot 23

The plot shows similar features to that for the Resistance op-ed. And again, there is no indication of who might have written it. Another mystery.

I will describe another experiment. For test examples, I selected several op-eds of known (or at least strongly suspected) authorship and also used the Resistance and Shutdown pieces. I then repeatedly selected between 8 and 12 of the Twitter accounts and used them for training a classifier. So sometimes a test piece would have its author in the training set, and sometimes not. In almost all cases where the actual author was represented in the training set, that person would correctly be ascribed the author of the test piece. When the correct author was not in the training set, things got interesting. It turns out that for most such cases, the training authors with the top probabilities overlapped only modestly with those for the Resistance op-ed; this was also the case for the Shutdown piece. But the ascribed authors for those two pieces did tend to overlap heavily (with usually the same top two, in the same order). While far from proof of common authorship, this again indicates a fair degree of similarity in style.

Parting Thoughts

This has been something of an educational experience for me. On the one hand, I had no sound reason to expect that Twitter data could be at all useful for deducing authorship of other materials. And I have seen (and shown) some problematic failures, wherein an article is strongly attributed to someone we know did not write it. I learned that common subject matter (genre, in the authorship attribution literature) might be playing a greater role than I would have expected. I learned that a candidate pool somewhat larger than 30 tends to tamp down on those problematic failures, at least for the subject matter under scrutiny herein. Maybe the most pleasant surprise was that the methodology can be made to work quite well. Actually, it was the second most pleasant surprisethe first being not having been eaten for lunch in that phone conversation with the (thankfully kindly disposed) economics professor.

Analyze tweets and more with Wolfram|One, the first fully cloud-desktop hybrid, integrated computation platform.

Get free trial

 (2)

9. Wolfram Cloud 1.50 and 1.51: Major New Releases Bring Cloud Another Step Closer to the Desktop., 29 .[−]

Wolfram Cloud 1.50 and 1.51: Major New Releases Bring Cloud Another Step Closer to the Desktop

A couple weeks ago, we released Version 1.51 of the Wolfram Cloud. Weve made quite a few significant functionality improvements even since 1.50a major milestone from many months of hard workas we continue to make cloud notebooks as easy and powerful to use as the notebooks on our desktop clients for Wolfram|One and Mathematica. You can read through everything thats new in 1.51 in the detailed release notes. After working on this version through to its release, Im excited to show off Wolfram Cloud 1.51Ive put together a few of the highlights and favorite new features for you here.

Cloud 1.50 Recap

But before we get into whats new, lets first catch up on what Version 1.51 is building on. The release of Wolfram Cloud 1.50 in May was an incredibly massive effort, especially for cloud notebooks. Here are some of our major accomplishments from 1.50:

  • License modernization (simplification of products and subscriptions)
  • New URL structure (/obj for deployed view, /env for editable view)
  • DynamicGeoGraphics is supported in the cloud
  • Improvements to server-side rendering of cloud notebooks (the HTML cache)
  • CodeAssistOptions are implemented in the cloud
  • The cell insertion menu design matches the desktop
  • Autocompletion for special characters works in text cells as well as input cells
  • ListPicker is modernized in the cloud
  • New front-end web fonts are supported in the cloud
  • Internal and Wolfram Enterprise Private Cloud (EPC) users can disable embedded-view branding via the Wolfram Language
  • … and much, much more

Click-to-Copy for Noneditable Cells

In cloud notebooks, output cells are generally not editable, and selections within them didnt work very well because browsers had a hard time dealing with their structure. To make it easy to still grab some output, we added click-to-copy in 1.51: you only need to click an output cell to copy its whole content to the clipboard. You can then paste it into another cell, cloud notebook, desktop or any other application (rich structures might not be preserved in other applications, though).

Click-to-copy for noneditable cells

Click-to-copy is disabled on controls that are interactive by themselves, e.g. buttons, sliders and graphics (which you can select individually), in order not to interfere with that interactivity.

Of course, the longer-term goal is to actually support the same granular selection mechanism as on the desktop.

Evaluate in Place

You can evaluate expressions in place now, just like in desktop notebooks. Just select a (sub) expression in an input cell and press +Shift+ (on Windows) or + (on Mac) to evaluate it, replacing the original expression with the result from the computation.

Evaluate in place

InputForm-Based Input

Since cloud notebooks dont yet support the feature-rich two-dimensional input found on the desktop, we added a new control that allows you to enter a textual representation of various typesetting constructs (what we call InputForm) and turn it into the corresponding two-dimensional boxes. Press +Shift+1 (or, put differently: +!) to bring up the input field, and press to resolve it into its typeset form. It works both in input cells (with the right evaluation semantics) and textual cells.

InputForm input
You can enter x/y for a fraction, x^y for superscripts, Sqrt[x] for square roots and various other forms. Clicking the resulting output brings you back to the edit mode.

The control is quite similar to += for free-form linguistic input in the Wolfram Language. We’re working on another similar control that would allow entering TEX syntax.

Inline Cell Group Openers and Closers

We now have the same cell group openers and closers as on the desktop ( Version 12). Closed cell groups have a little chevron attached to them to open the group, which makes the content within them easier to discover. If you use that opener, there will also be a corresponding closer.

Inline cell openers and closers

Improved Support for Special Unicode Characters

Cloud notebooks are now better at rendering non-plane-0 Unicode charactersincluding emojis! If you know their hexadecimal code point, you can enter them using the special \|xxxxxx notation, or you might use some built-in OS functionality (e.g. ++ on Mac). They work in text as well as in computations.

Unicode notation
&#10005

\|01f329=(\|01f600+\|01f600+\|01f600)*17

However, extended Unicode support isn’t quite perfect yet: ToCharacterCode still splits up non-plane-0 characters into surrogate pairs (were currently working on a fix, which will come in Wolfram Cloud 1.52), and the editor cursor can be moved “inside” emojis (this will be improved in the longer term).

There are many other new features, and we fixed quite a few bugs as well. Check out the release notes for all the details! We always love hearing from our users, so let us know in the comments section if you have any questions or suggestions. You can also join us on Wolfram Community to continue the discussion about cloud news and developments. We’re continuing our hard work improving the cloud even after 1.51, so keep an eye out for 1.52, coming soon!

The Wolfram Cloud, powering multiple Wolfram products, combines a state-of-the-art notebook interface with the world’s most productive programming language.

Sign up now

 (3)

10. Embracing Uncertainty: Better Model Selection with Bayesian Linear Regression., 22 .[−]

Embracing Uncertainty: Better Model Selection with Bayesian Linear Regression

Readers who follow the Mathematica Stack Exchange (which I highly recommend to any Wolfram Language user) may have seen this post recently, in which I showed a function I wrote to make Bayesian linear regression easy to do. After finishing that function, I have been playing around with it to get a better feel of what it can do, and how it compares against regular fitting algorithms such as those used by Fit. In this blog post, I dont want to focus too much on the underlying technicalities (check out my previous blog post to learn more about Bayesian neural network regression); rather, I will show you some of the practical applications and interpretations of Bayesian regression, and share some of the surprising results you can get from it.

Getting Started

The easiest way to get my BayesianLinearRegression function is through my submission to the Wolfram Function Repository. To use this version of the function with the code examples in this blog, you can evaluate the following line, which creates a shortcut for the resource function:

BayesianLinearRegression
&#10005

BayesianLinearRegression = ResourceFunction["BayesianLinearRegression"]

You can also visit the GitHub repository and follow the installation instructions to load the BayesianInference package with the following:

<< BayesianInference`
&#10005

<< BayesianInference`

Alternatively, you can get the standalone source file of BayesianLinearRegression and evaluate it to make the function available to you, though the function regressionPlot1D I use later will not work in the following examples if you dont get the full BayesianInference package. Its definition is in the BayesianVisualisations.wl file.

Back to Basics

I want to do something that will be very familiar to most people with some background in data fitting: polynomial regression. I could have picked something more complicated, but it turns out that the Bayesian take on fitting data adds quite a bit of depth and new possibilities even to something as simple as fitting polynomials, making it an ideal demonstration example.

I picked the example data here specifically because it hints at a straight trend, while still leaving some doubt:

data = CompressedData
&#10005

data = CompressedData["
1:eJxTTMoPSmViYGAQAWIQDQE/9kPobxC64SuUz3wAKm8Poc9A6IYPUP4zKP0A
qv4xlP4HoQ+wQPX/gqr7DNXPcABFHcNHmDlQ+g5U/BKUfoFGMzpA6Bsw9Wju
+Yqm/hOUPgOln0DV3YLSF2Dut0d11y8o/QFKv9qPat+O/QBd7zpq
"];
plot = ListPlot[data, PlotStyle -> Red]

So lets start off by doing the first sensible thing to do: fit a line through it. Concretely, I will fit the model , where NormalDistribution[0,?]. BayesianLinearRegression uses the same syntax as LinearModelFit, but it returns an association containing all relevant information about the fit:

fit = N@BayesianLinearRegression
&#10005

fit = N@BayesianLinearRegression[data, {1, \[FormalX]}, \[FormalX]];

Keys
&#10005

Keys[fit]

I will focus most on explaining the posterior distributions and the log-evidence. The log-evidence is a number that indicates how well the model fits with the data:

fit
&#10005

fit["LogEvidence"]

Is this value good or bad? We cant tell yet: it only means something when compared to the log-evidences of other fits of the same data. I will come back to model comparison in the section on the Bayesian Occams razor, but lets first take a look in more detail at the linear fit we just computed. The fit association has a key called "Posterior" that contains a number of useful probability distributions:

KeyValueMap
&#10005

KeyValueMap[List, Short /@ fit["Posterior"]] // TableForm

In Bayesian inference, the word posterior refers to a state of knowledge after having observed the data (as opposed to prior, which refers to the state where you do not know the data yet). The posterior distribution of the regression parameters tells us how well the parameters and are constrained by the data. Its best visualized with a ContourPlot:

With
&#10005

With[{coefficientDist =
   fit["Posterior", "RegressionCoefficientDistribution"]},
 ContourPlot[
  Evaluate[PDF[coefficientDist, {\[FormalA], \[FormalB]}]],
  {\[FormalA], -1, 1},
  {\[FormalB], 0, 1},
  PlotRange -> {0, All}, PlotPoints -> 20, FrameLabel -> {"a", "b"},
  ImageSize -> 400,
  PlotLabel -> "Posterior PDF of regression coefficients"
  ]
 ]

The slant of the ellipses in the contour plot shows that there seems to be some positive correlation between and . You can calculate exactly how much correlation there is between them with Correlation:

Correlation
&#10005

Correlation[
  fit["Posterior", "RegressionCoefficientDistribution"]] // MatrixForm

At first glance, this may seem strange. Why are and correlated with each other? One way to look at this is to consider how the fit changes when you force one of the two coefficients to change. For example, you can fix and then try and find the best value of that fits the data with FindFit. The following illustrates this thought experiment:

With
&#10005

With[{dat = data, p = plot},
 Animate[
  With[{fit =
     a + \[FormalB] \[FormalX] /.
      FindFit[dat, a + \[FormalB] \[FormalX], \[FormalB], \[FormalX]]},
   Show[
    Plot[
     fit,
     {\[FormalX], -2.5, 2.5},
     PlotLabel -> \[FormalY] == fit,
     PlotRange -> {-2.5, 2.5}, PlotStyle -> Directive[Black, Dashed]
     ],
    p
    ]
   ],
  {{a, 0.}, -1, 1},
  TrackedSymbols :> {a},
  AnimationDirection -> ForwardBackward
  ]
 ]

Intuitively, there is correlation between and because the center of mass of the points is slightly to the bottom left of the origin:

Mean
&#10005

Mean[data]

This means that when increases, also needs to increase so that it can still catch the center of mass of the cloud. Here I’ve moved all points even further down and left to amplify the effect:

With
&#10005

With[{dat = data - 1},
 Animate[
  With[{fit =
     a + \[FormalB] \[FormalX] /.
      FindFit[dat, a + \[FormalB] \[FormalX], \[FormalB], \[FormalX]]},
   Show[
    Plot[
     fit,
     {\[FormalX], -3.5, 1.5},
     PlotLabel -> \[FormalY] == fit,
     PlotRange -> {-3.5, 1.5}, PlotStyle -> Directive[Black, Dashed]
     ],
    ListPlot[dat, PlotStyle -> Red]
    ]
   ],
  {{a, 0.}, -1, 1},
  TrackedSymbols :> {a},
  AnimationDirection -> ForwardBackward
  ]
 ]

Another way to think of the posterior distribution over and is to think of the fit as consisting of infinitely many lines, drawn at random across the page and with each line weighted according to how well it fits the data. Calculating that weight is essentially what Bayesian inference is about. In the following plot, I used RandomVariate to draw many lines from the posterior. By making the opacity of each individual line quite low, you can see the regions where the lines tend to concentrate:

lines = With
&#10005

lines = With[{x1 = 0, x2 = 1},
   Apply[
    InfiniteLine[{{x1, #1}, {x2, #2}}] &,
    RandomVariate[
      fit["Posterior", "RegressionCoefficientDistribution"],
      2000].{{1, 1}, {x1, x2}},
    {1}
    ]
   ];
Show[
 plot,
 Graphics[{Black, Opacity[40/Length[lines]], Thickness[0.002], lines}],
 PlotRange -> {{-2.5, 2.5}, All}
 ]

The distribution of this cloud of lines is what I like to call the underlying value distribution, which is one of the distributions returned by the fit. To plot it, its convenient to use InverseCDF to calculate quantiles of the distribution. In the next example, I plotted the 5-, 50- and 95-percent quantiles, meaning that youd expect 90% of the lines to fall within the shaded areas:

With
&#10005

With[{valueDist = fit["Posterior", "UnderlyingValueDistribution"],
  bands = Quantity[{95, 50, 5}, "Percent"]},
 Show[
  Plot[Evaluate@InverseCDF[valueDist, bands], {\[FormalX], -5, 5},
   Filling -> {1 -> {2}, 3 -> {2}}, PlotLegends -> bands,
   Prolog -> {Black, Opacity[15/Length[lines]], Thickness[0.002],
     lines}
   ],
  plot,
  PlotRange -> All,
  PlotLabel -> "Posterior distribution of underlying values"
  ]
 ]

BayesianLinearRegression also estimates the standard deviation of the error term , and just like the regression coefficients and , it has its own distribution. Variance follows an InverseGammaDistribution, so to get the distribution of I use TransformedDistribution:

Quiet@Plot
&#10005

Quiet@Plot[
  Evaluate@PDF[
    TransformedDistribution[
     Sqrt[\[FormalV]], \[FormalV] \[Distributed]
      fit["Posterior", "ErrorDistribution"]],
    \[FormalX]],
  {\[FormalX], 0, 1.5},
  PlotRange -> {0, All},
  Filling -> Axis, Frame -> True, FrameLabel -> {\[Sigma], "PDF"},
  PlotLabel -> \[Sigma]^2 \[Distributed]
    fit["Posterior", "ErrorDistribution"]
  ]

So in short, the posterior error terms are distributed as NormalDistribution[0, σ], where InverseGammaDistribution[10.005,3.09154]. This distribution we can calculate with ParameterMixtureDistribution:

epsDist = ParameterMixtureDistribution
&#10005

epsDist =
  ParameterMixtureDistribution[
   NormalDistribution[0, Sqrt[\[FormalV]]], \[FormalV] \[Distributed]
    InverseGammaDistribution[10.005`, 3.091536049663807`]];
Plot[PDF[epsDist, \[FormalX]], {\[FormalX], -2, 2}, Filling -> Axis,
 PlotRange -> {0, All}, Frame -> True,
 PlotLabel -> \[Epsilon] \[Distributed] epsDist]

Uncertain Uncertainties

Just to make the point clear: there is uncertainty about the uncertainty of the model, which for me was an idea that was a bit difficult to get my head around initially. To make things even more complicated, all of the lines I mentioned previously also have distributions of error bars that correlate with and . So if you want to make a prediction from this model, you need to consider an infinite number of trend lines that each carry an infinite number of error bars. Thats a lot of uncertainty to consider!

Thinking about regression problems this way makes it clear why Bayesian inference can be a daunting task that involves lots of complicated integrals. For linear regression, though, were fortunate enough that its possible to do all of these integrals symbolically and plow our way through the infinities. This is one of the reasons why scientists and engineers like linear mathematics so much: everything remains tractable, with nice-looking formulas.

The distribution that tells you where to expect to find future data (taking all of the aforementioned uncertainties into consideration) is called the posterior predictive distribution. For the data I just fitted, it looks like this:

fit
&#10005

fit["Posterior", "PredictiveDistribution"]

Now I will plot this distribution using the function regressionPlot1D from the Git repository, which is just shorthand for the underlying values plot I showed earlier. I also included the distribution you get when you make a point estimate of the predictive model. This means that you take the best values of , and from the posterior and plot using those values as if they were completely certain. Point estimates can be useful in making your results more manageable since probability distributions can be difficult to work with, but reducing your distributions to single numbers always discards information. Here I used the Mean (expected value) to reduce the distributions to single values, but you could also use other measures of centrality like the median or mode:

With
&#10005

With[{
  predictiveDist = fit["Posterior", "PredictiveDistribution"],
  pointEst = NormalDistribution[
    Mean@fit["Posterior",
       "RegressionCoefficientDistribution"].{1, \[FormalX]},
    Mean[TransformedDistribution[
      Sqrt[\[FormalV]], \[FormalV] \[Distributed]
       fit["Posterior", "ErrorDistribution"]]]
    ],
  bands = {0.95, 0.5, 0.05}
  },
 Show[
  regressionPlot1D[predictiveDist, {\[FormalX], -5, 5}],
  regressionPlot1D[pointEst, {\[FormalX], -5, 5}, PlotStyle -> Dashed,
    PlotLegends -> {"Point estimate"}],
  plot,
  PlotRange -> All,
  PlotLabel -> "Posterior predictive distribution"
  ]
 ]

So that covers fitting a straight line through the points. However, the shape of the data suggests that maybe a quadratic fit is more appropriate (I cooked up the example just for this purpose, of course), so this is a good moment to demonstrate how Bayesian model comparison works. I am going to fit polynomials up to degree 5 to the data, with each of these polynomials representing a model that could explain the data. From this micro-universe of possible explanations of the data, I want to select the best one.

The Bayesian Occam’s Razor

In Bayesian inference, the models have probability distributions in the same way that the regression coefficients , and have distributions. The posterior probability of a model depends on two factors:

  • A quantity commonly referred to as evidence or marginal likelihood. This measures how well a model fits the data while taking into account the uncertainties in the regression parameters such as , and . BayesianLinearRegression reports this quantity as the "LogEvidence": the higher the log-evidence, the better the model. The evidence automatically accounts for model complexity due to an effect that is sometimes referred to as the Bayesian Occams razor (see, for example, chapter 28 in this book by David MacKay).
  • The prior probability of the model.

Usually youll only want to consider models that, before seeing the data, you consider approximately equally likely. However, sometimes you do have strong prior information available that the data was generated by a certain model, and youd only accept other explanations if the evidence overwhelmingly disproves that model in favor of another.

For example, if the data shown previously came from a noisy (and possibly biased) measurement of current through a standard resistor as a function of voltage, I would have little doubt that Ohms law applies here and that fitting a straight line is the correct way to interpret the data. Its simply more likely that a little noise accidentally made the data look quadratic rather than that Ohm’s law suddenly stopped working in my experiment. In this case, I would assign a prior probability very close to 1 to Ohms law and divide the remaining sliver of probability among competing models Im willing to consider. This is in accordance with a principle called Cromwells rule, which was coined by Dennis Lindley:

Leave a little probability for the Moon being made of green cheese; it can be as small as 1 in a million, but have it there since otherwise an army of astronauts returning with samples of the said cheese will leave you unmoved.

So lets see how the fits of different polynomials look and what their evidence is:

models = AssociationMap
&#10005

models = AssociationMap[
   BayesianLinearRegression[data, \[FormalX]^
     Range[0, #], \[FormalX]] &,
   Range[0, 5]
   ];
Multicolumn[
 KeyValueMap[
  Show[
    regressionPlot1D[#2["Posterior",
      "PredictiveDistribution"], {\[FormalX], -5, 5}],
    plot,
    PlotRange -> All,
    PlotLabel ->
     StringForm[
      "Degree: `1`\nLog evidence: `2`", #1, #2["LogEvidence"] ]
    ] &,
  models
  ],
 2
 ]

As you can see, the first- and second-degree fits are in close competition for being the best model. If we assume that the prior probabilities for all models are the same, we can calculate the relative probabilities of the fits by exponentiating the log-evidences and normalizing them. For this operation, we can borrow SoftmaxLayer from the neural network framework:

calculateRelativeProbabilities
&#10005

calculateRelativeProbabilities[models_] := Module[{
    probabilities ,
    temp = SortBy[-#LogEvidence &]@models
    },
   probabilities =
    SoftmaxLayer[]@
     Replace[temp[[All, "LogEvidence"]],
      assoc_?AssociationQ :> Values[assoc]];
   temp[[All, "Probability"]] = probabilities;
   temp[[All, "AccumulatedProbability"]] = Accumulate[probabilities];
   temp
   ];

models = calculateRelativeProbabilities
&#10005

models = calculateRelativeProbabilities[models];
Dataset[models][All, {"LogEvidence", "Probability",
  "AccumulatedProbability"}]

In this table, I sorted the models by probabilities and then accumulated them so you can easily see how much total probability youve covered going from top to bottom. Keep in mind that these probabilities only mean anything in the micro-universe were currently looking at: the moment I would start adding more models, all of these probabilities could change. In Bayesian inference, you can only compare models youre willing to formulate in the first place: there is no universal measure that tells you if the fit of the data is actually good or not in some completely objective sense. All you can really hope for is to find the best model(s) from the group youre considering because the set of models youre not considering is impossible to compare against.

Interlude: Comparing against LinearModelFit

At this point, its also good to compare the fits by BayesianLinearRegression against those by LinearModelFit, which also presents several goodness-of-fit measures:

standardFits = AssociationMap
&#10005

standardFits = AssociationMap[
   LinearModelFit[
     data, \[FormalX]^Range[0, #], \[FormalX],
     ConfidenceLevel -> 0.90 (*
     To agree with our earlier choice of 95% and 5% prediction bands *)
\
     ] &,
   Range[0, 5]
   ];

With
&#10005

With[{
  keys = {"BIC", "AIC", "AICc", "AdjustedRSquared", "RSquared"}
  },
 AssociationThread[keys, #[keys]] & /@ standardFits // Dataset
 ]

As expected, the measure goes up with the degree of the model since it does not penalize for model complexity. All the other measures seem to favor the second-order model: keep in mind that better models have lower values for the Bayesian information criterion (or BIC, which is an approximation to the negative log-evidence), Akaike information criterion (AIC) and AIC corrected (for small sample size). For adjusted , the highest value indicates the best model. Lets also compare the prediction bands from LinearModelFit with the Bayesian ones for the second-order model:

With
&#10005

With[{n = 2, xmin = -5, xmax = 5},
 Show[
  regressionPlot1D[
   models[n, "Posterior", "PredictiveDistribution"], {\[FormalX],
    xmin, xmax}, {95, 5}],
  Plot[
   Evaluate[Reverse@standardFits[n]["SinglePredictionBands"]],
   {\[FormalX], xmin, xmax},
   Filling -> {1 -> {2}}, PlotStyle -> Dashed,
   PlotLegends -> {"LinearModelFit"},
   PlotLabel -> "Comparison of prediction bands"
   ],
  plot
  ]
 ]

As you can see, the confidence bands from LinearModelFit are slightly wider (and therefore more pessimistic) than the posterior prediction bands. The main reason for this difference is that the Bayesian prediction bands take into account all correlations in the posterior distribution between the model parameters and propagates these correlations into the predictions to narrow them down a little bit further. Another way to put it is that the Bayesian analysis does not discard information prematurely when calculating the prediction bands because it retains all intermediate distributions fully. This effect becomes more pronounced when less data is available, as the following fits illustrate (if youre skeptical about the role of the Bayesian prior in this example, I recommend you try this for yourself with decreasingly informative priors):

BlockRandom@Module
&#10005

BlockRandom@
 Module[{dat, fits, priorScale = 1/100000, ndat = 5,
   bands = Quantity[{95, 5}, "Percent"]},
  SeedRandom[2];
  dat = RandomVariate[BinormalDistribution[0.7], ndat];
  fits = {
    InverseCDF[
     BayesianLinearRegression[dat, \[FormalX], \[FormalX],
       "PriorParameters" -> <|"B" -> {0, 0},
         "Lambda" -> priorScale {{1, 0}, {0, 1}}, "V" -> priorScale,
         "Nu" -> priorScale|>
       ]["Posterior", "PredictiveDistribution"],
     bands
     ],
    Reverse@
     LinearModelFit[dat, \[FormalX], \[FormalX],
       ConfidenceLevel -> 0.90]["SinglePredictionBands"]
    };
  Show[
   Plot[Evaluate@fits[[1]], {\[FormalX], -2, 2},
    Filling -> {1 -> {2}}, PlotLegends -> bands],
   Plot[Evaluate@fits[[2]], {\[FormalX], -2, 2},
    Filling -> {1 -> {2}}, PlotStyle -> Dashed,
    PlotLegends -> {"LinearModelFit"}],
   ListPlot[dat, PlotStyle -> Red]
   ]
  ]

Mixing It Up

So now were in a position where we have several polynomial models, with two of them competing for the first position and no very clear winner. Which one do we choose? The Bayesian answer to this question is simple: why not both? Why not all? Were still working from a probabilistic perspective: the truth is simply somewhere in the middle, and theres no need to make a definite choice. Picking a model is also a form of point estimate, and weve seen before that point estimates discard potentially useful information. Instead, we can just split the difference in the same way that we did earlier, by averaging out over all possible values of , and while fitting a straight line to the data. The function MixtureDistribution is of great use here to combine the different posterior predictions into a new distribution. It is not even necessary to consider only the first- and second-order models: we can just combine all of them by weight:

mixDists = MixtureDistributionmixDists = MixtureDistribution
mixDists = MixtureDistribution
&#10005

mixDists = MixtureDistribution[
     Values@models[[All, "Probability"]],
     Values@models[[All, "Posterior", #]]
     ] & /@ {"PredictiveDistribution",
    "UnderlyingValueDistribution"};
mixDists
Show[
 regressionPlot1D[mixDists[[1]], {\[FormalX], -4, 4}],
 regressionPlot1D[mixDists[[2]], {\[FormalX], -4, 4},
  PlotLegends -> {"Underlying value"}, PlotStyle -> Dashed],
 plot,
 PlotRange -> All,
 PlotLabel -> "Predictions by mixture model up to degree 5"
 ]

(Note the custom compact formatting of MixtureDistribution from the BayesianInference package.) The underlying value bands highlight the hybridization of models particularly well.

And why even stop there? There are more models we could explore in the polynomial universe, so lets expand the horizons a bit. For example, why not try a fit like (i.e. without the constant offset)? There are 63 different polynomials of up to order 5 to try:

polynomialBases = DeleteCasespolynomialBases = DeleteCases
polynomialBases = DeleteCases
&#10005

polynomialBases = DeleteCases[{}]@Subsets[\[FormalX]^Range[0, 5]];
Short[polynomialBases]
Length[polynomialBases]

Here is a function that makes fitting these models a little easier and faster:

fitModels
&#10005

fitModels[data_, modelBases_, vars_] := Module[{
    models
    },
   models = Reverse@SortBy[#LogEvidence &]@ParallelMap[
       BayesianLinearRegression[data, #, vars,
         IncludeConstantBasis -> False] &,
       modelBases,
       Method -> "CoarsestGrained"
       ];
   calculateRelativeProbabilities[models]
   ];

Fit all possible polynomials up to degree 5, and view the 10 best fits:

models2 = fitModels
&#10005

models2 = fitModels[data, polynomialBases, \[FormalX]];
Dataset[models2][
 ;; 10,
 {"LogEvidence", "Probability", "AccumulatedProbability", "Basis"}
 ]

I can now calculate the prediction bands again, but its not really necessary to include all models in the mixture since only a few carry any significant weight. To make the distribution more manageable, I will throw away the least likely models that together account for percent of the total probability mass:

selection = Take
&#10005

selection = Take[models2,
   UpTo[LengthWhile[models2, #AccumulatedProbability <= 0.99 &] + 1]
   ];
With[{
  mixDist = MixtureDistribution[
      selection[[All, "Probability"]],
      selection[[All, "Posterior", #]]
      ] & /@ {"PredictiveDistribution",
     "UnderlyingValueDistribution"}
  },
 Show[
  regressionPlot1D[mixDist[[1]], {\[FormalX], -4, 4}],
  regressionPlot1D[mixDist[[2]], {\[FormalX], -4, 4},
   PlotLegends -> {"Underlying value"}, PlotStyle -> Dashed],
  plot,
  PlotRange -> All,
  PlotLabel -> StringForm[
    "Mixture model over `1` most likely polynomials (degree \
\[LessEqual] 5)",
    Length[selection]
    ]
  ]
 ]

It is interesting to see that models with no constant offset are strongly preferred, leading to a very narrow estimate of the underlying value near the origin. The extrapolation bands have also become wider than before, but thats just a warning about the dangers of extrapolation: if we really have the prior belief that all of these polynomials are equal contenders for explaining the data, then the wide extrapolation bands are a natural consequence of that assumption since there are so many different possible explanations of the data. Its better than the alternative: thinking you can extrapolate very accurately and then being proven wrong later, after having made important decisions based on false accuracy.

Additionally, its interesting to consider what we now know about the regression coefficients of the basis functions of our fit. In the following code, I compute the MarginalDistribution of each component of the regression coefficient distributions and visualize their credible intervals:

marginals = With
&#10005

marginals =
  With[{zeroDist =
     ProbabilityDistribution[
      DiracDelta[\[FormalX]], {\[FormalX], -\[Infinity], \
\[Infinity]}]},
   Association@Table[
     basefun -> MixtureDistribution[
       selection[[All, "Probability"]],
       Map[
        If[MemberQ[#Basis, basefun],
          MarginalDistribution[
           #["Posterior", "RegressionCoefficientDistribution"],
           First@FirstPosition[#Basis, basefun]
           ],
          zeroDist
          ] &,
        selection
        ]
       ],
     {basefun, \[FormalX]^Range[0, 5]}
     ]
   ];

intervalPlot[assoc_, bands_] := ListPlot[
   MapAt[Thread[Callout[#, Keys[assoc]]] &, Length[bands]]@Transpose[
     KeyValueMap[
      Thread@Tooltip[Quiet@InverseCDF[#2, bands], #1] &,
      assoc
      ]
     ],
   Filling -> {1 -> {2}, 3 -> {2}}, PlotLegends -> bands,
   Ticks -> {None, Automatic}, PlotRange -> All
   ];

Show[intervalPlot[marginals, Quantity[{99, 50, 1}, "Percent"]],
 PlotLabel -> "Credible intervals of coefficients"]

Compare this against the credible intervals you get when you fit a single fifth-degree polynomial rather than 63 different ones, which is significantly less informative:

Module
&#10005

Module[{fit5 =
   BayesianLinearRegression[data, \[FormalX]^Range[0, 5], \[FormalX]],
   marginals},
 marginals = AssociationMap[
   MarginalDistribution[
     fit5["Posterior", "RegressionCoefficientDistribution"], # +
      1] &,
   Range[0, 5]
   ];
 Show[intervalPlot[marginals, Quantity[{99, 50, 1}, "Percent"]],
  PlotLabel ->
   "Credible intervals of coefficients of single polynomial"]
 ]

The first plot shows us we can be quite sure that the and terms in the polynomial are 0, and we now also know the signs of the , and terms, but we can only come to these conclusions by trying out all possible polynomials and comparing them against each other.

This procedure is also quite different from another popular method for making fitting terms disappear, which is called LASSO FitRegularization:

FitFit
Fit
&#10005

Fit[data, \[FormalX]^Range[0, 5], \[FormalX],
 FitRegularization -> {"LASSO", 10}]
Show[Plot[%, {\[FormalX], -2.5, 2.5}], plot]

Interestingly, the LASSO penalty used here discards the quadratic term rather than the cubic one, but the result depends on the strength of the regularization strength (which I somewhat arbitrarily set to 10). The main difference between these two procedures is that at no point during the Bayesian analysis did I have to apply an artificial penalty for models with more terms: it is a result that simply drops out after considering all the possibilities and working out their probabilities.

Find Your Own Fit

I hope that this exploration of Bayesian regression was as useful for you to read as it was for me to write. In case youre interested in the underlying mathematics used by BayesianLinearRegression, you can read more about Bayesian linear regression, Bayesian multivariate regression and conjugate priors.

Check out BayesianLinearRegression in the Wolfram Function Repository or download the BayesianInference package, and see how you can use Bayesian regression to better model your own data. Im excited to see the different ways youre using BayesianLinearRegressionshare your results and tips on Wolfram Community!

 (4)


 
 RSS- () — RSSfeedReader
: 10
:

Computational Thinking (2)
Current Events (1)
Data Analysis and Visualization (5)
Developer Insights (1)
Education (3)
Featured (1)
Function Repository (1)
Mathematica News (1)
Mathematics (1)
Other Application Areas (1)
Recreational Computation (1)
SystemModeler (2)
Wolfram Cloud (1)
Wolfram Community (1)
Wolfram Language (1)
Wolfram Notebooks (2)
Wolfram|Alpha (1)
:

2019-10-10, . (1)
2019-10-08, . (1)
2019-10-04, . (1)
2019-10-01, . (1)
2019-09-24, . (1)
2019-09-17, . (1)
2019-09-12, . (1)
2019-09-05, . (1)
2019-08-29, . (1)
2019-08-22, . (1)
:

Ankit Naik (1)
Brian Wood (2)
Chapin Langenheim (1)
Daniel Lichtblau (1)
Jan Poeschko (1)
Sjoerd Smit (1)
Stephen Wolfram (2)
Suba Thomas (1)