Join us Wednesday, October 17, 2018, from 9:30–11:30pm CT for an exciting adventure in livecoding! During our annual Wolfram Technology Conference, we put our internal experts and guests to the test. Coding questions ranging from physics to pop culture, image processing to visualizations, and all other things challenging will be posed to participants live.

Who will take home the trophy belt this year? A senior developer from our Machine Learning group? A high-school kid with serious coding chops? You? Now in its third year, the Wolfram Livecoding Championship promises to be bigger and better than ever. The event is concurrently livestreamed on Twitch and YouTube Live, so if you’re not able to be here in person, we’d love to see you on the stream. The livestream will also be available on Stephen Wolfram’s Twitch channel, with a special livestreamed introduction from Stephen himself. See last year’s competition and get a taste of what the event has to offer:

New this year will be running commentary on competitors’ progress as they each take their own unique approach to problem solving, highlighting the depth and breadth of possibilities in the Wolfram Language.

Stay tuned for more competitions, and we hope to see you there!

Between October 1787 and April 1788, a series of essays was published under the pseudonym of “Publius.” Altogether, 77 appeared in four New York City periodicals, and a collection containing these and eight more appeared in book form as The Federalist soon after. As of the twentieth century, these are known collectively as The Federalist Papers. The aim of these essays, in brief, was to explain the proposed Constitution and influence the citizens of the day in favor of ratification thereof. The authors were Alexander Hamilton, James Madison and John Jay.

On July 11, 1804, Alexander Hamilton was mortally wounded by Aaron Burr, in a duel beneath the New Jersey Palisades in Weehawken (a town better known in modern times for its tunnels to Manhattan and Alameda). Hamilton died the next day. Soon after, a list he had drafted became public, claiming authorship of more than sixty essays. James Madison publicized his claims to authorship only after his term as president had come to an end, many years after Hamilton’s death. Their lists overlapped, in that essays 49–58 and 62–63 were claimed by both men. Three essays were claimed by each to have been collaborative works, and essays 2–5 and 64 were written by Jay (intervening illness being the cause of the gap). Herein we refer to the 12 claimed by both men as “the disputed essays.”

Debate over this authorship, among historians and others, ensued for well over a century. In 1944 Douglass Adair published “ The Authorship of the Disputed Federalist Papers,” wherein he proposed that Madison had been the author of all 12. It was not until 1963, however, that a statistical analysis was performed. In “ Inference in an Authorship Problem,” Frederick Mosteller and David Wallace concurred that Madison had indeed been the author of all of them. An excellent account of their work, written much later, is Mosteller’s “ Who Wrote the Disputed Federalist Papers, Hamilton or Madison?.” His work on this had its beginnings also in the 1940s, but it was not until the era of “modern” computers that the statistical computations needed could realistically be carried out.

Since that time, numerous analyses have appeared, and most tend to corroborate this finding. Indeed, it has become something of a standard for testing authorship attribution methodologies. I recently had occasion to delve into it myself. Using this technology, developed in the Wolfram Language, I will show results for the disputed essays that are mostly in agreement with this consensus opinion. Not entirely so, however—there is always room for surprise. Brief background: in early 2017 I convinced Catalin Stoean, a coauthor from a different project, to work with me in developing an authorship attribution method based on the Frequency Chaos Game Representation (FGCR) and machine learning. Our paper “ Text Documents Encoding through Images for Authorship Attribution” was recently published, and will be presented at SLSP 2018. The method outlined in this blog comes from this recent work.

Stylometry

The idea that rigorous, statistical analysis of text might be brought to bear on determination of authorship goes back at least to Thomas Mendenhall’s “ The Characteristic Curves of Composition” in 1887 (earlier work along these lines had been done, but it tended to be less formal in nature). The methods originally used mostly involved comparisons of various statistics, such as frequencies for sentence or word length (that latter in both character and syllable counts), frequency of usage of certain words and the like. Such measures can be used because different authors tend to show distinct characteristics when assessed over many such statistics. The difficulty encountered with the disputed essays was that, by measures then in use, the authors were in agreement to a remarkable extent. More refined measures were needed.

Modern approaches to authorship attribution are collectively known as “ stylometry.” Most approaches fall into one or more of the following categories: lexical characteristics (e.g. word frequencies, character attributes such as n-gram frequencies, usage of white space), syntax (e.g. structure of sentences, usage of punctuation) and semantic features (e.g. use of certain uncommon words, relative frequencies of members of synonym families).

Among advantages enjoyed by modern approaches, there is the ready availability on the internet of large corpora, and the increasing availability (and improvement) of powerful machine learning capabilities. In terms of corpora, one can find all manner of texts, newspaper and magazine articles, technical articles and more. As for machine learning, recent breakthroughs in image recognition, speech translation, virtual assistant technology and the like all showcase some of the capabilities in this realm. The past two decades have seen an explosion in the use of machine learning (dating to before that term came into vogue) in the area of authorship attribution.

A typical workflow will involve reading in a corpus, programmatically preprocessing to group by words or sentences, then gathering various statistics. These are converted into a format, such as numeric vectors, that can be used to train a machine learning classifier. One then takes text of known or unknown authorship (for purposes of validation or testing, respectively) and performs similar preprocessing. The resulting vectors are classified by the result of the training step.

We will return to this after a brief foray to describe a method for visualizing DNA sequences.

The Chaos Game Representation

Nearly thirty years ago, H. J. Jeffrey introduced a method of visualizing long DNA sequences in “ Chaos Game Representation of Gene Structure.” In brief, one labels the four corners of a square with the four DNA nucleotide bases. Given a sequence of nucleotides, one starts at the center of this square and places a dot halfway from the current spot to the corner labeled with the next nucleotide in the sequence. One continues placing dots in this manner until the end of a sequence of nucleotides is reached. This in effect makes nucleotide strings into instruction sets, akin to punched cards in mechanized looms.

One common computational approach is slightly different. It is convenient to select a level of pixelation, such that the final result is a rasterized image. The actual details go by the name of the Frequency Chaos Game Representation, or FCGR for short. In brief, a square image space is divided into discrete boxes. The gray level in the resulting image of each such pixelized box is based on how many points from chaos game representation (CGR) land in it.

It turns out that such images do not tend to vary much from others created from the same nucleotide sequence. For example, the previous images were created from the initial subsequences of length 150,000 from their respective chromosomes. Corresponding images from the final subsequences of corresponding chromosomes are shown here:

As is noted in the referenced article, dimension-reduction methods can now be used on such images, for the purpose of creating a “nearest image” lookup capability. This can be useful, say, for quick identification of the approximate biological family a given nucleotide sequence belongs to. More refined methods can then be brought to bear to obtain a full classification. (It is not known whether image lookup based on FCGR images is alone sufficient for full identification—to the best of my knowledge, it has not been attempted on large sets containing closer neighbor species than the six shown in this section). It perhaps should go without saying (but I’ll note anyway) that even without any processing, the Wolfram Language function Nearest will readily determine which images from the second set correspond to similar images from the first.

FCGR on Text

A key aspect to CGR is that it uses an alphabet of length four. This is responsible for a certain fractal effect in that blocks from each quadrant tend to be approximately repeated in nested subblocks in corresponding nested subquadrants. In order to obtain an alphabet of length four, it was convenient to use multiple digits from a power of four. Some experiments indicated that an alphabet of length 16 would work well. Since there are 26 characters in the English version of the Latin alphabet, as well as punctuation, numeric characters, white space and more, some amount of merging was done, with the general idea that “similar” characters could go into the same overall class. For example, we have one class comprised of {c,k,q,x,z}, another of {b,d,p} and so on. This brought the modified alphabet to 16 characters. Written in base 4, the 16 possibilities give all possible pairs of digits in base 4. The string of base 4 digits thus produced is then used to produce an image from text.

For relatively short texts, up to a few thousand characters, say, we simply create one image. Longer texts we break into chunks of some specified size (typically in the range of 2,000–10,000 characters) and make an image for each such chunk. Using ExampleData["Text"] from the Wolfram Language, we show the result for the first and last chunks from Alice in Wonderland and Pride and Prejudice, respectively:

While there is not so much for the human eye to discern between these pairs, machine learning does quite well in this area.

Authorship Attribution Using FCGR

The paper with Stoean provides details for a methodology that has proven to be best from among variations we have tried. We use it to create one-dimensional vectors from the two-dimensional image arrays; use a common dimension reduction via the singular-value decomposition to make the sizes manageable; and feed the training data, thus vectorized, into a simple neural network. The result is a classifier that can then be applied to images from text of unknown authorship.

While there are several moving parts, so to speak, the breadth of the Wolfram Language make this actually fairly straightforward. The main tools are indicated as follows:

2. StringDrop, StringReplace and similar string manipulation functions, used for removing initial sections (as they often contain identifying information) and to do other basic preprocessing.

3. Simple replacement rules to go from text to base 4 strings.

4. Simple code to implement FCGR, such as can be found in the Community forum.

6. Machine learning functionality, at a fairly basic level (which is the limit of what I can handle). The functions I use are NetChain and NetTrain, and both work with a simple neural net.

7. Basic statistics functions such as Total, Sort and Tally are useful for assessing results.

Common practice in this area is to show results of a methodology on one or more sets of standard benchmarks. We used three such sets in the referenced paper. Two come from Reuters articles in the realm of corporate/industrial news. One is known as Reuters_50_50 (also called CCAT50). It has fifty authors represented, each with 50 articles for training and 50 for testing. Another is a subset of this, comprised of 50 training and 50 testing articles from ten of the fifty authors. One might think that using both sets entails a certain level of redundancy, but, perhaps surprisingly, past methods that perform very well on either of these tend not to do quite so well on the other. We also used a more recent set of articles, this time in Portuguese, from Brazilian newspapers. The only change to the methodology that this necessitated involved character substitutions to handle e.g. the “c?with?cedilla” character ç.

Results of this approach were quite strong. As best we could find in prior literature, scores equaled or exceeded past top scores on all three datasets. Since that time, we have applied the method to two other commonly used examples. One is a corpus comprised of IMDb reviews from 62 prolific reviewers. This time we were not the top performer, but came in close behind two other methods. Each was actually a “hybrid” comprised of weighted scores from some submethods. (Anecdotally, our method seems to make different mistakes from others, at least in examples we have investigated closely. This makes it a sound candidate for adoption in hybridized approaches.) As for the other new test, well, that takes us to the next section.

The Federalist Papers

We now return to The Federalist Papers. The first step, of course, is to convert the text to images. We show a few here, created from first and last chunks from two essays. The ones on the top are from Federalist No. 33 (Hamilton) while those on the bottom are from Federalist No. 44 (Madison). Not surprisingly, they are not different in the obvious ways that the genome?based images were different:

Before attempting to classify the disputed essays, it is important to ascertain that the methodology is sound. This requires a validation step. We proceeded as follows: We begin with those essays known to have been written by either Hamilton or Madison (we discard the three they coauthored, because there is not sufficient data therein to use). We hold back three entire essays from those written by Madison, and eight from the set by Hamilton (this is in approximate proportion to the relative number each penned). These withheld essays will be our first validation set. We also withhold the final chunk from each of the 54 essays that remain, to be used as a second validation set. (This two?pronged validation appears to be more than is used elsewhere in the literature. We like to think we have been diligent.)

The results for the first validation set are perfect. Every one of the 70 chunks from the withheld essays are ascribed to their correct author. For the second set, two were erroneously ascribed. The scores for most chunks have the winner around four to seven times higher than the loser. For the two that were mistaken, these ratios dropped considerably, in one case to a factor of three and in the other to around 1.5. Overall, even with the two misses, these are extremely good results as compared to methods reported in past literature. I will remark that all processing, from importing the essays through classifying all chunks, takes less than half a minute on my desktop machine (with the bulk of that occupied in multiple training runs of the neural network classifier).

In order to avail ourselves of the full corpus of training data, we next merge the validation chunks into the training set and retrain. When we run the classifier on chunks from the disputed essays, things are mostly in accordance with prior conclusions. Except…

The first ten essays go strongly to Madison. Indeed, every chunk therein is ascribed to him. The last two go to Hamilton, albeit far less convincingly. A typical aggregated score for one of the convincing outcomes might be approximately 35:5 favoring Madison, whereas for the last two that go to Hamilton the scores are 34:16 and 42:27, respectively. A look at the chunk level suggests a perhaps more interesting interpretation. Essay 62, the next?to?last, has the five-chunk score pairs shown here (first is Hamilton’s score, then Madison’s):

Three are fairly strongly in favor of Hamilton as author (one of which could be classified as overwhelmingly so). The second and fourth are quite close, suggesting that despite the ability to do solid validation, these might be too close to call (or might be written by one and edited by the other).

The results from the final disputed essay are even more stark:

The first four chunks go strongly to Hamilton. The next two go strongly to Madison. The last also favors Madison, albeit weakly. This would suggest again a collaborative effort, with Hamilton writing the first part, Madison roughly a third and perhaps both working on the final paragraphs.

The reader will be reminded that this result comes from but one method. In its favor is that it performs extremely well on established benchmarks, and also in the validation step for the corpus at hand. On the counter side, many other approaches, over a span of decades, all point to a different outcome. That stated, we can mention that most (or perhaps all) prior work has not been at the level of chunks, and that granularity can give a better outcome in cases where different authors work on different sections. While these discrepancies with established consensus are of course not definitive, they might serve to prod new work on this very old topic. At the least, other methods might be deployed at the granularity of the chunk level we used (or similar, perhaps based on paragraphs), to see if parts of those essays 62 and 63 then show indications of Hamilton authorship.

Dedication

To two daughters of Weehawken. My wonderful mother?in?law, Marie Wynne, was a library clerk during her working years. My cousin Sharon Perlman (1953–2016) was a physician and advocate for children, highly regarded by peers and patients in her field of pediatric nephrology. Her memory is a blessing.

People from around the globe continue to join Wolfram Community, our tech-oriented social network, which now surpasses 19,000 members. Along with an improved platform design, we have also introduced new features—now, discussions contain statistics of likes, views and comments, so when your post becomes popular you can showcase the metrics of your success. Sharing has also become easier with an in-discussion, social media–sharing toolbar. We’ve introduced skills and job opportunities in member profiles, so keep yours up to date—it might be quite beneficial for your networking and career.

Take a look at some of the posts making Wolfram Community so popular. We’d love to see you posting your Wolfram technology–based projects too!

How does a neural network “see the world” if it has only been trained on beautiful images? Marco Thiel, a professor from the University of Aberdeen, UK, shows how easy it is to answer this not-so-easy question with the Wolfram Language. The diversity of models in the Wolfram Neural Net Repository and elegant architecture of the Wolfram Language across various domains makes this usually laborious project a breeze.

When processing natural language (as with automatic speech recognition), the generated text is often not punctuated. This can lead to problems during further analysis. Mengyi Shan, a Wolfram Summer School student, works with the Wolfram Language in training ten neural networks to recognize where commas and periods should appear. This post received attention from news outlets around the world.

In August 2018, an exceptionally strong storm caused a large suspension bridge in Genoa, Italy, to collapse, killing at least 43 people. Professor Marco Thiel comes back to explore a computational approach to understanding infrastructural issues, using Germany as an example. With just a few lines of Wolfram Language code, you can determine where unsafe bridges are grouped, the correlation between a bridge’s age and its safety level, and how much infrastructure spending has changed within a given period of time.

The ambiguous circle illusion left people with lots of questions. Erik Mahieu uses the Wolfram Language to create an educational analysis for 3D-printed models that produces the illusion in the physical world. His demonstration walks you through the steps from the initial Manipulate to the finished, printed product.

It’s inspiring to see Wolfram artificial intelligence technology empowering real-world research on stem cells, such as at the Developmental Biology Institute of Marseille. Doctoral student Ali Hashmi shares his research advances and neural network design, and expresses appreciation for the Wolfram development team for the efficiency of the Wolfram machine learning framework.

Recently, a paper was published that discusses a fascinating hashing algorithm based on fluid mechanics, and that mentions that all calculations were carried out using the Wolfram Language. As no notebook supplement was given, Wolfram’s Michael Trott reproduced some of the computations from the paper. This post is of particular interest to fans of stunning graphics and captivating computational storytelling.

During the Wolfram High School Summer Camp, Paolo Lammens developed a tool to identify chord sequences in music to create a corresponding graph. This represents all unique chords as vertices and connects all pairs of chronologically subsequent chords with a directed edge. Using MIDI files, Paolo shows every step of the visualization process.

If you haven’t yet signed up to be a member of Wolfram Community, please do so! You can join in on similar discussions, post your own work in groups of your interest and browse the complete list of Staff Picks.

In past blog posts, we’ve talked about the Wolfram Language’s built-in, high-level functionality for 3D printing. Today we’re excited to share an example of how some more general functionality in the language is being used to push the boundaries of this technology. Specifically, we’ll look at how computation enables 3D printing of very intricate sugar structures, which can be used to artificially create physiological channel networks like blood vessels.

Let’s think about how 3D printing takes a virtual design and brings it into the physical world. You start with some digital or analytical representation of a 3D volume. Then you slice it into discrete layers, and approximate the volume within each layer in a way that maps to a physical printing process. For example, some processes use a digital light projector to selectively polymerize material. Because the projector is a 2D array of pixels that are either on or off, each slice is represented by a binary bitmap. For other processes, each layer is drawn by a nozzle or a laser, so each slice is represented by a vector image, typically with a fixed line width.

In each case, the volume is represented as a stack of images, which, again, is usually an approximation of the desired design. Greater fidelity can be achieved by increasing the resolution of the printer—that is, the smallest pixel or thinnest line it can create. However, there is a practical limit, and sometimes a physical limit to the resolution. For example, in digital light projection a pixel cannot be made much smaller than the wavelength of the light used. Therefore, for some kinds of designs, it’s actually easier to achieve higher fidelity by modifying the process itself. Suppose, for example, you want to make a connected network of cylindrical rods with arbitrary orientation (there is a good reason to do this—we’ll get to that). Any process based on layers or pixels will produce some approximation of the cylinders. You might instead devise a process that is better suited to making this shape.

The Fused Deposition Modeling Algorithm

One type of 3D printing, termed fused deposition modeling, deposits material through a cylindrical nozzle. This is usually done layer by layer, but it doesn’t have to be. If the nozzle is translated in 3D, and the material can be made to stiffen very quickly upon exiting, then you have an elegant way of making arbitrarily oriented cylinders. If you can get new cylinders to stick to existing cylinders, then you can make very interesting things indeed. This non-planar deposition process is called direct-write assembly, wireframe printing or free-form 3D printing.

Things that you would make using free-form 3D printing are best represented not as solid volumes, but as structural frames. The data structure is actually a graph, where the nodes of the graph are the joints, and the edges of the graph are the beams in the frame. In the following image, you’ll see the conversion of a model to a graph object. Directed edges indicate the corresponding beam can only be drawn in one direction. An interesting computational question is, given such a frame, how do you print it? More precisely, given a machine that can “draw” 3D beams, what sequence of operations do you command the machine to perform?

First, we can distinguish between motions where we are drawing a beam and motions where we are moving the nozzle without drawing a beam. For most designs, it will be necessary to sometimes move the nozzle without drawing a beam. In this discussion, we won’t think too hard about these non-printing motions. They take time, but, at least in this example, the time it takes to print is not nearly as important as whether the print actually succeeds or fails catastrophically.

We can further define the problem as follows. We have a set of beams to be printed, and each beam is defined by two joints, . Give a sequence of beams and a printing direction for each beam (i.e. ) that is consistent with the following constraints:

1) Directionality: for each beam, we need to choose a direction so that the nozzle doesn’t collide with that beam as it’s printed.

2) Collision: we have to make sure that as we print each beam, we don’t hit a previously printed beam with the nozzle.

3) Connection: we have to start each beam from a physical surface, whether that be the printing substrate or an existing joint.

Let’s pause there for a moment. If these are the only three constraints, and there are only three axes of motion, then finding a sequence that is consistent with the constraints is straightforward. To determine whether printing beam B would cause a collision with beam A, we first generate a volume by sweeping the nozzle shape along the path coincident with beam B to form the 3D region . If RegionDisjoint[R, A] is False, then printing beam B would cause a collision with beam A. This means that beam A has to be printed first.

Here’s an example from the RegionDisjoint reference page to help illustrate this. Red walls collide with the cow and green walls do not:

Mimicking the logic from this example, we can make a function that takes a swept nozzle and finds the beams that it collides with. Following is a Wolfram Language command that visualizes nozzle-beam collisions. The red beams must be drawn after the green one to avoid contact with the blue nozzle as it draws the green beam:

✕

HighlightNozzleCollisions[,{{28,0,10},{23,0,10}}]

For a printer with three axes of motion, it isn’t particularly difficult to compute collision constraints between all the pairs of beams. We can actually represent the constraints as a directed graph, with the nodes representing the beams, or as an adjacency matrix, where a 1 in element (, ) indicates that beam must precede beam . Here’s the collision matrix for the bridge:

A feasible sequence exists, provided this precedence graph is acyclic. At first glance, it may seem that a topological sort will give such a feasible sequence; however, this does not take the connection constraint into consideration, and therefore non-anchored beams might be sequenced. Somewhat surprisingly, TopologicalSort can often yield a sequence with very few connection violations. For example, in the topological sort, only the 12th and 13th beams violate the connection constraint:

Instead, to consider all three aforementioned constraints, you can build a sequence in the following greedy manner. At each step, print any beam such that: (a) the beam can be printed starting from either the substrate or an existing joint; and (b) all of the beam’s predecessors have already been printed. There’s actually a clever way to speed this up: go backward. Instead of starting at the beginning, with no beams printed, figure out the last beam you’d print. Remove that last beam, then repeat the process. You don’t have to compute collision constraints for a beam that’s been removed. Keep going until all the beams are gone, then just print in the reverse removal order. This can save a lot of time, because this way you never have to worry about whether printing one beam will make it impossible to print a later beam due to collision. For a three-axis printer this isn’t a big deal, but for a four- or five-axis robot arm it is.

So the assembly problem under collision, connection and directionality constraints isn’t that hard. However, for printing processes where the material is melted and solidifies by cooling, there is an additional constraint. This is shown in the following video:

See what happened? The nozzle is hot, and it melts the existing joint. Some degree of melting is unfortunately necessary to fuse new beams to existing joints. We could add scaffolding or try to find some physical solution, but we can circumvent it in many cases by computation alone. Specifically, we can find a sequence that is not only consistent with collision, connection and directionality constraints, but that also never requires a joint to simultaneously support two cantilevered beams. Obviously some things, like the tree we tried to print previously, are impossible to print under this constraint. However, it turns out that some very intimidating-looking designs are in fact feasible.

We approach the problem by considering the assembly states. A state is just the set of beams that has been assembled, and contains no information about the order in which they were assembled. Our goal is to find a path from the start state to the end state. Because adjacent states differ by the presence of a single beam, each path corresponds to a unique assembly sequence. For small designs, we can actually generate the whole graph. However, for large designs, exhaustively enumerating the states would take forever. For illustrative purposes, here’s a structure where the full assembly state is small enough to enumerate. Note that some states are unreachable or are a dead end:

Note that, whether you start at the beginning and go forward or start at the end and work backward, you can find yourself in a dead end. These dead ends are labeled G and H in the figure. There might be any number of dead ends, and you may have to visit all of them before you find a sequence that works. You might never find a sequence that works! This problem is actually NP complete—that is, you can’t know if there is a feasible sequence without potentially trying all of them. The addition of the cantilever constraint is what makes the problem hard. You can’t say for sure if printing a beam is going to make it impossible to assemble another beam later. What’s more, going backward doesn’t solve that problem: you can’t say for sure if removing a beam is going to make it impossible to remove a beam later due to the cantilever constraint.

The key word there is “potentially.” Usually you can find a sequence without trying everything. The algorithm we developed searches the assembly graph for states that don’t contain cantilevers. If you get to one of these states, it doesn’t mean a full sequence exists. However, it does mean that if a sequence exists, you can find one without backtracking past this particular cantilever-free state. This essentially divides the problem into a series of much smaller NP-complete graph search problems. Except in contrived cases, these can be solved quickly, enabling construction of very intricate models:

✕FindFreeformPath[,Monitor->Full]

So that mostly solves the problem. However, further complicating matters is that these slender beams are about as strong as you might expect. Gravity can deform the construct, but there is actually a much larger force attributable to the flow of material out of the nozzle. This force can produce catastrophic failure, such as the instability shown here:

However, it turns out that intelligent sequencing can solve this problem as well. Using models developed for civil engineering, it is possible to compute at every potential step the probability that you’re going to break your design. The problem then becomes not one of finding the shortest path to the goal, but of finding the safest path to the goal. This step requires inversion of large matrices and is computationally intensive, but with the Wolfram Language’s fast built-in solvers, it becomes feasible to perform this process hundreds of thousands of times in order to find an optimal sequence.

Use Cases

So that’s the how. The next question is, “Why?” Well, the problem is simple enough. Multicellular organisms require a lot of energy. This energy can only be supplied by aerobic respiration, a fancy term for a cascade of chemical reactions. These reactions use oxygen to produce the energy required to power all higher forms of life. Nature has devised an ingenious solution: a complex plumbing system and an indefatigable pump delivering oxygen-rich blood to all of your body’s cells, 24/7. If your heart doesn’t beat at least once every couple seconds, your brain doesn’t receive enough oxygen-rich blood to maintain consciousness.

We don’t really understand super-high-level biological phenomena like consciousness. We can’t, as far as we can tell, engineer a conscious array of cells, or even of transistors. But we understand pretty well the plumbing that supports consciousness. And it may be that if we can make the plumbing and deliver oxygen to a sufficiently thick slab of cells, we will see some emergent phenomena. A conscious brain is a long shot, a functional piece of liver or kidney decidedly less so. Even a small piece of vascularized breast or prostate tissue would be enormously useful for understanding how tumors metastasize.

The problem is, making the plumbing is hard. Cells in a dish do self-organize to an extent, but we don't understand such systems well enough to tell a bunch of cells to grow into a brain. Plus, as noted, growing a brain sort of requires attaching it to a heart. Perhaps if we understand the rules that govern the generation of biological forms, we can generate them at will. We know that with some simple mathematical rules, one can generate very complex, interesting structures—the stripes on a zebra, the venation of a leaf. But going backward, reverse-engineering the rule from the form, is hard, to say the least. We have mastered the genome and can program single cells, but we are novices at best when it comes to predicting or programming the behavior of cellular ensembles.

An alternative means of generating biological forms like vasculature is a bit cruder—just draw the form you want, then physically place all the cells and the plumbing according to your blueprint. This is bioprinting. Bioprinting is exciting because it reduces the generation of biological forms into a set of engineering problems. How do we make a robot put all these cells in the right place? These days, any sentence that starts with “How do we make a robot...” probably has an answer. In this case, however, the problem is complicated by the fact that, while the robot or printer is working, the cells that have already been assembled are slowly dying. For really big, complex tissues, either you need to supply oxygen to the tissue as you assemble it or you need to assemble it really fast.

One approach of the really fast variety was demonstrated in 2009. Researchers at Cornell used a cotton candy machine to melt-spin a pile of sugar fibers. They cast the sugar fibers in a polymer, dissolved them out with water and made a vascular network in minutes, albeit with little control over the geometry. A few years later, researchers at University of Pennsylvania used a hacked desktop 3D printer to draw molten sugar fibers into a lattice and show that the vascular casting approach was compatible with a variety of cell-laden gels. This was more precise, but not quite free-form. The next step, undertaken in a collaboration between researchers at the University of Illinois at Urbana–Champaign and Wolfram Research, was to overcome the physical and computational barriers to making really complex designs—in other words, to take sugar printing and make it truly free-form.

We’ve described the computational aspects of free-form 3D printing in the first half of this post. The physical side is important too.

First, you need to make a choice of material. Prior work has used glucose or sucrose—things that are known to be compatible with cells. The problem with these materials is twofold: One, they tend to burn. Two, they tend to crystallize while you’re trying to print. If you’ve ever left a jar of honey or maple syrup out for a long time, you can see crystallization in action. Crystals will clog your nozzle, and your print will fail. Instead of conventional sugars, this printer uses isomalt, a low-calorie sugar substitute. Isomalt is less prone to burning or crystallizing than other sugar-like materials, and it turns out that cells are just as OK with isomalt as they are with real sugar.

Next, you need to heat the isomalt and push it out of a tiny nozzle under high pressure. You have to draw pretty slowly—the nozzle moves about half a millimeter per second—but the filament that is formed coincides almost exactly with the path taken by the nozzle. Right now it’s possible to be anywhere from 50 to 500 micrometers, a very nice range for blood vessels.

So the problems of turning a design into a set of printer instructions, and of having a printer that is sufficiently precise to execute them, are more or less solved. This doesn’t mean that 3D-printed organs are just around the corner. There are still problems to be solved in introducing cells in and around these vascular molds. Depending on the ability of the cells to self-organize, dumping them around the mold or flowing them through the finished channels might not be good enough. In order to guide development of the cellular ensemble into a functional tissue, more precise patterning may be required from the outset; direct cell printing would be one way to do this. However, our understanding of self-organizing systems increases every day. For example, last year researchers reproduced the first week of mouse embryonic development in a petri dish. This shows that in the right environment, with the right mix of chemical signals, cells will do a lot of the work for us. Vascular networks deliver oxygen, but they can also deliver things like drugs and hormones, which can be used to poke and prod the development of cells. In this way, bioprinting might enable not just spatial but also temporal control of the cells’ environment. It may be that we use the vascular network itself to guide the development of the tissue deposited around it. Cardiologists shouldn’t expect a 3D-printed heart for their next patients, but scientists might reasonably ask for a 3D-printed sugar scaffold for their next experiments.

So to summarize, isomalt printing offers a route to making interesting physiological structures. Making it work requires a certain amount of mechanical and materials engineering, as one might expect, but also a surprising amount of computational engineering. The Wolfram Language provides a powerful tool for working with geometry and physical models, making it possible to extend free-form bioprinting to arbitrarily large and complex designs.

To learn more about our work, check out our papers: a preprint regarding the algorithm (to appear in IEEE Transactions on Automation Science and Engineering), and another preprint regarding the printer itself (published in Additive Manufacturing).

Acknowledgements

This work was performed in the Chemical Imaging and Structures Laboratory under the principal investigator Rohit Bhargava at the University of Illinois at Urbana–Champaign.

Matt Gelber was supported by fellowships from the Roy J. Carver Charitable Trust and the Arnold and Mabel Beckman Foundation. We gratefully acknowledge the gift of isomalt and advice on its processing provided by Oliver Luhn of S?dzucker AG/BENEO-Palatinit GmbH. The development of the printer was supported by the Beckman Institute for Advanced Science and Technology via its seed grant program.

We also would like to acknowledge Travis Ross of the Beckman Institute Visualization Laboratory for help with macro-photography of the printed constructs. We also thank the contributors of the CAD files on which we based our designs: GrabCAD user M. G. Fouch?, 3D Warehouse user Damo and Bibliocas user limazkan (Javier Mdz). Finally, we acknowledge Seth Kenkel for valuable feedback throughout this project.

Today I am proud to announce a free interactive course, Introduction to Calculus, hosted on Wolfram s learning hub, Wolfram U! The course is designed to give a comprehensive introduction to fundamental concepts in calculus such as limits, derivatives and integrals. It includes 38 video lessons along with interactive notebooks that offer examples in the Wolfram Cloud—all for free. This is the second of Wolfram U s fully interactive free online courses, powered by our cloud and notebook technology.
This introduction to the profound ideas that underlie calculus will help students and learners of all ages anywhere in the world to master the subject. While the course requires no prior knowledge of the Wolfram Language, the concepts illustrated by the language are geared toward easy reader comprehension due to its human-readable nature. Studying calculus through this course is a good way for high-school students to prepare for AP Calculus AB.
As a former classroom teacher with more than ten years of experience in teaching calculus, I was very excited to have the opportunity to develop this course. My philosophy in teaching calculus is to introduce the basic concepts in a geometrical and intuitive way, and then focus on solving problems that illustrate the applications of these concepts in physics, economics and other fields. The Wolfram Language is ideally suited for this approach, since it has excellent capabilities for graphing functions, as well as for all types of computation.
To create this course, I worked alongside John Clark, a brilliant young mathematician who did his undergraduate studies at Caltech and produced the superb notebooks that constitute the text for the course.
Lessons
The heart of the course is a set of 38 lessons, beginning with “What is Calculus?”. This introductory lesson includes a discussion of the problems that motivated the early development of calculus, a brief history of the subject and an outline of the course. The following is a short excerpt from the video for this lesson.
Further lessons begin with an overview of the topic (for example, optimization), followed by a discussion of the main concepts and a few examples that illustrate the ideas using Wolfram Language functions for symbolic computation, visualization and dynamic interactivity.
The videos range from 8 to 17 minutes in length, and each video is accompanied by a transcript notebook displayed on the right-hand side of the screen. You can copy and paste Wolfram Language input directly from the transcript notebook to the scratch notebook to try the examples for yourself. If you want to pursue any topic in greater depth, the full text notebooks prepared by John Clark are also provided for further self-study. In this way, the course allows for a variety of learning styles, and I recommend that you combine the different resources (videos, transcripts and full text) for the best results.
Exercises
Each lesson is accompanied by a small set of (usually five) exercises to reinforce the concepts covered during the lesson. Since this course is designed for independent study, a detailed solution is given for all exercises. In my experience, such solutions often serve as models when students try to write their own for similar problems.
The following shows an exercise from the lesson on volumes of solids:
Like the rest of the course, the notebooks with the exercises are interactive, so students can try variations of each problem in the Wolfram Cloud, and also rotate graphics such as the bowl in the problem shown (in order to view it from all angles).
Problem Sessions
The calculus course includes 10 problem sessions that are designed to review, clarify and extend the concepts covered during the previous lessons. There is one session at the end of every 3 or 4 lessons, and each session includes around 14 problems.
As in the case of exercises, complete solutions are presented for each problem. Since the Wolfram Language automates the algebraic and numerical calculations, and instantly produces illuminating plots, problems are discussed in rapid succession during the video presentations. The following is an excerpt of the video for Problem Session 1: Limits and Functions:
The problem sessions are similar in spirit to the recitations in a typical college calculus course, and allow the student to focus on applying the facts learned in the lessons.
Quizzes
Each problem session is followed by a short, multiple-choice quiz with five problems. The quiz problems are roughly at the same level as those discussed in the lessons and problem sessions, and a student who reviews this material carefully should have no difficulty in doing well on the quiz.
Students will receive instant feedback about their responses to the quiz questions, and they are encouraged to try any method (hand calculations or computer) to solve them.
Sample Exam
The final two sections of the course are devoted to a discussion of sample problems based on the AP Calculus AB exam. The problems increase in difficulty as the sample exam progresses, and some of them require a careful application of algebraic techniques. Complete solutions are provided for each exam problem, and the text for the solutions often includes the steps for hand calculation. The following is an excerpt of the video for part one of the sample calculus exam:
The sample exam serves as a final review of the course, and will also help students to gain confidence in tackling the AP exam or similar exams for calculus courses at the high-school or college level.
Course Certificate
I strongly urge students to watch all the lessons and problem sessions and attempt the quizzes in the recommended sequence, since each topic in the course builds on earlier concepts and techniques. You can request a certificate of completion, pictured here, at the end of the course. A course certificate is achieved after watching all the videos and passing all the quizzes. It represents real proficiency in the subject, and teachers and students will find this a useful resource to signify readiness for the AP Calculus AB exam:
The mastery of the fundamental concepts of calculus is a major milestone in a student’s academic career. I hope that Introduction to Calculus will help you to achieve this milestone. I have enjoyed teaching the course, and welcome any comments regarding the current content as well suggestions for the future.Êîììåíòàðèè (6)

Wolfram|Alpha senior developer Noriko Yasui explains the basic features of the Japanese version of Wolfram|Alpha. This version was released in June 2018, and its mathematics domain has been completely localized into Japanese. Yasui shows how Japanese students, teachers and professionals can ask mathematical questions and obtain the results in their native language. In addition to these basic features, she introduces a unique feature of Japanese Wolfram|Alpha: curriculum-based Japanese high-school math examples. Japanese high-school students can see how Wolfram|Alpha answers typical questions they see in their math textbooks or college entrance exams.

Having a really broad toolset and an open mind on how to approach data can lead to interesting insights that are missed when data is looked at only through the lens of statistics or machine learning. It’s something we at Wolfram Research call multiparadigm data science, which I use here for a small excursion through calculus, graph theory, signal processing, optimization and statistics to gain some interesting insights into the engineering of supersonic cars.
The story started with a conversation about data with some of the Bloodhound team, which is trying to create a 1000 mph car. I offered to spend an hour or two looking at some sample data to give them some ideas of what might be done. They sent me a curious binary file that somehow contained the output of 32 sensors recorded from a single subsonic run of the ThrustSSC car (the current holder of the world land speed record).
Import
The first thing I did was code the information that I had been given about the channel names and descriptions, in a way that I could easily query:
#10005
channels={"SYNC"->"Synchronization signal","D3fm"->"Rear left active suspension position","D5fm"->"Rear right active suspension position","VD1"->"Unknown","VD2"->"Unknown","L1r"->"Load on front left wheel","L2r"->"Load on front right wheel","L3r"->"Load on rear left wheel","L4r"->"Load on rear right wheel","D1r"->"Front left displacement","D2r"->"Front right displacement","D4r"->"Rear left displacement","D6r"->"Rear right displacement","Rack1r"->"Steering rack displacement rear left wheel","Rack2r"->"Steering rack displacement rear right wheel","PT1fm"->"Pitot tube","Dist"->"Distance to go (unreliable)","RPM1fm"->"RPM front left wheel","RPM2fm"->"RPM front right wheel","RPM3fm"->"RPM rear left wheel","RPM4fm"->"RPM rear right wheel","Mach"->"Mach number","Lng1fm"->"Longitudinal acceleration","EL1fm"->"Engine load left mount","EL2fm"->"Engine load right mount","Throt1r"->"Throttle position","TGTLr"->"Turbine gas temperature left engine","TGTRr"->"Turbine gas temperature right engine","RPMLr"->"RPM left engine spool","RPMRr"->"RPM right engine spool","NozLr"->"Nozzle position left engine","NozRr"->"Nozzle position right engine"};
#10005
SSCData[]=First/@channels;
#10005
SSCData[name_,"Description"]:=Lookup[channels,name,Missing[]];
TextGrid[{#,SSCData[#,"Description"]} /@SSCData[],Frame->All]
Then on to decoding the file. I had no guidance on format, so the first thing I did was pass it through the 200+ fully automated import filters:
#10005
DeleteCases[Map[Import["BLK1_66.dat",#] ,$ImportFormats],$Failed]
Thanks to the automation of the Import command, that only took a couple of minutes to do, and it narrowed down the candidate formats. Knowing that there were channels and repeatedly visualizing the results of each import and transformation to see if they looked like real-world data, I quickly tumbled on the following:
#10005
MapThread[Set,{SSCData/@SSCData[],N[Transpose[Partition[Import["BLK1_66.dat","Integer16"],32]]][[All,21050;;-1325]]}];
#10005
Row[ListPlot[SSCData[#],PlotLabel->#,ImageSize->170] /@SSCData[]]
The ability to automate all 32 visualizations without worrying about details like plot ranges made it easy to see when I had gotten the right import filter and combination of Partition and Transpose. It also let me pick out the interesting time interval quickly by trial and error.
OK, data in, and we can look at all the channels and immediately see that SYNC and Lng1fm contain nothing useful, so I removed them from my list:
#10005
SSCData[] = DeleteCases[SSCData[], "SYNC" | "Lng1fm"];
Graphs Networks: Looking for Families of Signals
The visualization immediately reveals some very similar-looking plots—for example, the wheel RPMs. It seemed like a good idea to group them into similar clusters to see what would be revealed. As a quick way to do that, I used an idea from social network analysis: to form graph communities based on the relationship between individual channels. I chose a simple family relationship—streams with a correlation with of at least 0.4, weighted by the correlation strength:
#10005
correlationEdge[{v1_,v2_}]:=With[{d1=SSCData[v1],d2=SSCData[v2]},
If[Correlation[d1,d2]^2Correlation[d1,d2]^2]]];
#10005
edges = Map[correlationEdge, Subsets[SSCData[], {2}]];
CommunityGraphPlot[Graph[
Property[#, {VertexShape ->
Framed[ListLinePlot[SSCData[#], Axes -> False,
Background -> White, PlotRange -> All], Background -> White],
VertexLabels -> None, VertexSize -> 2}] /@ SSCData[], edges,
VertexLabels -> Automatic], CommunityRegionStyle -> LightGreen,
ImageSize -> 530]
I ended up with three main clusters and five uncorrelated data streams. Here are the matching labels:
#10005
CommunityGraphPlot[Graph[
Property[#, {VertexShape ->
Framed[Style[#, 7], Background -> White], VertexLabels -> None,
VertexSize -> 2}] /@ SSCData[], edges,
VertexLabels -> Automatic], CommunityRegionStyle -> LightGreen,
ImageSize -> 530]
Generally it seems that the right cluster is speed related and the left cluster is throttle related, but perhaps the interesting one is the top, where jet nozzle position, engine mount load and front suspension displacement form a group. Perhaps all are thrust related.
The most closely aligned channels are the wheel RPMs. Having all wheels going at the same speed seems like a good thing at 600 mph! But RPM1fm, the front-left wheel is the least correlated. Let s look more closely at that:
#10005
TextGrid[
Map[SSCData[#, "Description"] ,
MaximalBy[Subsets[SSCData[], {2}],
Abs[Correlation[SSCData[#[[1]]], SSCData[#[[2]]]]] , 10]],
Frame -> All]
Optimization: Data Comparison
I have no units for any instruments and some have strange baselines, so I am not going to assume that they are calibrated in an equivalent way. That makes comparison harder. But here I can call on some optimization to align the data before we compare. I rescale and shift the second dataset so that the two sets are as similar as possible, as measured by the Norm of the difference. I can forget about the details of optimization, as FindMinimum takes care of that:
#10005
alignedDifference[d1_,d2_]:=With[{shifts=Quiet[FindMinimum[Norm[d1-(a d2+b),1],{a,b}]][[2]]},d1-(a #+b /.shifts)/@d2];
Let’s look at a closely aligned pair of values first:
#10005
ListLinePlot[MeanFilter[alignedDifference[SSCData["RPM3fm"],SSCData["RPM4fm"]],40],PlotRange->All,PlotLabel->"Difference in rear wheel RPMs"]
Given that the range of RPM3fm was around 0–800, you can see that there are only a few brief events where the rear wheels were not closely in sync. I gradually learned that many of the sensors seem to be prone to very short glitches, and so probably the only real spike is the briefly sustained one in the fastest part of the run. Let’s look now at the front wheels:
#10005
ListLinePlot[MeanFilter[alignedDifference[SSCData["RPM1fm"],SSCData["RPM2fm"]],40],PlotRange->All,PlotLabel->"Difference in front wheel RPMs"]
The differences are much more prolonged. It turns out that desert sand starts to behave like liquid at high velocity, and I don’t know what the safety tolerances are here, but that front-left wheel is the one to worry about.
I also took a look at the difference between the front suspension displacements, where we see a more worrying pattern:
#10005
ListLinePlot[MeanFilter[alignedDifference[SSCData["D1r"],SSCData["D2r"]],40],PlotRange->All,PlotLabel->"Difference in front suspension displacements"]
Not only is the difference a larger fraction of the data ranges, but you can also immediately see a periodic oscillation that grows with velocity. If we are hitting some kind of resonance, that might be dangerous. To look more closely at this, we need to switch paradigms again and use some signal processing tools. Here is the Spectrogram of the differences between the displacements. The Spectrogram is just the magnitude of the discrete Fourier transforms of partitions of the data. There are some subtleties about choosing the partitioning size and color scaling, but by default that is automated for me. We should read it as time along the axis, frequency along the , and darker values are greater magnitude:
#10005
Spectrogram[alignedDifference[SSCData["D1r"],SSCData["D2r"]],PlotLabel->"Difference in front suspension displacements"]
We can see the vibration as a dark line from 2000 to 8000, and that its frequency seems to rise early in the run and then fall again later. I don’t know the engineering interpretation, but I would suspect that this reduces the risk of dangerous resonance compared to constant frequency vibration.
Calculus: Velocity and Acceleration
It seems like acceleration should be interesting, but we have no direct measurement of that in the data, so I decided to infer that from the velocity. There is no definitive accurate measure of velocity at these speeds. It turned out that the Pitot measurement is quite slow to adapt and smooths out the features, so the better measure was to use one of the wheel RPM values. I take the derivative over a 100-sample interval, and some interesting features pop out:
#10005
ListLinePlot[Differences[SSCData["RPM4fm"], 1, 100],
PlotRange -> {-100, 80}, PlotLabel -> "Acceleration"]
The acceleration clearly goes up in steps and there is a huge negative step in the middle. It only makes sense when you overlay the position of the throttle:
#10005
ListLinePlot[
{MeanFilter[Differences[SSCData["RPM4fm"],1,100],5],
MeanFilter[SSCData["Throt1r"]/25,10]},
PlotLabel->"Acceleration vs Throttle"]
Now we see that the driver turns up the jets in steps, waiting to see how the car reacts before he really goes for it at around 3500. The car hits peak acceleration, but as wind resistance builds, acceleration falls gradually to near zero (where the car cruises at maximum speed for a while before the driver cuts the jets almost completely). The wind resistance then causes the massive deceleration. I suspect that there is a parachute deployment shortly after that to explain the spikiness of the deceleration, and some real brakes at 8000 bring the car to a halt.
Signal Processing
I was still pondering vibration and decided to look at the load on the suspension from a different point of view. This wavelet scalogram turned out to be quite revealing:
#10005
WaveletScalogram[ContinuousWaveletTransform[SSCData["L1r"]],PlotLabel->"Suspension frequency over time"]
You can read it the same as the Spectrogram earlier, time along , and frequency on the axis. But scalograms have a nice property of estimating discontinuities in the data. There is a major pair of features at 4500 and 5500, where higher-frequency vibrations appear and then we cross a discontinuity. Applying the scalogram requires some choices, but again, the automation has taken care of some of those choices by choosing a MexicanHatWavelet[1] out of the dozen or so wavelet choices and the choice of 12 octaves of resolution, leaving me to focus on the interpretation.
I was puzzled by the interpretation, though, and presented this plot to the engineering team, hoping that it was interesting. They knew immediately what it was. While this run of the car had been subsonic, the top edge of the wheel travels forward at twice the speed of the vehicle. These features turned out to detect when that top edge of the wheel broke the sound barrier and when it returned through the sound barrier to subsonic speeds. The smaller features around 8000 correspond to the deployment of the physical brakes as the car comes to a halt.
Deployment: Recreating the Cockpit
There is a whole sequence of events that happen in a data science project, but broadly they fall into: data acquisition, analysis, deployment. Deployment might be setting up automated report generation, creating APIs to serve enterprise systems or just creating a presentation. Having only offered a couple of hours, I only had time to format my work into a slide show notebook. But I wanted to show one other deployment, so I quickly created a dashboard to recreate a simple cockpit view:
#10005
CloudDeploy[
With[{data =
AssociationMap[
Downsample[SSCData[#], 10] , {"Throt1r", "NozLr", "RPMLr",
"RPMRr", "Dist", "D1r", "D2r", "TGTLr"}]},
Manipulate[
Grid[List /@ {
Grid[{{
VerticalGauge[data[["Throt1r", t]], {-2000, 2000},
GaugeLabels -> "Throttle position",
GaugeMarkers -> "ScaleRange"],
VerticalGauge[{data[["D1r", t]], data[["D2r", t]]}, {1000,
2000}, GaugeLabels -> "Displacements"],
ThermometerGauge[data[["TGTLr", t]] + 1600, {0, 1300},
GaugeLabels -> Placed[ "Turbine temperature", {0.5, 0}]]}},
ItemSize -> All],
Grid[{{
AngularGauge[-data[["RPMLr", t]], {0, 2000},
GaugeLabels -> "RPM L", ScaleRanges -> {1800, 2000}],
AngularGauge[-data[["RPMRr", t]], {0, 2000},
GaugeLabels -> "RPM R", ScaleRanges -> {1800, 2000}]
}}, ItemSize -> All],
ListPlot[{{-data[["Dist", t]], 2}}, PlotMarkers -> Magnify["?", 0.4], PlotRange -> {{0, 1500}, {0, 10}}, Axes -> {True, False},
AspectRatio -> 1/5, ImageSize -> 500]}],
{{t, 1, "time"}, 1, Length[data[[1]]], 1}]],
"SSCDashboard", Permissions -> "Public"]
In this little meander through the data, I have made use of graph theory, calculus, signal processing and wavelet analysis, as well as some classical statistics. You don’t need to know too much about the details, as long as you know the scope of tools available and the concepts that are being applied. Automation takes care of many of the details and helps to deploy the data in an accessible way. That’s multiparadigm data science in a nutshell.
Download this post as a Wolfram Notebook.Êîììåíòàðèè (2)

In my previous post, I demonstrated the first step of a multiparadigm data science workflow: extracting data. Now it s time to take a closer look at how the Wolfram Language can help make sense of that data by cleaning it, sorting it and structuring it for your workflow. I ll discuss key Wolfram Language functions for making imported data easier to browse, query and compute with, as well as share some strategies for automating the process of importing and structuring data. Throughout this post, I ll refer to the US Election Atlas website, which contains tables of US presidential election results for given years:
Keys and Values: Making an Association
As always, the first step is to get data from the webpage. All tables are extracted from the page using Import (with the "Data" element):
#10005
data=Import["https://uselectionatlas.org/RESULTS/data.php?per=1 vot=1 pop=1 reg=1 datatype=national year=2016","Data"];
Next is to locate the list of column headings. FirstPosition indicates the location of the first column label, and Most takes the last element off to represent the location of the list containing that entry (i.e. going up one level in the list):
#10005
Most@FirstPosition[data,"Map"]
Previously, we typed these indices in manually; however, using a programmatic approach can make your code more general and reusable. Sequence converts a list into a flat expression that can be used as a Part specification:
#10005
keysIndex=Sequence@@Most@FirstPosition[data,"Map"];
#10005
data[[keysIndex]]
Examining the entries in the first row of data, it looks like the first two columns (Map and Pie, both containing images) were excluded during import:
#10005
data[[Sequence@@Most@FirstPosition[data,"Alabama"]]]
This means that the first two column headings should also be omitted when structuring this data; we want the third element and everything thereafter (represented by the ;; operator) from the sublist given by keysIndex:
#10005
keyList=data[[keysIndex,3;;]]
You can use the same process to extract the rows of data (represented as a list of lists). The first occurrence of “Alabama” is an element of the inner sublist, so going up two levels (i.e. excluding the last two elements) will give the full list of entries:
#10005
valuesIndex=Sequence@@FirstPosition[data,"Alabama"][[;;-3]];
#10005
valueRows=data[[valuesIndex]]
For handling large datasets, the Wolfram Language offers Association (represented by ), a key-value construct similar to a hash table or a dictionary with substantially faster lookups than List:
#10005
valueRows[[1,1]]|>
You can reference elements of an Association by key (usually a String) rather than numerical index, as well as use a single?bracket syntax for Part, making data exploration easier and more readable:
#10005
%["State"]
Given a list of keys and a list of values, you can use AssociationThread to create an Association:
#10005
entry=AssociationThread[keyList,First@valueRows]
Note that this entry is shorter than the original list of keys:
#10005
Length/@{keyList,entry}
When AssociationThread encounters a duplicate key, it assigns only the value that occurs the latest in the list. Here (as is often the case), the dropped information is extraneous—the entry keeps absolute vote counts and omits vote percentages.
Part one of this series showed the basic use of Interpreter for parsing data types. When used with the | (Alternatives) operator, Interpreter attempts to parse items using each argument in the order given, returning the first successful test. This makes it easy to interpret multiple data types at once. For faster parsing, it’s usually best to list basic data types like Integer before higher-level Entity types such as "USState":
#10005
Interpreter[Integer|"USState"]/@entry
Most computations apply directly to the values in an Association and return standard output. Suppose you wanted the proportion of registered voters who actually cast ballots:
#10005
%["Total Vote"]/%["Total REG"]//N
You can use Map to generate a full list of entries from the rows of values:
#10005
electionlist=Map[Interpreter[Integer|"USState"]/@AssociationThread[keyList,#] ,valueRows]
Viewing and Analyzing with Dataset
Now the data is in a consistent structure for computation—but it isn’t exactly easy on the eyes. For improved viewing, you can convert this list directly to a Dataset:
#10005
dataset=Dataset[electionlist]
Dataset is a database-like structure with many of the same advantages as Association, plus the added benefits of interactive viewing and flexible querying operations. Like Association, Dataset allows referencing of elements by key, making it easy to pick out only the columns pertinent to your analysis:
#10005
mydata = dataset[
All, {"State", "Trump", "Clinton", "Johnson", "Other"}]
From here, there are a number of ways to rearrange, aggregate and transform data. Functions like Total and Mean automatically thread across columns:
#10005
Total@mydata[All,2;;]
You can use functions like Select and Map in a query-like fashion, effectively allowing the Part syntax to work with pure functions. Here are the rows with more than 100,000 "Other" votes:
#10005
mydata[Select[#["Other"]>100000 ]]
Dataset also provides other specialized forms for working with specific columns and rows—such as finding the Mean number of "Other" votes per state in the election:
#10005
mydata[Mean,"Other"]//N
Normal retrieves the data in its lower-level format to prepare it for computation. This associates each state entity with the corresponding vote margin:
#10005
margins=Normal@mydata[All,#["State"]->(#["Trump"]-#["Clinton"]) ]
You can pass this result directly into GeoRegionValuePlot for easy visualization:
#10005
GeoRegionValuePlot[margins,ColorFunction->(Which[#0.5,RGBColor[#,0,0]] )]
This also makes it easy to view the vote breakdown in a given state:
#10005
Multicolumn[PieChart[#,ChartLabels->Keys[#],PlotLabel->#["State"]] /@RandomChoice[Normal@mydata,6]]
Generalizing and Optimizing Your Code
It’s rare that you’ll get all the data you need from a single webpage, so it’s worth using a bit of computational thinking to write code that works across multiple pages. Ideally, you should be able to apply what you’ve already written with little alteration.
Suppose you wanted to pull election data from different years from the US Election Atlas website, creating a Dataset similar to the one already shown. A quick examination of the URL shows that the page uses a query parameter to determine what year’s election results are displayed (note the year at the end):
You can use this parameter, along with the scraping procedure outlined previously, to create a function that will retrieve election data for any presidential election year. Module localizes variable names to avoid conflicts; in this implementation, candidatesIndex explicitly selects the last few columns in the table (absolute vote counts per candidate). Entity and similar high-level expressions can take a long time to process (and aren’t always needed), so it’s convenient to add the Optional parameter stateparser to interpret states differently (e.g. using String):
#10005
ElectionAtlasData[year_,stateparser_:"USState"]:=Module[{data=Import["https://uselectionatlas.org/RESULTS/data.php?datatype=national def=1 year="ToString[year],"Data"],
keyList,valueRows,candidatesIndex},
keyList=data[[Sequence@@Append[Most@#,Last@#;;]]] @FirstPosition[data,"State"];
valueRows=data[[Sequence@@FirstPosition[data,"Alabama"|"California"][[;;-3]]]];
candidatesIndex=Join[{1},Range[First@FirstPosition[keyList,"Other"]-Length[keyList],-1]];
Map[
Interpreter[Integer|stateparser],Dataset[AssociationThread[keyList[[candidatesIndex]],#] /@valueRows[[All,candidatesIndex]]],{2}]
]
A few quick computations show that this function is quite robust for its purpose; it successfully imports election data for every year the atlas has on record (dating back to 1824). Here’s a plot of how many votes the most popular candidate got nationally each year:
#10005
ListPlot[Max@Total@ElectionAtlasData[#,String][All,2;;] /@Range[1824,2016,4]]
Using Table with Multicolumn works well for displaying and comparing stats across different datasets. With localizes names like Module, but it doesn’t allow alteration of definitions (i.e. it creates constants instead of variables). Here are the vote tallies for Iowa over a twenty-year period:
#10005
Multicolumn[
Table[
With[{data=Normal@ElectionAtlasData[year,String][SelectFirst[#["State"]=="Iowa" ]]},
PieChart[data,ChartLabels->Keys[data],PlotLabel->year]],
{year,1992,2012,4}],
3,Appearance->"Horizontal"]
Here is the breakdown of the national popular vote over the same period:
#10005
Multicolumn[
Table[With[{data=ElectionAtlasData[year]},
GeoRegionValuePlot[Normal[data[All,#["State"]->(#[[3]]-#[[2]]) ]],
ColorFunction->(Which[#0.5,RGBColor[#,0,0]] ),
PlotLegends->(SwatchLegend[{Blue,Red},Normal@Keys@data[[1,{2,3}]]]),
PlotLabel->Style[year,"Text"]]],
{year,1992,2012,4}],
2,Appearance->"Horizontal"]
Sharing and Publishing
Now that you have seen some of the Wolfram Language’s automated data structuring capabilities, you can start putting together real, in-depth data explorations. The functions and strategies described here are scalable to any size and will work for data of any type—including people, locations, dates and other real-world concepts supported by the Entity framework.
In the upcoming third and final installment of this series, I’ll talk about ways to deploy and publish the data you’ve collected—as well as any analysis you’ve done—making it accessible to friends, colleagues or the general public.
For more detail on the functions you read about here, see the Extract Columns in a Dataset and Select Elements in a Dataset workflows.
Download this post as a Wolfram Notebook.Êîììåíòàðèè (0)

Teachers, professors, parents-as-teachers—to ease the transition into the fall semester, we’ve compiled some of our favorite Wolfram resources for educators! We appreciate everything you do, and we hope you find this cornucopia of computation useful.

It’s no secret that we’re fans of technology in the classroom, and that extends past STEM fields. Computational thinking is relevant across the whole curriculum—English, history, music, art, social sciences and even sports—with powerful ways to explore the topics at hand through accessible technology. Tech-Based Teaching walks you through computational lesson planning and enthusiastic coding events. You’ll also find information about teaching online STEM courses, as well as other examples of timely curated content.

From simply exploring general concepts to researching specifics, from step-by-step solutions for math problems to creating homework worksheets, Wolfram|Alpha is the perfect entry point for an educator using technology in the classroom. Keep your students engaged with the award-winning computational knowledge engine and mass amounts of curated information, and make sure to check out Wolfram|Alpha Pro for a new level of computational excellence (and see our current promotions)!

Ask for a random problem, get a random problem! With Wolfram Problem Generator, you or your students can choose a subject and receive unlimited random practice problems. This is useful for test prep or working on areas your students haven’t mastered yet.

You might still be wondering how computation could apply to fields like fine arts, social sciences or sports. These fields are where the Wolfram Demonstrations Project can help. An open-code resource to illustrate concepts in otherwise technologically neglected fields, the Wolfram Demonstrations Project offers interactive illustrations as a resource for visually exploring ideas through its universal electronic publishing platform. You don’t even have to have Mathematica to use Demonstrations—no plugins required.

Your students might be the kind of people who like fun ways of practicing their computational skills (but let’s face it, who doesn’t?), which is where Wolfram Challenges come in. Wolfram Challenges are a continually expanding collection of coding games and exercises designed to give users with almost any level of experience using the Wolfram Language a rigorous computational workout.

Stephen Wolfram’s An Elementary Introduction to the Wolfram Languageteaches those with no programming experience how to work with the Wolfram Language. It’s available in print and for free online, with interactive exercises to check your answers immediately using the Wolfram Cloud. Or sign up for the free, fully interactive online course at Wolfram U, which combines all the book’s content and exercises with easy-to-follow video tutorials.

If you’re looking for open courses to expand your own knowledge or you’d like to recommend courses to your students in high school, college and beyond, Wolfram U should be the first place you check. Wolfram U hosts streamed webinar series, special events (both upcoming and archived) and video courses—all taught by experts in multiple fields.

Join Wolfram Research’s back-to-school special event on September 12, 2018, to learn how to enhance your academic content with instantly computable real-world data using Wolfram|Alpha. Sign up now and get access to recordings from earlier sessions in this webinar series covering interactive notebooks, computational essays, and collaborating and sharing in the cloud. Visit Wolfram U to learn about other upcoming events, webinars and courses.

As the technology manager for Assured Flow Solutions, Andrew Yule has long relied on the Wolfram Language as his go-to tool for petroleum production analytics, from quick computations to large-scale modeling and analysis. “I haven’t come across something yet that the Wolfram Language hasn’t been able to help me do,” he says. So when Yule set out to consolidate all of his team’s algorithms and data into one system, the Wolfram Language seemed like the obvious choice.

In this video, Yule describes how the power and flexibility of the Wolfram Language were essential in creating Alex, a centralized hub for accessing and maintaining his team’s computational knowledge:

Collecting Intellectual Property

Consultants at Assured Flow Solutions use a variety of computations for analyzing oil and gas production issues involving both pipeline simulations and real-world lab testing. Yule’s first challenge was to put all these methods and techniques into a consistent framework—essentially trying to answer the question “How do you collect and manage all this intellectual property?”

Prior to Alex, consultants had been pulling from dozens of Excel spreadsheets scattered across network drives, often with multiple versions, which made it difficult to find the right tool for a particular task. Yule started by systematically replacing these with faster, more robust Wolfram Language computations. He then consulted with subject experts in different areas, capturing their knowledge as symbolic code to make it usable by other employees.

Yule deployed the toolkit as a cloud-accessible package secured using the Wolfram Language’s built-in encoding functionality. Named after the ancient Library of Alexandria, Alex quickly became the canonical source for the company’s algorithms and data.

Connecting the Interface

Utilizing the flexible interface features of the Wolfram Language, Yule then built a front end for Alex. On the left is a pane that uses high-level pattern matching to search and navigate the available tools. Selected modules are loaded in the main window, including interactive controls for precise adjustment of algorithms and parameters:

Yule included additional utilities for copying and exporting data, loading and saving settings, and reporting bugs, taking advantage of the Wolfram Language’s file- and email-handling abilities. The interface itself is deployed as a standalone Wolfram Notebook using the EnterpriseCDF standard, which provides access to all the company’s intellectual property without requiring a local Wolfram Language installation.

Flexible Workflows, Consistent Results

This centralization of tools has completely changed the way Assured Flow Solutions views data analytics and visualizations. In addition to providing quick, easy access to the company’s codebase, Alexhas greatly improved the speed, accuracy and consistency of results. And using the Wolfram Language’s symbolic framework adds the flexibility to work with any kind of input. “It doesn’t matter if you’re loading in raw data, images, anything—it all has the same feel to it. Everything’s an expression in the Wolfram Language,” says Yule.

With the broad deployment options of the Wolfram Cloud, consultants can easily share notebooks and results for internal collaboration. They have also begun deploying instant APIs, allowing client applications to utilize Wolfram Language computations without exposing source code.

Overall, Yule prefers the Wolfram Language to other systems because of its versatility—or, as he puts it, “the ability to write one line of code that will accomplish ten things at once.” Its unmatched collection of built-in algorithms and connections makes it “a really powerful alternative to things like Excel.” Combining this with the secure hosting and deployment of the Wolfram Cloud, Wolfram technology provides the ideal environment for an enterprise-wide computation hub like Alex.

Find out more about Andrew Yule and other exciting Wolfram Language applications on our Customer Stories pages.