Tag Archives: analysis

Big Data Knows What You Do and When

Data scientists are getting to know more about you and your fellow urban dwellers as you move around your neighborhood and your city. As smartphones and cell towers become more ubiquitous and  data collection and analysis gathers pace researchers (and advertisers) will come to know your daily habits and schedule rather intimately. So, questions from a significant other along the lines of, “and, where were you at 11:15 last night?” may soon be consigned to history.

From Technology Review:

Mobile phones have generated enormous insight into the human condition thanks largely to the study of the data they produce. Mobile phone companies record the time of each call, the caller and receiver ids, as well as the locations of the cell towers involved, among other things.

The combined data from millions of people produces some fascinating new insights in the nature of our society.

Anthropologists have crunched it to reveal human reproductive strategiesa universal law of commuting and even the distribution of wealth in Africa.

Today, computer scientists have gone one step further by using mobile phone data to map the structure of cities and how people use them throughout the day. “These results point towards the possibility of a new, quantitative classification of cities using high resolution spatio-temporal data,” say Thomas Louail at the Institut de Physique Théorique in Paris and a few pals.

They say their work is part of a new science of cities that aims to objectively measure and understand the nature of large population centers.

These guys begin with a database of mobile phone calls made by people in the 31 Spanish cities that have populations larger than 200,000. The data consists of the number of unique individuals using a given cell tower (whether making a call or not) for each hour of the day over almost two months.

Given the area that each tower covers, Louail and co work out the density of individuals in each location and how it varies throughout the day. And using this pattern, they search for “hotspots” in the cities where the density of individuals passes some specially chosen threshold at certain times of the day.

The results reveal some fascinating patterns in city structure. For a start, every city undergoes a kind of respiration in which people converge into the center and then withdraw on a daily basis, almost like breathing. And this happens in all cities. This “suggests the existence of a single ‘urban rhythm’ common to all cities,” says Louail and co.

During the week, the number of phone users peaks at about midday and then again at about 6 p.m. During the weekend the numbers peak a little later: at 1 p.m. and 8 p.m. Interestingly, the second peak starts about an hour later in western cities, such as Sevilla and Cordoba.

The data also reveals that small cities tend to have a single center that becomes busy during the day, such as the cities of Salamanca and Vitoria.

But it also shows that the number of hotspots increases with city size; so-called polycentric cities include Spain’s largest, such as Madrid, Barcelona, and Bilboa.

That could turn out to be useful for automatically classifying cities.

Read the entire article here.

Apocalypse Now or Later?

Armageddon-poster06Americans love their apocalypses. So, should demise come at the hands of a natural catastrophe, hastened by human (in)action, or should it come courtesy of an engineered biological or nuclear disaster? You chose. Isn’t this so much fun, thinking about absolute extinction?

Ira Chernus, Professor of Religious Studies at the University of Colorado at Boulder, brings us a much-needed scholarly account of our love affairs with all things apocalyptic. But our fascination for  Armageddon — often driven by hope — does nothing to resolve the ultimate conundrum: regardless of the type of ending, it is unlikely that Bruce Willis will be featuring.

From TomDispatch / Salon:

Wherever we Americans look, the threat of apocalypse stares back at us.

Two clouds of genuine doom still darken our world: nuclear extermination and environmental extinction. If they got the urgent action they deserve, they would be at the top of our political priority list.

But they have a hard time holding our attention, crowded out as they are by a host of new perils also labeled “apocalyptic”: mounting federal debt, the government’s plan to take away our gunscorporate control of the Internet, the Comcast-Time Warner mergerocalypse, Beijing’s pollution airpocalypse, the American snowpocalypse, not to speak of earthquakes and plagues. The list of topics, thrown at us with abandon from the political right, left, and center, just keeps growing.

Then there’s the world of arts and entertainment where selling the apocalypse turns out to be a rewarding enterprise. Check out the website “Romantically Apocalyptic,” Slash’s album “Apocalyptic Love,” or the history-lite documentary “Viking Apocalypse” for starters. These days, mathematicians even have an “apocalyptic number.”

Yes, the A-word is now everywhere, and most of the time it no longer means “the end of everything,” but “the end of anything.” Living a life so saturated with apocalypses undoubtedly takes a toll, though it’s a subject we seldom talk about.

So let’s lift the lid off the A-word, take a peek inside, and examine how it affects our everyday lives. Since it’s not exactly a pretty sight, it’s easy enough to forget that the idea of the apocalypse has been a container for hope as well as fear. Maybe even now we’ll find some hope inside if we look hard enough.

A Brief History of Apocalypse

Apocalyptic stories have been around at least since biblical times, if not earlier. They show up in many religions, always with the same basic plot: the end is at hand; the cosmic struggle between good and evil (or God and the Devil, as the New Testament has it) is about to culminate in catastrophic chaos, mass extermination, and the end of the world as we know it.

That, however, is only Act I, wherein we wipe out the past and leave a blank cosmic slate in preparation for Act II: a new, infinitely better, perhaps even perfect world that will arise from the ashes of our present one. It’s often forgotten that religious apocalypses, for all their scenes of destruction, are ultimately stories of hope; and indeed, they have brought it to millions who had to believe in a better world a-comin’, because they could see nothing hopeful in this world of pain and sorrow.

That traditional religious kind of apocalypse has also been part and parcel of American political life since, in Common Sense, Tom Paine urged the colonies to revolt by promising, “We have it in our power to begin the world over again.”

When World War II — itself now sometimes called an apocalypse – ushered in the nuclear age, it brought a radical transformation to the idea. Just as novelist Kurt Vonnegut lamented that the threat of nuclear war had robbed us of “plain old death” (each of us dying individually, mourned by those who survived us), the theologically educated lamented the fate of religion’s plain old apocalypse.

After this country’s “victory weapon” obliterated two Japanese cities in August 1945, most Americans sighed with relief that World War II was finally over. Few, however, believed that a permanently better world would arise from the radioactive ashes of that war. In the 1950s, even as the good times rolled economically, America’s nuclear fear created something historically new and ominous — a thoroughly secular image of the apocalypse.  That’s the one you’ll get first if you type “define apocalypse” into Google’s search engine: “the complete final destruction of the world.” In other words, one big “whoosh” and then… nothing. Total annihilation. The End.

Apocalypse as utter extinction was a new idea. Surprisingly soon, though, most Americans were (to adapt the famous phrase of filmmaker Stanley Kubrick) learning how to stop worrying and get used to the threat of “the big whoosh.” With the end of the Cold War, concern over a world-ending global nuclear exchange essentially evaporated, even if the nuclear arsenals of that era were left ominously in place.

Meanwhile, another kind of apocalypse was gradually arising: environmental destruction so complete that it, too, would spell the end of all life.

This would prove to be brand new in a different way. It is, as Todd Gitlin has so aptly termed it, history’s first “slow-motion apocalypse.” Climate change, as it came to be called, had been creeping up on us “in fits and starts,” largely unnoticed, for two centuries. Since it was so different from what Gitlin calls “suddenly surging Genesis-style flood” or the familiar “attack out of the blue,” it presented a baffling challenge. After all, the word apocalypse had been around for a couple of thousand years or more without ever being associated in any meaningful way with the word gradual.
The eminent historian of religions Mircea Eliade once speculated that people could grasp nuclear apocalypse because it resembled Act I in humanity’s huge stock of apocalypse myths, where the end comes in a blinding instant — even if Act II wasn’t going to follow. This mythic heritage, he suggested, remains lodged in everyone’s unconscious, and so feels familiar.

But in a half-century of studying the world’s myths, past and present, he had never found a single one that depicted the end of the world coming slowly. This means we have no unconscious imaginings to pair it with, nor any cultural tropes or traditions that would help us in our struggle to grasp it.

That makes it so much harder for most of us even to imagine an environmentally caused end to life. The very category of “apocalypse” doesn’t seem to apply. Without those apocalyptic images and fears to motivate us, a sense of the urgent action needed to avert such a slowly emerging global catastrophe lessens.

All of that (plus of course the power of the interests arrayed against regulating the fossil fuel industry) might be reason enough to explain the widespread passivity that puts the environmental peril so far down on the American political agenda. But as Dr. Seuss would have said, that is not all! Oh no, that is not all.

Apocalypses Everywhere

When you do that Google search on apocalypse, you’ll also get the most fashionable current meaning of the word: “Any event involving destruction on an awesome scale; [for example] ‘a stock market apocalypse.’” Welcome to the age of apocalypses everywhere.

With so many constantly crying apocalyptic wolf or selling apocalyptic thrills, it’s much harder now to distinguish between genuine threats of extinction and the cheap imitations. The urgency, indeed the very meaning, of apocalypse continues to be watered down in such a way that the word stands in danger of becoming virtually meaningless. As a result, we find ourselves living in an era that constantly reflects premonitions of doom, yet teaches us to look away from the genuine threats of world-ending catastrophe.

Oh, America still worries about the Bomb — but only when it’s in the hands of some “bad” nation. Once that meant Iraq (even if that country, under Saddam Hussein, never had a bomb and in 2003, when the Bush administration invaded, didn’t even have a bomb program). Now, it means Iran — another country without a bomb or any known plan to build one, but with the apocalyptic stare focused on it as if it already had an arsenal of such weapons — and North Korea.

These days, in fact, it’s easy enough to pin the label “apocalyptic peril” on just about any country one loathes, even while ignoring friendsallies, and oneself. We’re used to new apocalyptic threats emerging at a moment’s notice, with little (or no) scrutiny of whether the A-word really applies.

What’s more, the Cold War era fixed a simple equation in American public discourse: bad nation + nuclear weapon = our total destruction. So it’s easy to buy the platitude that Iran must never get a nuclear weapon or it’s curtains. That leaves little pressure on top policymakers and pundits to explain exactly how a few nuclear weapons held by Iran could actually harm Americans.

Meanwhile, there’s little attention paid to the world’s largest nuclear arsenal, right here in the U.S. Indeed, America’s nukes are quite literally impossible to see, hidden as they are underground, under the seas, and under the wraps of “top secret” restrictions. Who’s going to worry about what can’t be seen when so many dangers termed “apocalyptic” seem to be in plain sight?

Environmental perils are among them: melting glaciers and open-water Arctic seas, smog-blinded Chinese cities, increasingly powerful storms, and prolonged droughts. Yet most of the time such perils seem far away and like someone else’s troubles. Even when dangers in nature come close, they generally don’t fit the images in our apocalyptic imagination. Not surprisingly, then, voices proclaiming the inconvenient truth of a slowly emerging apocalypse get lost in the cacophony of apocalypses everywhere. Just one more set of boys crying wolf and so remarkably easy to deny or stir up doubt about.

Death in Life

Why does American culture use the A-word so promiscuously? Perhaps we’ve been living so long under a cloud of doom that every danger now readily takes on the same lethal hue.

Psychiatrist Robert Lifton predicted such a state years ago when he suggested that the nuclear age had put us all in the grips of what he called “psychic numbing” or “death in life.” We can no longer assume that we’ll die Vonnegut’s plain old death and be remembered as part of an endless chain of life. Lifton’s research showed that the link between death and life had become, as he put it, a “broken connection.”

As a result, he speculated, our minds stop trying to find the vitalizing images necessary for any healthy life. Every effort to form new mental images only conjures up more fear that the chain of life itself is coming to a dead end. Ultimately, we are left with nothing but “apathy, withdrawal, depression, despair.”

If that’s the deepest psychic lens through which we see the world, however unconsciously, it’s easy to understand why anything and everything can look like more evidence that The End is at hand. No wonder we have a generation of American youth and young adults who take a world filled with apocalyptic images for granted.

Think of it as, in some grim way, a testament to human resiliency. They are learning how to live with the only reality they’ve ever known (and with all the irony we’re capable of, others are learning how to sell them cultural products based on that reality). Naturally, they assume it’s the only reality possible. It’s no surprise that “The Walking Dead,” a zombie apocalypse series, is theirfavorite TV show, since it reveals (and revels in?) what one TV critic called the “secret life of the post-apocalyptic American teenager.”

Perhaps the only thing that should genuinely surprise us is how many of those young people still manage to break through psychic numbing in search of some way to make a difference in the world.

Yet even in the political process for change, apocalypses are everywhere. Regardless of the issue, the message is typically some version of “Stop this catastrophe now or we’re doomed!” (An example: Stop the Keystone XL pipeline or it’s “game over”!) A better future is often implied between the lines, but seldom gets much attention because it’s ever harder to imagine such a future, no less believe in it.

No matter how righteous the cause, however, such a single-minded focus on danger and doom subtly reinforces the message of our era of apocalypses everywhere: abandon all hope, ye who live here and now.

Read the entire article here.

Image: Armageddon movie poster. Courtesy of Touchstone Pictures.

Online Social Networks as Infectious Diseases

Yersinia_pestis

A new research study applies the concepts of infectious diseases to online social networks. By applying epidemiological modelling to examine the dynamics of networks, such as MySpace and Facebook, researchers are able to analyze the explosive growth — the term “viral” is not coincidental — and ultimate demise of such networks. So, is Facebook destined to suffer a fate similar to Myspace, Bebo, polio and the bubonic plague? These researchers from Princeton think so, estimating Facebook will lose 80 percent of its 1.2 billion users by 2017.

From the Guardian:

Facebook has spread like an infectious disease but we are slowly becoming immune to its attractions, and the platform will be largely abandoned by 2017, say researchers at Princeton University (pdf).

The forecast of Facebook’s impending doom was made by comparing the growth curve of epidemics to those of online social networks. Scientists argue that, like bubonic plague, Facebook will eventually die out.

The social network, which celebrates its 10th birthday on 4 February, has survived longer than rivals such as Myspace and Bebo, but the Princeton forecast says it will lose 80% of its peak user base within the next three years.

John Cannarella and Joshua Spechler, from the US university’s mechanical and aerospace engineering department, have based their prediction on the number of times Facebook is typed into Google as a search term. The charts produced by the Google Trends service show Facebook searches peaked in December 2012 and have since begun to trail off.

“Ideas, like diseases, have been shown to spread infectiously between people before eventually dying out, and have been successfully described with epidemiological models,” the authors claim in a paper entitled Epidemiological modelling of online social network dynamics.

“Ideas are spread through communicative contact between different people who share ideas with each other. Idea manifesters ultimately lose interest with the idea and no longer manifest the idea, which can be thought of as the gain of ‘immunity’ to the idea.”

Facebook reported nearly 1.2 billion monthly active users in October, and is due to update investors on its traffic numbers at the end of the month. While desktop traffic to its websites has indeed been falling, this is at least in part due to the fact that many people now only access the network via their mobile phones.

For their study, Cannarella and Spechler used what is known as the SIR (susceptible, infected, recovered) model of disease, which creates equations to map the spread and recovery of epidemics.

They tested various equations against the lifespan of Myspace, before applying them to Facebook. Myspace was founded in 2003 and reached its peak in 2007 with 300 million registered users, before falling out of use by 2011. Purchased by Rupert Murdoch’s News Corp for $580m, Myspace signed a $900m deal with Google in 2006 to sell its advertising space and was at one point valued at $12bn. It was eventually sold by News Corp for just $35m.

The 870 million people using Facebook via their smartphones each month could explain the drop in Google searches – those looking to log on are no longer doing so by typing the word Facebook into Google.

But Facebook’s chief financial officer David Ebersman admitted on an earnings call with analysts that during the previous three months: “We did see a decrease in daily users, specifically among younger teens.”

Investors do not appear to be heading for the exit just yet. Facebook’s share price reached record highs this month, valuing founder Mark Zuckerberg’s company at $142bn.

Read the entire article here.

Image: Scanning electron microscope image of Yersinia pestis, the bacterium responsible for bubonic plague. Courtesy of Wikipedia.

 

Meta-Research: Discoveries From Research on Discoveries

Discoveries through scientific research don’t just happen in the lab. Many of course do. Some discoveries now come through data analysis of research papers. Here, sophisticated data mining tools and semantic software sift through hundreds of thousands of research papers looking for patterns and links that would otherwise escape the eye of human researchers.

From Technology Review:

Software that read tens of thousands of research papers and then predicted new discoveries about the workings of a protein that’s key to cancer could herald a faster approach to developing new drugs.

The software, developed in a collaboration between IBM and Baylor College of Medicine, was set loose on more than 60,000 research papers that focused on p53, a protein involved in cell growth, which is implicated in most cancers. By parsing sentences in the documents, the software could build an understanding of what is known about enzymes called kinases that act on p53 and regulate its behavior; these enzymes are common targets for cancer treatments. It then generated a list of other proteins mentioned in the literature that were probably undiscovered kinases, based on what it knew about those already identified. Most of its predictions tested so far have turned out to be correct.

“We have tested 10,” Olivier Lichtarge of Baylor said Tuesday. “Seven seem to be true kinases.” He presented preliminary results of his collaboration with IBM at a meeting on the topic of Cognitive Computing held at IBM’s Almaden research lab.

Lichtarge also described an earlier test of the software in which it was given access to research literature published prior to 2003 to see if it could predict p53 kinases that have been discovered since. The software found seven of the nine kinases discovered after 2003.

“P53 biology is central to all kinds of disease,” says Lichtarge, and so it seemed to be the perfect way to show that software-generated discoveries might speed up research that leads to new treatments. He believes the results so far show that to be true, although the kinase-hunting experiments are yet to be reviewed and published in a scientific journal, and more lab tests are still planned to confirm the findings so far. “Kinases are typically discovered at a rate of one per year,” says Lichtarge. “The rate of discovery can be vastly accelerated.”

Lichtarge said that although the software was configured to look only for kinases, it also seems capable of identifying previously unidentified phosphatases, which are enzymes that reverse the action of kinases. It can also identify other types of protein that may interact with p53.

The Baylor collaboration is intended to test a way of extending a set of tools that IBM researchers already offer to pharmaceutical companies. Under the banner of accelerated discovery, text-analyzing tools are used to mine publications, patents, and molecular databases. For example, a company in search of a new malaria drug might use IBM’s tools to find molecules with characteristics that are similar to existing treatments. Because software can search more widely, it might turn up molecules in overlooked publications or patents that no human would otherwise find.

“We started working with Baylor to adapt those capabilities, and extend it to show this process can be leveraged to discover new things about p53 biology,” says Ying Chen, a researcher at IBM Research Almaden.

It typically takes between $500 million and $1 billion dollars to develop a new drug, and 90 percent of candidates that begin the journey don’t make it to market, says Chen. The cost of failed drugs is cited as one reason that some drugs command such high prices (see “A Tale of Two Drugs”).

Software that read tens of thousands of research papers and then predicted new discoveries about the workings of a protein that’s key to cancer could herald a faster approach to developing new drugs.

The software, developed in a collaboration between IBM and Baylor College of Medicine, was set loose on more than 60,000 research papers that focused on p53, a protein involved in cell growth, which is implicated in most cancers. By parsing sentences in the documents, the software could build an understanding of what is known about enzymes called kinases that act on p53 and regulate its behavior; these enzymes are common targets for cancer treatments. It then generated a list of other proteins mentioned in the literature that were probably undiscovered kinases, based on what it knew about those already identified. Most of its predictions tested so far have turned out to be correct.

“We have tested 10,” Olivier Lichtarge of Baylor said Tuesday. “Seven seem to be true kinases.” He presented preliminary results of his collaboration with IBM at a meeting on the topic of Cognitive Computing held at IBM’s Almaden research lab.

Lichtarge also described an earlier test of the software in which it was given access to research literature published prior to 2003 to see if it could predict p53 kinases that have been discovered since. The software found seven of the nine kinases discovered after 2003.

“P53 biology is central to all kinds of disease,” says Lichtarge, and so it seemed to be the perfect way to show that software-generated discoveries might speed up research that leads to new treatments. He believes the results so far show that to be true, although the kinase-hunting experiments are yet to be reviewed and published in a scientific journal, and more lab tests are still planned to confirm the findings so far. “Kinases are typically discovered at a rate of one per year,” says Lichtarge. “The rate of discovery can be vastly accelerated.”

Lichtarge said that although the software was configured to look only for kinases, it also seems capable of identifying previously unidentified phosphatases, which are enzymes that reverse the action of kinases. It can also identify other types of protein that may interact with p53.

The Baylor collaboration is intended to test a way of extending a set of tools that IBM researchers already offer to pharmaceutical companies. Under the banner of accelerated discovery, text-analyzing tools are used to mine publications, patents, and molecular databases. For example, a company in search of a new malaria drug might use IBM’s tools to find molecules with characteristics that are similar to existing treatments. Because software can search more widely, it might turn up molecules in overlooked publications or patents that no human would otherwise find.

“We started working with Baylor to adapt those capabilities, and extend it to show this process can be leveraged to discover new things about p53 biology,” says Ying Chen, a researcher at IBM Research Almaden.

It typically takes between $500 million and $1 billion dollars to develop a new drug, and 90 percent of candidates that begin the journey don’t make it to market, says Chen. The cost of failed drugs is cited as one reason that some drugs command such high prices (see “A Tale of Two Drugs”).

Lawrence Hunter, director of the Center for Computational Pharmacology at the University of Colorado Denver, says that careful empirical confirmation is needed for claims that the software has made new discoveries. But he says that progress in this area is important, and that such tools are desperately needed.

The volume of research literature both old and new is now so large that even specialists can’t hope to read everything that might help them, says Hunter. Last year over one million new articles were added to the U.S. National Library of Medicine’s Medline database of biomedical research papers, which now contains 23 million items. Software can crunch through massive amounts of information and find vital clues in unexpected places. “Crucial bits of information are sometimes isolated facts that are only a minor point in an article but would be really important if you can find it,” he says.

Read the entire article here.

Big Data and Even Bigger Problems

First a definition. Big data: typically a collection of large and complex datasets that are too cumbersome to process and analyze using traditional computational approaches and database applications. Usually the big data moniker will be accompanied by an IT vendor’s pitch for shiny new software (and possible hardware) solution able to crunch through petabytes (one petabyte is a million gigabytes) of data and produce a visualizable result that mere mortals can decipher.

Many companies see big data and related solutions as a panacea to a range of business challenges: customer service, medical diagnostics, product development, shipping and logistics, climate change studies, genomic analysis and so on. A great example was the last U.S. election. Many political wonks — from both sides of the aisle — agreed that President Obama was significantly aided in his won re-election with the help of big data. So, with that in mind, many are now looking at more important big data problems.

From Technology Review:

As chief scientist for President Obama’s reëlection effort, Rayid Ghani helped revolutionize the use of data in politics. During the final 18 months of the campaign, he joined a sprawling team of data and software experts who sifted, collated, and combined dozens of pieces of information on each registered U.S. voter to discover patterns that let them target fund-raising appeals and ads.

Now, with Obama again ensconced in the Oval Office, some veterans of the campaign’s data squad are applying lessons from the campaign to tackle social issues such as education and environmental stewardship. Edgeflip, a startup Ghani founded in January with two other campaign members, plans to turn the ad hoc data analysis tools developed for Obama for America into software that can make nonprofits more effective at raising money and recruiting volunteers.

Ghani isn’t the only one thinking along these lines. In Chicago, Ghani’s hometown and the site of Obama for America headquarters, some campaign members are helping the city make available records of utility usage and crime statistics so developers can build apps that attempt to improve life there. It’s all part of a bigger idea to engineer social systems by scanning the numerical exhaust from mundane activities for patterns that might bear on everything from traffic snarls to human trafficking. Among those pursuing such humanitarian goals are startups like DataKind as well as large companies like IBM, which is redrawing bus routes in Ivory Coast (see “African Bus Routes Redrawn Using Cell-Phone Data”), and Google, with its flu-tracking software (see “Sick Searchers Help Track Flu”).

Ghani, who is 35, has had a longstanding interest in social causes, like tutoring disadvantaged kids. But he developed his data-mining savvy during 10 years as director of analytics at Accenture, helping retail chains forecast sales, creating models of consumer behavior, and writing papers with titles like “Data Mining for Business Applications.”

Before joining the Obama campaign in July 2011, Ghani wasn’t even sure his expertise in machine learning and predicting online prices could have an impact on a social cause. But the campaign’s success in applying such methods on the fly to sway voters is now recognized as having been potentially decisive in the election’s outcome (see “A More Perfect Union”).

“I realized two things,” says Ghani. “It’s doable at the massive scale of the campaign, and that means it’s doable in the context of other problems.”

At Obama for America, Ghani helped build statistical models that assessed each voter along five axes: support for the president; susceptibility to being persuaded to support the president; willingness to donate money; willingness to volunteer; and likelihood of casting a vote. These models allowed the campaign to target door knocks, phone calls, TV spots, and online ads to where they were most likely to benefit Obama.

One of the most important ideas he developed, dubbed “targeted sharing,” now forms the basis of Edgeflip’s first product. It’s a Facebook app that prompts people to share information from a nonprofit, but only with those friends predicted to respond favorably. That’s a big change from the usual scattershot approach of posting pleas for money or help and hoping they’ll reach the right people.

Edgeflip’s app, like the one Ghani conceived for Obama, will ask people who share a post to provide access to their list of friends. This will pull in not only friends’ names but also personal details, like their age, that can feed models of who is most likely to help.

Say a hurricane strikes the southeastern United States and the Red Cross needs clean-up workers. The app would ask Facebook users to share the Red Cross message, but only with friends who live in the storm zone, are young and likely to do manual labor, and have previously shown interest in content shared by that user. But if the same person shared an appeal for donations instead, he or she would be prompted to pass it along to friends who are older, live farther away, and have donated money in the past.

Michael Slaby, a senior technology official for Obama who hired Ghani for the 2012 election season, sees great promise in the targeted sharing technique. “It’s one of the most compelling innovations to come out of the campaign,” says Slaby. “It has the potential to make online activism much more efficient and effective.”

For instance, Ghani has been working with Fidel Vargas, CEO of the Hispanic Scholarship Fund, to increase that organization’s analytical savvy. Vargas thinks social data could predict which scholarship recipients are most likely to contribute to the fund after they graduate. “Then you’d be able to give away scholarships to qualified students who would have a higher probability of giving back,” he says. “Everyone would be much better off.”

Ghani sees a far bigger role for technology in the social sphere. He imagines online petitions that act like open-source software, getting passed around and improved. Social programs, too, could get constantly tested and improved. “I can imagine policies being designed a lot more collaboratively,” he says. “I don’t know if the politicians are ready to deal with it.” He also thinks there’s a huge amount of untapped information out there about childhood obesity, gang membership, and infant mortality, all ready for big data’s touch.

Read the entire article here.

Inforgraphic courtesy of visua.ly. See the original here.

Politics Driven by Science

Imagine a nation, or even a world, where political decisions and policy are driven by science rather than emotion. Well, small experiments are underway, so this may not be as far off as many would believe, or even dare to hope.

[div class=attrib]From the New Scientist:[end-div]

In your wildest dreams, could you imagine a government that builds its policies on carefully gathered scientific evidence? One that publishes the rationale behind its decisions, complete with data, analysis and supporting arguments? Well, dream no longer: that’s where the UK is heading.

It has been a long time coming, according to Chris Wormald, permanent secretary at the Department for Education. The civil service is not short of clever people, he points out, and there is no lack of desire to use evidence properly. More than 20 years as a serving politician has convinced him that they are as keen as anyone to create effective policies. “I’ve never met a minister who didn’t want to know what worked,” he says. What has changed now is that informed policy-making is at last becoming a practical possibility.

That is largely thanks to the abundance of accessible data and the ease with which new, relevant data can be created. This has supported a desire to move away from hunch-based politics.

Last week, for instance, Rebecca Endean, chief scientific advisor and director of analytical services at the Ministry of Justice, announced that the UK government is planning to open up its data for analysis by academics, accelerating the potential for use in policy planning.

At the same meeting, hosted by innovation-promoting charity NESTA, Wormald announced a plan to create teaching schools based on the model of teaching hospitals. In education, he said, the biggest single problem is a culture that often relies on anecdotal experience rather than systematically reported data from practitioners, as happens in medicine. “We want to move teacher training and research and practice much more onto the health model,” Wormald said.

Test, learn, adapt

In June last year the Cabinet Office published a paper called “Test, Learn, Adapt: Developing public policy with randomised controlled trials”. One of its authors, the doctor and campaigning health journalist Ben Goldacre, has also been working with the Department of Education to compile a comparison of education and health research practices, to be published in the BMJ.

In education, the evidence-based revolution has already begun. A charity called the Education Endowment Foundation is spending £1.4 million on a randomised controlled trial of reading programmes in 50 British schools.

There are reservations though. The Ministry of Justice is more circumspect about the role of such trials. Where it has carried out randomised controlled trials, they often failed to change policy, or even irked politicians with conclusions that were obvious. “It is not a panacea,” Endean says.

Power of prediction

The biggest need is perhaps foresight. Ministers often need instant answers, and sometimes the data are simply not available. Bang goes any hope of evidence-based policy.

“The timescales of policy-making and evidence-gathering don’t match,” says Paul Wiles, a criminologist at the University of Oxford and a former chief scientific adviser to the Home Office. Wiles believes that to get round this we need to predict the issues that the government is likely to face over the next decade. “We can probably come up with 90 per cent of them now,” he says.

Crucial to the process will be convincing the public about the value and use of data, so that everyone is on-board. This is not going to be easy. When the government launched its Administrative Data Taskforce, which set out to look at data in all departments and opening it up so that it could be used for evidence-based policy, it attracted minimal media interest.

The taskforce’s remit includes finding ways to increase trust in data security. Then there is the problem of whether different departments are legally allowed to exchange data. There are other practical issues: many departments format data in incompatible ways. “At the moment it’s incredibly difficult,” says Jonathan Breckon, manager of the Alliance for Useful Evidence, a collaboration between NESTA and the Economic and Social Research Council.

[div class=attrib]Read the entire article after the jump.[end-div]

Big Data Versus Talking Heads

With the election in the United States now decided, the dissection of the result is well underway. And, perhaps the biggest winner of all is the science of big data. Yes, mathematical analysis of vast quantities of demographic and polling data won over the voodoo proclamations and gut felt predictions of the punditocracy. Now, that’s a result truly worth celebrating.

[div class=attrib]From ReadWriteWeb:[end-div]

Political pundits, mostly Republican, went into a frenzy when Nate Silver, a New York Times pollster and stats blogger, predicted that Barack Obama would win reelection.

But Silver was right and the pundits were wrong – and the impact of this goes way beyond politics.

Silver won because, um, science. As ReadWrite’s own Dan Rowinski noted,  Silver’s methodology is all based on data. He “takes deep data sets and applies logical analytical methods” to them. It’s all just numbers.

Silver runs a blog called FiveThirtyEight, which is licensed by the Times. In 2008 he called the presidential election with incredible accuracy, getting 49 out of 50 states right. But this year he rolled a perfect score, 50 out of 50, even nailing the margins in many cases. His uncanny accuracy on this year’s election represents what Rowinski calls a victory of “logic over punditry.”

In fact it’s bigger than that. Bear in mind that before turning his attention to politics in 2007 and 2008, Silver was using computer models to make predictions about baseball. What does it mean when some punk kid baseball nerd can just wade into politics and start kicking butt on all these long-time “experts” who have spent their entire lives covering politics?

It means something big is happening.

Man Versus Machine

This is about the triumph of machines and software over gut instinct.

The age of voodoo is over. The era of talking about something as a “dark art” is done. In a world with big computers and big data, there are no dark arts.

And thank God for that. One by one, computers and the people who know how to use them are knocking off these crazy notions about gut instinct and intuition that humans like to cling to. For far too long we’ve applied this kind of fuzzy thinking to everything, from silly stuff like sports to important stuff like medicine.

Someday, and I hope it’s soon, we will enter the age of intelligent machines, when true artificial intellgence becomes a reality, and when we look back on the late 20th and early 21st century it will seem medieval in its simplicity and reliance on superstition.

What most amazes me is the backlash and freak-out that occurs every time some “dark art” gets knocked over in a particular domain. Watch Moneyball (or read the book) and you’ll see the old guard (in that case, baseball scouts) grow furious as they realize that computers can do their job better than they can. (Of course it’s not computers; it’s people who know how to use computers.)

We saw the same thing when IBM’s Deep Blue defeated Garry Kasparov in 1997. We saw it when Watson beat humans at Jeopardy.

It’s happening in advertising, which used to be a dark art but is increasingly a computer-driven numbers game. It’s also happening in my business, the news media, prompting the same kind of furor as happened with the baseball scouts in Moneyball.

[div class=attrib]Read the entire article following the jump.[end-div]

[div class=attrib]Political pundits, Left to right: Mark Halperin, David Brooks, Jon Stewart, Tim Russert, Matt Drudge, John Harris & Jim VandeHei, Rush Limbaugh, Sean Hannity, Chris Matthews, Karl Rove. Courtesy of Telegraph.[end-div]

What’s All the Fuss About Big Data?

We excerpt an interview with big data pioneer and computer scientist, Alex Pentland, via the Edge. Pentland is a leading thinker in computational social science and currently directs the Human Dynamics Laboratory at MIT.

While there is no exact definition of “big data” it tends to be characterized quantitatively and qualitatively differently from data commonly used by most organizations. Where regular data can be stored, processed and analyzed using common database tools and analytical engines, big data refers to vast collections of data that often lie beyond the realm of regular computation. So, often big data requires vast and specialized storage and enormous processing capabilities. Data sets that fall into the big data area cover such areas as climate science, genomics, particle physics, and computational social science.

Big data holds true promise. However, while storage and processing power now enable quick and efficient crunching of tera- and even petabytes of data, tools for comprehensive analysis and visualization lag behind.

[div class=attrib]Alex Pentland via the Edge:[end-div]

Recently I seem to have become MIT’s Big Data guy, with people like Tim O’Reilly and “Forbes” calling me one of the seven most powerful data scientists in the world. I’m not sure what all of that means, but I have a distinctive view about Big Data, so maybe it is something that people want to hear.

I believe that the power of Big Data is that it is information about people’s behavior instead of information about their beliefs. It’s about the behavior of customers, employees, and prospects for your new business. It’s not about the things you post on Facebook, and it’s not about your searches on Google, which is what most people think about, and it’s not data from internal company processes and RFIDs. This sort of Big Data comes from things like location data off of your cell phone or credit card, it’s the little data breadcrumbs that you leave behind you as you move around in the world.

What those breadcrumbs tell is the story of your life. It tells what you’ve chosen to do. That’s very different than what you put on Facebook. What you put on Facebook is what you would like to tell people, edited according to the standards of the day. Who you actually are is determined by where you spend time, and which things you buy. Big data is increasingly about real behavior, and by analyzing this sort of data, scientists can tell an enormous amount about you. They can tell whether you are the sort of person who will pay back loans. They can tell you if you’re likely to get diabetes.

They can do this because the sort of person you are is largely determined by your social context, so if I can see some of your behaviors, I can infer the rest, just by comparing you to the people in your crowd. You can tell all sorts of things about a person, even though it’s not explicitly in the data, because people are so enmeshed in the surrounding social fabric that it determines the sorts of things that they think are normal, and what behaviors they will learn from each other.

As a consequence analysis of Big Data is increasingly about finding connections, connections with the people around you, and connections between people’s behavior and outcomes. You can see this in all sorts of places. For instance, one type of Big Data and connection analysis concerns financial data. Not just the flash crash or the Great Recession, but also all the other sorts of bubbles that occur. What these are is these are systems of people, communications, and decisions that go badly awry. Big Data shows us the connections that cause these events. Big data gives us the possibility of understanding how these systems of people and machines work, and whether they’re stable.

The notion that it is connections between people that is really important is key, because researchers have mostly been trying to understand things like financial bubbles using what is called Complexity Science or Web Science. But these older ways of thinking about Big Data leaves the humans out of the equation. What actually matters is how the people are connected together by the machines and how, as a whole, they create a financial market, a government, a company, and other social structures.

Because it is so important to understand these connections Asu Ozdaglar and I have recently created the MIT Center for Connection Science and Engineering, which spans all of the different MIT departments and schools. It’s one of the very first MIT-wide Centers, because people from all sorts of specialties are coming to understand that it is the connections between people that is actually the core problem in making transportation systems work well, in making energy grids work efficiently, and in making financial systems stable. Markets are not just about rules or algorithms; they’re about people and algorithms together.

Understanding these human-machine systems is what’s going to make our future social systems stable and safe. We are getting beyond complexity, data science and web science, because we are including people as a key part of these systems. That’s the promise of Big Data, to really understand the systems that make our technological society. As you begin to understand them, then you can build systems that are better. The promise is for financial systems that don’t melt down, governments that don’t get mired in inaction, health systems that actually work, and so on, and so forth.

The barriers to better societal systems are not about the size or speed of data. They’re not about most of the things that people are focusing on when they talk about Big Data. Instead, the challenge is to figure out how to analyze the connections in this deluge of data and come to a new way of building systems based on understanding these connections.

Changing The Way We Design Systems

With Big Data traditional methods of system building are of limited use. The data is so big that any question you ask about it will usually have a statistically significant answer. This means, strangely, that the scientific method as we normally use it no longer works, because almost everything is significant!  As a consequence the normal laboratory-based question-and-answering process, the method that we have used to build systems for centuries, begins to fall apart.

Big data and the notion of Connection Science is outside of our normal way of managing things. We live in an era that builds on centuries of science, and our methods of building of systems, governments, organizations, and so on are pretty well defined. There are not a lot of things that are really novel. But with the coming of Big Data, we are going to be operating very much out of our old, familiar ballpark.

With Big Data you can easily get false correlations, for instance, “On Mondays, people who drive to work are more likely to get the flu.” If you look at the data using traditional methods, that may actually be true, but the problem is why is it true? Is it causal? Is it just an accident? You don’t know. Normal analysis methods won’t suffice to answer those questions. What we have to come up with is new ways to test the causality of connections in the real world far more than we have ever had to do before. We no can no longer rely on laboratory experiments; we need to actually do the experiments in the real world.

The other problem with Big Data is human understanding. When you find a connection that works, you’d like to be able to use it to build new systems, and that requires having human understanding of the connection. The managers and the owners have to understand what this new connection means. There needs to be a dialogue between our human intuition and the Big Data statistics, and that’s not something that’s built into most of our management systems today. Our managers have little concept of how to use big data analytics, what they mean, and what to believe.

In fact, the data scientists themselves don’t have much of intuition either…and that is a problem. I saw an estimate recently that said 70 to 80 percent of the results that are found in the machine learning literature, which is a key Big Data scientific field, are probably wrong because the researchers didn’t understand that they were overfitting the data. They didn’t have that dialogue between intuition and causal processes that generated the data. They just fit the model and got a good number and published it, and the reviewers didn’t catch it either. That’s pretty bad because if we start building our world on results like that, we’re going to end up with trains that crash into walls and other bad things. Management using Big Data is actually a radically new thing.

[div class=attrib]Read the entire article after the jump.[end-div]

[div class=attrib]Image courtesy of Techcrunch.[end-div]

Culturomics

[div class=attrib]From the Wall Street Journal:[end-div]

Can physicists produce insights about language that have eluded linguists and English professors? That possibility was put to the test this week when a team of physicists published a paper drawing on Google’s massive collection of scanned books. They claim to have identified universal laws governing the birth, life course and death of words.

The paper marks an advance in a new field dubbed “Culturomics”: the application of data-crunching to subjects typically considered part of the humanities. Last year a group of social scientists and evolutionary theorists, plus the Google Books team, showed off the kinds of things that could be done with Google’s data, which include the contents of five-million-plus books, dating back to 1800.

Published in Science, that paper gave the best-yet estimate of the true number of words in English—a million, far more than any dictionary has recorded (the 2002 Webster’s Third New International Dictionary has 348,000). More than half of the language, the authors wrote, is “dark matter” that has evaded standard dictionaries.

The paper also tracked word usage through time (each year, for instance, 1% of the world’s English-speaking population switches from “sneaked” to “snuck”). It also showed that we seem to be putting history behind us more quickly, judging by the speed with which terms fall out of use. References to the year “1880” dropped by half in the 32 years after that date, while the half-life of “1973” was a mere decade.

In the new paper, Alexander Petersen, Joel Tenenbaum and their co-authors looked at the ebb and flow of word usage across various fields. “All these different words are battling it out against synonyms, variant spellings and related words,” says Mr. Tenenbaum. “It’s an inherently competitive, evolutionary environment.”

When the scientists analyzed the data, they found striking patterns not just in English but also in Spanish and Hebrew. There has been, the authors say, a “dramatic shift in the birth rate and death rates of words”: Deaths have increased and births have slowed.

English continues to grow—the 2011 Culturonomics paper suggested a rate of 8,500 new words a year. The new paper, however, says that the growth rate is slowing. Partly because the language is already so rich, the “marginal utility” of new words is declining: Existing things are already well described. This led them to a related finding: The words that manage to be born now become more popular than new words used to get, possibly because they describe something genuinely new (think “iPod,” “Internet,” “Twitter”).

Higher death rates for words, the authors say, are largely a matter of homogenization. The explorer William Clark (of Lewis & Clark) spelled “Sioux” 27 different ways in his journals (“Sieoux,” “Seaux,” “Souixx,” etc.), and several of those variants would have made it into 19th-century books. Today spell-checking programs and vigilant copy editors choke off such chaotic variety much more quickly, in effect speeding up the natural selection of words. (The database does not include the world of text- and Twitter-speak, so some of the verbal chaos may just have shifted online.)

[div class=attrib]Read the entire article here.[end-div]