aptby James A. Bacon

We’ve all been hearing more and more about “Big Data,” which arises from the ability of computers to collect and process unimaginably huge gobs of data and sophisticated mathematical equations to detect patterns and anomalies that can be used to drive business decision-making. Capital One used Big Data before it had a name to revolutionize the credit card business, and it’s one of the biggest, most profitable companies in Virginia. Now comes Arlington-based Applied Predictive Technologies, which just sold out to MasterCard for $600 million.

That’s a remarkable valuation for a 16-year-old company of 300 employees and revenues approaching $100 million. Humongous pay-offs like that are routine for Silicon Valley but they’re rare in Virginia.

“We will stay Ballston-based, but we will be growing faster,” APT chief executive Anthony Bruce told the Washington Post in an email. “Our opportunity to grow and expand will be accelerated by this partnership, in Arlington and elsewhere.”

Here’s how the company describes its product: “APT’s Test & Learn software is revolutionizing the way leading companies harness their Big Data to accurately measure the profit impact of pricing, marketing, merchandising, operations, and capital initiatives, tailoring investments in these areas to maximize ROI.”

An illustration can be seen in the graphic above. Drawing from data on retail and restaurant sales at more than 100,000 locations nationwide, APT charted the impact of the 2015 NCAA Final Four basketball tournament on restaurant sales in Indianapolis. The APT Index also integrates weather and demographic data to allow retail executives to ask an even broader range of questions. It doesn’t take much imagination to see how MasterCard could use its relationship with retailers globally to sell this as a value-added product.

Read “Data Crush” by Chris Surdak to get a feel for how Big Data will transform industry after industry in ways we mortals can barely comprehend. Big Data will blaze a path of creative destruction easily equal to that of the Internet.

Bacon’s bottom line: Momma, don’t let your babies grow up to be lawyers. Engineers and computer programmers will make a decent living in the economy of the early-mid 21st century, but if you want your kid to have a shot at becoming the next Steve Jobs or Bill Gates, tell them to major in any branch of mathematics that lends itself to Big Data analytics.

If you want an argument in favor of STEM education (the “m” stands for mathematics), this is it. The Big Data revolution may have started in the United States, but the industry will move to wherever there are pools of mathematically gifted employees. We neglect mathematical instruction at our peril. (So says the guy who couldn’t tell you the difference between sines, cosines and tangents, much less between integral and differential calculus, much less actually compute anything requiring a retention of anything beyond 8th-grade algebra. I’m a dinosaur but at least I know it.)


Share this article



ADVERTISEMENT

(comments below)



ADVERTISEMENT

(comments below)


Comments

  1. larryg Avatar

    is a lot of ‘big data” – data collected by the govt – like zip codes with demographic data or GPS data made possible by Govt satellites?

    are there examples of totally privately collected/generated “big” data?

    1. Retail sales a the level of individual stores would seem to be an example of privately generated Big Data.

      1. Retail sales
        Stock transactions
        Credit scores
        Individual web usage data (think Google)
        Survey completions (think ComScore)
        Retail bank financial transactions
        Credit card spending (like MPT tracks)
        Employment opportunities (think LinkedIn)
        Trending news (think Twitter)

        Put it this way … if you aren’t paying for a product then you are the product. And it’s the data from your use of all those “free” products that makes the business case for the company.

        1. Where you have driven (think Waze)
          What you have read or considered reading (think Amazon.Com)

  2. larryg Avatar

    I agree – and I point out that nowdays – when you agree to “terms of service” – you often find that you are agreeing to the sale of data about you – to others.

    but I’m also pointing out that a good amount of “big data” is data the government collects about things like MSAs, demographic data by zip, gps coordinates of road systems, govt-created map data.. etc..

    i.e. the private sector is leveraging data they did not pay for or certainly are not paying for it at a market supply/demand price.

    and yet another example of the govt investments – true investments – that drive the economy.

    however – to be fair – I’d also point out the airline reservation systems combined with 3rd party sites like Trivago and Kayak – and Airnb , etc. as excellent examples of private big data… but possible though data-sharing partnerships.

    bottom- line – I am in agreement with your premise – and further point out that this is what is squeezing out more and more workers as big data is taking their jobs as info “gate keepers”.. People no longer are restricted from looking up NADA prices.. on cars.. there is an avalanche of data.. to support informed consumers.

    I remember a few years back when I went shopping and told the dealer the invoice price and he accused me of looking at restricted proprietary data…that was not really legal for me to have.. we had a further discussion of which he learned a few things..

    1. virginiagal2 Avatar
      virginiagal2

      There is a difference between data, even data in large amounts, and big data, although there is no clear bright line. Traditional methods of analysis are generally used for data. Newer methods, such as Hadoop and R, are used for big data. Big data and traditional data can be combined for more meaningful analyses.

      Big data often is less structured, or less tidily structured, than data. It also tends to be “big” – not just millions of records, but trillions, and often stored in ways that require some thinking to get to the meaning behind it.

      Big data, if transactional, is generally for more than one company or organization, and not necessarily in the same format – so invoice prices are data, but not big data. But if you wanted to do a study on invoice price versus sales price for goods across various sectors, that’s almost certainly big data.

      Reservations and worldwide flights would be big data, but patterns for one company would just be data. For one company, it’s a lot of data, but it’s in a common format and amenable to simpler analysis.

      Web clicks, even for one company, are usually considered big data (so analysis of your responses to search results, or other behavior on Amazon or Google is big data.)

      Demographic data by zip would generally be considered just data.

      Published papers on genes and cancer are big data. A good example of big data, presented at a conference I attended, was an analysis that took published papers on the human genome that looked for genes associated with cancer, crunched the papers using textual analysis methods, and correlated that with human genome datasets and research results, to look for correlations between genes and cancer. It found new correlations and suggested new research. I think but won’t swear that one was an NIH project.

      That work also points out another strength of big data – the ability to pull in information for analysis from sources that aren’t traditionally considered data, like web clicks and research papers.

  3. LifeOnTheFallLine Avatar
    LifeOnTheFallLine

    Big data is definitely the future, especially as servers become cheaper, faster and capable of far more storage. There are only two limiting factors: the imagination of the question askers and the amount of information people are willing to share.

    Without irony, I look forward to the day we all have encrypted GPS devices on our cars so that DOTs can get the best look possible at what is really needed for our roadways.

    1. DOTs might be able to buy that data today from Google / Waze. They are certainly collecting it from their users. I just brought up Waze at my house. There are presently 13,376 Waze users active “nearby”. I am not sure how close “nearby” is but that’s a lot of people from which Waze can collect data.

      1. LifeOnTheFallLine Avatar
        LifeOnTheFallLine

        Google/Waze is pretty good, there’s also data available for purchase from TomTom, Here and Inrix (which is available on a limited basis to the public through the University of Maryland’s RITIS program: https://www.ritis.org). There are some sampling limitations: the only data supply is people with either the proper apps or GIS devices, and the available data loses quality greatly once you move away from the interstates.

        The real use for widespread, trackable GPS data is in O-D studies, which currently cost a bunch and are at best highly educated guesses.

        1. Yeah, products like Waze definitely provide a sample rather than a census. However, my guess is that the confidence interval for statistical analyses of the samples is probably sufficiently narrow to make those samples useful to DOTs.

          The taxing scheme for driving in Virginia is pretty well broken. It is a mis-mash of various taxes (including sales taxes) that have been negotiated across various factions of the General Assembly. Using a Waze-like capability to record when, where and how far a person drove (on a census basis) would allow a much more effective road tax regime. For example, driving on a busy road might cost me more than driving on a road with limited congestion. However, if done properly, some of the funds generated from from the congested road would be set aside for improvements – either to the road itself or to mass transit that alleviates congestion on the road.

          1. LifeOnTheFallLine Avatar
            LifeOnTheFallLine

            Your second point is exactly correct and why people in the United States will never allow that sort of tracking to happen.

            But, yeah, taking the GPS off your car once a year when you get a state inspection, and running it through an algorithm that taxes based on:

            – Roads used
            – Vehicle weight
            – Miles traveled

            And any other metric you could think of would make the taxing structure much more fair and much more sustainable.

            The problem is that even if it meant the gas tax would be repealed it would be hard to convince people to pay all at once and I don’t trust gas companies to lower the prices to reflect the dropped taxes.

      2. larryg Avatar

        I’ve used waze and provides a LOT of good real time info especially about traffic issues.. but for some reason it totally whacks my phone… eats the battery and causes reboots so I had to take it off.

        I dunno if it’s my phone or Waze needs to tweaking to reduce it’s impact on phone resources.

        on laptops – software typically lists resources requirements – how much disk, how much memory, etc

        in the phone world – it’s the wild wild west – some apps can have heavy impacts on the phone’s resources – but some of that depends on the phone’s particular resources and it’s design and build quality.

        Some phones are crap and some apps are crap…

        I like Waze but it just whacks my phone.

        1. GPS is a power hog. However, broadcasting your location back to Waze is the real culprit. Once you turn on Waze you become a sensor. You are providing as much (or more) information than you are consuming. I still think it’s a “square deal” but people rarely realize how much information about themselves they are giving up with these social and quasi-social apps.

    2. virginiagal2 Avatar
      virginiagal2

      I don’t particularly want DOTs tracking where identifiable individuals go in realtime. Little too Big Brother-y for me.

      Overall patterns of usage, yes. Individual cars, no.

      Even anonymized, it takes a relatively small number of data points to get data back to an identifiable individual.

      1. LifeOnTheFallLine Avatar
        LifeOnTheFallLine

        You’re in luck then since traffic studies are only useful in the aggregate anyway.

  4. larryg Avatar

    the use of big data to look at system/region wide congestion as great promise and what it will likely show in that individual projects don’t relieve congestion as much as they push it to somewhere else … where it actually makes things even worse.

    big data has great promise to sequence wider scope traffic signals – where bottlenecks are addressed and result in wide scope congestion relief.

    finally “connected” GPS will react to real time wider scope traffic conditions to route you – a path that is the shortest time path given the existing traffic conditions…

    people approaching cities -50 miles out – might be advised that “going around” is going to be shorter than going through and clumping up on an incident.

    we’re already seeing some of this on smart phones… but it needs to flow through to heads-up displays.. it’s ludicrous to try to use a cell phone to navigate in heavy traffic conditions but it is – without a doubt a better way to measure actual traffic (via GPS position reporting of phones in each car).

    I’d also agree and point out -that with the right kind of education -that big data apps – are jobs…

  5. BDVienna Avatar
    BDVienna

    The best mix: Advanced math and/or programming, combined with something creative. Art is good because it helps you design interfaces and think creatively. Other humanities also aren’t bad. Social sciences involve a lot of statistical research.

    Or you could major in computer science and drama, then invent the next great comedy website.

    1. I forget the term but there is a category of person who is adept at both music and math. Among music theorists it is fairly common to find strong math skills as well. There is apparently something about the patterns found in music that relate to math. I once worked for a technology company that included music composition in the skills for a new chief scientist employee search.

      1. virginiagal2 Avatar
        virginiagal2

        That’s been my experience as well.

  6. larryg Avatar

    the thing about big data – is there are usually 3 disciplines to do good applications

    1 – you have to be skilled in knowing how to design and handle gobs of data – especially in a real time environment.

    you have to understand how data moves from platforms via the internet.
    data is moving to/from cars that are moving rapidly from one cell tower to another.. thousands of cars – generating data.. at the same time thats gotta get from your car to the cell tower to the internet to your servers and back to those thousand cars in seconds… via whatever cell towers they now are connected to… anywhere on those paths – things can go sideways if you really don’t understand how those things work.

    Our phones these days are sophisticated networked computers..that have more computing power than an IBM mainframe of a few years back.

    2. – you have to understand the business purpose and products that
    are fueled by big data what market are they trying to serve?

    3. – you have to understand how customers/clients/users – use the apps – how the apps are useful and provide value to their need.

    this is a large and complex field with a lot of moving parts that is evolving rapidly and to attract and keep competent employees – you have to be able to monetize the data and information and/or charge for apps and software.

    the more a prospective job seeker understands about all the different technologies involved -the more valuable and in demand they are – …

    in the past – those kinds of multi-discipline jobs were called systems analysts or operational analysts.. etc.. the ability to understand multiple technology disciplines… which requires top notch core academic skills.. in hard-science as wells as communication skills – and the ability to work in teams…

    these are the 21st century jobs – that I worry that our schools and our students are not up to and we’re going to lose them to overseas companies and workers.

    when I look at our K-12 schools these days 1/2 are not bound for 4yr – and they are not even bound for 2yr community college occupation goals.

    and the 4yr folks are more interested in liberal arts and the like rather than hard-science proficiency…

    when you look at AP in most schools – it’s not a pretty sight except in places like Fairfax . We have low AP participation rate in the math/science areas and we have not good pass rates for the math/science APs.

    1. Larry:

      Your points are well taken but the state of the market today would require some level of programming skill as well. One of my goals at my current employer is to make advanced data analytics available to sophisticated business users without the need for computer systems developers. We’ll get there but it’s a tough slog. Dealing with any single dataset is fairly straight forward. However, once you start dealing with multiple datasets the industry’s ability to allow “programmer-less” analysis falls off sharply. Companies like Tableau Software are getting closer and closer but even they aren’t there yet.

      Someday Bacon’s Rebellion will include associated free public data sets and an open source or open use tool for the analysis of the data. When that day comes – watch out! Our elected and appointed officials are in for a real shock. Their ability to “hide in plain sight” will be severely compromised by open data and powerful “programmer-less” analytical tools.

  7. larryg Avatar

    Don – I put great stock ins SOME (but not all of your views) but especially so in the business/computer world.

    you have direct insight into how that industry is evolving and I think you have an important idea for Jim Bacon to consider and that is next to his existing left and right masthead columns – a place – a permanent place for publicly available and accessible datasets some of which you have provided in your comments and others have provided like HCJ with the SOL build-a-table.

    Our friends Waldo Jacquith and Megan Ryan are urging the state to provide wide/deeper access to existing databases … which are ubiquitous even in state agencies but they will not allow the public to access the data on those databases which would be super-easy with a front-end that blocks specific data not releasable but allows ad hoc retrievals of the rest.

    The Times Dispatch provides access to a list of state employees and their salaries…

    so one of the ironies is that for-profit businesses can get easier access to govt data than citizens can.

    big data is big trouble for state agencies who really don’t want the public learning more about their operations.

    I continue to point out that most schools in Va do not account for how local discretionary tax money -not mandated by the State and Feds is spent.

    not so much about how much is spent by what it is actually spent on – that is not required by the Feds/State – i.e. what are the priorities of the School system for local tax money?

    how many teachers and what kinds of teachers are hired with discretionary local tax money that are NOT Fed/state required teachers, etc.

  8. larryg Avatar

    I should add that RTD had to file a FOIA to get access to the state database.

    Jim should ask Waldo Jaquith if he might do a column on what is known as bullk data from state agencies.. something he made great inroads on with the General Assembly that led to one of the first “big data” applications in Va – namely Richmond Sunshine – and in the process – embarrassed and chastised a miserable State front end to legislative data operate by a staff vs Waldo’s mostly one man operation.

    Another one worth highlighting is VPAP – the data – often touted here .. and highly valued even by business and elected – but put together by non-state folks who were cajoling the state board of elections to give them ready access to PAPER financial disclosure forms that were supposed to be the fundamental basis of timely disclosure of money in politics in Va.

    Until VPAP came along – the ready and timely “access” consisted of hanging out at the physical location of SBE waiting to see the “paper”.

    The “ready timely disclosure” that underpins money in politics in Va was originally and cynically based physical access until VPAP came along.

    I would posit that both the Sunshine and VPAP – along with Waldos – Va De-coded are fine examples of the beginnings of “big data” and I would further posit – that if and when – the real-time financial disclosures are linked to the folks making real-time votes on legislation – that we’ll be getting closer to the original tenets of the forefathers to hold the elected more accountable.

    but again – let me point out – that only superbly educated people – in the most robust fields of math and science (to include all fields of software and big data) are going to find jobs in these burgeoning industries.

    Kids who barely achieve minimal proficiency are not going to qualify and for that matter – neither are the 4yr kids if they avoid these academic areas and go for liberal arts type degrees.

    the 21st century global economy world is not your father’s USA manufacturing/info gatekeeper economy any more.

  9. Let’s not lose sight of BDVienna’s point. Sure we need Mathematics in a big way, but to STEM we must add Art, for STEAM in our educational system — art, music, literature, etc — to stimulate the creativity and guide the values required for our directing Big Data in constructive avenues.

    1. larryg Avatar

      Well I do agree we DO need creativity…and art immeasurably improves our lives and makes us think differently about problem-solving also.

      but art does not build the safety systems in your car nor move data from you cell phone to a server and back again…

      It’s not STEM per se. It’s the king of thinking and logic and problem solving that is behind STEM – and yes creativity is part of it.

      but creativity alone – is not going to get most folks a job in the 21st century – rather a lucky few who are truly gifted.

      the manufacturing is going to robots – not only here but in countries with cheap labor.

      what big data is going to do – to knowledge/data gatekeepers jobs is what robots have done and are doing to manufacturing jobs.

      We have to ask ourselves – what a child in Nelson County should be learning …. if he/she are not going to work in manufacturing and not as an intermediary (gate keeper) for access to info…

      your car is not only going to turn on your engine oil light -it’s going to transmit the code to your dealer…who’s going to have you schedule for service.

      You’re not going to come home to a dead furnace.. you’re going to know within seconds on your phone that an alarm/fault has occurred and asking your permission to schedule a service call.

      when you get home from you doctor’s appointment – you’re going to have a summary of the doctor’s notes as well as the next appointment and prescriptions

      so he’s the question -what should the kid in Nelson County take in school that will set him up with the academic skills to write applications for the above scenarios?

      I would posit – that he needs to be a skilled reader who understands words, meanings, concepts, etc.

      I would posit – she needs to be well oriented to basic technology concepts

      and I would posit – they need to understand the math that is behind the technology that powers the applications.

      and it will be an enormous advantage if they are “creative” in the way they synthesize different approaches to problem solving.

      if that kid in Nelson grows up barely proficient in reading and math – it’s going to be a tough slog….and the fall back probably ought to be an occupational certificate for autos or medical technology , HVAC, etc..

      and if his K-12 does not provide it and Community College is not available – then it’s going to be a tough life… some characterize as a “serf” life.

      Big Data is exciting and promises great things – but it also puts a sobering challenge on us for public education.

      we’re going to have a country continuing to evolve – haves and have nots..

  10. TooManyTaxes Avatar
    TooManyTaxes

    I worry about privacy of both government and private databases. There is not enough disclosure as to what the data holder – note I did not say data owner – has and does with personally identifiable data. I have no trouble with either the government of private business using aggregate data where no one’s identity can be discovered or disclosed. I do have a problem with anyone using information about me where I shop, what I buy, where I drive, etc.

    The FTC requires entities to develop, post online and follow their privacy policies. The Agency also goes after companies that breach their policies. But that’s not enough. All data gathering that is individualized should be subject to the basic rules communications carriers must follow — customer proprietary network information, as set forth in 47 USC 222 and the implementing rules.

    In sum, a carrier may use, disclose or access such information (the services you purchase and the calls you make) only : (1) as required by law; (2) with the specific customer’s approval; and (3) in providing the service from which the customer information is derived. They must disclose your rights annually and obtain either opt-in or opt-out consent – the details of which I will not elaborate for fear of being banned from the site.

    The company cannot disclose any of this information over the Internet or over the phone except when a customer’s password is provided or when the company has the recorded oral permission of the customer after he/she has provided information not generally known the public.

    The company must keep records of when and to whom they disclose CPNI. The FCC has fined some violators in the millions of dollars. And under certain circumstances pretexters can be convicted of a felony and serve as much as 10 years in prison.

    HIPAA protects my medical information. What protects other information?

  11. larryg Avatar

    not the “king of thinking” – the KIND of thinking… dyslexia…

  12. larryg Avatar

    on the protection of data ?

    it’s the wild wild west…

    the biggest flaw in “big data” is how badly the folks who are building these databases underestimate their vulnerability to cyber burglary..

    and I use that term on purpose to draw a comparison to how you’d secure your home or business…

    with databases – a lot of companies are essentially putting a key to their data outside their door under a fake rock… because..

    1. – it cost too much money to secure it

    2. – they would have to hire more nerds to be in positions of power in the corporation.

    3. – they have no skilled director of corporate data – just the nerd who has been there the longest or the only one that knows how to get at the family jewels.

    we make much about how incompetent the govt is.

    DonR knows the truth Corporations are often bastions of ignorance and penny-wise, pound-foolish penny pinching… because it cuts down on corporate bonuses.. and investor wealth.

    One of my last jobs in the real world – I had to BEG our bosses to buy a backup file system server… for basically peanuts… and when I went to install Tripwire – a software that detects changes to important system files – I got the “why are you wasting money” treatment.

    By far the most ignorant of may corporations – are the folks at the top.

    they got good looks and good hairdos but they are often dumb as a stump or they are downright evil in their treatment of people.

Leave a Reply