Searchable Video
Jul 1, 2007 12:00 PM, By Michael Fickes
People remember faces and events using various mental techniques.
Recognition, one human memory technique, uses a cue of some kind to call up a memory. The word “father,” for example, calls an image to mind in each of us. We also recognize pictorial cues. Footage of the World Trade Center buildings collapsing calls to mind a host of visuals.
Today, computers can do the same. Video surveillance systems combined with video content analysis software can automatically generate plain English cues, called metadata, that describe each video frame. By attaching metadata to frames, the video becomes searchable.
Video analytics systems are programmed with rules that will cause the system to alarm when certain video appears. For example, suppose the video shows a human form climbing a fence. The computer doesn't recognize the shape or the activity by itself. Nor does the metadata being created and attached to each frame of video mean anything by itself.
When the system compares metadata describing a person climbing a fence to a rule telling the system to alarm when that metadata description appears, the system to alarm when that metadata description appears, the system essentially remembers what a person climbing a fence looks like and activates an alarm.
The system can also route video to monitors in a security center; it can e-mail and telephone security directors; it can send video to wireless handheld devices carried by guards; and it can lock doors, set off alarms, call the police and turn on lights — all because the system recognized a cue that prompted it to take certain actions upon seeing a specific frame of live video.
Once primarily used by government agencies, video analytics (also referred to as video content analysis) is moving into the commercial and corporate worlds. National retailers and financial service providers are using analytics to combat fraud. Corporations can use it to enhance workplace safety, to generate business intelligence and to improve the operation of access control security as well as lighting and heating, ventilating and air conditioning systems.
“The concept of tagging frames of surveillance video with metadata is very new,” says Tim Ross, co-founder and executive vice president of 3VR Security Inc., a video analytics company in San Francisco. “So are the capabilities. Our system, for instance, not only triggers alerts in response to metadata, it also generates searches.”
One might ask: Where has this technology come from? What is the science behind it? How can corporate users adopt these technologies? How much will it cost? How does it provide a return on investment (ROI)?
Searching for effective ways to search video
Historically, security people have searched video by staring at monitors. Unfortunately, the human attention span drifts after 20 minutes or so. While human monitors periodically re-focus, they miss too much to be effective.
The earliest attempts to automate video monitoring and searching came several years ago with the emergence of video motion detection. Those early applications also had problems. While people could not concentrate long enough to spot the kind of motion that would interest security or another corporate group, computers spotted all kinds of motion and reported most of it, overwhelming the system with false alarms.
In 2001, the National Institute of Standards and Technology (NIST) sponsored a series of conferences designed to foster research into search technologies of all kinds. One of the conferences, Text Retrieval Conference — Video or TRECVID, set out to develop techniques to search video.
Early TRECVID techniques employed metadata text created manually by people reviewing video. Once the metadata tags existed, a searcher could locate video clips by typing words into a search engine.
Manual metadata has proven effective in some applications, but the real goal has always been to automate the creation of metadata, so systems can literally watch and search themselves.
Magic algorithms
Today, commercial systems are using algorithms to automate the creation of metadata. “Our software needs to see the video at least once so that it can process each frame,” says Dr. Alan J. Lipton, chief technology officer and director, R&D with ObjectVideo, a video analytics company in Reston Va. “Our technology looks at a video frame and extracts objects of interest — such as people and vehicles — ignoring the background. Then the system creates plain English descriptions of the frames.”
To do this, the system generates a series of algorithms or mathematical formulas for each frame that describe the pixels that compose the images in the frame. Certain pixels will not change or will change very little from frame to frame. The algorithms classify those pixels as part of the background of the scene.
Algorithms also identify objects that compose certain kinds of shapes. Over the years, algorithms have been developed to correspond with cars, people, suitcases, doors, legions of other objects and even colors. The system identifies objects and tracks them from frame to frame. It identifies motion by noticing that a car-shape, for example, occupies a different position in this frame compared to that frame.
Algorithmic formulas have grown in sophistication and number in the past two years, making it possible to track a yellow car pulling into the field-of-view of a security camera, stopping beside a guardhouse for several seconds and driving away. The system will report that someone drove to the gate, spoke to a guard and then drove away. If the car drives through the gate without stopping, the algorithms will generate different metadata tags.
Retrieving memories
“The system creates all of this information for each video frame,” Lipton says. “How can you retrieve it? We have developed a natural language rule interface. You define a rule like: tell me if someone climbs the fence — in this direction but not that direction. The system will pull up all the different pieces of metadata where that condition was met in the last three months and date- and time-stamp them.”
Algorithms power different categories of analysis, says 3VR's Ross. “We're using all the different forms of analysis,” he says, “including OCR (optical character recognition), motion analysis, object analysis, face recognition and camera tampering.”
With that kind of search capability, a user can search the stored database for virtually anything seen by the camera.
There are many different ways to look at this technology. Instead of video analytic categories such as motion and object analysis, Santa Clara-based Vidient Inc.'s analytics focus on behaviors such as loitering, running, abandoned package, climbing a fence, walking in the wrong direction and others. “We implement behaviors with algorithms,” says Steve Goldberg, president and CEO of Vidient Inc.
Marketing video analytics technology
Video analysis systems capable of searching video content can, however, be expensive. Some estimates set the price between $2,000 and $6,000 per camera. At those rates, equipping a 1,000-camera system with 200 analytic cameras would cost between $400,000 and $1.2 million.
Not surprisingly, prices are expected to decline as usage increases. For the time being, however, vendors are experimenting with various packaging options designed to bring current prices into a more acceptable range.
“There is a maxim that good software becomes good hardware,” says ObjectVideo's Lipton. “We have recognized that this technology isn't an enterprise solution. It is part of something else. It is a feature in a larger system.”
To implement that strategy, ObjectVideo has created software that can be embedded onto a microchip called a digital signal processor (DSP). Texas Instruments, Dallas, makes the chip and installs ObjectVideo analytics software on it. The chip fits on a small board, which in turn slides into a security device — a camera at the front end of the system, a digital video recorder at the back end or a router in the middle. In any case, it is a device through which video data will pass.
Verint Systems Inc., Melville, N.Y., has also adopted the DSP approach, but the DSP chip is only in the camera. “We've chosen to process video in a distributed fashion out on the edge of the network,” says Alex Johnson, Verint's director of solutions services. “We have each camera do its own processing and then send metadata back to a central server. This enables you to do 60 cameras per server compared to five or 10 analytics per server if you do the processing through the digital video recorder or network video recorder.”
Vidient contends that processing analytics on a chip in the camera increases power consumption and heat, Goldberg says.
“On the other hand, it stands to reason that analytics will ultimately end up in the camera. But for the next year or two, we believe the answer is an appliance like ours, which sits in between the cameras and the digital or network video recorder,” he says.
The Vidient solution can handle video from four cameras and run analytics for 50 different Vidient behaviors.
Still another solution is available from 3VR. “Our system is both a video management system and an intelligent video system in the same product,” Ross says. “You don't have to buy separate digital or network video recorders. All that functionality comes with our product.”
Perhaps more important, 3VR includes all of its video analytics — OCR, motion analysis, object analysis, face recognition and camera tampering — in the recorder but only charges about what recorder manufacturers charge.
With so many different packages, users will need to spend time determining which video analytics package will produce the best balance of quality and economy.
The ROI story
When implementing a video content analysis product, the return on investment (ROI) calculation will prove less challenging than usual. Unlike traditional security technologies that must, by their nature, attempt to solve problems after the fact with time-consuming investigations, video content analysis can spot problems as they occur and often prevent them from occurring.
“3VR focuses on high loss problems,” Ross says. “Organized check fraud is a $31 billion a year problem for banks. Generally it is carried out with large criminal gangs. We have built a comprehensive set of (facial recognition) features around check fraud for banks. We tag the face of every customer with a biometric signature that enables a search across bank branches. And we store the face images — which take up only a fraction of the storage space of video — for two years in a searchable database.”
When an individual in the database attempts to cash a check, an alert orders the teller to refuse the check and to contact authorities. In that way, the system prevents check fraud, cuts losses and pays for itself.
Pushing the potential ROI further, 3VR acquired Amcrin Corp. last March. Amcrin provides a searchable database called CrimeDex, which contains records of about 10,000 suspects in crimes committed against banks and retail stores. The database shares information with other businesses as well as the police. 3VR is integrating CrimeDex into its services to expand its search capabilities beyond individual banks.
Retailers are also beginning to count on ROI from video content analysis. Retail losses due to shoplifting, store room theft, return desk fraud and other crimes total approximately $40 billion a year. Experts estimate that as much as half of that stems from employee theft.
“Up until now, the people working on loss prevention in retail have not cared about video,” says Ed Troha, director of marketing and communications for ObjectVideo. “They build statistical cases by looking for cash registers with suspicious activity. But with video analytics, you can see the guy that is ripping you off the first time he tries.”
Troha offers two examples: First, an employee registers a cash return transaction. Cash comes out of the till. But where does it go? If a transaction occurs while no customer is standing across from the cash register, security should respond. Analytics can spot such an event and set off an alarm.
A second kind of internal retail fraud occurs when a cashier accepts money from a customer for, say, a $1,000 computer. As the customer walks away, the cashier can hit the “no sale” button, void the transaction and remove the cash from the drawer. Video content analysis will see that no customer was present when the transaction was voided and set off an alarm.
There are other forms of ROI. For example, video analytics can enhance existing systems and eliminate the need for new cash expenditures. “What if you want to add motion detectors to a fence?” Verint's Johnson asks. “If you already have cameras watching the fence line, why not use video analytics to put up a virtual tripwire? That way you can use assets that you already own to provide the same level of functionality.”
ROI in such a case would be particularly large if the company already had a video content analysis system that could simply be tweaked to cover the fence.
Video analytics can also help to cut security officer payrolls or reassign officers that have been monitoring video.
Analytics can detect the shapes of people going in the wrong direction and tailgating through controlled doors. Analytics can also be used to count people on each floor of a building, Troha explains. “Now you can create a program to adjust the HVAC system to match.” That's a security tool that performs like a business investment.
The Exploding Market For Video Analytics
Video analytics technology is creating a massive new market. “The market for video analytics is poised to explode,” says Dilip Sarangan, a research analyst at Frost & Sullivan in Palo Alto, Calif. “The explosive nature of the market is tied to the increased need for more proactive surveillance, the elimination of human error, the convergence of physical and electronic systems and increased scalability.
According to Sarangan, the market will grow to more than $400 million by 2012, up from about $60 million in 2005. That's almost a seven-fold increase.
Dr. Alan J. Lipton, chief technology officer and director of R&D with ObjectVideo Inc., Reston, Va., agrees. “In terms of market adoption, I would say it is early days,” he says. “But it has already become table stakes in the world of government security and quasi-government security. Most RFPs today include specifications for analytics. And now, for the first time, we're seeing analytics becoming a requirement in the commercial world.”
Want to use this article? Click here for options!
© 2009 Penton Media Inc.
Today's New Product
Privaris Biometric Verification SoftwareIn support of the Privaris family of personal identity verification tokens for secure physical and IT access, an updated version of its plusID Manager Version 2.0 software extends the capabilities and convenience to administer and enroll biometric tokens. The software offers multi-client support, import and export functionality, more extensive reporting features and a key server for a more convenient method of securing tokens to the issuing organization. |
advertisement
This month in Access Control
- Targeting The Customer
- Electronic Pedigrees
- One Hero Among Many
- Who? What? When? Where? Why?
- More from September's issue
Latest Jobs
advertisement







