Audio-visual AI — Machine-learning System Tackles Speech & Objects Recognition

By Rob Matheson, MIT News Office

Model learns to pick out objects within an image, using spoken descriptions.

Audio-visual AI — Machine-learning System Tackles Speech & Objects Recognition

By Rob Matheson, MIT News Office

MIT computer scientists have developed a system that learns to identify objects within an image, based on a spoken description of the image. Given an image and an audio caption, the model will highlight in real-time the relevant regions of the image being described.

Unlike current speech-recognition technologies, the model doesn’t require manual transcriptions and annotations of the examples it’s trained on. Instead, it learns words directly from recorded speech clips and objects in raw images and associates them with one another.

The model can currently recognize only several hundred different words and object types. But the researchers hope that one day their combined speech-object recognition technique could save countless hours of manual labor and open new doors in speech and image recognition.

Speech-recognition systems such as Siri and Google Voice, for instance, require transcriptions of many thousands of hours of speech recordings. Using these data, the systems learn to map speech signals with specific words. Such an approach becomes especially problematic when, say, new terms enter our lexicon, and the systems must be retrained.

“We wanted to do speech recognition in a way that’s more natural, leveraging additional signals and information that humans have the benefit of using, but that machine learning algorithms don’t typically have access to. We got the idea of training a model in a manner similar to walking a child through the world and narrating what you’re seeing,” says David Harwath, a researcher in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Spoken Language Systems Group. Harwath co-authored a paper describing the model that was presented at the recent European Conference on Computer Vision.

In the paper, the researchers demonstrate their model on an image of a young girl with blonde hair and blue eyes, wearing a blue dress, with a white lighthouse with a red roof in the background. The model learned to associate which pixels in the image corresponded with the words “girl,” “blonde hair,” “blue eyes,” “blue dress,” “white lighthouse,” and “red roof.” When an audio caption was narrated, the model then highlighted each of those objects in the image as they were described.

One promising application is learning translations between different languages, without the need for a bilingual annotator. Of the estimated 7,000 languages spoken worldwide, only 100 or so have enough transcription data for speech recognition. Consider, however, a situation where two different-language speakers describe the same image. If the model learns speech signals from language A that correspond to objects in the image and learns the signals in language B that correspond to those same objects, it could assume those two signals — and matching words — are translations of one another.

“There’s potential there for a Babel Fish-type of mechanism,” Harwath says, referring to the fictitious living earpiece in the “Hitchhiker’s Guide to the Galaxy” novels that translate different languages to the wearer.

The CSAIL co-authors are — graduate student Adria Recasens; visiting student Didac Suris; former researcher Galen Chuang; Antonio Torralba, a professor of electrical engineering and computer science who also heads the MIT-IBM Watson AI Lab; and Senior Research Scientist James Glass, who leads the Spoken Language Systems Group at CSAIL.

Audio-visual Associations


This work expands on an earlier model developed by Harwath, Glass, and Torralba that correlates speech with groups of thematically related images. In the earlier research, they put images of scenes from a classification database on the crowdsourcing Mechanical Turk platform. They then had people describe the images as if they were narrating to a child, for about 10 seconds. They compiled more than 200,000 pairs of images and audio captions, in hundreds of different categories, such as beaches, shopping malls, city streets, and bedrooms.

They then designed a model consisting of two separate convolutional neural networks (CNNs). One processes images, and one processes spectrograms, a visual representation of audio signals as they vary over time. The highest layer of the model computes outputs of the two networks and maps the speech patterns with image data.

The researchers would, for instance, feed the model caption A and image A, which is correct. Then, they would feed it a random caption B with image A, which is an incorrect pairing. After comparing thousands of wrong captions with image A, the model learns the speech signals corresponding with image A, and associates those signals with words in the captions. As described in a 2016 study, the model learned, for instance, to pick out the signal corresponding to the word “water,” and to retrieve images with bodies of water.

“But it didn’t provide a way to say, ‘This is an exact point in time that somebody said a specific word that refers to that specific patch of pixels,’” Harwath says.

Making a "Matchmap"


In the new paper, the researchers modified the model to associate specific words with specific patches of pixels. The researchers trained the model on the same database, but with a new total of 400,000 image-captions pairs. They held out 1,000 random pairs for testing.

In training, the model is similarly given correct and incorrect images and captions. But this time, the image-analyzing CNN divides the image into a grid of cells consisting of patches of pixels. The audio-analyzing CNN divides the spectrogram into segments of, say, one second to capture a word or two.

With the correct image and caption pair, the model matches the first cell of the grid to the first segment of audio, then matches that same cell with the second segment of audio, and so on, all the way through each grid cell and across all time segments. For each cell and audio segment, it provides a similarity score, depending on how closely the signal corresponds to the object.

The challenge is that, during training, the model doesn’t have access to any true alignment information between the speech and the image. “The biggest contribution of the paper,” Harwath says, “is demonstrating that these cross-modal [audio and visual] alignments can be inferred automatically by simply teaching the network which images and captions belong together and which pairs don’t.”

The authors dub this automatic-learning association between a spoken caption’s waveform with the image pixels a “Matchmap.” After training on thousands of image-caption pairs, the network narrows down those alignments to specific words representing specific objects in that "Matchmap".

“It’s kind of like the Big Bang, where the matter was really dispersed, but then coalesced into planets and stars,” Harwath says. “Predictions start dispersed everywhere but, as you go through training, they converge into an alignment that represents meaningful semantic groundings between spoken words and visual objects.”

"Reprinted with permission of MIT News
Name

-51,1,3D Technology,2,5G,9,Abkhazia,2,Abortion Laws,1,Academics,10,Accidents,21,Activism,1,Adani Group,5,ADB,13,ADIZ,1,Adults,1,Advertising,31,Advisory,2,Aerial Reconnaissance,13,Aerial Warfare,35,Aerospace,5,Afghanistan,88,Africa,112,Agile Methodology,2,Agriculture,20,AI Policy,1,Air Crash,10,Air Defence Identification Zone,1,Air Defense,7,Air Force,29,Air Pollution,1,Airbus,5,Aircraft Carriers,5,Aircraft Systems,5,Al Nusra,1,Al Qaida,4,Al Shabab,1,Alaska,1,ALBA,1,Albania,2,Algeria,3,Alibaba,1,American History,4,AmritaJash,10,Antarctic,1,Antarctica,1,Anthropology,7,Anti Narcotics,12,Anti Tank,1,Anti-Corruption,4,Anti-dumping,1,Anti-Piracy,2,Anti-Submarine,1,Anti-Terrorism Legislation,1,Antitrust,2,APEC,1,Apple,2,Applied Sciences,2,AQAP,2,Arab League,3,Architecture,3,Arctic,6,Argentina,7,Armenia,30,Army,3,Art,3,Artificial Intelligence,81,Artillery,2,Arunachal Pradesh,2,ASEAN,12,Asia,70,Asia Pacific,23,Assassination,2,Asset Management,1,Astrophysics,2,ATGM,1,Atmospheric Science,1,Atomic.Atom,1,Augmented Reality,7,Australia,57,Austria,1,Automation,13,Automotive,129,Autonomous Flight,2,Autonomous Vehicle,3,Aviation,63,AWACS,2,Awards,17,Azerbaijan,16,Azeri,1,B2B,1,Bahrain,9,Balance of Payments,2,Balance of Trade,3,Balkan,10,Balochistan,2,Baltic,3,Baluchistan,8,Bangladesh,28,Banking,53,Bankruptcy,2,Basel,1,Bashar Al Asad,1,Battery Technology,3,Bay of Bengal,5,BBC,2,Beijing,1,Belarus,3,Belgium,1,Belt Road Initiative,3,Beto O'Rourke,1,BFSI,1,Bhutan,13,Big Data,30,Big Tech,1,Bilateral Cooperation,19,BIMSTEC,1,Biography,1,Biotechnology,3,Birth,1,BISA,1,Bitcoin,9,Black Lives Matter,1,Black Money,3,Black Sea,2,Blockchain,32,Blood Diamonds,1,Bloomberg,1,Boeing,21,Boko Haram,7,Bolivia,6,Bomb,3,Bond Market,2,Book,11,Book Review,24,Border Conflicts,11,Border Control and Surveillance,7,Bosnia,1,Brand Management,14,Brazil,104,Brexit,22,BRI,5,BRICS,20,British,3,Broadcasting,16,Brunei,3,Brussels,1,Buddhism,1,Budget,4,Build Back Better,1,Bulgaria,1,Burma,2,Business & Economy,1203,C-UAS,1,California,5,Call for Proposals,1,Cambodia,7,Cameroon,1,Canada,54,Canadian Security Intelligence Service (CSIS),1,Carbon Economy,9,CAREC,1,Caribbean,10,CARICOM,1,Caspian Sea,2,Catalan,3,Catholic Church,1,Caucasus,9,CBRN,1,Cement,1,Central African Republic,1,Central Asia,82,Central Asian,3,Central Eastern Europe,48,Certification,1,Chad,2,Chanakya,1,Charity,2,Chatbots,2,Chemicals,7,Child Labor,1,Child Marriage,1,Children,4,Chile,10,China,579,Christianity,1,CIA,1,CIS,5,Citizenship,2,Civil Engineering,2,Civil Liberties,5,Civil Rights,2,Civil Society,5,Civil Unrest,1,Civilization,1,Clean Energy,5,Climate,67,Climate Change,24,Climate Finance,2,Clinical Research,3,Clinton,1,Cloud Computing,44,Coal,6,Coast Guard,3,Cocoa,1,Cognitive Computing,12,Cold War,5,Colombia,15,Commodities,4,Communication,11,Communism,3,Compliance,1,Computers,40,Computing,1,Conferences,1,Conflict,109,Conflict Diamonds,1,Conflict Resolution,48,Conflict Resources,1,Congo,1,Construction,5,Consumer Behavior,4,Consumer Price Index,5,COP26,4,COP28,1,COP29,1,Copper,2,Coronavirus,107,Corporate Communication,1,Corporate Governance,4,Corporate Social Responsibility,4,Corruption,4,Costa Rica,2,Counter Intelligence,15,Counter Terrorism,81,COVID,9,COVID Vaccine,6,CPEC,8,CPG,4,Credit,2,Credit Rating,1,Credit Score,1,Crimea,4,CRM,1,Croatia,2,Crypto Currency,16,Cryptography,1,CSTO,1,Cuba,7,Culture,5,Currency,8,Customer Relationship Management,1,Cyber Attack,7,Cyber Crime,2,Cyber Security & Warfare,115,Cybernetics,5,Cyberwarfare,16,Cyclone,1,Cyprus,5,Czech Republic,3,DACA,1,DARPA,3,Data,9,Data Analytics,36,Data Center,2,Data Science,2,Database,3,Daughter.Leslee,1,Davos,1,DEA,1,DeBeers,1,Debt,13,Decision Support System,5,Defense,12,Defense Deals,8,Deforestation,2,Deloitte,1,Democracy,22,Democrats,2,Demographic Studies,1,Demonetization,6,Denmark. F-35,1,Denuclearization,1,Diamonds,1,Digital,39,Digital Currency,1,Digital Economy,10,Digital Marketing,6,Digital Transformation,11,Diplomacy,14,Diplomatic Row,3,Disaster Management,4,Disinformation,2,Diversity & Inclusion,1,Djibouti,2,Documentary,3,Doklam,2,Dokolam,1,Dominica,2,Donald Trump,48,Donetsk,2,Dossier,2,Drones,14,E-Government,2,E-International Relations,1,Earning Reports,3,Earth Science,1,Earthquake,8,East Africa,2,East China Sea,9,eBook,1,ECB,1,eCommerce,11,Econometrics,2,Economic Justice,1,Economics,43,Economy,109,ECOWAS,2,Ecuador,4,Edge Computing,2,Editor's Opinion,53,Education,65,EFTA,1,Egypt,27,Election Disinformation,1,Elections,44,Electric Vehicle,15,Electricity,7,Electronics,8,Emerging Markets,1,Employment,19,Energy,316,Energy Policy,28,Energy Politics,27,Engineering,24,England,2,Enterprise Software Solutions,8,Entrepreneurship,15,Environment,47,ePayments,13,Epidemic,6,ESA,1,Ethiopia,3,Eulogy,4,Eurasia,3,Euro,6,Europe,14,European Union,233,EuroZone,5,Exchange-traded Funds,1,Exclusive,2,Exhibitions,2,Explosives,1,Export Import,6,F-35,6,Facebook,9,Fake News,3,Fallen,1,FARC,2,Farnborough. United Kingdom,2,FATF,1,FDI,5,Featured,1373,Federal Reserve,2,Fidel Castro,1,FIFA World Cup,1,Fiji,1,Finance,18,Financial Markets,59,Financial Planning,1,Financial Statement,2,Finland,5,Fintech,14,Fiscal Policy,14,Fishery,3,Five Eyes,1,Floods,1,Food Security,27,Forces,1,Forecasting,3,Foreign Policy,13,Forex,4,France,33,Free Market,1,Free Syrian Army,4,Free Trade Agreement,1,Freedom,3,Freedom of Press,1,Freedom of Speech,2,Frigate,1,FTC,1,Fujairah,97,Fund Management,1,Funding,22,Future,1,G20,10,G24,1,G7,4,Gaddafi,1,Gambia,2,Gaming,1,Garissa Attack,1,Gas Price,23,GATT,1,Gaza,13,GCC,11,GDP,14,GDPR,1,Gender Studies,3,Geneal Management,1,General Management,1,Generative AI,7,Genetics,1,Geo Politics,105,Geography,2,Geoint,14,Geopolitics,9,Georgia,11,Georgian,1,geospatial,9,Geothermal,2,Germany,69,Ghana,3,Gibratar,1,Gig economy,1,Global Perception,1,Global Trade,96,Global Warming,1,Global Water Crisis,11,Globalization,3,Gold,2,Google,20,Gorkhaland,1,Government,128,Government Analytics,1,GPS,1,Greater Asia,174,Greece,14,Green Bonds,1,Green Energy,3,Greenland,1,Gross Domestic Product,1,GST,1,Gujarat,6,Gulf of Tonkin,1,Gun Control,4,Hacking,4,Haiti,2,Hamas,10,Hasan,1,Health,8,Healthcare,72,Heatwave,1,Helicopter,12,Heliport,1,Hezbollah,3,High Altitude Warfare,1,High Speed Railway System,1,Hillary 2016,1,Hillary Clinton,1,Himalaya,1,Hinduism,2,Hindutva,4,History,10,Home Security,1,Honduras,2,Hong Kong,7,Horn of Africa,5,Housing,16,Houthi,12,Howitzer,1,Human Development,32,Human Resource Management,5,Human Rights,7,Humanitarian,3,Hungary,3,Hunger,3,Hydrocarbon,3,Hydrogen,4,IAEA,2,ICBM,1,Iceland,2,ICO,1,Identification,2,IDF,1,Imaging,2,IMEEC,2,IMF,76,Immigration,19,Impeachment,1,Imran Khan,1,Independent Media,72,India,661,India's,1,Indian Air Force,19,Indian Army,7,Indian Nationalism,1,Indian Navy,27,Indian Ocean,24,Indices,1,Indigenous rights,1,Indo-Pacific,6,Indonesia,19,IndraStra,1,Industrial Accidents,4,Industrial Automation,2,Industrial Safety,4,Inflation,10,Infographic,1,Information Leaks,1,Infrastructure,3,Innovations,22,Insider Trading,1,Insurance,3,Intellectual Property,3,Intelligence,5,Intelligence Analysis,8,Interest Rate,3,International Business,13,International Law,11,International Relations,9,Internet,53,Internet of Things,34,Interview,8,Intra-Government,5,Investigative Journalism,4,Investment,33,Investor Relations,1,IPEF,1,iPhone,1,IPO,4,Iran,205,Iraq,54,IRGC,1,Iron & Steel,4,ISAF,1,ISIL,9,ISIS,33,Islam,12,Islamic Banking,1,Islamic State,86,Israel,145,ISRO,1,IT ITeS,136,Italy,10,Ivory Coast,1,Jabhat al-Nusra,1,Jack Ma,1,Jamaica,3,Japan,91,JASDF,1,Jihad,1,JMSDF,1,Joe Biden,8,Joint Strike Fighter,5,Jordan,7,Journalism,6,Judicial,4,Justice System,3,Kanchin,1,Kashmir,8,Kaspersky,1,Kazakhstan,26,Kenya,5,Khalistan,2,Kiev,1,Kindle,700,Knowledge Management,4,Korean Conflict,1,Kosovo,2,Kubernetes,1,Kurdistan,8,Kurds,10,Kuwait,7,Kyrgyzstan,9,Labor Laws,10,Labor Market,4,Land Reforms,3,Land Warfare,21,Languages,1,Laos,2,Large language models,1,Laser Defense Systems,1,Latin America,82,Law,6,Leadership,3,Lebanon,10,Legal,11,LGBTQ,2,Li Keqiang,1,Liberalism,1,Library Science,1,Libya,14,Liechtenstein,1,Lifestyle,1,Light Battle Tank,1,Linkedin,1,Lithuania,1,Littoral Warfare,2,Livelihood,3,Loans,9,Lockdown,1,Lone Wolf Attacks,2,Lugansk,2,Macedonia,1,Machine Learning,7,Madagascar,1,Mahmoud,1,Main Battle Tank,3,Malaysia,12,Maldives,13,Mali,7,Malware,2,Management Consulting,6,Manpower,1,Manto,1,Manufacturing,15,Marijuana,1,Marine Engineering,3,Maritime,50,Market Research,2,Marketing,38,Mars,2,Martech,10,Mass Media,29,Mass Shooting,1,Material Science,2,Mauritania,1,Mauritius,2,MDGs,1,Mechatronics,2,Media War,1,MediaWiki,1,Medicare,1,Mediterranean,12,MENA,6,Mental Health,4,Mercosur,2,Mergers and Acquisitions,18,Meta,2,Metadata,2,Metals,3,Mexico,14,Micro-finance,4,Microsoft,12,Migration,19,Mike Pence,1,Military,112,Military Exercise,10,Military Service,2,Military-Industrial Complex,3,Mining,15,Missile Launching Facilities,6,Missile Systems,56,Mobile Apps,3,Mobile Communications,11,Mobility,4,Modi,8,Moldova,1,Monaco,1,Monetary Policy,6,Money Market,2,Mongolia,11,Monkeypox,1,Monsoon,1,Montreux Convention,1,Moon,4,Morocco,2,Morsi,1,Mortgage,3,Moscow,2,Motivation,1,Mozambique,1,Mubarak,1,Multilateralism,2,Mumbai,1,Muslim Brotherhood,2,Mutual Funds,1,Myanmar,30,NAFTA,3,NAM,2,Namibia,1,Nanotechnology,4,Narendra Modi,2,NASA,13,National Identification Card,1,National Security,5,Nationalism,2,NATO,34,Natural Disasters,15,Natural Gas,33,Natural Language Processing,1,Nauru,1,Naval Base,5,Naval Engineering,24,Naval Intelligence,2,Naval Postgraduate School,2,Naval Warfare,50,Navigation,2,Navy,23,NBC Warfare,2,NDC,1,Nearshoring,1,Negotiations,2,Nepal,12,Netflix,1,Neurosciences,7,New Delhi,4,New Normal,1,New York,5,New Zealand,7,News,1267,News Publishers,1,Newspaper,1,NFT,1,NGO,1,Nicaragua,1,Niger,3,Nigeria,10,Nikki Haley,1,Nirbhaya,1,Non Aligned Movement,1,Non Government Organization,4,Nonproliferation,2,North Africa,23,North America,53,North Korea,58,Norway,5,NSA,1,NSG,2,Nuclear,41,Nuclear Agreement,32,Nuclear Doctrine,2,Nuclear Energy,4,Nuclear Fussion,1,Nuclear Propulsion,2,Nuclear Security,47,Nuclear Submarine,1,NYSE,1,Obama,3,ObamaCare,2,OBOR,15,Ocean Engineering,1,Oceania,2,OECD,5,OFID,5,Oil & Gas,382,Oil Gas,7,Oil Price,73,Olympics,2,Oman,25,Omicron,1,Oncology,1,Online Education,5,Online Reputation Management,1,OPEC,129,Open Access,1,Open Journal Systems,1,Open Letter,1,Open Source,4,OpenAI,2,Operation Unified Protector,1,Operational Research,4,Opinion,692,Opinon Poll,1,Optical Communications,1,Pacific,5,Pakistan,181,Pakistan Air Force,3,Pakistan Army,1,Pakistan Navy,3,Palestine,24,Palm Oil,1,Pandemic,84,Papal,1,Paper,3,Papers,110,Papua New Guinea,2,Paracels,1,Partition,1,Partnership,1,Party Congress,1,Passport,1,Patents,2,PATRIOT Act,1,Peace Deal,6,Peacekeeping Mission,1,Pension,1,People Management,1,Persian Gulf,19,Peru,5,Petrochemicals,1,Petroleum,19,Pharmaceuticals,14,Philippines,19,Philosophy,2,Photos,3,Physics,1,Pipelines,5,PLA,2,PLAN,4,Plastic Industry,2,Poland,8,Polar,1,Policing,1,Policy,8,Policy Brief,6,Political Studies,1,Politics,53,Polynesia,3,Pope,1,Population,6,Portugal,1,Poverty,8,Power Transmission,6,President APJ Abdul Kalam,2,Presidential Election,30,Press Release,158,Prison System,1,Privacy,18,Private Equity,2,Private Military Contractors,2,Privatization,1,Programming,1,Project Management,4,Propaganda,5,Protests,12,Psychology,3,Public Policy,55,Public Relations,1,Public Safety,7,Publications,1,Publishing,7,Purchasing Managers' Index,1,Putin,7,Q&A,1,Qatar,114,QC/QA,1,Qods Force,1,Quad,1,Quantum Computing,3,Quantum Physics,4,Quarter Results,2,Racial Justice,2,RADAR,2,Rahul Guhathakurta,4,Railway,9,Raj,1,Ranking,4,Rape,1,RBI,1,RCEP,2,Real Estate,6,Recall,4,Recession,2,Red Sea,5,Referendum,5,Reforms,18,Refugee,23,Regional,4,Regulations,2,Rehabilitation,1,Religion & Spirituality,9,Renewable,18,Report,4,Reports,49,Repository,1,Republicans,3,Rescue Operation,2,Research,5,Research and Development,24,Restructuring,1,Retail,36,Revenue Management,1,Rice,1,Risk Management,5,Robotics,8,Rohingya,5,Romania,2,Royal Canadian Air Force,1,Rupee,1,Russia,317,Russian Navy,5,Saab,1,Saadat,1,SAARC,6,Safety,1,SAFTA,1,SAM,2,Samoa,1,Sanctions,5,SAR,1,SAT,1,Satellite,14,Saudi Arabia,130,Scandinavia,6,Science & Technology,392,Science Fiction,1,SCO,5,Scotland,6,Scud Missile,1,Sea Lanes of Communications,4,SEBI,3,Securities,2,Security,6,Semiconductor,18,Senate,4,Senegal,1,SEO,5,Serbia,4,Services Sector,1,Seychelles,2,SEZ,1,Shadow Bank,1,Shale Gas,4,Shanghai,1,Sharjah,12,Shia,6,Shinzo Abe,1,Shipping,11,Shutdown,2,Siachen,1,Sierra Leone,1,Signal Intelligence,1,Sikkim,5,Silicon Valley,1,Silk Route,6,Simulations,2,Sinai,1,Singapore,17,Situational Awareness,20,Small Modular Nuclear Reactors,1,Smart Cities,7,Social Media,1,Social Media Intelligence,40,Social Policy,40,Social Science,1,Social Security,1,Socialism,1,Soft Power,1,Software,7,Solar Energy,16,Somalia,5,South Africa,20,South America,47,South Asia,472,South China Sea,36,South East Asia,76,South Korea,62,South Sudan,4,Sovereign Wealth Funds,1,Soviet,2,Soviet Union,9,Space,46,Space Station,2,Spain,9,Special Forces,1,Sports,3,Sports Diplomacy,1,Spratlys,1,Sri Lanka,24,Stablecoin,1,Stamps,1,Startups,43,State of the Union,1,Statistics,1,STEM,1,Stephen Harper,1,Stock Markets,23,Storm,2,Strategy Games,5,Strike,1,Sub-Sahara,4,Submarine,16,Sudan,5,Sunni,6,Super computing,1,Supply Chain Management,48,Surveillance,13,Survey,5,Sustainable Development,18,Swami Vivekananda,1,Sweden,4,Switzerland,6,Syria,112,Taiwan,32,Tajikistan,12,Taliban,17,Tamar Gas Fields,1,Tamil,1,Tanzania,4,Tariff,4,Tata,3,Taxation,25,Tech Fest,1,Technology,13,Tel-Aviv,1,Telecom,24,Telematics,1,Territorial Disputes,1,Terrorism,77,Testing,2,Texas,3,Thailand,11,The Middle East,653,Think Tank,317,Tibet,3,TikTok,1,Tobacco,1,Tonga,1,Total Quality Management,2,Town Planning,3,TPP,2,Trade Agreements,14,Trade War,10,Trademarks,1,Trainging and Development,1,Transcaucasus,19,Transcript,4,Transpacific,2,Transportation,47,Travel and Tourism,15,Tsar,1,Tunisia,7,Turkey,74,Turkmenistan,10,U.S. Air Force,3,U.S. Dollar,2,UAE,139,UAV,23,UCAV,1,Udwains,1,Uganda,1,Ukraine,113,Ukraine War,25,Ummah,1,UNCLOS,7,Unemployment,1,UNESCO,1,UNHCR,1,UNIDO,2,United Kingdom,82,United Nations,28,United States,760,University and Colleges,4,Uranium,2,Urban Planning,10,US Army,12,US Army Aviation,1,US Congress,1,US FDA,1,US Navy,18,US Postal Service,1,US Senate,1,US Space Force,2,USA,16,USAF,21,USV,1,UUV,1,Uyghur,3,Uzbekistan,13,Valuation,1,Vatican,3,Vedant,1,Venezuela,19,Venture Capital,4,Vibrant Gujarat,1,Victim,1,Videogames,1,Vietnam,25,Virtual Reality,7,Vision 2030,1,VPN,1,Wahhabism,3,War,1,War Games,1,Warfare,1,Water,17,Water Politics,7,Weapons,11,Wearable,2,Weather,2,Webinar,1,WeChat,1,WEF,3,Welfare,1,West,2,West Africa,19,West Bengal,2,Western Sahara,2,White House,1,Whitepaper,2,WHO,3,Wholesale Price Index,1,Wikileaks,1,Wikipedia,3,Wildfire,1,Wildlife,3,Wind Energy,1,Windows,1,Wireless Security,1,Wisconsin,1,Women,10,Women's Right,14,Workers Union,1,Workshop,1,World Bank,38,World Economy,32,World Peace,10,World War I,1,World War II,3,WTO,6,Wyoming,1,Xi Jinping,9,Xinjiang,2,Yemen,28,Yevgeny Prigozhin,1,Zbigniew Brzezinski,1,Zimbabwe,2,
ltr
item
IndraStra Global: Audio-visual AI — Machine-learning System Tackles Speech & Objects Recognition
Audio-visual AI — Machine-learning System Tackles Speech & Objects Recognition
By Rob Matheson, MIT News Office
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgzJMkP07oPiajaDw1KnivLOk7mymD9_K9UtePFSH7ueSBEI2GArFeBE72OaxNqGnm0mIGokza-JEzDm0vbBbuUryel1WthBUXyzp_purk2RMhIRV7etsk6H8DJDrFCCljNUaEXNO_Hqsk/s640/result.jpg
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgzJMkP07oPiajaDw1KnivLOk7mymD9_K9UtePFSH7ueSBEI2GArFeBE72OaxNqGnm0mIGokza-JEzDm0vbBbuUryel1WthBUXyzp_purk2RMhIRV7etsk6H8DJDrFCCljNUaEXNO_Hqsk/s72-c/result.jpg
IndraStra Global
https://www.indrastra.com/2018/09/audio-visual-ai-machine-learning-004-09-2018-0029.html
https://www.indrastra.com/
https://www.indrastra.com/
https://www.indrastra.com/2018/09/audio-visual-ai-machine-learning-004-09-2018-0029.html
true
1461303524738926686
UTF-8
Loaded All Posts Not found any posts VIEW ALL Readmore Reply Cancel reply Delete By Home PAGES POSTS View All RECOMMENDED FOR YOU LABEL ARCHIVE SEARCH ALL POSTS Not found any post match with your request Back Home Sunday Monday Tuesday Wednesday Thursday Friday Saturday Sun Mon Tue Wed Thu Fri Sat January February March April May June July August September October November December Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec just now 1 minute ago $$1$$ minutes ago 1 hour ago $$1$$ hours ago Yesterday $$1$$ days ago $$1$$ weeks ago more than 5 weeks ago Followers Follow THIS PREMIUM CONTENT IS LOCKED STEP 1: Share to a social network STEP 2: Click the link on your social network Copy All Code Select All Code All codes were copied to your clipboard Can not copy the codes / texts, please press [CTRL]+[C] (or CMD+C with Mac) to copy Table of Content