Translate

Monday, October 5, 2020

The 16 best movies to watch on Amazon Prime Video - CNET

Don't know what to watch on Amazon tonight? Let's round up some of its best gems.

from CNET News https://ift.tt/3lbN0oQ
via A.I .Kung Fu

Uni, a new startup by PayU co-founder Nitin Gupta, announces $18.5M seed round led by Lightspeed and Accel for "building the modern age consumer credit card" (Manish Singh/TechCrunch)

Manish Singh / TechCrunch:
Uni, a new startup by PayU co-founder Nitin Gupta, announces $18.5M seed round led by Lightspeed and Accel for “building the modern age consumer credit card”  —  Even as close to a billion debit cards are in use in India today, only about 58 million credit cards are in circulation in the world's second most populous nation.



from Techmeme https://ift.tt/3iE0SGJ
via A.I .Kung Fu

Researcher says that Macs with T2 chips are vulnerable to a variant of the checkm8 exploit, which could jailbreak certain iPhones and was unpatchable (Catalin Cimpanu/ZDNet)

Catalin Cimpanu / ZDNet:
Researcher says that Macs with T2 chips are vulnerable to a variant of the checkm8 exploit, which could jailbreak certain iPhones and was unpatchable  —  Jailbreak involves combining last year's checkm8 exploit with the Blackbird vulnerability disclosed this August.



from Techmeme https://ift.tt/3ivVnKt
via A.I .Kung Fu

Plan2Explore: Active Model-Building for Self-Supervised Visual Reinforcement Learning


To operate successfully in unstructured open-world environments, autonomous intelligent agents need to solve many different tasks and learn new tasks quickly. Reinforcement learning has enabled artificial agents to solve complex tasks both in simulation and real-world. However, it requires collecting large amounts of experience in the environment, and the agent learns only that particular task, much like a student memorizing a lecture without understanding. Self-supervised reinforcement learning has emerged as an alternative, where the agent only follows an intrinsic objective that is independent of any individual task, analogously to unsupervised representation learning. After experimenting with the environment without supervision, the agent builds an understanding of the environment, which enables it to adapt to specific downstream tasks more efficiently.

In this post, we explain our recent publication that develops Plan2Explore. While many recent papers on self-supervised reinforcement learning have focused on model-free agents that can only capture knowledge by remembering behaviors practiced during self-supervision, our agent learns an internal world model that lets it extrapolate beyond memorized facts by predicting what will happen as a consequence of different potential actions. The world model captures general knowledge, allowing Plan2Explore to quickly solve new tasks through planning in its own imagination. In contrast to the model-free prior work, the world model further enables the agent to explore what it expects to be novel, rather than repeating what it found novel in the past. Plan2Explore obtains state-of-the-art zero-shot and few-shot performance on continuous control benchmarks with high-dimensional input images. To make it easy to experiment with our agent, we are open-sourcing the complete source code.

How does Plan2Explore work?

At a high level, Plan2Explore works by training a world model, exploring to maximize the information gain for the world model, and using the world model at test time to solve new tasks (see figure above). Thanks to effective exploration, the learned world model is general and captures information that can be used to solve multiple new tasks with no or few additional environment interactions. We discuss each part of the Plan2Explore algorithm individually below. We assume a basic understanding of reinforcement learning in this post.

Learning the world model

Plan2Explore learns a world model that predicts future outcomes given past observations $o_{1:t}$ and actions $a_{1:t}$. To handle high-dimensional image observations, we encode them into lower-dimensional features $h$ and use an RSSM model that predicts forward in a compact latent state-space $s$. The latent state aggregates information from past observations and is trained for future prediction, using a variational objective that reconstructs future observations. Since the latent state learns to represent the observations, during planning we can predict entirely in the latent state without decoding the images themselves. The figure below shows our latent prediction architecture.


A novelty metric for active model-building

To learn an accurate and general world model we need an exploration strategy that collects new and informative data. To achieve this, Plan2Explore uses a novelty metric derived from the model itself. The novelty metric measures the expected information gained about the environment upon observing the new data. As the figure below shows, this is approximated by the disagreement of an ensemble of $K$ latent models. Intuitively, large latent disagreement reflects high model uncertainty, and obtaining the data point would reduce this uncertainty. By maximizing latent disagreement, Plan2Explore selects actions that lead to the largest information gain, therefore improving the model as quickly as possible.


Planning for future novelty

To effectively maximize novelty, we need to know which parts of the environment are still unexplored. Most prior work on self-supervised exploration used model-free methods that reinforce past behavior that resulted in novel experience. This makes these methods slow to explore: since they can only repeat exploration behavior that was successful in the past, they are unlikely to stumble onto something novel. In contrast, Plan2Explore plans for expected novelty by measuring model uncertainty of imagined future outcomes. By seeking trajectories that have the highest uncertainty, Plan2Explore explores exactly the parts of the environments that were previously unknown.

To choose actions $a$ that optimize the exploration objective, Plan2Explore leverages the learned world model as shown in the figure below. The actions are selected to maximize the expected novelty of the entire future sequence $s_{t:T}$, using imaginary rollouts of the world model to estimate the novelty. To solve this optimization problem, we use the Dreamer agent, which learns a policy $\pi_\phi$ using a value function and analytic gradients through the model. The policy is learned completely inside the imagination of the world model. During exploration, this imagination training ensures that our exploration policy is always up-to-date with the current world model and collects data that are still novel. The figure below shows the imagination training process.


Evaluation of curiosity-driven exploration behavior

We evaluate Plan2Explore on the DeepMind Control Suite, which features 20 tasks requiring different control skills, such as locomotion, balancing, and simple object manipulation. The agent only has access to image observations and no proprioceptive information. Instead of random exploration, which fails to take the agent far from the initial position, Plan2Explore leads to diverse movement strategies like jumping, running, and flipping, as shown in the figure below. Later, we will see that these are effective practice episodes that enable the agent to quickly learn to solve various continuous control tasks.



Evaluation of downstream task performance

Once an accurate and general world model is learned, we test Plan2Explore on previously unseen tasks. Given a task specified with a reward function, we use the model to optimize a policy for that task. Similar to our exploration procedure, we optimize a new value function and a new policy head for the downstream task. This optimization uses only predictions imagined by the model, enabling Plan2Explore to solve new downstream tasks in a zero-shot manner without any additional interaction with the world.

The following plot shows the performance of Plan2Explore on tasks from DM Control Suite. Before 1 million environment steps, the agent doesn’t know the task and simply explores. The agent solves the task as soon as it is provided at 1 million steps, and keeps improving fast in a few-shot regime after that.


Plan2Explore () is able to solve most of the tasks we benchmarked. Since prior work on self-supervised reinforcement learning used model-free agents that are not able to adapt in a zero-shot manner (ICM, ), or did not use image observations, we compare by adapting this prior work to our model-based Plan2Explore setup. Our latent disagreement objective outperforms other previously proposed objectives. More interestingly, the final performance of Plan2Explore is comparable to the state-of-the-art oracle agent that requires task rewards throughout training (). In our paper, we further report performance of Plan2Explore in the zero-shot setting where the agent needs to solve the task before any task-oriented practice.

Future directions

Plan2Explore demonstrates that effective behavior can be learned through self-supervised exploration only. This opens multiple avenues for future research:

  • First, to apply self-supervised RL to a variety of settings, future work will investigate different ways of specifying the task and deriving behavior from the world model. For example, the task could be specified with a demonstration, description of the desired goal state, or communicated to the agent in natural language.

  • Second, while Plan2Explore is completely self-supervised, in many cases a weak supervision signal is available, such as in hard exploration games, human-in-the-loop learning, or real life. In such a semi-supervised setting, it is interesting to investigate how weak supervision can be used to steer exploration towards the relevant parts of the environment.

  • Finally, Plan2Explore has the potential to improve the data efficiency of real-world robotic systems, where exploration is costly and time-consuming, and the final task is often unknown in advance.

By designing a scalable way of planning to explore in unstructured environments with visual observations, Plan2Explore provides an important step toward self-supervised intelligent machines.


We would like to thank Georgios Georgakis and the editors of CMU and BAIR blogs for the useful feedback.

This post is based on the following paper:

  • Planning to Explore via Self-Supervised World Models
    Ramanan Sekar*, Oleh Rybkin*, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak
    Thirty-seventh International Conference Machine Learning (ICML), 2020.
    arXiv, Project Website


from The Berkeley Artificial Intelligence Research Blog https://ift.tt/30Bm4Hc
via A.I .Kung Fu

Apple pulls rival earphones from store ahead of expected launch - CNET

Apple is reportedly working on several versions of over-ear headphones.

from CNET News https://ift.tt/34pFaBv
via A.I .Kung Fu

FAA, EU finish Boeing 737 Max recertification flights - CNET

After two crashes killed 346 people, the Boeing 737 Max is getting close to carrying passengers again. Plus: Everything you need to know about the plane's other issues.

from CNET News https://ift.tt/36vfNOs
via A.I .Kung Fu

The best movies to watch on Disney Plus for Thanksgiving - CNET

Stuffed with turkey and ready for a wholesome movie? Here are options for everyone.

from CNET News https://ift.tt/3d9C0G0
via A.I .Kung Fu

A Literal Child and His Mom Sue Nintendo Over ‘Joy-Con Drift’

The class action lawsuit alleges that the video game company hasn't done enough to address a known problem with its controllers.

from Wired https://ift.tt/34pSwgY
via A.I .Kung Fu

Spotify updates its iOS and Android apps to let users search for songs by lyrics, a feature Apple Music has had since 2018 (Michael Potuck/9to5Mac)

Michael Potuck / 9to5Mac:
Spotify updates its iOS and Android apps to let users search for songs by lyrics, a feature Apple Music has had since 2018  —  Spotify has rolled out a useful new feature today for iOS and Android that allows users to search for songs by its lyrics, something that Apple Music users have enjoyed for a couple of years.



from Techmeme https://ift.tt/36CBAGW
via A.I .Kung Fu

Kaspersky researchers spot malware embedded in UEFI firmware on motherboards of victims' devices, affecting diplomats working on issues related to North Korea (Andy Greenberg/Wired)

Andy Greenberg / Wired:
Kaspersky researchers spot malware embedded in UEFI firmware on motherboards of victims' devices, affecting diplomats working on issues related to North Korea  —  The tool attacks a device's UEFI firmware—which makes it especially hard to detect and destroy.



from Techmeme https://ift.tt/3iALtar
via A.I .Kung Fu

Boom! Hacked page on mobile phone website is stealing customers’ card data

A cartoon depicts a thief emerged from one computer and reaching onto the screen of another.

Enlarge / Computer hacker character stealing money online. Vector flat cartoon illustration (credit: GettyImages)

If you’re in the market for a new mobile phone plan, it’s best to avoid turning to Boom! Mobile. That is, unless you don’t mind your sensitive payment card data being sent to criminals in an attack that remained ongoing in the last few hours.

According to researchers from security firm Malwarebytes, Boom! Mobile’s boom.us website is infected with a malicious script that skims payment card data and sends it to a server under the control of a criminal group researchers have dubbed Fullz House. The malicious script is called by a single line that comprises mostly nonsense characters when viewed with the human eye.

(credit: Malwarebytes)

When decoded from Base64 format, the line translates to: paypal-debit[.]com/cdn/ga.js. The JavaScript code ga.js masquerades as a Google Analytics script at one of the many fraudulent domains operated by Fullz House members.

Read 5 remaining paragraphs | Comments



from Biz & IT – Ars Technica https://ift.tt/2GCo6jx
via A.I .Kung Fu

Game of Thrones prequel House of the Dragon gets leading man - CNET

Paddy Considine has been cast as Viserys Targaryen in the Thrones prequel.

from CNET News https://ift.tt/33uOogu
via A.I .Kung Fu

The time has come to get this charming 6-sided digital timer - CNET

Roll it to start a preset timer or customize the exact countdown you need.

from CNET News https://ift.tt/3njy5uT
via A.I .Kung Fu

2021 Jaguar XF gets updated infotainment tech and a few styling tweaks - Roadshow

This luxury sedan features more technology and greater luxury than ever before.

from CNET News https://ift.tt/36zYWg5
via A.I .Kung Fu

Jaguar XF Sportbrake discontinued in US for 2021 - Roadshow

While the Jaguar XF sedan gets a number of updates for 2021, its rakish wagon variant gets the axe.

from CNET News https://ift.tt/3jutgwp
via A.I .Kung Fu

Chris Hemsworth helps reintroduce Tasmanian Devils to Australia for first time in 3,000 years - CNET

Marvel actor Chris Hemsworth helps release a group of Tasmanian devils back onto Australia's mainland.

from CNET News https://ift.tt/34ps873
via A.I .Kung Fu

The 16 best TV shows to watch on Amazon Prime Video - CNET

Looking for a great show to watch tonight? Let's round up Amazon's best gems.

from CNET News https://ift.tt/30BB7B3
via A.I .Kung Fu

How Facebook and Twitter Handled Trump’s ‘Don’t Be Afraid of Covid’ Post

Medical experts said the president’s message downplayed the dangers of the coronavirus. But it fell into a gray area for the social media platforms.

from NYT > Technology https://ift.tt/3iDf82C
via A.I .Kung Fu

Apple removed headphones and speakers from Bose, Logitech, and Sonos from its online store at the end of Sept., asked retail employees to do the same (Mark Gurman/Bloomberg)

Mark Gurman / Bloomberg:
Apple removed headphones and speakers from Bose, Logitech, and Sonos from its online store at the end of Sept., asked retail employees to do the same  —  - Apple is working on first over-ear headphones, smaller HomePod  — Products from Sonos, Bose, Logitech pulled from online store



from Techmeme https://ift.tt/34qLnNA
via A.I .Kung Fu