Research Project Report

Categories: Cloud Computing Research Shopping Technology

Report, Pages 9 (2191 words)

Views

So, what is Amazon Go?

Amazon Go is a chain of convenience stores in the US operated by Amazon. As of May 2019, it has stores at four locations: Chicago, Seattle, San Francisco and New York. We have been used to and living with grocery stores and convenience stores which has checkout queues where we generally pay for the items we have picked up from the store and pay for it. But Amazon Go is a revolutionary concept where there is no need for checkout queues.

Don't use plagiarized sources. Get your custom essay on

“ Research Project Report ”

Get custom paper

NEW! smart matching with writer

There are a lot of options Amazon Go has to offer, ready to eat meal, snacks, juices to freshly made food in the store. There are literally no checkout lines like you find in any store.

These are the following questions one might have on Amazon Go:

Are there really no support staff working at the Amazon Go store?
If there are no checkout lines, how does the customer know the amount charged?
What is the motivation behind the concept of a convenience store without any checkout lines?
What were the problems encountered while building the technology?
How reliable is the technology? Can it give accurate results if there is a lot of crowd in the store?

Support Staff: There are staff at the Go store.

A representative to greet the customers, one at the drinks area to check IDs and finally Chefs who prepare fresh food right in the store. Rest is taken care of by the technology which Amazon calls as "Just Walk Out Technology".

Payment: The customer will be able to see his receipt once he comes out of the store and his Amazon account is charged. We will see a deeper description of how it works in the upcoming sections.

Motivation: A customer needs to have the Amazon Go app installed on their phone to enter the store. There are turnstiles at the entrance which generates a unique QR code for each customer. The final receipt is sent to the customer through the app once he/she leaves the store after picking their stuff. It also displays the number of seconds one has spent inside.

There are a couple of motivations behind the idea of Amazon Go,

Zero-Click purchases: Customer attention is a priority for Amazon and it has been consistent with this with its products like Alexa which is zero click and now the Go stores.

Rounding Out the Data: Customer preference is important for Amazon. Using customer data, Amazon started selecting items based on their taste and delivering it to them.

Creating a single retail experience: Amazon was previously unsuccessful in departments like groceries, clothes and electronics for the simple reason that people want to put their money into something after looking into it personally. For example, groceries like meat, want to wear and try clothes before buying them etc along with the ability to order certain things online. Amazon in this way offers both online shopping as well as an in person shopping experience.

Shipping: Amazon go stores can function as source for return shopping. People can stop by and return items they have ordered from Amazon if they feel the need to return them and grab a sandwich instead.

Problems faced:

Sensor Fusion: Aggregate signals across different sensors. Pictures taken from cameras in store are fused. The question here is, what would be result of sensor fusion? The answer is Computer Vision. The images from the sensors which are fused using sensor fusion are given as input to the model. Computer vision is used by the model and it returns a decision if a customer had taken a product or kept it back. Using Computer Vision, the system takes a picture as an input and returns a decision as output.

Calibration: Each camera has to know its exact location in the store.
Person detection: Track and identify each person in the store.
Object Recognition: Differentiate items from one another.
Pose estimation: detect what the customer is doing near the shelf.
Activity Analysis: Determine whether a person has picked up vs. returned an item.

Reliability: Amazon Go got positive reviews regarding reliability in most of its stores so far. But it was heard that there were problems in the beta version of the store which was open only for the employees of Amazon. So, it had to extend the opening of the next few stores.

SECTION 2: HOW IT WORKS

In this section, we will walk through the concepts and technologies used from the stage where the customer enters the store until he exits the store.

Stage 1: Enter into the store:

The components in this stage are:

Amazon created a mobile app to scan a QR code when you enter the store. A lot of tests went into this stage including UI/UX testing where there are group of people scanning their phones, scanning with phone up or down etc.

Association system compares your likeliness in the video and links your account when you scan the code at the entrance.

A session is created based on the association.

Problems the team faced in this stage were:

Some customers might scan multiple times, so the system has to delete sessions where are no items purchased.

A comparatively difficult problem was if a family visits the store, the purchases of all the members of the family must be linked to the same account. To do so, one person from the family scans the same code for all the members. After doing so, the system creates same session for all the members and charges the account of the person representing the family. By considering all the members as group of one, the team could solve this problem.

Stage 2: Identifying the customers:

In this stage, a customer is identified by the system with the help of a locator. It has to track the customer from the time he enters the store until he exits the store. Some of the problems the team had to face were:

Occlusion: This is when a person is blocked by something in the store. For example, a shelf blocking the view of a person or one person blocking the other.

Tangled State: Where people are very close to each other.

To address these problems, Amazon uses custom camera hardware that does both RGB (Red, Green, Blue) video and distance calculation. From there, images first are segmented into pixels, these pixels are grouped into blobs and from there these blobs are identified if it's a person or not. The task of identifying a person is done using Convolutional Neural Networks. Here, each frame will pass through a certain number of neurons. Each neuron is filtered using a filter and will check for a feature, like an edge or a bend, in the image. Every filter is checked with a magnified portion of an image and this will return a confidence value. The filters and weights are trained as more data is passed in. By repeating this, the model gets smarter. Finally, they build a location map taking frames from all the cameras and integrating them. This process is followed for each person present in the store.

Stage 3: Linking the images of the customers across all the cameras in the store:

The next task is to ensure that labels in the frames are preserved and moving from locating a customer to tracking him/her across the store. To do so, it uses a linker. Problems in this stage were:

Disambiguating Tangled States: When 2 people are close, the system considers them as low confidence. It is scheduled to be rerun and get identified over time.

Distinguishing associates who are different from the customers: We assume that the system is trained to identify the associates different from the customers by their uniform which is definitely unique.

Stage 4: Item Identification:

Production Identification: The question here was, what items are on the shelf and what are off the shelf. To solve this, the team did the following.

Product ID identification: Items that appear to be same, for example: flavors of a drink of the same brand has to be recognized properly by the system. To do so, Convolutional Neural Networks(CNN), which is an important concept of Deep Learning was applied. CNN uses layers to read through the training data which typically consists of images, in this case the images taken from different cameras across the store. To identify the flavors of different brands of the same item, the team used Residual Neural Networks that do refined product recognition which is done after the CNN layer identifies the product class.

Lighting and Deformation: Fluctuations in lighting might confuse the system and the system ends up identifying the items incorrectly. To solve this, there is a need to generate a huge amount of training set data.

Stage 5: CUSTOMER ASSOCIATION:

Pose Estimation: The cameras in the go store look in the top down view and not in the isometric view. There was an issue where the pixels between the customer and the item are not considered. So, the team built a stick figure like model of the customer from the videos. A new Deep Learning model was created to uniquely identify each customer from the video. The team used cross entropy loss function to detect the joint detection point, mostly to integrate the images from all the cameras on the cloud. Self Regression loss function to generate vectors and pairwise vector fields to integrate all the vectors. Team at Amazon showed that this can be applied on any video clips.

Stage 6: ACTION DETERMINATION:

In this stage, there is a need to accurately account for an item picked up or put back on the shelf. One of the problems encountered was, if there is a shelf from which a bottle appears to have been removed. But there might also be the case where an item is put back into the shelf. In this case, the model incorrectly assumes that an item has been removed. To solve this, the system needs to count the number of items on a shelf before deciding if an item was removed or added to or from the shelf rather than judging based on the space.

Pose Identification:

There are a huge number of poses people can be when picking an object off the shelf, especially when you consider multiple customers in close proximity. There is no enough training data available to get the system identify the pose of all the people in the store. To solve this, the team used simulators to generate huge number of virtual images of customers including physical attributes like height, weight, hair etc. Simulation is a method of generating huge amounts of image data. For example, a car company needs to automate their warehouses which need a huge number of images of cars to train their models. This could be laborious. Instead, using simulation, the car company can take a 3D image of one of its cars and generate millions of car images with different attributes. By simulating, the team had a huge payoff as,

The data is annotated before it is being used. So it is much easier to annotate the data generated through simulation.

The team could mitigate the computation time needed to generate the data.

The annotations are consistent across frames.

Simulation and cloud technology helped the team to overcome time overhead and attain rapid progress.

STAGE 7: STREAMING SERVICES:

The challenge was to get the video from the camera on to the cloud for further processing. It has the following components.

Video is captured on board and basic preprocessing is also done onboard which reduces the bandwidth requirements.

Video is streamed on site to handle network issues and delivery to the cloud was guaranteed.

Video servers were used on the cloud to capture and store the videos.

A key aspect in this stage which is in the pipeline is Anomaly Detection and redundancy. Anomaly Detection, also known as outlier detection is the identification of rare items in a dataset which behave in a different way compared to the rest of the data. These anomalies might make a lot of difference in applications related to banking such as bank frauds, medical errors, text errors etc. Removing outliers is important for the performance of a model. Redundancy is repetitive occurrence of data. This might cause deviations in the results of the model.

STAGE 8: EXIT, CART, PAYMENT AND RECEIPTS:

In this stage, the customer exits the store. After a few seconds, he gets a pop up from the Amazon Go app on his phone showing the receipt of his purchases and the time he spent in the store. For payment, his Amazon account linked to the Go app is charged.

SECTION 3: FUTURE

What the future holds?

Some interesting extensions to Amazon Go that might come up in the future.

AI might recommend side dishes based eating habits of a customer determined by his order history.

AI might offer discounts or increase the price based on demand if customer lives in an affluent area. This can be determined by the zip code of the area the customer lives.

This is just a beginning. There's lot more to Amazon Go.