Translate

Saturday, October 31, 2020

A profile of Discord, which lies at the center of the gaming world with 100M+ MAUs, as it pushes to turn into a communication tool for everyone, not just gamers (David Pierce/Protocol)

David Pierce / Protocol:
A profile of Discord, which lies at the center of the gaming world with 100M+ MAUs, as it pushes to turn into a communication tool for everyone, not just gamers  —  Most longtime Discord users have a similar origin story.  They liked playing video games, and liked playing with their friends …



from Techmeme https://ift.tt/35QYjgq
via A.I .Kung Fu

Cloud video surveillance company Eagle Eye Networks raises $40M Series E from Accel to invest in new AI projects such as license plate recognition (Christine Hall/Crunchbase News)

Christine Hall / Crunchbase News:
Cloud video surveillance company Eagle Eye Networks raises $40M Series E from Accel to invest in new AI projects such as license plate recognition  —  Eagle Eye Networks, a cloud video surveillance company, raised $40 million in Series E funding from Accel to advance its platform.



from Techmeme https://ift.tt/35QezhD
via A.I .Kung Fu

StackHawk, whose tech helps developers find application security vulnerabilities before they get into production, raises $10M Series A led by Sapphire Ventures (Nick Greenhalgh/Denver Business Journal)

Nick Greenhalgh / Denver Business Journal:
StackHawk, whose tech helps developers find application security vulnerabilities before they get into production, raises $10M Series A led by Sapphire Ventures  —  With a successful beta at its back and paying customers onboard, Denver application security startup StackHawk announced Tuesday …



from Techmeme https://ift.tt/34IUACc
via A.I .Kung Fu

Wise, a fintech startup that partners with other companies so that they can offer business bank accounts to their own customers, raises $12M Series A (Romain Dillet/TechCrunch)

Romain Dillet / TechCrunch:
Wise, a fintech startup that partners with other companies so that they can offer business bank accounts to their own customers, raises $12M Series A  —  Fintech startup Wise has raised a $12 million Series A round.  The company offers business bank accounts with an interesting go-to-market strategy.



from Techmeme https://ift.tt/3jMsYAv
via A.I .Kung Fu

Varmilo's Moonlight mechanical keyboard is the smoothest I've ever used - CNET

Built with the company's EC Switch V2 linear switches, the keyboard offers an effortless typing experience.

from CNET News https://ift.tt/34IV5fm
via A.I .Kung Fu

Skan, which helps enterprises automate repetitive business processes by combining data engineering with computer vision, raises $14M Series A (Kyle Wiggers/VentureBeat)

Kyle Wiggers / VentureBeat:
Skan, which helps enterprises automate repetitive business processes by combining data engineering with computer vision, raises $14M Series A  —  Skan.ai, an AI-enabled process discovery and operational intelligence platform, today closed $14 million in funding.



from Techmeme https://ift.tt/3mEqqWP
via A.I .Kung Fu

DriveWealth raises $56.7M Series C for its digital brokerage services that help broker-dealers and its global online partners access the US securities market (FinSMEs)

FinSMEs:
DriveWealth raises $56.7M Series C for its digital brokerage services that help broker-dealers and its global online partners access the US securities market  —  DriveWealth, LLC, a Chatam, N.J.-based global digital trading technology company, raised $56.7m in Series C funding.



from Techmeme https://ift.tt/34IyZtE
via A.I .Kung Fu

The best robot vacuum for 2020: Electrolux, Neato, iRobot Roomba, Eufy and more - CNET

From Electrolux, iRobot, Eufy, Ecovacs, Neato and more, we tested lots of leading robot vacuum models to see which are the best.

from CNET News https://ift.tt/3eeQcxJ
via A.I .Kung Fu

The Internet Archive starts adding banners on some Wayback Machine pages with links that provide contextual information from fact-checking organizations (Mark Graham/Internet Archive Blogs)

Mark Graham / Internet Archive Blogs:
The Internet Archive starts adding banners on some Wayback Machine pages with links that provide contextual information from fact-checking organizations  —  Fact checking organizations and origin websites sometimes have information about pages archived in the Wayback Machine.



from Techmeme https://ift.tt/2TJwhxv
via A.I .Kung Fu

Forty-six top US companies including Apple, Google, and Twitter have filed an amicus brief supporting a legal challenge to block Trump admin's H-1B visa changes (Nandita Mathur/Livemint)

Nandita Mathur / Livemint:
Forty-six top US companies including Apple, Google, and Twitter have filed an amicus brief supporting a legal challenge to block Trump admin's H-1B visa changes  —  - The move comes in the wake of the US administration's proposal to scrap the computerized lottery system to grant H-1B work visas …



from Techmeme https://ift.tt/3jUch6h
via A.I .Kung Fu

Odaseva, a France-based data protection services provider for large-scale Salesforce customers, raises $25M Series B led by Eight Roads Ventures (Annie Musgrove/Tech.eu)

Annie Musgrove / Tech.eu:
Odaseva, a France-based data protection services provider for large-scale Salesforce customers, raises $25M Series B led by Eight Roads Ventures  —  French SaaS company Odaseva has raised $25 million in Series B funding to continue growing its data governance platform for enterprise.



from Techmeme https://ift.tt/3kLXBHf
via A.I .Kung Fu

Joe Biden has finally disclosed who is raising him big money just days before Election Day

Joe Biden attends Fundraiser in Philadelphia Photo by Bastiaan Slabbers/NurPhoto via Getty Images

Biden has been sharply breaking from precedent in his Democratic Party.

Democratic presidential nominee Joe Biden finally disclosed the roster of his biggest fundraisers on Saturday, unveiling the names of the 820 people who have helped him build a big-money juggernaut.

The list includes Biden surrogates like former South Bend, Indiana Mayor Pete Buttigieg and Rep. Adam Schiff (D-CA); Hollywood filmmakers like Lee Daniels and Jeffrey Katzenberg; Silicon Valley billionaires like Reid Hoffman and Ron Conway. The campaign did not specify how much these people raised for Biden efforts beyond that it was more than $100,000.

The release on a Saturday evening came at the last possible moment: Election Day is on Tuesday, and more than 90 million people have already voted, having done so without clarity on who his largest fundraisers are or what influence they may have had on his candidacy. Biden’s last-minute disclosure was a sharp departure from precedent in the Democratic Party, whose presidential candidates have regularly disclosed their so-called “bundlers” in a nod to transparency.

And that’s why campaign-finance reformers had grown concerned that Biden had not yet followed his predecessors Barack Obama and Hillary Clinton’s lead in releasing his bundlers for the general election.

Biden’s campaign had declined to answer inquiries about their bundlers until last week, when it told The New York Times that it would release their names by the end of October (which ended Saturday.) Both Obama and Clinton released updates on the list of people helping them raise big money at consistent intervals; Biden’s only prior update came on a Friday evening just after Christmas in 2019 during the Democratic primary with about 230 names, before his bundling operation beefed up in earnest.

“Congratulations on clearing an artificially low bar they set for themselves that defeats the entire purpose of transparency — allowing voters to know who is funding the campaigns asking for their support before casting their ballots,” said Tyson Brody, a Democratic operative who worked for Bernie Sanders and backs Biden, but is critical of the influence of large campaign contributors.

It makes strategic sense that the Biden campaign would not to draw attention to the bundlers who have helped him turn a lagging fundraising operation into a surprising powerhouse. Biden has worked to position himself as the candidate with the interest of the working and middle classes in mind, giving himself the nickname “Middle-Class Joe,” and casting the general election “as a campaign between Scranton and Park Avenue.”

And so, the Biden campaign has tried to draw focus to its small-dollar, online fundraising operation, rather than the celebrities, Silicon Valley billionaires, and Wall Street executives whose support undercuts some of the campaign’s messaging. That’s an especially important task for Biden given that many of these characters are prone to draw the scorn of the left, which is already skeptical of Biden and wants to see big campaign contributors play a smaller role in politics.

And the Trump campaign hasn’t been in much of a place to argue for transparency. Trump hasn’t released any information about his own bundlers at all.

So there’s been limited scrutiny. The upshot of that is that the 90 million people who have already cast ballots ended up voting with very limited information about the people who helped the campaigns raise the money that may have influenced those very votes.

The debate over bundler disclosure reflects a key campaign question of the Trump era: Should Trump’s own tactics set the standard for his Democratic rivals? Or should Democrats — who claim to prioritize reducing the role of money in politics — aspire to a higher, or at least the pre-Trump, standard?

Campaigns are only legally required to disclose bundlers who are registered lobbyists — everything else is voluntary. Trump and his most immediate GOP predecessor at the top of the ticket, Mitt Romney, declined to share any additional information. But prior to their campaigns, there had been a bipartisan tradition of at least offering some information in order to help voters understand who carried unofficial influence in their campaign; that was done by both John McCain and George W. Bush, who pioneered the modern bundling system and made being a bundler into something of a bragging right.

Bundlers do the often painstaking work of soliciting their networks for high-dollar campaign contributions: inviting their business associates to campaign events, making introductions to campaign staffers, and recruiting more bundlers to serve alongside them. Bundling can often end up be fiercely competitive, with campaigns closely tracking how much individuals have raised and bundlers sometimes finding themselves in competition for positions on leaderboards.

The Biden campaign has six levels of membership for its finance committee: ranging from a “Protector” who helps the campaign raise $50,000 to a “Biden Victory Partner” who brings home $2.5 million, according to a campaign document seen by Recode. Mementos that Biden has sent that top level of bundler include a gold-and-blue pin.

Despite his preference to talk about his low-dollar fundraising operation, Biden has built an impressive big-money machine.



from Vox - Recode https://ift.tt/3eiDf67
via A.I .Kung Fu

Iconic James Bond locations in real life: How to travel like 007 - CNET

Exotic locales are a staple of any Bond film. When you can travel again, here's where to find some amazing sights around the world as seen on screen.

from CNET News https://ift.tt/31YYZiu
via A.I .Kung Fu

Robot Bores: AI-powered awkward first date

Two chatbots meet and put the world to rights online in battle to see who is most human-like.

from BBC News - Technology https://ift.tt/3mIzQki
via A.I .Kung Fu

Profile of Shield AI, which raised money from a16z and others to develop autonomous military drones that scan buildings to help soldiers clear them (Elliott Ackerman/Wired)

Elliott Ackerman / Wired:
Profile of Shield AI, which raised money from a16z and others to develop autonomous military drones that scan buildings to help soldiers clear them  —  On the battlefield, any doorway can be a death trap.  A special ops vet, and his businessman brother, have built an AI to solve that problem.



from Techmeme https://ift.tt/320EYIq
via A.I .Kung Fu

Friday, October 30, 2020

CISA, FBI say an Iran-linked APT targeted unsecured state election websites to harvest US voter info used to send threatening emails to some Democratic voters (Sergiu Gatlan/BleepingComputer)

Sergiu Gatlan / BleepingComputer:
CISA, FBI say an Iran-linked APT targeted unsecured state election websites to harvest US voter info used to send threatening emails to some Democratic voters  —  DHS CISA and the FBI today shared more info on how an Iranian state-sponsored hacking group was able to harvest voter registration info …



from Techmeme https://ift.tt/3eaLL7j
via A.I .Kung Fu

The best soundbar for 2020: Vizio, Sonos, Polk, Yamaha, Roku and more - CNET

Let's face it, TV audio quality sucks. A soundbar can be a big improvement for a small price. Here are our favorites.

from CNET News https://ift.tt/34IW53d
via A.I .Kung Fu

How social media is preparing for US election chaos

Social media companies are making plans in case of unrest after election day.

from BBC News - Technology https://ift.tt/3kJG4Qj
via A.I .Kung Fu

Hands-on with the Dash Cart at an Amazon Fresh store: cart sensors worked fine, but a two-bag limit, bag fill limit, and real-world hassles hinder experience (Jeremy Horwitz/VentureBeat)

Jeremy Horwitz / VentureBeat:
Hands-on with the Dash Cart at an Amazon Fresh store: cart sensors worked fine, but a two-bag limit, bag fill limit, and real-world hassles hinder experience  —  There were no lines outside Irvine, California's new Amazon Fresh grocery store on its opening day last week …



from Techmeme https://ift.tt/2HHFSTr
via A.I .Kung Fu

Halloween's here: Weird horror films and TV shows you may have overlooked - CNET

Vengeful witches, angry ghosts, killer cults, evil clowns and body-snatching aliens. Watch these unusual scary films and TV shows, screaming (er, streaming) now.

from CNET News https://ift.tt/3oSMkYq
via A.I .Kung Fu

Apple rejects an app designed to verify a person's ballot status in PA, says the app violates its guideline which forbids compiling user data without consent (Mikey Campbell/AppleInsider)

Mikey Campbell / AppleInsider:
Apple rejects an app designed to verify a person's ballot status in PA, says the app violates its guideline which forbids compiling user data without consent  —  Apple on Friday rejected an app designed to ensure ballots are being correctly counted in Pennsylvania, saying the software violates App Store privacy guidelines.



from Techmeme https://ift.tt/3kIQcsx
via A.I .Kung Fu

Deploying and using the Document Understanding Solution

Based on our day to day experience, the information we consume is entirely digital. We read the news on our mobile devices far more than we do from printed copy newspapers. Tickets for sporting events, music concerts, and airline travel are stored in apps on our phones. One could go weeks or longer without needing to have any paper currency in his or her wallet, as digital payments are ubiquitous. However, many companies across different industries still primarily operate on manual, paper-based processes. For example, healthcare payors, construction companies, and law firms deal with billions of documents and forms, making the process of finding information difficult and time-consuming. When documents are found, extracting information through manual data entry can be slow, expensive, and error prone, resulting in increases in compliance risks. Furthermore, domain experts need to identify and categorize domain-specific phrases and keywords (or entities), or use traditional Optical Character Recognition (OCR) and keyword detection software that requires manual customization. These approaches can create scrambled output and unusable results. AWS AI services such as Amazon Kendra, Amazon Textract, Amazon Comprehend, and Amazon Comprehend Medical help solve these challenges by automating data extraction and comprehension using machine learning (ML).

Overview of the Document Understanding Solution

The Document Understanding Solution (DUS) allows you to use the power of AWS AI for enterprise search, document digitization, discovery, and extraction and redaction of select information. Part of the Intelligent Document Processing services offered from AWS, this solution uses AWS artificial intelligence (AI) services to solve business problems.

Search and discovery

These challenges exist in almost every business vertical. Imagine a manufacturer that has to maintain archives of thousands – if not millions – of product and tool specifications. Without document digitization of archives, there could be massive underutilization of their highly valuable tool data and information retrieval could be complex and costly. In another example, a company in the financial industry could have 1000s of financial reports in paper format. Without a simple way to extract and digitize this data, it could take an extensive manual effort to keypunch it.

To help with these situations, DUS leverages multiple ML services, including Amazon Textract. Amazon Textract is a fully managed machine learning service that automatically extracts text and data from scanned documents that goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables. Amazon Textract will move the data from the documents to a format that can be readily searched. Next, Amazon Kendra and Amazon Elasticsearch Service (Amazon ES) are available to provide the end user search experience in DUS. Amazon Kendra is an intelligent search service powered by machine learning. Amazon Kendra uses ML to obtain better results for natural language questions, and will return an exact answer from within a document, whether that is a text snippet, FAQ, or a PDF document. In addition to Amazon Kendra, the DUS provides a rich search experience to the user through the use of Amazon Elasticsearch Service. Amazon Elasticsearch Service is a fully managed service that makes it easy for you to deploy, secure, and run Elasticsearch cost effectively at scale.

Control and compliance

In addition to search, the ability to analyze documents at scale is essential. Amazon Textract extracts text from documents, which can then be input into Amazon Comprehend or Amazon Comprehend Medical. Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. It can identify key phrases and entities, such as places, people, and brands. Amazon Comprehend Medical is similar to Comprehend. It is a natural language processing service that makes it easy to use machine learning to extract relevant medical information from unstructured text. It can identify medical entities, such as medical conditions and medications.

Identifying these key pieces of information allows for compliance controls through redaction. For example, an insurer could use this solution to feed a workflow that automatically redacts personally identifiable information (PII) or protected health information (PHI) for their review before archiving claim forms by automatically recognizing the important key-value pairs and entities that require protection.

Other industries can also use this solution for complying with regulatory standards, such as GDPR and HIPAA. For example, this solution could be used by a law firm to redact PII, organization names or brand names. Another example includes a security agency needing to redact all vital information such as names, locations and/or dates from a case file for data security or privacy concerns.

Workflow automation

The DUS solution delivers results at scale in production workflows. Organizations can more rapidly process documents such as insurance claims and forms, and seamlessly extract tables from PDFs into CSVs to conduct additional analysis. With detection and categorization of medical entities and ICD-10-CM ontologies, medical institutes can recognize exponential savings in workforce, time, and other resources that are spent identifying and classifying patient information. All the data is stored by the solution in easily accessible formats, such as CSV and JSON files, which can be fed into downstream pipelines. Additionally, the bulk processing feature in DUS allows you to import a large number of documents directly for processing and analysis.

The following diagram illustrates the DUS architecture.


Deploying DUS

For instructions on setting up DUS, see Document Understanding Solution on AWS Solutions.

Deploying DUS sets up a web application that you can use for document understanding. The deployment includes setting up infrastructure in your AWS account and pre-loading sample documents.

Using DUS

Once you have successfully completed deploying the DUS demo, you are then provided with instructions on how to login into the application. After logging in, you are directed to the homepage, as seen below. You have three options which cover the common use-cases in document understanding solution: Discovery, Compliance, and Workflow Automation.

When you select the Discovery track you will be directed to the preloaded documents page or the Document List page. You may select one of the preloaded sample documents or upload your own document. From here, you can search for a specific document by using a phrase or keyword.

If you decide to upload your own document, choose upload your own documents above the available documents. You will then be directed to a new page to upload your own documents. This page also has sample documents from different industry verticals for you to experiment with.


Back on the Document List page, you will find some PDF and image files. Text in these documents are not actually tagged or available to use by default. However, since these documents have been processed by the solution, you will now be able to search for information within these documents. If you decide to search for a specific phrase or keyword in the search bar, then the solution will analyze the text it has extracted from the documents and provide you with search results. The search results can be displayed in three different ways; a comparative view of Amazon ES (traditional search) and Amazon Kendra (semantic search), just Amazon ES or just Amazon Kendra .

For Amazon Kendra results, you also have the option to provide feedback by either up-voting or down-voting an Amazon Kendra suggested answer.

Amazon Kendra also supports filtering based on user context. Under the Amazon Kendra results view, you can filter results based on the users for the preloaded documents. Click the Filter button to the right of the Amazon Kendra Results title. You can then select a persona and one of the suggested questions to display filtered results. Amazon Kendra will then rank results based on the selected persona. You can toggle between the various personas to compare how the results differ. For demonstration purposes, the Document Understanding Solution comes with preloaded documents and personas from the medical industry. You will be able to notice that based on the question and persona selected, results are ranked differently creating a more targeted search experience for the user.

From the Document List search results view, you can select a document that you want to further explore. This will direct you to the Document Details page. See the following image.

The following image shows the tool bar above the search bar, where you can choose to see different types of information from the document.


The tabs have the following functions:

  • Preview – Under this tab, you are able to view the original document as well as download a searchable PDF version of the document. This helps users to convert their documents – be it images or PDFs into easily searchable PDF files.
  • Raw Text – Under this tab, you can access all the text identified in the file.
  • Key-Value Pairs – Under this tab, key-value pairs from the document are highlighted. In this process, all forms in the document are identified and stored in a key-value pair format. If desired, you can download a CSV file of the key-value pairs. This is especially useful for organizations that have structured data and would want to automate their data extraction and storage workflows. For example, organizations that have a lot of forms like job applications or medical patient forms.
  • Tables – Under this tab, you can view all the tables identified in the document. Like the key-value pairs, you can download the tables in the CSV format. Companies dealing with balance sheets or with invoices would find this feature extremely useful since it allows users to easily convert tables, images and PDFs into CSV files which can then be used for further analysis.
  • Entities and Medical Entities – Under these tabs, you can find the general and medical entities in the document respectively. These entities include persons, locations, dates, PHI and medical information which helps organization to easily identify and extract critical medical data in a document.

For exploring redaction controls, choose the Compliance option on the toolbar. Here you can choose to redact information like key-value pairs, entities, medical entities or even keyword matches by switching to the respective tabs on the tool bar and choosing Redact. One example of how this feature may be useful is to consider a clinic that wants to redact PHI information before they decide to share medical records. Another example is an organization that wants to redact specific information identified as keylue pairs in forms present in their documents. As seen in the following image, you can redact information, download the redacted document and even clear redactions after use.

In terms of Workflow Automation, the Document Understanding Solution also provides some input and output capabilities via the AWS Console which makes it easier to integrate DUS into an existing pipeline. DUS supports a bulk document processing mode, in which you can simply input documents into an Amazon Simple Storage Service (Amazon S3) bucket which will be asynchronously analyzed and made available in the application. More information on bulk processing is available on the AWS Solutions Implementation Guide. Results from the different AWS AI services are all stored within Amazon S3 buckets and the corresponding metadata is available in Amazon DynamoDB tables. This helps users of the solution to build downstream pipelines from these datastores that hold the document analysis data.

Summary:

This post reviewed how you can integrate Amazon Textract, Amazon Comprehend, Amazon Comprehend Medical, and Amazon Kendra to conduct enterprise search, document digitization, document discovery, and extraction and redaction of select information.

To access the DUS source code, see Document Understanding Solution on GitHub. This solution has been made open source so that you can extend and incorporate the solution into your AWS workflows.


About the Authors

Simran Baxendale is a Program Manager in the Amazon Machine Learning Solutions Lab. She helps define, coordinate and execute program strategy for the demos applications team.

 

 

 

 

Curtis Bray is a manager in the Amazon Machine Learning Solutions Lab. He leads the demos applications team that focuses on building use case based demos that show customers how to unlock the power of AWS AI/ML services to solve real world business problems.

 

 

 

 

Alex Chirayath is an SDE in the Amazon Machine Learning Solutions Lab. He helps customers adopt AWS AI services by building solutions to address common business problems.



from AWS Machine Learning Blog https://ift.tt/3oHLhue
via A.I .Kung Fu

The best beer subscription boxes - CNET

Monthly deliveries of rare beers, hoppy beers, brewery merch and more.

from CNET News https://ift.tt/3oHKYQ6
via A.I .Kung Fu

Twitter is no longer restricting the New York Post's account - CNET

The social media platform says it'll retroactively apply its changed policy on hacked materials.

from CNET News https://ift.tt/34I09k6
via A.I .Kung Fu

The best gifts to get healthy in 2021 - CNET

These gifts are sure to please your fitness-minded friends and family.

from CNET News https://ift.tt/3kUkyYy
via A.I .Kung Fu

Best gaming chair to seat yourself in for 2020 - CNET

We tried gaming chairs from DXRacer, Secretlab and more to help you find the size and style that's right for you.

from CNET News https://ift.tt/31Zslxh
via A.I .Kung Fu

Best prepared-meal delivery services for 2020: Home Chef, Daily Harvest, Veestro and more - CNET

No-fuss prepared meals and oven-ready meal kits to save you time and stress.

from CNET News https://ift.tt/2TLxd4p
via A.I .Kung Fu

Trump, Biden campaigns slam Facebook after 'technical issues' impact political ads - CNET

The social network stopped accepting new US political ads on Oct. 27, a week before Election Day.

from CNET News https://ift.tt/2HIWRom
via A.I .Kung Fu

Gifts that give back - CNET

This year -- especially this year -- consider buying presents that help support charitable causes.

from CNET News https://ift.tt/37YyHku
via A.I .Kung Fu

Deploying and using the Document Understanding Solution

Based on our day to day experience, the information we consume is entirely digital. We read the news on our mobile devices far more than we do from printed copy newspapers. Tickets for sporting events, music concerts, and airline travel are stored in apps on our phones. One could go weeks or longer without needing to have any paper currency in his or her wallet, as digital payments are ubiquitous. However, many companies across different industries still primarily operate on manual, paper-based processes. For example, healthcare payors, construction companies, and law firms deal with billions of documents and forms, making the process of finding information difficult and time-consuming. When documents are found, extracting information through manual data entry can be slow, expensive, and error prone, resulting in increases in compliance risks. Furthermore, domain experts need to identify and categorize domain-specific phrases and keywords (or entities), or use traditional Optical Character Recognition (OCR) and keyword detection software that requires manual customization. These approaches can create scrambled output and unusable results. AWS AI services such as Amazon Kendra, Amazon Textract, Amazon Comprehend, and Amazon Comprehend Medical help solve these challenges by automating data extraction and comprehension using machine learning (ML).

Overview of the Document Understanding Solution

The Document Understanding Solution (DUS) allows you to use the power of AWS AI for enterprise search, document digitization, discovery, and extraction and redaction of select information. Part of the Intelligent Document Processing services offered from AWS, this solution uses AWS artificial intelligence (AI) services to solve business problems.

Search and discovery

These challenges exist in almost every business vertical. Imagine a manufacturer that has to maintain archives of thousands – if not millions – of product and tool specifications. Without document digitization of archives, there could be massive underutilization of their highly valuable tool data and information retrieval could be complex and costly. In another example, a company in the financial industry could have 1000s of financial reports in paper format. Without a simple way to extract and digitize this data, it could take an extensive manual effort to keypunch it.

To help with these situations, DUS leverages multiple ML services, including Amazon Textract. Amazon Textract is a fully managed machine learning service that automatically extracts text and data from scanned documents that goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables. Amazon Textract will move the data from the documents to a format that can be readily searched. Next, Amazon Kendra and Amazon Elasticsearch Service (Amazon ES) are available to provide the end user search experience in DUS. Amazon Kendra is an intelligent search service powered by machine learning. Amazon Kendra uses ML to obtain better results for natural language questions, and will return an exact answer from within a document, whether that is a text snippet, FAQ, or a PDF document. In addition to Amazon Kendra, the DUS provides a rich search experience to the user through the use of Amazon Elasticsearch Service. Amazon Elasticsearch Service is a fully managed service that makes it easy for you to deploy, secure, and run Elasticsearch cost effectively at scale.

Control and compliance

In addition to search, the ability to analyze documents at scale is essential. Amazon Textract extracts text from documents, which can then be input into Amazon Comprehend or Amazon Comprehend Medical. Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. It can identify key phrases and entities, such as places, people, and brands. Amazon Comprehend Medical is similar to Comprehend. It is a natural language processing service that makes it easy to use machine learning to extract relevant medical information from unstructured text. It can identify medical entities, such as medical conditions and medications.

Identifying these key pieces of information allows for compliance controls through redaction. For example, an insurer could use this solution to feed a workflow that automatically redacts personally identifiable information (PII) or protected health information (PHI) for their review before archiving claim forms by automatically recognizing the important key-value pairs and entities that require protection.

Other industries can also use this solution for complying with regulatory standards, such as GDPR and HIPAA. For example, this solution could be used by a law firm to redact PII, organization names or brand names. Another example includes a security agency needing to redact all vital information such as names, locations and/or dates from a case file for data security or privacy concerns.

Workflow automation

The DUS solution delivers results at scale in production workflows. Organizations can more rapidly process documents such as insurance claims and forms, and seamlessly extract tables from PDFs into CSVs to conduct additional analysis. With detection and categorization of medical entities and ICD-10-CM ontologies, medical institutes can recognize exponential savings in workforce, time, and other resources that are spent identifying and classifying patient information. All the data is stored by the solution in easily accessible formats, such as CSV and JSON files, which can be fed into downstream pipelines. Additionally, the bulk processing feature in DUS allows you to import a large number of documents directly for processing and analysis.

The following diagram illustrates the DUS architecture.


Deploying DUS

For instructions on setting up DUS, see Document Understanding Solution on AWS Solutions.

Deploying DUS sets up a web application that you can use for document understanding. The deployment includes setting up infrastructure in your AWS account and pre-loading sample documents.

Using DUS

Once you have successfully completed deploying the DUS demo, you are then provided with instructions on how to login into the application. After logging in, you are directed to the homepage, as seen below. You have three options which cover the common use-cases in document understanding solution: Discovery, Compliance, and Workflow Automation.

When you select the Discovery track you will be directed to the preloaded documents page or the Document List page. You may select one of the preloaded sample documents or upload your own document. From here, you can search for a specific document by using a phrase or keyword.

If you decide to upload your own document, choose upload your own documents above the available documents. You will then be directed to a new page to upload your own documents. This page also has sample documents from different industry verticals for you to experiment with.


Back on the Document List page, you will find some PDF and image files. Text in these documents are not actually tagged or available to use by default. However, since these documents have been processed by the solution, you will now be able to search for information within these documents. If you decide to search for a specific phrase or keyword in the search bar, then the solution will analyze the text it has extracted from the documents and provide you with search results. The search results can be displayed in three different ways; a comparative view of Amazon ES (traditional search) and Amazon Kendra (semantic search), just Amazon ES or just Amazon Kendra .

For Amazon Kendra results, you also have the option to provide feedback by either up-voting or down-voting an Amazon Kendra suggested answer.

Amazon Kendra also supports filtering based on user context. Under the Amazon Kendra results view, you can filter results based on the users for the preloaded documents. Click the Filter button to the right of the Amazon Kendra Results title. You can then select a persona and one of the suggested questions to display filtered results. Amazon Kendra will then rank results based on the selected persona. You can toggle between the various personas to compare how the results differ. For demonstration purposes, the Document Understanding Solution comes with preloaded documents and personas from the medical industry. You will be able to notice that based on the question and persona selected, results are ranked differently creating a more targeted search experience for the user.

From the Document List search results view, you can select a document that you want to further explore. This will direct you to the Document Details page. See the following image.

The following image shows the tool bar above the search bar, where you can choose to see different types of information from the document.


The tabs have the following functions:

  • Preview – Under this tab, you are able to view the original document as well as download a searchable PDF version of the document. This helps users to convert their documents – be it images or PDFs into easily searchable PDF files.
  • Raw Text – Under this tab, you can access all the text identified in the file.
  • Key-Value Pairs – Under this tab, key-value pairs from the document are highlighted. In this process, all forms in the document are identified and stored in a key-value pair format. If desired, you can download a CSV file of the key-value pairs. This is especially useful for organizations that have structured data and would want to automate their data extraction and storage workflows. For example, organizations that have a lot of forms like job applications or medical patient forms.
  • Tables – Under this tab, you can view all the tables identified in the document. Like the key-value pairs, you can download the tables in the CSV format. Companies dealing with balance sheets or with invoices would find this feature extremely useful since it allows users to easily convert tables, images and PDFs into CSV files which can then be used for further analysis.
  • Entities and Medical Entities – Under these tabs, you can find the general and medical entities in the document respectively. These entities include persons, locations, dates, PHI and medical information which helps organization to easily identify and extract critical medical data in a document.

For exploring redaction controls, choose the Compliance option on the toolbar. Here you can choose to redact information like key-value pairs, entities, medical entities or even keyword matches by switching to the respective tabs on the tool bar and choosing Redact. One example of how this feature may be useful is to consider a clinic that wants to redact PHI information before they decide to share medical records. Another example is an organization that wants to redact specific information identified as keylue pairs in forms present in their documents. As seen in the following image, you can redact information, download the redacted document and even clear redactions after use.

In terms of Workflow Automation, the Document Understanding Solution also provides some input and output capabilities via the AWS Console which makes it easier to integrate DUS into an existing pipeline. DUS supports a bulk document processing mode, in which you can simply input documents into an Amazon Simple Storage Service (Amazon S3) bucket which will be asynchronously analyzed and made available in the application. More information on bulk processing is available on the AWS Solutions Implementation Guide. Results from the different AWS AI services are all stored within Amazon S3 buckets and the corresponding metadata is available in Amazon DynamoDB tables. This helps users of the solution to build downstream pipelines from these datastores that hold the document analysis data.

Summary:

This post reviewed how you can integrate Amazon Textract, Amazon Comprehend, Amazon Comprehend Medical, and Amazon Kendra to conduct enterprise search, document digitization, document discovery, and extraction and redaction of select information.

To access the DUS source code, see Document Understanding Solution on GitHub. This solution has been made open source so that you can extend and incorporate the solution into your AWS workflows.


About the Authors

Simran Baxendale is a Program Manager in the Amazon Machine Learning Solutions Lab. She helps define, coordinate and execute program strategy for the demos applications team.

 

 

 

 

Curtis Bray is a manager in the Amazon Machine Learning Solutions Lab. He leads the demos applications team that focuses on building use case based demos that show customers how to unlock the power of AWS AI/ML services to solve real world business problems.

 

 

 

 

Alex Chirayath is an SDE in the Amazon Machine Learning Solutions Lab. He helps customers adopt AWS AI services by building solutions to address common business problems.



from AWS Machine Learning Blog https://ift.tt/3oHLhue
via A.I .Kung Fu