Using NLP Library from Nividous RPA

The NLP Library provides various actions that can be used with Smart Bot NLP module. You can do the inferencing using the pre-trained or custom NLP models available in the Smart Bot server using the actions available in the NLP Library from the Nividous Studio. It also provides some actions which are required for dataset preparation for NLP tasks. The NLP library provides following actions:

Classify: To classify the given text/image into predefined categories on which the classifier model is trained.
Recognize Entities: To extract the pre-defined entities from the given text using the entity extraction model. The model can return the predefined entities on which it is trained.
Prediction Data: This action is used for Predictive Analysis. Forecasting the class and confidence score based on the given input parameters and values.
Get ROI Text: To extract the given text from a pdf or image using the ROIs provided by CV template creation API.
Fuzzy Search: To verify that the given list of keywords is present in the text or not. The output will be an accuracy score between 0 and 100.

Pre-requisite for using NLP library:

The pre-requisite to use the NLP library for accessing NLP methods:

Add the configuration file application.cfg in the <PROJECT_DIR>/resources/ folder of the project.
The application.cfg file must consist of the following parameters for connecting the Smart Bot server with the Nividous Studio:
[RPA]
com.nividous.smartbot.user<<Username>>
com.nividous.smartbot.password= <<Encrypted password>>
com.nividous.smartbot.url=http://<<Your Smart Bot IP address: <Port Number>>/api
The Smart Bot must be up and running.
Add NLP from the Configure Libraries option in the process.

Library Actions:

Classify:

To classify text/image using the pre-trained or custom classifiers available in the Smart Bot:

On the Action field, press Ctrl + Space and select Classify from the list.
Enter the following parameters in the Input pane:
1. Classifier Name - Model Name
2. Text - The text for recognition. If Text field is kept blank, then it is mandatory to provide Path field.
3. Path - Image or PDF path from which text/image needs to be classified. If Path is kept blank then, it is mandatory to provide Text field.
4. Include Data ( By default False) - Enter True to include text data in the response.
5. Classification ( By default NLP) - Enter either 'NLP', 'keyword', or 'image'.
6. Timeout ( By default 300) - Maximum time in seconds to wait before returning timeout error.
7. Metadata ( By default None) - User can pass additional information which will be returned as it is in the response. This field is used to pass additional information which can be used in post processing (adapter) if required.
This action will return the Json output returned by the NLP classification API. Store the result of the action in an output variable.
Batch Classification: It also supports batch classification. You can pass the list of string values . The batch prediction support is for NLP and keyword classification.

In this example, the pre-trained model SpamSMSClassifier is used for classifying the text value, and the result of the classification is stored in an output variable.
To view the output of the process in the log file, you can use the Log action from the BuiltIn library.

Recognize Entity:

To recognize entities from the Entity Recognizer pre-trained or custom models available in the Smart Bot.

On the Action field, press Ctrl + Space and select Recognize Entities from the list.
Enter the following parameters in the Input pane:
1. Recognizer Name - Model Name.
2. Text - The input text from which entities needs to be extracted. For Images/PDF first use the GET ROI Text action to get the PDF/Image text.
3. ROI Json (optional) - The ROI Json list for the PDF/Image from which text is extracted.
4. Entities (optional) - You can provide list of entities to be extracted. If you keep this parameter blank, the action will return all the entities which are extracted by the model.
5. Timeout (By default 300) - Maximum time in seconds to wait before returning timeout error.
6. Metadata - ( Optional, default None) - User can pass additional information which will be returned as it is in the response. This field is used to pass additional information which can be used in post processing (adapter) if required.
This action will return the Json output returned by the NLP Entity Recognition API. Store the result of the action in an output variable.
It also supports batch entity recognition. You can pass the list of string values .
In this example, the pre-trained model Generic is used, and the result is stored in an output variable.
To view the output of the process in the log file, you can use the Log action from the BuiltIn library.

Prediction data:

To predict data from the Predictive Analysis pre-trained or custom models available in the Smart Bot.

On the Action field, press Ctrl + Space and select Prediction Data from the list.
Enter the following parameters in the Input pane:
1. Predictor Name - The name of the model.
2. Data Array - The key value pairs seperated by comma. The key-value pair represents the corresponding input parameter and its value. The name and the order of input variables should be same as the training dataset, else it will give an error. The syntax for adding key pair values is:
  "<<Key_Name> >":"<<Key_Value>>"
3. Timeout (By default 300) - Maximum time in seconds to wait before returning timeout error.
4. Metadata (Optional, default None) - User can pass additional information which will be returned as it is in the response. This field is used to pass additional information which can be used in post processing (adapter) if required.
This action will return the Json output returned by the NLP Predictive Analysis API. Store the result of the action in an output variable. In this example, the pre-trained model InsuranceClaimPrediction is used, and the result of the data prediction is stored in an output variable.
To view the output of the process in the log file, you can use the Log action from the BuiltIn library.

Get ROI Text

This action internally uses the CV Template creation API which provides the ROI Json for given Image or PDF. This action will sort this ROI from left to right and top to bottom and concatenate the text from the sorted ROIS. The final text is returned by this action along with sorted ROI Json array. This action gives better contextual text for any given document and therefore this action should be used to obtain the text for creating dataset for NLP Named Entity Recognition.

On the Action field, press Ctrl + Space and select Get ROI Text from the list.
Enter the following parameters in the Input pane:
1. Filepath - Path of PDF or Image.
2. Pages - Specify page numbers or page range from which text needs to be extracted.
3. Password - Enter the password when the document is protected.
4. Delete Transaction (By default True) - Select False if you want to keep the transactions on the Smart Bot server after completion of the process.
5. RoiModelName - Name of the custom model trained for ROI extraction. This is optional parameter. When kept blank, it will use the default ROI model.
6. Metadata - (Optional, default None) - User can pass additional information which will be returned as it is in the response. This field is used to pass additional information which can be used in post processing (adapter) if required.
This will return a tuple with three elements: (List of page-wise text, number of pages, list of page-wise sorted ROI). Store the result of the action in an output variable.
In this example, the filepath for an image/PDF is given, and the result of extracted data is stored in an output variable.
To view the output of the process in the log file, you can use the Log action from the BuiltIn library.

Fuzzy Search

To verify that the list of keywords is present or not in the input text. This action will return the percentage of matching for given list of keywords. The accuracy percentage value will be between 0 and 100. 0 indicates no match, 100 indicates exact match and any value between 0 to 100 indicates partial match; higher the accuracy score larger the probability of list of keywords exists in the given input text

To verify the keyword:

On the Action field, press Ctrl + Space and select Fuzzy Search from the list.
Enter the following parameters in the Input pane:
1. Text - String data which needs to be verify against keywords.
2. Keywords: User can pass multi-list (list of lists). The outer list represents AND operation and inner list represent the OR operation. For example, [['Keyword1', 'Keyword2'], ['Keyword3', 'Keyword4']] will be interpreted as that the text should contain at least two keywords one from 'Keyword1' OR 'Keyword2' AND the other from 'Keyword3' OR 'Keyword4'.

Store the result of the action in an output variable.

In this example, the list of keywords are provided with the keyword which needs to be matched, and the result of keyword with its accuracy score is stored in an output variable.
To view the output of the process in the log file, you can use the Log action from the BuiltIn library.