Architecture Checker (Experimental)

Architecture checker is a tool made for models running with HTP backend, including quantized 8-bit, quantized 16-bit and FP16 models. It outputs a list of issues in the model that keep the model from getting better performance while running on the HTP backend. Architecture checker tool can be invoked with the modifier feature which will apply the recommended modifications for these issues. This will help in visualizing the changes that can be applied to the model to make it a better fit on the HTP backend.

X86-Linux/ WSL usage: snpe-architecture-checker [-h] -i INPUT_DLC [-o OUTPUT_PATH] [-m MODIFY]

X86-Windows/ Windows on Snapdragon usage: python snpe-architecture-checker [-h] -i INPUT_DLC [-o OUTPUT_PATH] [-m MODIFY]

required arguments:
  -i INPUT_DLC, --input_dlc INPUT_DLC
                        Path to a DLC file

optional arguments:
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Path where the output csv should be saved. If not
                        specified, the output csv will be written to the same
                        path as the input dlc file
  -m MODIFY, --modify MODIFY
                     The query to select the modifications to apply.
                     --modify or --modify show - To see all the possible modifications. Display list of rule names and details of the modifications.
                     --modify all - To apply all the possible modifications found for the model.
                     --modify apply=rule_name1,rule_name2 - To apply modifications for specified rule names. The list of rules should be comma separated without spaces
Note:
SNPE_ROOT environment variable must be configured before running the tool.

The output is a csv file and will be saved as <OUTPUT_PATH>_architecture_checker.csv.

model_architecture_checker.csv

Graph / Layer name

Issue

Recommendation

Type

Input_tensor_name:[dims]

Output_tensor_name:[dims]

Parameters

Previous layer

Next layers

Modification

Modification_info

1

Graph

This model uses 16-bit activation data. 16-bit activation data takes twice the amount of memory than 8-bit activation data does.

Try to use a smaller datatype to get better performance. E.g., 8-bit

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

2

Layer_name_1

The number of channels in the input/output tensor of this convolution is low (smaller than 32).

Try increasing the number of channels in the input/output tensor to 32 or greater to get better performance.

Conv2d

input_1:[1, 250, 250, 3], __param_ 1:[5, 5, 3, 32], convolution_0_bias:[32]

output_1:[1, 123, 123, 32]

{‘dilation’:’[1, 1]’, ‘group’:1, ‘pad_amount’:’[[1, 1], [1, 1]]’, ‘stride’: ‘[1, 1]’}

[‘previous_layer_name’]

[‘next_layer_name1’, ‘next_layer_name2’]

N/A

N/A

How to read the example output csv?
Row 1: This is an issue on the graph, the graph is using 16-bit activation data, as said in the recommendation, changing the activation from 16 bit to 8 bit gives better performance.
Row 2: The issue is on the layer with name “Layer_name_1”. This layer has three inputs: input_1, __param_1 and convolution_0_bias where the dimensions are [1, 250, 250, 3], [5, 5, 3, 32] and [32] respectively. This layer has one output with tensor name output_1 and the dimension of this tensor is [1, 123, 123, 32]. The issue for this layer is the channel of the input tensor is low, as the channel is smaller than 32, would recommend to increase the channel to at least 32 to get better performance on HTP backend. Currently the input dimension is [1, 250, 250, 3] and ideally have that to be [1, x, x, 32]. The Modification and Modification_info columns provide details about the modifications applied to the layer. If the Architecture Checker is not invoked with modifier or if there aren’t any modifications applicable, then these value will be N/A.
Is the layer/tensor name the same in the original model?
It might not be the same but should be similar. The input tensor, output tensor, parameters and previous/next layers are avaliable in the output csv file to help locate the correct node inside the original model.

Sample Command

snpe-architecture-checker --input_dlc ./model.dlc
               --output_path ./archCheckerOutput

Architecture Checker - Model Modifier

For appying modifications to the model, the Architecture Checker can be invoked with “–modify” or “–modify show” which will display a list of possible modifications. In this case, the Architecture Checker tool will only show the rule names and modification detail. It will run without making any changes to the model and generate the csv output. Using the rule names from the above run, the Architecture Checker can be invoked with “–modify all” or “–modify apply=rule_name1,rule_name2”. In this case, the rule specific changes will be applied to the model and the changes can be viewed in the updated model. Additionally, the output csv will also contain information related to the modifications.

Consider the below csv output generated after applying “–modify apply=elwisediv” modification on an example model.

model_architecture_checker.csv

Graph / Layer name

Issue

Recommendation

Type

Input_tensor_name:[dims]

Output_tensor_name:[dims]

Parameters

Previous layer

Next layers

Modification

Modification_info

1

Layer_name_1

ElementWiseDivide usually has poor performance compared to ElementWiseMultiply

Try replacing ElementWiseDivide with ElementWiseMultiply using the reciprocal value to get better performance

Eltwise_Binary

input_1:[1, 52, 52, 6], input_2:[1]

output_1:[1, 52, 52, 6]

{‘eltwise_type’: ‘ElementWiseDivide’}

[‘previous_layer_name’]

[‘next_layer_name1’, ‘next_layer_name2’]

Done

ElementWiseDivide has been replaced by ElementWiseMultiply using the reciprocal value

2

Layer_name_2

The number of channels in the input/output tensor of this convolution is low (smaller than 32).

Try increasing the number of channels in the input/output tensor to 32 or greater to get better performance.

Conv2d

input_3:[1, 250, 250, 3], __param_1:[5, 5, 3, 32], convolution_1_bias:[32]

output_2:[1, 123, 123, 32]

{‘dilation’:’[1, 1]’, ‘group’:1, ‘pad_amount’:’[[1, 1], [1, 1]]’, ‘stride’: ‘[1, 1]’}

[‘previous_layer_name’]

[‘next_layer_name1’, ‘next_layer_name2’]

N/A

N/A

How to read the example output csv?
Row 1: The issue is on the layer with name “Layer_name_1”. It has element wise divide which gives a poor performance as compared to elementwise multipy. After invoking architecture checker with “–modify apply=elwisediv”, the modifications have been successfully applied i.e. the element wise divide is replaced by element wise multiply with a reciprocal value. This information is available in the Modification and Modification_info columns.
Row 2: The issue is on the layer with name “Layer_name_2”. The number of channels of the input tensor is less than 32. Its recommended to increase the number of channels to 32 or greater for better performance. For this issue, the modification through the tool is not applicable hence the Modification and Modification_info columns are N/A.
After modifying the model, the above run will generate updated model along with the csv output. Running the Architecture Checker on the updated model will no longer show the element wise divide issue on Layer_name_1.

Following are the commands to invoke Architecture Checker with Modifier to display list of modifications:

Sample Command

snpe-architecture-checker --input_dlc ./model.dlc
               --output_path ./archCheckerOutput
               --modify

Sample Command

snpe-architecture-checker --input_dlc ./model.dlc
               --output_path ./archCheckerOutput
               --modify show

Following are the commands to apply the modifications either on all possible modifications or specific rules:

Sample Command

snpe-architecture-checker --input_dlc ./model.dlc
               --output_path ./archCheckerOutput
               --modify all

Sample Command

snpe-architecture-checker --input_dlc ./model.dlc
               --output_path ./archCheckerOutput
               --modify apply=prelu,elwisediv
Note:
The Architecture Checker with modifier is an enchancement to help visualize the changes that can be applied on the model to better fit it on the HTP. To see the actual performance improvements, the model may require retraining/redesigning.