GPU Kernel Information
layer_index | layer_name | layer_type | layer_shape | layer_duration (us) | layer_allocated_bytes | layer_peak_allocated_bytes | layer_allocator_bytes_in_use | layer_allocator_name | layer_host_temp_mem_bytes | layer_device_temp_mem_bytes | layer_host_persistent_mem_bytes | layer_device_persistent_mem_bytes | kernel_name | kernel_duration (us) | kernel_flops | kernel_dram_read_bytes | kernel_dram_write_bytes | kernel_achieved_occupancy (%) | kernel_arithmetic_intensity (flops/byte) | kernel_arithmetic_throughput (GFlops) | kernel_memory_bound |
---|
layer_index | layer_name | layer_type | layer_shape | layer_duration (us) | layer_allocated_bytes | layer_peak_allocated_bytes | layer_allocator_bytes_in_use | layer_allocator_name | layer_host_temp_mem_bytes | layer_device_temp_mem_bytes | layer_host_persistent_mem_bytes | layer_device_persistent_mem_bytes | kernel_name | kernel_duration (us) | kernel_flops | kernel_dram_read_bytes | kernel_dram_write_bytes | kernel_achieved_occupancy (%) | kernel_arithmetic_intensity (flops/byte) | kernel_arithmetic_throughput (GFlops) | kernel_memory_bound |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | vgg_19/conv1/conv1_1/convolution-0-TransposeNHWCToNCHW-LayoutOptimizer | Transpose | [[1 3 224 224]] | 91 | 602112 | 602112 | 575874560 | GPU_0_bfc | 0 | 0 | 0 | 0 | void tensorflow::functor::SwapDimension1And2InTensor3UsingTiles<unsigned int, 1024, 1024, 2, false>(unsigned int const*, tensorflow::functor::Dimension<3>, unsigned int*) | 6.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
2 | vgg_19/conv1/conv1_1/convolution | Conv2D | [[1 64 224 224]] | 269.667 | 12845056 | 13447168 | 588117504 | GPU_0_bfc | 602112 | 0 | 0 | 0 | volta_scudnn_128x32_relu_small_nn_v1 | 40.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
2 | vgg_19/conv1/conv1_1/convolution | Conv2D | [[1 64 224 224]] | 269.667 | 12845056 | 13447168 | 588117504 | GPU_0_bfc | 602112 | 0 | 0 | 0 | void tensorflow::functor::ShuffleInTensor3Simple<float, 2, 1, 0, false>(int, float const*, tensorflow::functor::Dimension<3>, float*) | 4.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
2 | vgg_19/conv1/conv1_1/convolution | Conv2D | [[1 64 224 224]] | 269.667 | 12845056 | 13447168 | 588117504 | GPU_0_bfc | 602112 | 0 | 0 | 0 | cudnn::gemm::computeOffsetsKernel(cudnn::gemm::ComputeOffsetsParams) | 3.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
3 | vgg_19/conv1/conv1_1/BiasAdd | BiasAdd | [[1 64 224 224]] | 86 | 12845056 | 0 | 587515392 | GPU_0_bfc | 0 | 0 | 0 | 0 | void tensorflow::BiasNCHWKernel<float>(int, float const*, float const*, float*, int, int) | 51.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
4 | vgg_19/conv1/conv1_1/Relu | Relu | [[1 64 224 224]] | 74.333 | 12845056 | 0 | 587515392 | GPU_0_bfc | 0 | 0 | 0 | 0 | void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long>(Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long) | 48.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
5 | vgg_19/conv1/conv1_2/convolution | Conv2D | [[1 64 224 224]] | 422 | 12845056 | 13402368 | 600360448 | GPU_0_bfc | 557312 | 0 | 0 | 0 | volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1 | 246.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
5 | vgg_19/conv1/conv1_2/convolution | Conv2D | [[1 64 224 224]] | 422 | 12845056 | 13402368 | 600360448 | GPU_0_bfc | 557312 | 0 | 0 | 0 | void tensorflow::functor::ShuffleInTensor3Simple<float, 2, 1, 0, false>(int, float const*, tensorflow::functor::Dimension<3>, float*) | 4.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
5 | vgg_19/conv1/conv1_2/convolution | Conv2D | [[1 64 224 224]] | 422 | 12845056 | 13402368 | 600360448 | GPU_0_bfc | 557312 | 0 | 0 | 0 | void cudnn::winograd::generateWinogradTilesKernel<0, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>) | 2.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
6 | vgg_19/conv1/conv1_2/BiasAdd | BiasAdd | [[1 64 224 224]] | 79.667 | 12845056 | 0 | 587515392 | GPU_0_bfc | 0 | 0 | 0 | 0 | void tensorflow::BiasNCHWKernel<float>(int, float const*, float const*, float*, int, int) | 48.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
7 | vgg_19/conv1/conv1_2/Relu | Relu | [[1 64 224 224]] | 70.667 | 12845056 | 0 | 587515392 | GPU_0_bfc | 0 | 0 | 0 | 0 | void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long>(Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long) | 47.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
8 | vgg_19/pool1/MaxPool | MaxPool | [[1 64 112 112]] | 91.333 | 3211264 | 3211264 | 590726656 | GPU_0_bfc | 0 | 0 | 0 | 0 | void cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::maxpooling_func<float, (cudnnNanPropagation_t)0>, 0, false>(cudnnTensorStruct, float const*, cudnnTensorStruct, float*, cudnnPoolingStruct, float, float, int, cudnn::reduced_divisor, cudnn::reduced_divisor) | 33.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
9 | vgg_19/conv2/conv2_1/convolution | Conv2D | [[1 128 112 112]] | 285 | 6422528 | 7537152 | 584304128 | GPU_0_bfc | 1114624 | 0 | 0 | 0 | volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1 | 131.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
9 | vgg_19/conv2/conv2_1/convolution | Conv2D | [[1 128 112 112]] | 285 | 6422528 | 7537152 | 584304128 | GPU_0_bfc | 1114624 | 0 | 0 | 0 | void tensorflow::functor::ShuffleInTensor3Simple<float, 2, 1, 0, false>(int, float const*, tensorflow::functor::Dimension<3>, float*) | 5.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
9 | vgg_19/conv2/conv2_1/convolution | Conv2D | [[1 128 112 112]] | 285 | 6422528 | 7537152 | 584304128 | GPU_0_bfc | 1114624 | 0 | 0 | 0 | void cudnn::winograd::generateWinogradTilesKernel<0, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>) | 3.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
10 | vgg_19/conv2/conv2_1/BiasAdd | BiasAdd | [[1 128 112 112]] | 52.333 | 6422528 | 0 | 581092864 | GPU_0_bfc | 0 | 0 | 0 | 0 | void tensorflow::BiasNCHWKernel<float>(int, float const*, float const*, float*, int, int) | 22.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
11 | vgg_19/conv2/conv2_1/Relu | Relu | [[1 128 112 112]] | 39.667 | 6422528 | 0 | 581092864 | GPU_0_bfc | 0 | 0 | 0 | 0 | void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long>(Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long) | 17.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
12 | vgg_19/conv2/conv2_2/convolution | Conv2D | [[1 128 112 112]] | 383 | 6422528 | 9633792 | 587515392 | GPU_0_bfc | 3211264 | 0 | 0 | 0 | volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1 | 231.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
12 | vgg_19/conv2/conv2_2/convolution | Conv2D | [[1 128 112 112]] | 383 | 6422528 | 9633792 | 587515392 | GPU_0_bfc | 3211264 | 0 | 0 | 0 | void tensorflow::functor::ShuffleInTensor3Simple<float, 2, 1, 0, false>(int, float const*, tensorflow::functor::Dimension<3>, float*) | 7.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
12 | vgg_19/conv2/conv2_2/convolution | Conv2D | [[1 128 112 112]] | 383 | 6422528 | 9633792 | 587515392 | GPU_0_bfc | 3211264 | 0 | 0 | 0 | void cudnn::winograd::generateWinogradTilesKernel<0, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>) | 4.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
13 | vgg_19/conv2/conv2_2/BiasAdd | BiasAdd | [[1 128 112 112]] | 52 | 6422528 | 0 | 581092864 | GPU_0_bfc | 0 | 0 | 0 | 0 | void tensorflow::BiasNCHWKernel<float>(int, float const*, float const*, float*, int, int) | 22.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
14 | vgg_19/conv2/conv2_2/Relu | Relu | [[1 128 112 112]] | 39.667 | 6422528 | 0 | 581092864 | GPU_0_bfc | 0 | 0 | 0 | 0 | void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long>(Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long) | 17.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
15 | vgg_19/pool2/MaxPool | MaxPool | [[1 128 56 56]] | 60.333 | 1605632 | 1605632 | 582698496 | GPU_0_bfc | 0 | 0 | 0 | 0 | void cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::maxpooling_func<float, (cudnnNanPropagation_t)0>, 0, false>(cudnnTensorStruct, float const*, cudnnTensorStruct, float*, cudnnPoolingStruct, float, float, int, cudnn::reduced_divisor, cudnn::reduced_divisor) | 12.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
16 | vgg_19/conv3/conv3_1/convolution | Conv2D | [[1 256 56 56]] | 308.333 | 3211264 | 7668736 | 579487232 | GPU_0_bfc | 4457472 | 0 | 0 | 0 | volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1 | 152.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
16 | vgg_19/conv3/conv3_1/convolution | Conv2D | [[1 256 56 56]] | 308.333 | 3211264 | 7668736 | 579487232 | GPU_0_bfc | 4457472 | 0 | 0 | 0 | void tensorflow::functor::ShuffleInTensor3Simple<float, 2, 1, 0, false>(int, float const*, tensorflow::functor::Dimension<3>, float*) | 10.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
16 | vgg_19/conv3/conv3_1/convolution | Conv2D | [[1 256 56 56]] | 308.333 | 3211264 | 7668736 | 579487232 | GPU_0_bfc | 4457472 | 0 | 0 | 0 | void cudnn::winograd::generateWinogradTilesKernel<0, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>) | 5.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
17 | vgg_19/conv3/conv3_1/BiasAdd | BiasAdd | [[1 256 56 56]] | 39.667 | 3211264 | 0 | 577881600 | GPU_0_bfc | 0 | 0 | 0 | 0 | void tensorflow::BiasNCHWKernel<float>(int, float const*, float const*, float*, int, int) | 10.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
18 | vgg_19/conv3/conv3_1/Relu | Relu | [[1 256 56 56]] | 28.667 | 3211264 | 0 | 577881600 | GPU_0_bfc | 0 | 0 | 0 | 0 | void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long>(Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long) | 4.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
19 | vgg_19/conv3/conv3_2/convolution | Conv2D | [[1 256 56 56]] | 443 | 3211264 | 12125184 | 581092864 | GPU_0_bfc | 8913920 | 0 | 0 | 0 | volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1 | 282.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
19 | vgg_19/conv3/conv3_2/convolution | Conv2D | [[1 256 56 56]] | 443 | 3211264 | 12125184 | 581092864 | GPU_0_bfc | 8913920 | 0 | 0 | 0 | void tensorflow::functor::ShuffleInTensor3Simple<float, 2, 1, 0, false>(int, float const*, tensorflow::functor::Dimension<3>, float*) | 18.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
19 | vgg_19/conv3/conv3_2/convolution | Conv2D | [[1 256 56 56]] | 443 | 3211264 | 12125184 | 581092864 | GPU_0_bfc | 8913920 | 0 | 0 | 0 | void cudnn::winograd::generateWinogradTilesKernel<0, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>) | 10.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
20 | vgg_19/conv3/conv3_2/BiasAdd | BiasAdd | [[1 256 56 56]] | 39.667 | 3211264 | 0 | 577881600 | GPU_0_bfc | 0 | 0 | 0 | 0 | void tensorflow::BiasNCHWKernel<float>(int, float const*, float const*, float*, int, int) | 10.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
21 | vgg_19/conv3/conv3_2/Relu | Relu | [[1 256 56 56]] | 27.667 | 3211264 | 0 | 577881600 | GPU_0_bfc | 0 | 0 | 0 | 0 | void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long>(Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long) | 4.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
22 | vgg_19/conv3/conv3_3/convolution | Conv2D | [[1 256 56 56]] | 442.333 | 4816896 | 13730816 | 582698496 | GPU_0_bfc | 8913920 | 0 | 0 | 0 | volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1 | 279.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
22 | vgg_19/conv3/conv3_3/convolution | Conv2D | [[1 256 56 56]] | 442.333 | 4816896 | 13730816 | 582698496 | GPU_0_bfc | 8913920 | 0 | 0 | 0 | void tensorflow::functor::ShuffleInTensor3Simple<float, 2, 1, 0, false>(int, float const*, tensorflow::functor::Dimension<3>, float*) | 18.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
22 | vgg_19/conv3/conv3_3/convolution | Conv2D | [[1 256 56 56]] | 442.333 | 4816896 | 13730816 | 582698496 | GPU_0_bfc | 8913920 | 0 | 0 | 0 | void cudnn::winograd::generateWinogradTilesKernel<0, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>) | 10.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
23 | vgg_19/conv3/conv3_3/BiasAdd | BiasAdd | [[1 256 56 56]] | 40.667 | 4816896 | 0 | 579487232 | GPU_0_bfc | 0 | 0 | 0 | 0 | void tensorflow::BiasNCHWKernel<float>(int, float const*, float const*, float*, int, int) | 11.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
24 | vgg_19/conv3/conv3_3/Relu | Relu | [[1 256 56 56]] | 26.667 | 4816896 | 0 | 579487232 | GPU_0_bfc | 0 | 0 | 0 | 0 | void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long>(Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long) | 4.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
25 | vgg_19/conv3/conv3_4/convolution | Conv2D | [[1 256 56 56]] | 448.667 | 3211264 | 12125184 | 582698496 | GPU_0_bfc | 8913920 | 0 | 0 | 0 | volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1 | 282.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
25 | vgg_19/conv3/conv3_4/convolution | Conv2D | [[1 256 56 56]] | 448.667 | 3211264 | 12125184 | 582698496 | GPU_0_bfc | 8913920 | 0 | 0 | 0 | void tensorflow::functor::ShuffleInTensor3Simple<float, 2, 1, 0, false>(int, float const*, tensorflow::functor::Dimension<3>, float*) | 18.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
25 | vgg_19/conv3/conv3_4/convolution | Conv2D | [[1 256 56 56]] | 448.667 | 3211264 | 12125184 | 582698496 | GPU_0_bfc | 8913920 | 0 | 0 | 0 | void cudnn::winograd::generateWinogradTilesKernel<0, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>) | 9.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
26 | vgg_19/conv3/conv3_4/BiasAdd | BiasAdd | [[1 256 56 56]] | 39.333 | 3211264 | 0 | 577881600 | GPU_0_bfc | 0 | 0 | 0 | 0 | void tensorflow::BiasNCHWKernel<float>(int, float const*, float const*, float*, int, int) | 12.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
27 | vgg_19/conv3/conv3_4/Relu | Relu | [[1 256 56 56]] | 27 | 3211264 | 0 | 577881600 | GPU_0_bfc | 0 | 0 | 0 | 0 | void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long>(Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long) | 4.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
28 | vgg_19/pool3/MaxPool | MaxPool | [[1 256 28 28]] | 58 | 802816 | 802816 | 578684416 | GPU_0_bfc | 0 | 0 | 0 | 0 | void cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::maxpooling_func<float, (cudnnNanPropagation_t)0>, 0, false>(cudnnTensorStruct, float const*, cudnnTensorStruct, float*, cudnnPoolingStruct, float, float, int, cudnn::reduced_divisor, cudnn::reduced_divisor) | 6.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
29 | vgg_19/conv4/conv4_1/convolution | Conv2D | [[1 512 28 28]] | 342.667 | 1605632 | 19433472 | 577078784 | GPU_0_bfc | 17827840 | 0 | 0 | 0 | volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1 | 143.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
29 | vgg_19/conv4/conv4_1/convolution | Conv2D | [[1 512 28 28]] | 342.667 | 1605632 | 19433472 | 577078784 | GPU_0_bfc | 17827840 | 0 | 0 | 0 | void tensorflow::functor::ShuffleInTensor3Simple<float, 2, 1, 0, false>(int, float const*, tensorflow::functor::Dimension<3>, float*) | 32.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
29 | vgg_19/conv4/conv4_1/convolution | Conv2D | [[1 512 28 28]] | 342.667 | 1605632 | 19433472 | 577078784 | GPU_0_bfc | 17827840 | 0 | 0 | 0 | void cudnn::winograd::generateWinogradTilesKernel<0, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>) | 26.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
30 | vgg_19/conv4/conv4_1/BiasAdd | BiasAdd | [[1 512 28 28]] | 36.333 | 1605632 | 0 | 576275968 | GPU_0_bfc | 0 | 0 | 0 | 0 | void tensorflow::BiasNCHWKernel<float>(int, float const*, float const*, float*, int, int) | 6.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
31 | vgg_19/conv4/conv4_1/Relu | Relu | [[1 512 28 28]] | 27 | 1605632 | 0 | 576275968 | GPU_0_bfc | 0 | 0 | 0 | 0 | void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long>(Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long) | 4.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
32 | vgg_19/conv4/conv4_2/convolution | Conv2D | [[1 512 28 28]] | 526.667 | 1605632 | 37259264 | 577881600 | GPU_0_bfc | 35653632 | 0 | 0 | 0 | volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1 | 272.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
32 | vgg_19/conv4/conv4_2/convolution | Conv2D | [[1 512 28 28]] | 526.667 | 1605632 | 37259264 | 577881600 | GPU_0_bfc | 35653632 | 0 | 0 | 0 | void tensorflow::functor::ShuffleInTensor3Simple<float, 2, 1, 0, false>(int, float const*, tensorflow::functor::Dimension<3>, float*) | 63.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
32 | vgg_19/conv4/conv4_2/convolution | Conv2D | [[1 512 28 28]] | 526.667 | 1605632 | 37259264 | 577881600 | GPU_0_bfc | 35653632 | 0 | 0 | 0 | void cudnn::winograd::generateWinogradTilesKernel<0, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>) | 54.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
33 | vgg_19/conv4/conv4_2/BiasAdd | BiasAdd | [[1 512 28 28]] | 36 | 1605632 | 0 | 576275968 | GPU_0_bfc | 0 | 0 | 0 | 0 | void tensorflow::BiasNCHWKernel<float>(int, float const*, float const*, float*, int, int) | 6.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
34 | vgg_19/conv4/conv4_2/Relu | Relu | [[1 512 28 28]] | 25.333 | 1605632 | 0 | 576275968 | GPU_0_bfc | 0 | 0 | 0 | 0 | void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long>(Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long) | 4.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
35 | vgg_19/conv4/conv4_3/convolution | Conv2D | [[1 512 28 28]] | 526.333 | 2408448 | 38062080 | 578684416 | GPU_0_bfc | 35653632 | 0 | 0 | 0 | volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1 | 272.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
35 | vgg_19/conv4/conv4_3/convolution | Conv2D | [[1 512 28 28]] | 526.333 | 2408448 | 38062080 | 578684416 | GPU_0_bfc | 35653632 | 0 | 0 | 0 | void tensorflow::functor::ShuffleInTensor3Simple<float, 2, 1, 0, false>(int, float const*, tensorflow::functor::Dimension<3>, float*) | 63.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
35 | vgg_19/conv4/conv4_3/convolution | Conv2D | [[1 512 28 28]] | 526.333 | 2408448 | 38062080 | 578684416 | GPU_0_bfc | 35653632 | 0 | 0 | 0 | void cudnn::winograd::generateWinogradTilesKernel<0, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>) | 55.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
36 | vgg_19/conv4/conv4_3/BiasAdd | BiasAdd | [[1 512 28 28]] | 36 | 2408448 | 0 | 577078784 | GPU_0_bfc | 0 | 0 | 0 | 0 | void tensorflow::BiasNCHWKernel<float>(int, float const*, float const*, float*, int, int) | 6.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
37 | vgg_19/conv4/conv4_3/Relu | Relu | [[1 512 28 28]] | 25 | 2408448 | 0 | 577078784 | GPU_0_bfc | 0 | 0 | 0 | 0 | void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long>(Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long) | 4.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
38 | vgg_19/conv4/conv4_4/convolution | Conv2D | [[1 512 28 28]] | 531.667 | 1605632 | 37259264 | 578684416 | GPU_0_bfc | 35653632 | 0 | 0 | 0 | volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1 | 274.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
38 | vgg_19/conv4/conv4_4/convolution | Conv2D | [[1 512 28 28]] | 531.667 | 1605632 | 37259264 | 578684416 | GPU_0_bfc | 35653632 | 0 | 0 | 0 | void tensorflow::functor::ShuffleInTensor3Simple<float, 2, 1, 0, false>(int, float const*, tensorflow::functor::Dimension<3>, float*) | 63.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
38 | vgg_19/conv4/conv4_4/convolution | Conv2D | [[1 512 28 28]] | 531.667 | 1605632 | 37259264 | 578684416 | GPU_0_bfc | 35653632 | 0 | 0 | 0 | void cudnn::winograd::generateWinogradTilesKernel<0, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>) | 55.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
39 | vgg_19/conv4/conv4_4/BiasAdd | BiasAdd | [[1 512 28 28]] | 36 | 1605632 | 0 | 576275968 | GPU_0_bfc | 0 | 0 | 0 | 0 | void tensorflow::BiasNCHWKernel<float>(int, float const*, float const*, float*, int, int) | 6.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
40 | vgg_19/conv4/conv4_4/Relu | Relu | [[1 512 28 28]] | 26 | 1605632 | 0 | 576275968 | GPU_0_bfc | 0 | 0 | 0 | 0 | void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long>(Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long) | 4.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
41 | vgg_19/pool4/MaxPool | MaxPool | [[1 512 14 14]] | 53.667 | 401408 | 401408 | 576677376 | GPU_0_bfc | 0 | 0 | 0 | 0 | void cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::maxpooling_func<float, (cudnnNanPropagation_t)0>, 0, false>(cudnnTensorStruct, float const*, cudnnTensorStruct, float*, cudnnPoolingStruct, float, float, int, cudnn::reduced_divisor, cudnn::reduced_divisor) | 4.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
42 | vgg_19/conv5/conv5_1/convolution | Conv2D | [[1 512 14 14]] | 377.333 | 401408 | 36055040 | 575473152 | GPU_0_bfc | 35653632 | 0 | 0 | 0 | volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1 | 120.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
42 | vgg_19/conv5/conv5_1/convolution | Conv2D | [[1 512 14 14]] | 377.333 | 401408 | 36055040 | 575473152 | GPU_0_bfc | 35653632 | 0 | 0 | 0 | void tensorflow::functor::ShuffleInTensor3Simple<float, 2, 1, 0, false>(int, float const*, tensorflow::functor::Dimension<3>, float*) | 64.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
42 | vgg_19/conv5/conv5_1/convolution | Conv2D | [[1 512 14 14]] | 377.333 | 401408 | 36055040 | 575473152 | GPU_0_bfc | 35653632 | 0 | 0 | 0 | void cudnn::winograd::generateWinogradTilesKernel<0, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>) | 54.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
43 | vgg_19/conv5/conv5_1/BiasAdd | BiasAdd | [[1 512 14 14]] | 32.333 | 401408 | 0 | 575071744 | GPU_0_bfc | 0 | 0 | 0 | 0 | void tensorflow::BiasNCHWKernel<float>(int, float const*, float const*, float*, int, int) | 3.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
44 | vgg_19/conv5/conv5_1/Relu | Relu | [[1 512 14 14]] | 25.333 | 401408 | 0 | 575071744 | GPU_0_bfc | 0 | 0 | 0 | 0 | void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long>(Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long) | 2.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
45 | vgg_19/conv5/conv5_2/convolution | Conv2D | [[1 512 14 14]] | 372.333 | 401408 | 36055040 | 575473152 | GPU_0_bfc | 35653632 | 0 | 0 | 0 | volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1 | 120.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
45 | vgg_19/conv5/conv5_2/convolution | Conv2D | [[1 512 14 14]] | 372.333 | 401408 | 36055040 | 575473152 | GPU_0_bfc | 35653632 | 0 | 0 | 0 | void tensorflow::functor::ShuffleInTensor3Simple<float, 2, 1, 0, false>(int, float const*, tensorflow::functor::Dimension<3>, float*) | 63.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
45 | vgg_19/conv5/conv5_2/convolution | Conv2D | [[1 512 14 14]] | 372.333 | 401408 | 36055040 | 575473152 | GPU_0_bfc | 35653632 | 0 | 0 | 0 | void cudnn::winograd::generateWinogradTilesKernel<0, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>) | 54.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
46 | vgg_19/conv5/conv5_2/BiasAdd | BiasAdd | [[1 512 14 14]] | 35.333 | 401408 | 0 | 575071744 | GPU_0_bfc | 0 | 0 | 0 | 0 | void tensorflow::BiasNCHWKernel<float>(int, float const*, float const*, float*, int, int) | 3.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
47 | vgg_19/conv5/conv5_2/Relu | Relu | [[1 512 14 14]] | 25.333 | 401408 | 0 | 575071744 | GPU_0_bfc | 0 | 0 | 0 | 0 | void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long>(Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long) | 3.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
48 | vgg_19/conv5/conv5_3/convolution | Conv2D | [[1 512 14 14]] | 376.667 | 401408 | 36055040 | 575473152 | GPU_0_bfc | 35653632 | 0 | 0 | 0 | volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1 | 121.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
48 | vgg_19/conv5/conv5_3/convolution | Conv2D | [[1 512 14 14]] | 376.667 | 401408 | 36055040 | 575473152 | GPU_0_bfc | 35653632 | 0 | 0 | 0 | void tensorflow::functor::ShuffleInTensor3Simple<float, 2, 1, 0, false>(int, float const*, tensorflow::functor::Dimension<3>, float*) | 63.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
48 | vgg_19/conv5/conv5_3/convolution | Conv2D | [[1 512 14 14]] | 376.667 | 401408 | 36055040 | 575473152 | GPU_0_bfc | 35653632 | 0 | 0 | 0 | void cudnn::winograd::generateWinogradTilesKernel<0, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>) | 54.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
49 | vgg_19/conv5/conv5_3/BiasAdd | BiasAdd | [[1 512 14 14]] | 31.667 | 401408 | 0 | 575071744 | GPU_0_bfc | 0 | 0 | 0 | 0 | void tensorflow::BiasNCHWKernel<float>(int, float const*, float const*, float*, int, int) | 3.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
50 | vgg_19/conv5/conv5_3/Relu | Relu | [[1 512 14 14]] | 23.667 | 401408 | 0 | 575071744 | GPU_0_bfc | 0 | 0 | 0 | 0 | void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long>(Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long) | 2.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
51 | vgg_19/conv5/conv5_4/convolution | Conv2D | [[1 512 14 14]] | 375.667 | 401408 | 36055040 | 575473152 | GPU_0_bfc | 35653632 | 0 | 0 | 0 | volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1 | 120.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
51 | vgg_19/conv5/conv5_4/convolution | Conv2D | [[1 512 14 14]] | 375.667 | 401408 | 36055040 | 575473152 | GPU_0_bfc | 35653632 | 0 | 0 | 0 | void tensorflow::functor::ShuffleInTensor3Simple<float, 2, 1, 0, false>(int, float const*, tensorflow::functor::Dimension<3>, float*) | 63.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
51 | vgg_19/conv5/conv5_4/convolution | Conv2D | [[1 512 14 14]] | 375.667 | 401408 | 36055040 | 575473152 | GPU_0_bfc | 35653632 | 0 | 0 | 0 | void cudnn::winograd::generateWinogradTilesKernel<0, float, float>(cudnn::winograd::GenerateWinogradTilesParams<float, float>) | 55.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
52 | vgg_19/conv5/conv5_4/BiasAdd | BiasAdd | [[1 512 14 14]] | 31.333 | 401408 | 0 | 575071744 | GPU_0_bfc | 0 | 0 | 0 | 0 | void tensorflow::BiasNCHWKernel<float>(int, float const*, float const*, float*, int, int) | 3.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
53 | vgg_19/conv5/conv5_4/Relu | Relu | [[1 512 14 14]] | 23.667 | 401408 | 0 | 575071744 | GPU_0_bfc | 0 | 0 | 0 | 0 | void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long>(Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long) | 2.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
54 | vgg_19/pool5/MaxPool | MaxPool | [[1 512 7 7]] | 55 | 100352 | 100352 | 575172096 | GPU_0_bfc | 0 | 0 | 0 | 0 | void cudnn::detail::pooling_fw_4d_kernel<float, float, cudnn::detail::maxpooling_func<float, (cudnnNanPropagation_t)0>, 0, false>(cudnnTensorStruct, float const*, cudnnTensorStruct, float*, cudnnPoolingStruct, float, float, int, cudnn::reduced_divisor, cudnn::reduced_divisor) | 4.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
55 | vgg_19/fc6/convolution | Conv2D | [[1 4096 1 1]] | 27284.667 | 16384 | 411058176 | 574787072 | GPU_0_bfc | 411041792 | 0 | 0 | 0 | void tensorflow::functor::ShuffleInTensor3Simple<float, 2, 1, 0, false>(int, float const*, tensorflow::functor::Dimension<3>, float*) | 24114.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
55 | vgg_19/fc6/convolution | Conv2D | [[1 4096 1 1]] | 27284.667 | 16384 | 411058176 | 574787072 | GPU_0_bfc | 411041792 | 0 | 0 | 0 | void cudnn::detail::implicit_convolve_sgemm<float, float, 1024, 5, 5, 3, 3, 3, 1, true, false, true>(int, int, int, float const*, int, float*, float*, kernel_conv_params, int, float, float, int, float*, float*, int, int) | 2622.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
56 | vgg_19/fc6/BiasAdd | BiasAdd | [[1 4096 1 1]] | 26.667 | 16384 | 0 | 574686720 | GPU_0_bfc | 0 | 0 | 0 | 0 | void tensorflow::BiasNCHWKernel<float>(int, float const*, float const*, float*, int, int) | 2.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
57 | vgg_19/fc6/Relu | Relu | [[1 4096 1 1]] | 18 | 16384 | 0 | 574686720 | GPU_0_bfc | 0 | 0 | 0 | 0 | void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long>(Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long) | 2.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
58 | vgg_19/fc7/convolution | Conv2D | [[1 4096 1 1]] | 923.333 | 16384 | 67125248 | 574703104 | GPU_0_bfc | 67108864 | 0 | 0 | 0 | void tensorflow::functor::ShuffleInTensor3Simple<float, 2, 1, 0, false>(int, float const*, tensorflow::functor::Dimension<3>, float*) | 498.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
58 | vgg_19/fc7/convolution | Conv2D | [[1 4096 1 1]] | 923.333 | 16384 | 67125248 | 574703104 | GPU_0_bfc | 67108864 | 0 | 0 | 0 | void cudnn::detail::implicit_convolve_sgemm<float, float, 1024, 5, 5, 3, 3, 3, 1, true, false, true>(int, int, int, float const*, int, float*, float*, kernel_conv_params, int, float, float, int, float*, float*, int, int) | 334.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
59 | vgg_19/fc7/BiasAdd | BiasAdd | [[1 4096 1 1]] | 23.667 | 16384 | 0 | 574686720 | GPU_0_bfc | 0 | 0 | 0 | 0 | void tensorflow::BiasNCHWKernel<float>(int, float const*, float const*, float*, int, int) | 3.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
60 | vgg_19/fc7/Relu | Relu | [[1 4096 1 1]] | 17.333 | 16384 | 0 | 574686720 | GPU_0_bfc | 0 | 0 | 0 | 0 | void Eigen::internal::EigenMetaKernel<Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long>(Eigen::TensorEvaluator<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice>, long) | 2.33 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
61 | vgg_19/fc8/convolution | Conv2D | [[1 1000 1 1]] | 420 | 4096 | 16388096 | 574690816 | GPU_0_bfc | 16384000 | 0 | 0 | 0 | void cudnn::detail::implicit_convolve_sgemm<float, float, 1024, 5, 5, 3, 3, 3, 1, true, false, true>(int, int, int, float const*, int, float*, float*, kernel_conv_params, int, float, float, int, float*, float*, int, int) | 231.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
61 | vgg_19/fc8/convolution | Conv2D | [[1 1000 1 1]] | 420 | 4096 | 16388096 | 574690816 | GPU_0_bfc | 16384000 | 0 | 0 | 0 | void tensorflow::functor::ShuffleInTensor3Simple<float, 2, 1, 0, false>(int, float const*, tensorflow::functor::Dimension<3>, float*) | 105.67 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
62 | vgg_19/fc8/BiasAdd | BiasAdd | [[1 1000 1 1]] | 23 | 4096 | 0 | 574674432 | GPU_0_bfc | 0 | 0 | 0 | 0 | void tensorflow::BiasNCHWKernel<float>(int, float const*, float const*, float*, int, int) | 2.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | true |
Showing 1 to 97 of 97 entries