Sharif Digital Repository / Sharif University of Technology / Search result

Point Cloud Semantic Segmentation with Limited Supervision using Deep Neural Networks

, M.Sc. Thesis Sharif University of Technology Hamidi Hesarsorkh, Hassan (Author) ; Kasaei, Shohreh (Supervisor)

Abstract

One of the most common forms of three-dimensional data is point clouds. In addition to its high flexibility in storing three-dimensional space, this type of data is the closest type of data to the output of three-dimensional sensors. Semantic segmentation of point clouds is a fundamental operation on this type of data, with applications in robotics, self-driving cars, virtual reality, remote sensing, and other fields that work with this type of data. Since deep learning models require abundant data for training, this type of data is not an exception to this rule with these models. However, the problem is that collecting and labeling this type of data is more difficult and costly compared to...

محتواي کتاب

Deep Multi-Object Tracking by Part-Based Re-Identification in Soccer Matches

, M.Sc. Thesis Sharif University of Technology Mansourian, Amir Mohammad (Author) ; Kasaei, Shohreh (Supervisor)

Abstract

Effective tracking and re-identification of persons is essential for analyzing team sport videos. However, this task is challenging due to the nonlinear motion of players, the similarity in appearance of players from the same team, the distance of the camera from the persons on the pitch, and frequent occlusions. Therefore, the ability to extract meaningful embeddings to represent persons is crucial in developing an effective tracking and re-identification system. In team sports, there is other information that can be used for re-identification of persons, such as team affiliation, role information, and jersey number. However, existing methods usually suffer from two problems: first,...

محتواي کتاب

Adversarial Attacks on Deep Neural Networks

, Ph.D. Dissertation Sharif University of Technology Naderi, Hanieh (Author) ; Kasaei, Shohreh (Supervisor)

Abstract

The remarkable progress of deep neural networks in recent years has led to their entry into the industry and their use in the real world. However, one of the most important and basic issues that threaten the security of these networks is attacks. The attacks that deliberately manipulate input data cause vulnerabilities and misclassify networks. Due to the wide range of ways in which attacks can perturb input data, identifying their types is considered a vital part of ensuring a robust network. The inability of deep networks to generalize to unseen data is also an important limitation. This thesis presents a 2D adversarial attack and a 3D defense in this regard.In 2D attacks, the type of...

محتواي کتاب

Trainable Loss Weights for Image Super-Resolution

, M.Sc. Thesis Sharif University of Technology Chaichi Mellatshahi, Arash (Author) ; Kasaei, Shohreh (Supervisor)

Abstract

Image super-resolution is the process of estimating a high-quality image from a low-quality image. With the growth of remote sensing images, computer games, and the development of artificial intelligence applications in medical image analysis, research in this area of machine vision has seen significant growth. In recent years, research on super-resolution has primarily focused on the development of unsupervised models, blind networks, and the use of optimization methods in non-blind models. However, limited research has discussed the loss function in the super-resolution process. The majority of those studies have only used perceptual similarity in a conventional way. This is while the...

محتواي کتاب

Content-based Image Retrieval of Clothing Items with Neural Networks

, M.Sc. Thesis Sharif University of Technology Ghayour Razmgah, Mahdi (Author) ; Kasaei, Shohreh (Supervisor)

Abstract

One of the trends and important topics in computer vision is content based image retrieval. in this subject, we ask image as query from system, then system will search in pre-processed dataset and finds nearest images to the query and return them as result. in this thesis, our goal is to solve this problem in better way for fashion dataset. current solutions will generate bad results in case of rotation in input query or dataset. last recent years, transformers are generated really good results in NLP, then the ViT reproduced same idea in computer vision and gained comparable results due to CNNs. so, we are going to use vision transformers to solve content-based image retrieval problem with...

محتواي کتاب

Weakly Supervised Semantic Segmentation Using Deep Neural Networks

, M.Sc. Thesis Sharif University of Technology Khairi Atani, Masoud (Author) ; Kasaei, Shohreh (Supervisor)

Abstract

Semantic segmentation which is the classification of every pixel in an input image is a fundamental task in the fields of computer vision and scene understanding. Applications of semantic segmentation include usage in autonomous vehicles and robotics. Since in this task dense annotation of images in the dataset is needed, recent methods have been proposed to utilize weakly-supervised and semi-supervised learning using data with weak labels and unlabeled data respectively. Because the amount of fully labeled data might not be sufficient in such methods, some papers have proposed to employ depth input data due to its rich geometrical and local information when available. In this research, an...

محتواي کتاب

Semantic Segmentation Considering Correlation with RGB and Depth Using Convolutional Neural Networks

, M.Sc. Thesis Sharif University of Technology Ghelichkhan, Zahra (Author) ; Kasaei, Shohreh (Supervisor)

Abstract

In the extensive horizon of artificial intelligence technology, one of the grand challenges in computer vision has been semantic segmentation. This task which aimed to predict label for each pixel of image, describes the scene, due to the need of low level information, is more complicated in comparison with other computer vision tasks. However, as part of concept of scene understanding and a crucial step in many real world applications such as autonomous driving, human-computer interaction and robot navigation, many researchers have been sought to resolve it. What makes this task more challenging rather than other computer vision tasks is that information beyond a pixel, its neighbors and...

محتواي کتاب

Improving Robustness of Deep Neural Networks Against Adversarial Examples in Image

, M.Sc. Thesis Sharif University of Technology Mahabadi Mohamadi, Mohamad (Author) ; Kasaei, Shohreh (Supervisor)

Abstract

Despite widespread applications and high performance of deep neural networks in the fields of computer vision, they have been shown to be vulnerable to adversarial examples. An adversarial example is a perturbated image that the magnitude of its difference with its corresponding natural image is small and yet given such example, the network produces incorrect output. In recent years, many approaches have been proposed to increase the robustness of DNNs against adversarial examples with adversarial training being proposed as the most effective defense measure. Approaches based on adversarial training try to increase the robustness of the network by training on the adversarial examples. One of...

محتواي کتاب

Noisy-Channel Model for Feature Extraction

, Ph.D. Dissertation Sharif University of Technology Hafez Kolahi, Hassan (Author) ; Kasaei, Shohreh (Supervisor) ; Soleymani Baghshah, Mahdieh (Co-Supervisor)

Abstract

One of the approaches used in learning theory is using information theoretic tools. The general idea of this approach is that if we show the algorithm did not memorize the dataset, we could guarantee generalization. Noisy channel model is one of the important concepts in this approach. A noisy channel is a lossy process which maps the data to a compressed format.There are two ways to use noisy channel model in literature: input compression and model compression. One of the main results of this thesis is to show that the input compression methods can not explain the generalization of algorithms (despite previous belief). On the contrast by fixing some of the problems faced in the model...

محتواي کتاب

Self-Supervised Image Representation Learning

, M.Sc. Thesis Sharif University of Technology Aghababazadeh, Arash (Author) ; Kasaei, Shohreh (Supervisor)

Abstract

Self-supervied learning is a method to reduce the need for large labeled datasets in supervised learning. In self-supervised learning, the goal is to design a pretext task that can be trained without any labels. This pretext task results in learning a representation of data that can reduce the need for labels when used for different tasks. In the domain of images, data augmenting transformations which are often a composition of simple transformations such as random cropping and color jitter have been used for the design of pretext tasks. These simple transformations can cause information loss in some datasets which limits the usage of the learned representations for various downstream tasks....

محتواي کتاب

Multi-Sensor Data Fusion with Deep Learning in Semantic Segmentation

, M.Sc. Thesis Sharif University of Technology Sadeghi, Aryan (Author) ; Kasaei, Shohreh (Supervisor)

Abstract

In image processing applications, sensors (Camera, LiDAR and Stereo) are essential for scene perception and Deep learning methods outperform most of the image processing tasks like 3D and 2D object detection and semantic segmentation. Different sensors are used in image processing tasks. Sensor fusion is using multiple sensors data to get better performance. Each sensor captures different data (e.g, color, texture, and depth). Some of them are distorted in inclement weather, intense illuminance changes, and dark environments which multi-sensor data fusion is used to overcome sensor weaknesses. One of the most important fields that sensor fusion used is Auto Driving cars (AD). Different...

محتواي کتاب

Supervised Semantic Segmentation of RGB-Depth Images

, Ph.D. Dissertation Sharif University of Technology Fooladgar, Fahimeh (Author) ; Kasaei, Shohreh (Supervisor)

Abstract

The labeling process is one of the most important tasks in the field of computer vision. The dense labeling problem is the main step towards 2D and 3D scene understanding. The main goal of dense labeling is to label all pixels of images that are known as a semantic segmentation of images in the related literature. Although the state-of-the-art results are mainly achieved by deep learning methods, traditional methods had also been at the center of attention for some years. In the last decades, convolutional neural networks have changed the landscape of visual recognition tasks such as labeling and semantic segmentation. The most important issues in deep learning models are the hardware and...

محتواي کتاب

3D Reconstruction of Human Body from Single-View Videos

, Ph.D. Dissertation Sharif University of Technology Sepehri Nour, Maryam (Author) ; Kasaei, Shohreh (Supervisor)

Abstract

This thesis presented a state-of-the-art algorithm for perspective projection reconstruction of non-rigid surfaces from single-view and realistic videos. Which overcomes the limitations arising from the usage of orthographic camera model and the complexity and non-linearity issues of perspective projection equation. Unlike traditional non-rigid structure-from-motion (NRSfM) methods, which have been studied only on synthetic datasets and controlled lab environments that require some prior constraints (such as manually segmented objects, limited rotations and occlusions, and full-length trajectories); the proposed method can be used in realistic video sequences. In addition, contrary to...

محتواي کتاب

Class Attention Map Distillation In Semantic Segmentation

, M.Sc. Thesis Sharif University of Technology Karimi Bavandpour, Nader (Author) ; Kasaei, Shohreh (Supervisor)

Abstract

Semantic segmentation is the tash of labeling each pixel of an input image. It is one of the main problems in computer vision and plays an important role in scene understanding. State of the art methods of solving it are based on Convolutional Neural Networhs (CNNs). While many real world tasks like autodriving cars and robot navigation require fast and lightweight models, CNNs inherently tend to give beter accuracy when they are deeper and bigger, and this has raised interest in designing compact networks. Knowledge distillation is one of the popular methods of training compact networhs and helps to transfer a big and powerful network’s knowledge to a small and compact one. In this research...

محتواي کتاب

Human Action Recognition from RGB-D Videos using Deep Networks

, M.Sc. Thesis Sharif University of Technology Beizaee, Farzad (Author) ; Kasaei, Shohreh (Supervisor)

Abstract

Nowadays, Human Action Recognition is one of the most widely used and active areas of research in computer vision. the purpose of Human Action Recognition is to label an action in a video. This field has numerous applications like human-computer interaction, video analysis, medical care, surveillance camerate, and etc. Like other subcategories of computer vision, today with the advent of deep learning networks and its development, considerable progress has been made in the accuracy and speed of the methods. The main purpose of this research is to improve human action recognition networks on RGB-D videos. In this study, three methods for action recognition using deep neural networks are...

محتواي کتاب

A Robust and Compressed Descriptor for Action Recognition from 4D Data

, Ph.D. Dissertation Sharif University of Technology Asadi-Aghbolaghi, Maryam (Author) ; Kasaei, Shohreh (Supervisor)

Abstract

Human action recognition is nowadays within the most active computer vision research areas. The problem of action recognition is challenging due to the large intra-class variations, noise and high dimension of video data. Recent development of affordable depth sensors leads to new opportunities in this field by providing depth data. Literature of vision based human action recognition can be divided into two main groups of handcrafted and deep learning based approaches. In the first approach, a supervised spatio-temporal kernel descriptor (SSTKDes) is proposed from RGB-depth videos. To enhance the discriminative ability of the descriptor, extracted primary kernel-based features are...

محتواي کتاب

3D Reconstruction of Human Pose in Multi-View Dynamic Scenes

, Ph.D. Dissertation Sharif University of Technology Ershadi Nasab, Sara (Author) ; Sanaei, Esmaeil (Supervisor) ; Kasaei, Shohreh (Co-Advisor)

Abstract

In this thesis, 3D pose reconstruction of one or multiple humans in multi-view dynamic scene is considered. Inputs are multi-view frames of multi-view camera systems. Outputs are the 3D reconstructed human poses in 3D space. The pose is the location of 14 human body joints in the 3D space. In this research, it is not allowed to use Kinect sensor data or other Markers or GPS sensors. It is supposed that only multi-view images are used. 3D reconstruction of human body pose can be performed with different assumptions. For example, the scene is indoor only or the camera calibration information is provided at first.Cameras are moving or fixed. In this research, each of this assumption is regarded...

محتواي کتاب

4D Hand Gesture Recognition on RGB-D Videos

, M.Sc. Thesis Sharif University of Technology Azad, Reza (Author) ; Kasaei, Shohreh (Supervisor)

Abstract

Hand gesture recognition is one of the most applicable and hot research topics in computer vision community. The recent advances in imaging devices, like Microsoft Kinect, have received a great deal of attention from researchers to reconsider problems such as gesture recognition from depth information. Hand gesture recognition refers to the classification of dynamic hand movements in action videos. Generally, hand gesture recognition includes three main steps: hand detection, feature extraction and classification. The first step plays an important role in hand gesture recognition. The most challenging part of hand gesture recognition is the second step which is the process of extracting high...

محتواي کتاب

Extracting Homography Matrix to Determine 3D Position of Soccer Players

, Ph.D. Dissertation Sharif University of Technology Fotouhi, Mehran (Author) ; Kasaei, Shohreh (Supervisor)

Abstract

Determination of the position of an object in the 3D world is one of the most basic preprocessing steps in the field of computer vision. It is used in many practical applications such as video surveillance, human action analysis, and human-computer interaction. For this determination, calibrated cameras are usually used for which the internal and external camera parameters are already known. But, in some real-life applications, a pre-access to the camera is not possible. This thesis studies the homography matrix extraction for determination of the position of soccer players. The pan-tilt-zoom (PTZ) cameras are used. (For a stationary camera, a Homography matrix is obtained once.) To...

محتواي کتاب

Insert Graphical Elements in Multiview Soccer Videos

, M.Sc. Thesis Sharif University of Technology Ashgar, Nafiseh (Author) ; Kasaei, Shohreh (Supervisor)

Abstract

In recent decades, many researchers have focused on inferring camera calibration from soccer videos. This task is usually used to provide more information to the audience by adding graphical elements to the field. Indeed, the problem of inserting graphical elements in sport field videos is the problem of calculating projection matrix in continuous frames with which we can insert graphical elements. Basic challenges in this regard are the lack of information in some frames, bad lighting conditions, noise and blur, quick changes of camera viewpoint and radial distortion. Despite previous methods which aimed to propose an algorithm for a specific region of the field, we have introduced a novel...

محتواي کتاب