Patent, Reports, and Student theses
This thesis investigates the seasonal predictive capabilities of Neural Radiance Fields (NeRF) applied to satellite images. Focusing on the utilization of satellite data, the study explores how Sat-NeRF, a novel approach in computer vision, per- forms in predicting seasonal variations across different months. Through compre- hensive analysis and visualization, the study examines the model’s ability to cap- ture and predict seasonal changes, highlighting specific challenges and strengths. Results showcase the impact of the sun on predictions, revealing nuanced details in seasonal transitions, such as snow cover, color accuracy, and texture represen- tation in different landscapes. The research introduces modifications to the Sat- NeRF network. The implemented versions of the network include geometrically rendered shadows, a signed distance function, and a month embedding vector, where the last version mentioned resulted in Planet-NeRF. Comparative evalua- tions reveal that Planet-NeRF outperforms prior models, particularly in refining seasonal predictions. This advancement contributes to the field by presenting a more effective approach for seasonal representation in satellite imagery analysis, offering promising avenues for future research in this domain.
@mastersthesis{diva2:1841942,
author = {Ingerstad, Erica and Kåreborn, Liv},
title = {{Planet-NeRF:
Neural Radiance Fields for 3D Reconstruction on Satellite Imagery in Season Changing Environments}},
school = {Linköping University},
type = {{LiTH-ISY-EX--24/5631--SE}},
year = {2024},
address = {Sweden},
}
While markerless motion capture provided acceptable accuracy, no clear patterns emerged regarding the individual effects of surface properties on technique. This is most likely due to limitations such as sample size, lack of standardizing data set (players) across facilities, and limited control over player behavior. However, analyzing one individual's motion capture data across surfaces showed potential for distinguishing turning styles based on facility parameters.
The method in this thesis demonstrates the potential of markerless motion capture for injury prevention research in football. Despite inconclusive results on the individual facility parameter effects, the ability to distinguish player styles across surfaces suggests valuable future directions for investigating personalized risk factors and optimizing playing surfaces. Further research with larger, more diverse samples and a broader set of biomechanical and facility features could provide deeper insight into injury prevention strategies.
@mastersthesis{diva2:1848290,
author = {Rommel, Kaspar},
title = {{Influence of artificial turf on football technique using motion capture and 3D modelling}},
school = {Linköping University},
type = {{}},
year = {2024},
address = {Sweden},
}
Harness racing horses are exposed to high workload and consequently, they are at risk of joint injuries and lameness. In recent years, the interest in applications to improve animal welfare has increased and there is a demand for objective assessment methods that can enable early and robust diagnosis of injuries.
In this thesis, experiments were conducted on video recordings collected by a helmet camera mounted on the driver of a sulky. The aim was to take the first steps toward equine gait analysis by investigating how semantic segmentation and 3D reconstruction of such data could be performed. Since these were the first experiments made on this data, no expectations of the results existed in advance.
Manual pixel-wise annotations were created on a small set of extracted frames and a deep learning model for semantic segmentation was trained to localize the horse, as well as the sulky and reins. The results are promising and could probably be further improved by expanding the annotated dataset and using a larger image resolution. Structure-from-motion using COLMAP was performed to estimate the camera motion in part of a video recording. A method to filter out dynamic objects based on masks created from predicted segmentation maps was investigated and the results showed that the reconstruction was part-wise successful, but struggled when dynamic objects were not filtered out and when the equipage was moving at high speed along a straight stretch.
Overall the results are promising, but further development needs to be conducted to ensure robustness and conclude whether data collected by the investigated helmet camera configuration is suitable for equine gait analysis.
@mastersthesis{diva2:1729598,
author = {Hult, Evelina},
title = {{Toward Equine Gait Analysis:
Semantic Segmentation and 3D Reconstruction}},
school = {Linköping University},
type = {{LiTH-ISY-EX--23/5539--SE}},
year = {2023},
address = {Sweden},
}
With over 53 million articles and 11 million images, Wikipedia is the greatest encyclopedia in history. The number of users is equally significant, with daily views surpassing 1 billion. Such an enormous system needs automation of tasks to make it possible for the volunteers to maintain. When it comes to textual data, there is a system based on machine learning called ORES providing automation to tasks such as article quality estimation and article topic routing. A visual counterpart system also needs to be developed to support tasks such as vandalism detection in images and for a better understanding of the visual data of Wikipedia. Researchers from the Wikimedia Foundation identified a hindrance to implementing the visual counterpart of ORES: the images of Wikipedia lack topical metadata. Thus, this work aims to develop a deep learning model that classifies images into a set of topics, which have been pre-determined in parallel work. State-of-the-art image classification models and other methods to mitigate the existing class imbalance are used. The conducted experiments show, among others, that: using the data that considers the hierarchy of labels performs better; resampling techniques are ineffective at mitigating imbalance due to the high label concurrence; sample-weighting improves metrics; and that initializing parameters as pre-trained on ImageNet rather than randomly yields better metrics. Moreover, we find interesting outlier labels that, despite having fewer samples, obtain better performance metrics, which is believed to be either due to bias from pre-training or simply more signal in the label. The distribution of the visual data predicted by the models displayed. Finally, some qualitative examples of the model predictions to some images are presented, proving the ability of the model to find correct labels that are missing in the ground truth
@mastersthesis{diva2:1729493,
author = {Vieira Bernat, Matheus},
title = {{Topical Classification of Images in Wikipedia:
Development of topical classification models followed by a study of the visual content of Wikipedia}},
school = {Linköping University},
type = {{LiTH-ISY-EX--23/5538--SE}},
year = {2023},
address = {Sweden},
}
Image fusion is a technique that aims to combine semantic information from different source images into a new synthesized image that contains information from both source images. It is a technique that can be useful in many different areas, such as reconnaissance, surveillance and medical diagnostics. A crucial aspect of image fusion is finding important features from the source images and preserving these in the fused image. A possible method to find and preserve the features could be to utilize deep learning. This thesis trains and evaluates an unsupervised network on two new datasets created for the fusion of visual near infrared (VNIR) and long wave infrared (LWIR) images. Feature representations obtained from a pre-trained network are implemented in the loss function, followed by training and evaluation of that model as well. Both deep learning models are compared with results obtained from a traditional image fusion method. The trained models performed well whereas the traditional method performed better when evaluating dataset 1. The deep learning models did perform better on dataset 2 which contained images captured in daylight and dusk conditions. The resulting fused images from the deep learning approaches demonstrated better contrast compared to the fused images obtained by averaging. The additional feature representations obtained from the pre-trained network did not improve the results on any of the datasets. An explanation for these results could be that the loss function already helps to preserve the semantic information in the features.
@mastersthesis{diva2:1737202,
author = {Granqvist, Matilda},
title = {{Infrared and Visible Image Fusion with an Unsupervised Network}},
school = {Linköping University},
type = {{LiTH-ISY-EX--23/5540--SE}},
year = {2023},
address = {Sweden},
}
Point Cloud Registration with data measured from a photon-counting LIDAR sensor from a large distance (500 m - 1.5 km) is an expanding field. Data measuredfrom far is sparse and have low detail, which can make the registration processdifficult, and registering this type of data is fairly unexplored. In recent years,machine learning for point cloud registration has been explored with promisingresults. This work compares the performance of the point cloud registration algorithm Iterative Closest Point with state-of-the-art algorithms, with data froma photon-counting LIDAR sensor. The data was provided by the Swedish Defense Research Agency (FOI). The chosen state-of-the-art algorithms were thenon-learning-based Fast Global Registration and learning-based D3Feat and SpinNet. The results indicated that all state-of-the-art algorithms achieve a substantial increase in performance compared to the Iterative Closest Point method. Allthe state-of-the-art algorithms utilize their calculated features to obtain bettercorrespondence points and therefore, can achieve higher performance in pointcloud registration. D3Feat performed point cloud registration with the highestaccuracy of all the state-of-the-art algorithms and ICP.
@mastersthesis{diva2:1761482,
author = {Boström, Maja},
title = {{Point Cloud Registration using both Machine Learning and Non-learning Methods:
with Data from a Photon-counting LIDAR Sensor}},
school = {Linköping University},
type = {{LiTH-ISY-EX--23/5558--SE}},
year = {2023},
address = {Sweden},
}
Today the process of sorting second-hand clothes and textiles is mostly manual. In this master’s thesis, methods for automating this process as well as improving the manual sorting process have been investigated. The methods explored include the automatic prediction of price and intended usage for second-hand clothes, as well as different types of image retrieval to aid manual sorting. Two models were examined: CLIP, a multi-modal model, and MAE, a self-supervised model. Quantitatively, the results favored CLIP, which outperformed MAE in both image retrieval and prediction. However, MAE may still be useful for some applications in terms of image retrieval as it returns items that look similar, even if they do not necessarily have the same attributes. In contrast, CLIP is better at accurately retrieving garments with as many matching attributes as possible. For price prediction, the best model was CLIP. When fine-tuned on the dataset used, CLIP achieved an F1-Score of 38.08 using three different price categories in the dataset. For predicting the intended usage (either reusing the garment or exporting it to another country) the best model managed to achieve an F1-Score of 59.04.
@mastersthesis{diva2:1763534,
author = {Hermansson, Simon},
title = {{Learning Embeddings for Fashion Images}},
school = {Linköping University},
type = {{LiTH-ISY-EX--23/5567--SE}},
year = {2023},
address = {Sweden},
}
In the area of Traffic Sign Recognition (TSR), deep learning models are trained to detect and classify images of traffic signs. The amount of data available to train these models is often limited, and collecting more data is time-consuming and expensive. A possible complement to traditional data acquisition, is to generate synthetic images with a generative machine learning model. This thesis investigates the use of denoising diffusion probabilistic models for generating synthetic data of one or multiple traffic sign classes, when providing different amount of real images for that class (classes). In the few-sample method, the number of images used was from 1 to 1000, and zero images were used in the zero-shot method. The results from the few-sample method show that combining synthetic images with real images when training a traffic sign classifier, increases the performance in 3 out of 6 investigated cases. The results indicate that the developed zero-shot method is useful if further refined, and potentially could enable generation of realistic images of signs not seen in the training data.
@mastersthesis{diva2:1764694,
author = {Carlson, Johanna and Byman, Lovisa},
title = {{Generation of Synthetic Traffic Sign Images using Diffusion Models}},
school = {Linköping University},
type = {{LiTH-ISY-EX--23/5563--SE}},
year = {2023},
address = {Sweden},
}
This thesis explores the application of Contrastive Language-Image Pre-Training (CLIP), a vision-language model, in an automated video surveillance system for anomaly detection. The ability of CLIP to perform zero-shot learning, coupled with its robustness against minor image alterations due to its lack of reliance on pixel-level image analysis, makes it a suitable candidate for this application.
The study investigates the performance of CLIP in tandem with various anomaly detection algorithms within a visual surveillance system. A custom dataset was created for video anomaly detection, encompassing two distinct views and two varying levels of anomaly difficulty. One view offers a more zoomed-in perspective, while the other provides a wider perspective. This was conducted to evaluate the capacity of CLIP to manage objects that occupy either a larger or smaller portion of the entire scene.
Several different anomaly detection methods were tested with varying levels of supervision, including unsupervised, one-class classification, and weakly- supervised algorithms, which were compared against each other. To create better separation between the CLIP embeddings, a metric learning model was trained and then used to transform the CLIP embeddings to a new embedding space.
The study found that CLIP performs effectively when anomalies take up a larger part of the image, such as in the zoomed-in view where some of the One- Class-Classification (OCC) and weakly supervised methods demonstrated superior performance. When anomalies take up a significantly smaller part of the image in the wider view, CLIP has difficulty distinguishing anomalies from normal scenes even using the transformed CLIP embeddings. For the wider view the results showed on better performance for the OCC and weakly supervised methods.
@mastersthesis{diva2:1765573,
author = {Gärdin, Christoffer},
title = {{Anomaly Detection with Machine Learning using CLIP in a Video Surveillance Context}},
school = {Linköping University},
type = {{LiTH-ISY-EX--23/5564--SE}},
year = {2023},
address = {Sweden},
}
Detecting defects in industrially manufactured products is crucial to ensure their safety and quality. This process can be both expensive and error-prone if done manually, making automated solutions desirable. There is extensive research on industrial anomaly detection in images, but recent studies have shown that adding 3D information can increase the performance. This thesis aims to extend the 2D anomaly detection framework, PaDiM, to incorporate 3D information. The proposed methods combine RGB with depth maps or point clouds and the effects of using PointNet++ and vision transformers to extract features are investigated. The methods are evaluated on the MVTec 3D-AD public dataset using the metrics image AUROC, pixel AUROC and AUPRO, and on a small dataset collected with a Time-of-Flight sensor. This thesis concludes that the addition of 3D information improves the performance of PaDiM and vision transformers achieve the best results, scoring an average image AUROC of 86.2±0.2 on MVTec 3D-AD.
@mastersthesis{diva2:1766718,
author = {Bärudde, Kevin and Gandal, Marcus},
title = {{Industrial 3D Anomaly Detection and Localization Using Unsupervised Machine Learning}},
school = {Linköping University},
type = {{LiTH-ISY-EX--23/5569--SE}},
year = {2023},
address = {Sweden},
}
In synthetic aperture radar (SAR) and inverse synthetic aperture radar (ISAR), an imaging radar emits electromagnetic waves of varying frequencies towards a target and the backscattered waves are collected. By either moving the radar antenna or rotating the target and combining the collected waves, a much longer synthetic aperture can be created. These radar measurements can be used to determine the radar cross-section (RCS) of the target and to reconstruct an estimate of the target. However, the reconstructed images will suffer from spectral leakage effects and are limited in resolution. Many methods of enhancing the images exist and some are based on deep learning. Most commonly the deep learning methods rely on high-resolution ground truth data of the scene to train a neural network to enhance the radar images. In this thesis, a method that does not rely on any high-resolution ground truth data is applied to train a convolutional neural network to enhance radar images. The network takes a conventional ISAR image subject to spectral leakage effects as input and outputs an enhanced ISAR image which contains much more defined features. New RCS measurements are created from the enhanced ISAR image and the network is trained to minimise the difference between the original RCS measurements and the new RCS measurements. A sparsity constraint is added to ensure that the proposed enhanced ISAR image is sparse. The synthetic training data consists of scenes containing point scatterers that are either individual or grouped together to form shapes. The scenes are used to create synthetic radar measurements which are then used to reconstruct ISAR images of the scenes. The network is tested using both synthetic data and measurement data from a cylinder and two aeroplane models. The network manages to minimise spectral leakage and increase the resolution of the ISAR images created from both synthetic and measured RCSs, especially on measured data from target models which have similar features to the synthetic training data.
The contributions of this thesis work are firstly a convolutional neural network that enhances ISAR images affected by spectral leakage. The neural network handles complex-valued signals as a single channel and does not perform any rescaling of the input. Secondly, it is shown that it is sufficient to calculate the new RCS for much fewer frequency samples and angular positions and compare those measurements to the corresponding frequency samples and angular positions in the original RCS to train the neural network.
@mastersthesis{diva2:1767511,
author = {Enåkander, Moltas},
title = {{ISAR Imaging Enhancement Without High-Resolution Ground Truth}},
school = {Linköping University},
type = {{LiTH-ISY-EX--23/5572--SE}},
year = {2023},
address = {Sweden},
}
In the field of autonomous driving a common scenario is to apply deep learningmodels on camera feeds to provide information about the surroundings. A recenttrend is for such vision-based methods to be centralized, in that they fuse imagesfrom all cameras in one big model for a single comprehensive output. Designingand tuning such models is hard and time consuming, in both development andtraining. This thesis aims to reproduce the results of a paper about a centralizedvision-based model performing 3D object detection, called BEVDet. Additionalgoals are to ablate the technique of class balanced grouping and sampling usedin the model, to tune the model to improve generalization, and to change thedetection head of the model to a Transformer decoder-based head.
The findings include a successful reproduction of the results of the paper,while adding depth supervision to BEVDet establishes a baseline for the subsequentexperiments. An increasing validation loss during most of the training indicatesthat there is room for improvement in the generalization of the model. Severaldifferent methods are tested in order to resolve the increasing validation loss,but they all fail to do so. The ablation study shows that the class balanced groupingis important for the performance of the chosen configuration of the model,while the class balanced sampling does not contribute significantly. Without extensivetuning the replacement head gives performance similar to the PETR, themodel that the head is adapted from, but fails to match the performance of thebaseline model. In addition, the model with the Transformer decoder-based headshows a converging validation loss, unlike the baseline model.
@mastersthesis{diva2:1771747,
author = {Lidman, Erik},
title = {{Visual Bird's-Eye View Object Detection for Autonomous Driving}},
school = {Linköping University},
type = {{LiTH-ISY-EX--23/5579--SE}},
year = {2023},
address = {Sweden},
}
In the digital age where video content is abundant, this thesis investigates the efficient adaptation of an existing video-language model (VLM) to new data. The research leverages CLIP, a robust language-vision model, for various video-related tasks including video retrieval. The study explores using pre-trained VLMs to extract video embeddings without the need for extensive retraining. The effectiveness of a smaller model using aggregation is compared with larger models and the application of logistic regression for few-shot learning on video embeddings is examined. The aggregation was done using both non-learning through mean-pooling and also by utilizing a transformer. The video-retrieval models were evaluated on the ActivityNet Captions dataset which contains long videos with dense descriptions while the linear probes were evaluated on ActivityNet200 a video classification dataset.
The study's findings suggest that most models improved when additional frames were employed through aggregation, leading to improved performance. A model trained with fewer frames was able to surpass those trained with two or four times more frames by instead using aggregation. The incorporation of patch dropout and the freezing of embeddings proved advantageous by enhancing performance and conserving training resources. Furthermore, using a linear probe showed that the extracted features were of high quality requiring only 2-4 samples per class to match the zero-shot performance.
@mastersthesis{diva2:1772807,
author = {Lindgren, Felix},
title = {{Efficient Utilization of Video Embeddings from Video-Language Models}},
school = {Linköping University},
type = {{LiTH-ISY-EX--23/5592--SE}},
year = {2023},
address = {Sweden},
}
The goal of this thesis is to use fringe-pattern phase analysis to calibrate the distortion of a camera lens. The benefit of using this method is that the distortion can be calculated using data from each individual pixel and the methodology does not need any model.
The phase used to calibrate the images is calculated in two different ways, either utilizing the monogenic signal or through fringe-pattern phase analysis.
The calibration approaches were also validated through different methods. Primarily by utilizing the Hough transform and calibrating simulated distortion. The thesis also introduces a validation approach utilizing the phase orientation calculated through the monogenic signal.
The thesis also implements different approaches such as flat field correction to limit the impact of the image sensor noise to mitigate the phase noise.
It is also investigated which fringe-pattern frequencies are best suited for calibration through comparative analysis. The comparative analysis identified problems with too high and low frequencies of the fringe-patterns when calibrating using fringe-pattern phase analysis.
@mastersthesis{diva2:1773375,
author = {Karlsson, Karl},
title = {{Camera Distortion Calibration through Fringe Pattern Phase Analysis}},
school = {Linköping University},
type = {{LiTH-ISY-EX--23/5580--SE}},
year = {2023},
address = {Sweden},
}
Automatic 3D reconstruction of birds can aid researchers in studying their behavior. Recently there has been an attempt to reconstruct a variety of birds from single-view images. However, the common murre's appearance is different from the birds that have been studied. Moreover, recent studies have focused on side views. This thesis studies the 3D reconstruction of the common murre from single-view top-view images. A template mesh is first optimized to fit a 3D scan. Then the result is used to optimize a species-specific mean from side-view images annotated with keypoints and silhouettes. The resulting mean mesh is used to initialize the optimization for top-down images. Using a mask loss, a pose prior loss, and a bone length loss that uses a mean vector from the side-view images improves the 3D reconstruction as rated by humans. Furthermore, the intersection over union (IoU) and percentage of correct keypoint (PCK), although used by other authors, are insufficient in a single-view top-view setting.
@mastersthesis{diva2:1779743,
author = {Hägerlind, Johannes},
title = {{3D-Reconstruction of the Common Murre}},
school = {Linköping University},
type = {{LiTH-ISY-EX--23/5576--SE}},
year = {2023},
address = {Sweden},
}
This thesis explores the integration of deep learning-based depth estimation models with the ORB-SLAM3 framework to address challenges in monocular Simultaneous Localization and Mapping (SLAM), particularly focusing on pure rotational movements. The study investigates the viability of using pre-trained generic depth estimation networks, and hybrid combinations of these networks, to replace traditional depth sensors and improve scale accuracy in SLAM systems. A series of experiments are conducted outdoors, utilizing a custom camera setup designed to isolate pure rotational movements. The analysis involves assessing each model's impact on the SLAM process as well as performance indicators (KPIs) on both depth estimation and 3D tracking. Results indicate a correlation between depth estimation accuracy and SLAM performance, underscoring the potential of depth estimation models in enhancing SLAM systems. The findings contribute to the understanding of the role of monocular depth estimation in integrating with SLAM, especially in applications requiring precise spatial awareness for augmented reality.
@mastersthesis{diva2:1845865,
author = {Bladh, Daniel},
title = {{Deep Learning-Based Depth Estimation Models with Monocular SLAM:
Impacts of Pure Rotational Movements on Scale Drift and Robustness}},
school = {Linköping University},
type = {{LiTH-ISY-EX--23/5630--SE}},
year = {2023},
address = {Sweden},
}
Being able to train machine learning models on simulated data can be of great interest in several applications, one of them being for autonomous driving of cars. The reason is that it is easier to collect large labeled datasets as well as performing reinforcement learning in simulations. However, transferring these learned models to the real-world environment can be hard due to differences between the simulation and the reality; for example, differences in material, textures, lighting and content. One approach is to use domain adaptation, by making the simulations as similar as possible to the reality. The thesis's main focus is to investigate domain adaptation as a way to meet the reality-gap, and also compare it to an alternative method, domain randomization.
Two different methods of domain adaptation; one adapting the simulated data to reality, and the other adapting the test data to simulation, are compared to using domain randomization. These are evaluated with a classifier making decisions for a robot car while driving in reality. The evaluation consists of a quantitative evaluation on real-world data and a qualitative evaluation aiming to observe how well the robot is driving and avoiding obstacles. The results show that the reality-gap is very large and that the examined methods reduce it, with the two using domain adaptation resulting in the largest decrease. However, none of them led to satisfactory driving.
@mastersthesis{diva2:1624770,
author = {Forsberg, Fanny},
title = {{Domain Adaptation to Meet the Reality-Gap from Simulation to Reality}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5453--SE}},
year = {2022},
address = {Sweden},
}
When a camera system in a car is mounted behind the windshield, light rays will be refracted by the windshield. The distortion can be significant, especially for wide field-of-view cameras. Traditional approaches handle the windshield distortion along with the calibration that calculates the intrinsic and extrinsic parameters. However, these approaches do not handle the windshield distortion explicitly, and to understand the image formation, it brings to understand more about the windshield distortion effect. In this thesis, data is collected from a camera system viewed with and without the windshield. The windshield distortion effect has been studied by varying the windshield’s tilt and the camera’s setup. Points are then found in both images and matched. From this, a distortion difference is calculated and analyzed. Next, a preliminary model of the windshield distortion effect is presented and evaluated. The results show that the model works well for all cases and the two windshields considered in this thesis.
@mastersthesis{diva2:1638117,
author = {Luong, Therese},
title = {{Windshield Distortion Modelling}},
school = {Linköping University},
type = {{LiTH-ISY-EX--22/5455--SE}},
year = {2022},
address = {Sweden},
}
Deep learning has shown to be successful on the task of semantic segmentation of three-dimensional (3D) point clouds, which has many interesting use cases in areas such as autonomous driving and defense applications. A common type of sensor used for collecting 3D point cloud data is Light Detection and Ranging (LiDAR) sensors. In this thesis, a time-correlated single-photon counting (TCSPC) LiDAR is used, which produces very accurate measurements over long distances up to several kilometers. The dataset collected by the TCSPC LiDAR used in the thesis contains two classes, person and other, and it comes with several challenges due to it being limited in terms of size and variation, as well as being extremely class imbalanced. The thesis aims to identify, analyze, and evaluate state-of-the-art deep learning models for semantic segmentation of point clouds produced by the TCSPC sensor. This is achieved by investigating different loss functions, data variations, and data augmentation techniques for a selected state-of-the-art deep learning architecture. The results showed that loss functions tailored for extremely imbalanced datasets performed the best with regard to the metric mean intersection over union (mIoU). Furthermore, an improvement in mIoU could be observed when some combinations of data augmentation techniques were employed. In general, the performance of the models varied heavily, with some achieving promising results and others achieving much worse results.
@mastersthesis{diva2:1667072,
author = {Süsskind, Caspian},
title = {{Deep Learning Semantic Segmentation of 3D Point Cloud Data from a Photon Counting LiDAR}},
school = {Linköping University},
type = {{LiTH-ISY-EX--22/5467--SE}},
year = {2022},
address = {Sweden},
}
Radiologists often have to look through many different patients and examinations in quick succession, and to aid in the workflow the different types of images should be presented for the radiologist in the same manner and order between each new examination. Thus decreasing the time needed for the radiologist to either find the correct image or rearrange the images to their liking. A step in thisprocess requires a comparison between two images to be made and produce a score between 0-1 describing how similar the images are. A similar algorithm already exists at Sectra, but that algorithm only uses the metadata from the images without considering the actual pixel data.
The aim of this thesis were to explore different methods of doing the same comparison as the previous algorithm but only using the pixel data. Considering only 3D volumes from CT examinations of the abdomen and thorax region, this thesis explores the possibility of using SSIM, SIFT and SIFT together with a histogram comparison using the Bhattacharyya distance for this task. It was deemed very important that the ranking produced when ordering the images in terms of similarity to one reference image followed a specific order. This order was determined by consulting personnel at Sectra that works closely with the clinical side of radiology.
SSIM were able to differentiate between different plane orientations since they usually had large resolution differences in each led, but it could not be made tofollow the desired ranking and was thus disregarded as a reliable option for this problem. The method using SIFT followed the desired ranking better, but struggled a lot with differentiating between the different contrast phases. A histogram component were also added to this method, which increased the accuracy and improved the ranking. Although, further development is still needed for thismethod to be a reliable option that could be used in a clinical setting.
@mastersthesis{diva2:1665838,
author = {Castenbrandt, Felicia},
title = {{Image Similarity Scoring for Medical Images in 3D}},
school = {Linköping University},
type = {{LiTH-ISY-EX--22/5484--SE}},
year = {2022},
address = {Sweden},
}
Lens distortions appear in almost all digital images and cause straight lines to appear curved in the image. This can contribute to errors in position estimations and 3D reconstruction and it is therefore of interest to correct for the distortion. If the camera is available, the distortion parameters can be obtained when calibrating the camera. However, when the camera is unavailable the distortion parameters can not be found with the standard camera calibration technique and other approaches must be used. Recently, variants of Perspective-n-Point (PnP) extended with lens distortionand focal length parameters have been proposed. Given a set of 2D-3D point correspondences, the PnP-based methods can estimate distortion parameters without the camera being available or with modified settings. In this thesis, the performance of PnP-based methods is compared to Zhang’s camera calibration method. The methods are compared both quantitatively, using the errors in reprojectionand distortion parameters, and qualitatively by comparing images before and after lens distortion correction. A test set for the comparison was obtained from a camera and a 3D laser scanner of an indoor scene.The results indicate that one of the PnP-based models can achieve a similar reprojection error as the baseline method for one of the cameras. It could also be seen that two PnP-based models could reduce lens distortion when visually comparing the test images to the baseline. Moreover, it was noted that a model can have a small reprojection error even though the distortion coefficient error is large and the lens distortion is not completely removed. This indicates that it is important to include both quantitative measures, such as reprojection error and distortion coefficient errors, as well as qualitative results when comparing lens distortion correction methods. It could also be seen that PnP-based models with more parameters in the estimation are more sensitive to noise.
@mastersthesis{diva2:1670770,
author = {Olsson, Emily},
title = {{Lens Distortion Correction Without Camera Access}},
school = {Linköping University},
type = {{LiTH-ISY-EX--22/5476--SE}},
year = {2022},
address = {Sweden},
}
An autonomous vehicle is a complex system that requires a good perception of the surrounding environment to operate safely. One part of that is multiple object tracking, which is an essential component in camera-based perception whose responsibility is to estimate object motion from a sequence of images. This requires an association problem to be solved where newly estimated object positions are mapped to previously predicted trajectories, for which different solution strategies exist.
In this work, a multiple hypothesis tracking algorithm is implemented. The purpose is to demonstrate that measurement associations are improved compared to less compute-intensive alternatives. It was shown that the implemented algorithm performed 13 percent better than an intersection over union tracker when evaluated using a standard evaluation metric.
Furthermore, this work also investigates the usage of abstraction layers to accelerate time-critical parallel operations on the GPU. It was found that the execution time of the tracking algorithm could be reduced by 42 percent by replacing four functions with implementations written in the purely functional array language Futhark. Finally, it was shown that a GPU code abstraction layer can reduce the knowledge barrier required to write efficient CUDA kernels.
@mastersthesis{diva2:1670800,
author = {Nolkrantz, Marcus},
title = {{Efficient multiple hypothesis tracking using a purely functional array language}},
school = {Linköping University},
type = {{LiTH-ISY-EX--22/5482--SE}},
year = {2022},
address = {Sweden},
}
With the increasing demand for labeled data in machine learning for visual perception tasks, the interest in using synthetically generated data has grown. Due to the existence of a domain gap between synthetic and real data, strategies in domain adaptation are necessary to achieve high performance with models trained on synthetic or mixed data.
With a dataset of synthetically blocked fish-eye lenses in traffic environments, we explore different strategies to train a neural network. The neural network is a binary classifier for full blockage detection. The different strategies tested are data mixing, fine-tuning, domain adversarial training, and adversarial discriminative domain adaptation. Different ratios between synthetically generated data and real data are also tested. Our experiments showed that fine-tuning had slightly superior results in this test environment. To fully take advantage of the domain adversarial training, training until domain indiscriminate features are learned is necessary and helps the model attain higher performance than using random data mixing.
@mastersthesis{diva2:1671549,
author = {Tran, Hoang},
title = {{Learning with Synthetically Blocked Images for Sensor Blockage Detection}},
school = {Linköping University},
type = {{LiTH-ISY-EX--22/5509--SE}},
year = {2022},
address = {Sweden},
}
Ceramic materials contain several defects, one of which is porosity. At the time of writing, porosity measurement is a manual and time-consuming process performed by a human operator. With advances in deep learning for computer vision, this thesis explores to what degree convolutional neural networks and semantic segmentation can reliably measure porosity from microscope images. Combining classical image processing techniques with deep learning, images were automatically labeled and then used for training semantic segmentation neural networks leveraging transfer learning. Deep learning-based methods were more robust and could more reliably identify porosity in a larger variety of images than solely relying on classical image processing techniques.
@mastersthesis{diva2:1674176,
author = {Isaksson, Filip},
title = {{Measuring Porosity in Ceramic Coating using Convolutional Neural Networks and Semantic Segmentation}},
school = {Linköping University},
type = {{LiTH-ISY-EX--22/5490--SE}},
year = {2022},
address = {Sweden},
}
Estimation of forest parameters using remote sensing information could streamline the forest industry from a time and economic perspective. This thesis utilizes object detection and semantic segmentation to detect and classify individual trees from images over 3D models reconstructed from satellite images. This thesis investigated two methods that showed different strengths in detecting and classifying trees in deciduous, evergreen, or mixed forests. These methods are not just valuable for forest inventory but can be greatly useful for telecommunication companies and in defense and intelligence applications. This thesis also presents methods for estimating tree volume and estimating tree growth in 3D models. The results from the methods show the potential to be used in forest management. Finally, this thesis shows several benefits of managing a digitalized forest, economically, environmentally, and socially.
@mastersthesis{diva2:1673885,
author = {Dahm\'{e}n, Gustav and Strand, Erica},
title = {{Forest Growth And Volume Estimation Using Machine Learning}},
school = {Linköping University},
type = {{LiTH-ISY-EX--22/5508--SE}},
year = {2022},
address = {Sweden},
}
Unmanned aerial vehicles (UAVs) with high-resolution cameras are common in today’s society. Industries, such as the forestry industry, use drones to get a fast overview of tree populations. More advanced sensors, such as near-infrared light or depth data, can increase the amount of information that UAV images provide, providing information about the forest, such as; tree quantity or forest health. However, the fast-expanding field of deep learning could help expand the information acquired using only RGB cameras. Three deep learning models, FasterR-CNN, RetinaNet, and YOLOR were compared to investigate this. It was also investigated if initializing the models using transfer learning from the MS COCO dataset could increase the performance of the models. The dataset used was Swedish Forest Agency (2021): Forest Damages-Spruce Bark Beetle 1.0 National Forest Data Lab and drone images provided by IT-Bolaget Per & Per. The deep learning models were to detect five different tree species; spruce, pine, birch, aspen, and others. The results show potential for the usage of deep learning to detect tree species in images from UAVs.
@mastersthesis{diva2:1676909,
author = {Sievers, Olle},
title = {{CNN-Based Methods for Tree Species Detection in UAV Images}},
school = {Linköping University},
type = {{LiTH-ISY-EX--22/5502--SE}},
year = {2022},
address = {Sweden},
}
Object tracking can be done in numerous ways, where the goal is to track a target through all frames in a sequence. The ground truth bounding box is used to initialize the object tracking algorithm. Object tracking can be carried out on infrared imagery suitable for military applications to execute tracking even without illumination. Objects, such as aircraft, can deploy countermeasures to impede tracking. The countermeasures most often mainly impact one wavelength band. Therefore, using two different wavelength bands for object tracking can counteract the impact of the countermeasures. The dataset was created from simulations. The countermeasures applied to the dataset are flares and Directional Infrared Countermeasures (DIRCMs).
Different object tracking algorithms exist, and many are based on discriminative correlation filters (DCF). The thesis investigated the DCF-based trackers STRCF and ECO on the created dataset. The STRCF and the ECO trackers were analyzed using one and two wavelength bands. The following features were investigated for both trackers: grayscale, Histogram of Oriented Gradients (HOG), and pre-trained deep features.
The results indicated that the STRCF and the ECO trackers using two wavelength bands instead of one improved performance on sequences with countermeasures. The use of HOG, deep features, or a combination of both improved the performance of the STRCF tracker using two wavelength bands. Likewise, the performance of the ECO tracker using two wavelength bands was improved by the use of deep features. However, the negative aspect of using two wavelength bands and introducing more features is that it resulted in a lower frame rate.
@mastersthesis{diva2:1676100,
author = {Modorato, Sara},
title = {{Tracking Under Countermeasures Using Infrared Imagery}},
school = {Linköping University},
type = {{LiTH-ISY-EX--22/5473--SE}},
year = {2022},
address = {Sweden},
}
In recent years, the EU has observed a decrease in the stocks of certain fish species due to unrestricted fishing. To combat the problem, many fisheries are investigating how to automatically estimate the catch size and composition using sensors onboard the vessels. Yet, measuring the size of fish in marine imagery is a difficult task. The images generally suffer from complex conditions caused by cluttered fish, motion blur and dirty sensors.
In this thesis, we propose a novel method for automatic measurement of fish size that can enable measuring both visible and occluded fish. We use a Mask R-CNN to segment the visible regions of the fish, and then fill in the shape of the occluded fish using a U-Net. We train the U-Net to perform shape completion in a semi-supervised manner, by simulating occlusions on an open-source fish dataset. Different to previous shape completion work, we teach the U-Net when to fill in the shape and not by including a small portion of fully visible fish in the input training data.
Our results show that our proposed method succeeds to fill in the shape of the synthetically occluded fish as well as of some of the cluttered fish in real marine imagery. We achieve an mIoU score of 93.9 % on 1 000 synthetic test images and present qualitative results on real images captured onboard a fishing vessel. The qualitative results show that the U-Net can fill in the shapes of lightly occluded fish, but struggles when the tail fin is hidden and only parts of the fish body is visible. This task is difficult even for a human, and the performance could perhaps be increased by including the fish appearance in the shape completion task. The simulation-to-reality gap could perhaps also be reduced by finetuning the U-Net on some real occlusions, which could increase the performance on the heavy occlusions in the real marine imagery.
@mastersthesis{diva2:1677704,
author = {Gustafsson, Stina},
title = {{Learning to Measure Invisible Fish}},
school = {Linköping University},
type = {{LiTH-ISY-EX--22/5517--SE}},
year = {2022},
address = {Sweden},
}
The development of autonomous driving systems has been one of the most popular research areas in the 21st century. One key component of these kinds of systems is the ability to perceive and comprehend the physical world. Two techniques that address this are object detection and semantic segmentation. During the last decade, CNN based models have dominated these types of tasks. However, in 2021, transformer based networks were able to outperform the existing CNN approach, therefore, indicating a paradigm shift in the domain. This thesis aims to explore the use of a vision transformer, particularly a Swin Transformer, in an object detection and semantic segmentation framework, and compare it to a classical CNN on road scenes. In addition, since real-time execution is crucial for autonomous driving systems, the possibility of a parameter reduction of the transformer based network is investigated. The results appear to be advantageous for the Swin Transformer compared to the convolutional based network, considering both object detection and semantic segmentation. Furthermore, the analysis indicates that it is possible to reduce the computational complexity while retaining the performance.
@mastersthesis{diva2:1678704,
author = {Hardebro, Mikaela and Jirskog, Elin},
title = {{Transformer Based Object Detection and Semantic Segmentation for Autonomous Driving}},
school = {Linköping University},
type = {{LiTH-ISY-EX--22/5487--SE}},
year = {2022},
address = {Sweden},
}
In recent years, pictures from handheld devices such as smartphones have been increasingly utilized as a documentation tool by medical practitioners not trained to take professional photographs. Similarly to the other types of image modalities, the images should be taken in a way to capture the vital information in the region of interest. Nevertheless, image capturing cannot always be done as desired, so images may exhibit different blur types at the region of interest. Having blurry images does not serve medical purposes, therefore, the patients might have to schedule a second appointment several days later to retake the images. A solution to this problem is to create an algorithm which immediately after capturing an image determines if it is medically useful and notifies the user of the result. The algorithm needs to perform the analysis at a reasonable speed, and at best, with a limited number of operations to make the calculations directly in the smartphone device. A large number of medical images must be available to create such an algorithm. Medical images are difficult to acquire, and it is specifically difficult to acquire blurry images since they are usually deleted.
The main objective of this thesis is to determine the medical usefulness of images taken with smartphone cameras, using both machine learning and handcrafted algorithms, with a low number of floating point operations and a high performance. Seven different algorithms (one hand-crafted and six machine learned) are created and compared regarding both number of floating point operations and performance. Fast Walsh-Hadamard transforms are the basis of the hand-crafted algorithm. The employed machine learning algorithms are both based on common convolutional neural networks (MobileNetV3 and ResNet50) and on our own designs. The issue with the low number of medical images acquired is solved by training the machine learning models on a synthetic dataset, where the non-medically useful images are generated by applying blur on the medically useful images. These models do, however, undergo evaluation using a real dataset, containing medically useful images as well as non-medically useful images.
Our results implicate that a real-time determination of the medical usefulness of images is possible on handheld devices, since our machine learned model DeepLAD-Net reaches the highest accuracy with 42 · 106 floating point operations. In terms of accuracy, MobileNetV3-large is the second best model with31 times as many floating point operations as our best model.
@mastersthesis{diva2:1670428,
author = {Zahra, Hasseli and Raamen, Anwia Odisho},
title = {{Automatic Quality Assessment of Dermatology Images:
A Comparison Between Machine Learning and Hand-Crafted Algorithms}},
school = {Linköping University},
type = {{LiTH-ISY-EX--22/5486--SE}},
year = {2022},
address = {Sweden},
}
With advancements in space technology, remote sensing applications, and computer vision, significant improvements in the data describing our planet are seen today. Researchers want to gather different kinds of data and perform data fusion techniques between them to increase our understanding of the world. Two such data types are Electro-Optical images and Synthetic Aperture Radar images. For data fusion, the images need to be accurately aligned. Researchers have investigated methods for robustly and accurately registering these images for many years. However, recent advancements in imaging systems have made the problem more complex than ever.
Currently, the imaging satellites that capture information around the globe have achieved a resolution of less than a meter per pixel. There is an increase in signal complexity for high-resolution SAR images due to how the imaging system operates. Interference between waves gives rise to speckled noise and geometric distortions, making the images very difficult to interpret. This directly affects the image registration accuracy.
In this thesis, the complexity of the problem regarding registration between SAR and EO data was described, and methods for registering the images were investigated. The methods were feature- and area-based. The feature-based method used a KAZE filter and SURF descriptor. The method found many key points but few correct correspondences. The area-based methods used FFT and MI, respectively. FFT was deemed best for higher quality images, whereas MI better dealt with the non-linear intensity difference. More complex techniques, such as dense neural networks, were excluded. No method achieved satisfying results on the entire data set, but the area-based methods accomplished complementary results.
A conclusion was drawn that the distortions in the SAR images are too significant to register accurately using only CV algorithms. Since the area-based methods achieved good results on images excluding significant distortions, future work should focus on solving the geometrical errors and increasing the registration accuracy
@mastersthesis{diva2:1682316,
author = {Hansson, Niclas},
title = {{Investigation of Registration Methods for High Resolution SAR-EO Imagery}},
school = {Linköping University},
type = {{LiTH-ISY-EX--22/5506--SE}},
year = {2022},
address = {Sweden},
}
This master thesis project was done together with Saab Dynamics in Linköping the spring of 2022 and aims to perform an online IMU-camera calibration using an AprilTag board. Experiments are conducted on two different types of datasets, the public dataset Euroc and internal datasets from Saab. The calibration is done iteratively by solving a series of nonlinear optimization problems without any initial knowledge of the sensor configuration. The method is largely based on work by Huang and collaborators. Other than just finding the transformation between the IMU and the camera, the biases in the IMU, and the time delay between the two sensors are also explored. By comparing the resulting transformation with Kalibr, the current state of the art offline calibration toolbox, it is possible to conclude that the model can find and correct for the biases in the gyroscope. Therefore it is important to include these biases in the model. The model is able to roughly find the time shift between the two sensors but has more difficulties correcting for it. The thesis also aims to explore ways of compiling a good dataset for calibration. Results show that it is desirable to avoid rapid movements as well as images gathered at distances from the AprilTag board that very a lot. Also, having a shorter exposure time is useful to not lose AprilTag detections.
@mastersthesis{diva2:1701458,
author = {Karlhede, Arvid},
title = {{Online Camera-IMU Calibration}},
school = {Linköping University},
type = {{LiTH-ISY-EX--22/5524--SE}},
year = {2022},
address = {Sweden},
}
Automatic detection of weeds could be used for more efficient weed control in agriculture. In this master thesis, weed detectors have been trained and examined on data collected by RISE to investigate whether an accurate weed detector could be trained on the collected data. When only using annotations of the weed class Creeping thistle for training and evaluation, a detector achieved a mAP of 0.33. When using four classes of weed, a detector was trained with a mAP of 0.07. The performance was worse than in a previous study also dealing with weed detection. Hypotheses for why the performance was lacking were examined. Experiments indicated that the problem could not fully be explained by the model being underfitted, nor by the object’s backgrounds being too similar to the foreground, nor by the quality of the annotations being too low. The performance was better when training the model with as much data as possible than when only selected segments of the data were used.
@mastersthesis{diva2:1666845,
author = {Ahlqvist, Axel},
title = {{Examining Difficulties in Weed Detection}},
school = {Linköping University},
type = {{}},
year = {2022},
address = {Sweden},
}
Perception of depth, ego-motion and robust keypoints is critical for SLAM andstructure from motion applications. Neural networks have achieved great perfor-mance in perception tasks in recent years. But collecting labeled data for super-vised training is labor intensive and costly. This thesis explores recent methodsin unsupervised training of neural networks that can predict depth, ego-motion,keypoints and do geometric consensus maximization. The benefit of unsuper-vised training is that the networks can learn from raw data collected from thecamera sensor, instead of labeled data. The thesis focuses on training on imagesfrom a monocular camera, where no stereo or LIDAR data is available. The exper-iments compare different techniques for depth and ego-motion prediction fromprevious research, and shows how the techniques can be combined successfully.A keypoint prediction network is evaluated and its performance is comparedwith the ORB detector provided by OpenCV. A geometric consensus network isalso implemented and its performance is compared with the RANSAC algorithmin OpenCV. The consensus maximization network is trained on the output of thekeypoint prediction network. For future work it is suggested that all networkscould be combined and trained jointly to reach a better overall performance. Theresults show (1) which techniques in unsupervised depth prediction are most ef-fective, (2) that the keypoint predicting network outperformed the ORB detector,and (3) that the consensus maximization network was able to classify outlierswith comparable performance to the RANSAC algorithm of OpenCV.
@mastersthesis{diva2:1534180,
author = {Örjehag, Erik},
title = {{Unsupervised Learning for Structure from Motion}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5361--SE}},
year = {2021},
address = {Sweden},
}
Training data is an essential ingredient within supervised learning, yet time con-suming, expensive and for some applications impossible to retrieve. Thus it isof interest to use synthetic training data. However, the domain shift of syntheticdata makes it challenging to obtain good results when used as training data fordeep learning models. It is therefore of interest to refine synthetic data, e.g. using image-to-image translation, to improve results. The aim of this work is to compare different methods to do image-to-image translation of synthetic training data of thermal IR-images using GANs. Translation is done both using synthetic thermal IR-images alone, as well as including pixelwise depth and/or semantic information. To evaluate, a new measure based on the Frechét Inception Distance, adapted to work for thermal IR-images is proposed. The results show that the model trained using IR-images alone translates the generated images closest to the domain of authentic thermal IR-images. The training where IR-images are complemented by corresponding pixelwise depth data performs second best. However, given more training time, inclusion of depth data has the potential to outperform training withirdata alone. This gives a valuable insight on how to best translate images from the domain of synthetic IR-images to that of authentic IR-images, which is vital for quick and low cost generation of training data for deep learning models.
@mastersthesis{diva2:1543340,
author = {Hamrell, Hanna},
title = {{Image-to-Image Translation for Improvement of Synthetic Thermal Infrared Training Data Using Generative Adversarial Networks}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5364--SE}},
year = {2021},
address = {Sweden},
}
Instance segmentation has a great potential for improving the current state of littering by autonomously detecting and segmenting different categories of litter. With this information, litter could, for example, be geotagged to aid litter pickers or to give precise locational information to unmanned vehicles for autonomous litter collection. Land-based litter instance segmentation is a relatively unexplored field, and this study aims to give a comparison of the instance segmentation models Mask R-CNN and DetectoRS using the multiclass litter dataset called Trash Annotations in Context (TACO) in conjunction with the Common Objects in Context precision and recall scores. TACO is an imbalanced dataset, and therefore imbalanced data-handling is addressed, exercising a second-order relation iterative stratified split, and additionally oversampling when training Mask R-CNN. Mask R-CNN without oversampling resulted in a segmentation of 0.127 mAP, and with oversampling 0.163 mAP. DetectoRS achieved 0.167 segmentation mAP, and improves the segmentation mAP of small objects most noticeably, with a factor of at least 2, which is important within the litter domain since small objects such as cigarettes are overrepresented. In contrast, oversampling with Mask R-CNN does not seem to improve the general precision of small and medium objects, but only improves the detection of large objects. It is concluded that DetectoRS improves results compared to Mask R-CNN, as well does oversampling. However, using a dataset that cannot have an all-class representation for train, validation, and test splits, together with an iterative stratification that does not guarantee all-class representations, makes it hard for future works to do exact comparisons to this study. Results are therefore approximate considering using all categories since 12 categories are missing from the test set, where 4 of those were impossible to split into train, validation, and test set. Further image collection and annotation to mitigate the imbalance would most noticeably improve results since results depend on class-averaged values. Doing oversampling with DetectoRS would also help improve results. There is also the option to combine the two datasets TACO and MJU-Waste to enforce training of more categories.
@mastersthesis{diva2:1546705,
author = {Sievert, Rolf},
title = {{Instance Segmentation of Multiclass Litter and Imbalanced Dataset Handling:
A Deep Learning Model Comparison}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5365--SE}},
year = {2021},
address = {Sweden},
}
When photos and videos are increasingly used as evidence material, it is of importance to know if these materials can be used as evidence material or if the risk of them being forged is impending. This thesis investigates methods for detecting anomalous regions in images and videos using photo-response non-uniformity -- a fixed-pattern sensor noise that can be estimated from photos or videos.
For photos, experiments were performed on a method that assumes other photos from the same camera are available. For videos, experiments were performed on a method further developed from the still image method, with other videos from the same camera being available. The last experiments were performed on videos when only the video that was about to be investigated was available.
The experiments on the still image method were performed on images with three different kinds of forged regions: a forged region from somewhere else in the same photo, a forged region from a photo taken by another camera, and a forged region from the same sensor position in a photo taken by the same camera. The method should not be able to detect the third kind of forged region. Experiments performed on videos had a forged region in several adjacent frames in the video. The forged region was from another video, and it moved and changed shape between the frames.
The methods mainly consist of a classification process and some post-processing. In the classification process, features were extracted from images/videos and used in a random forest classifier. The results are presented in precision, recall, F1 score and false positive rate.
The quality of the still images was generally better than the videos, which also resulted in better results. For the cameras used in the experiments, it seemed easier to estimate a good PRNU pattern from photos and videos from older cameras. Probably due to sensor differences and extra processing in newer camera models. How the images and videos are compressed also affects the possibility to estimate a good PRNU pattern, because important information may then be lost.
@mastersthesis{diva2:1552602,
author = {Söderqvist, Kerstin},
title = {{Anomaly Detection in Images and Videos Using Photo-Response Non-Uniformity}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5367--SE}},
year = {2021},
address = {Sweden},
}
Reconstruction of sonar images is an inverse problem, which is normally solved with model-based methods. These methods may introduce undesired artifacts called angular and range leakage into the reconstruction. In this thesis, a method called Learned Primal-Dual Reconstruction, which combines a data-driven and a model-based approach, is used to investigate the use of data-driven methods for reconstruction within sonar imaging. The method uses primal and dual variables inspired by classical optimization methods where parts are replaced by convolutional neural networks to iteratively find a solution to the reconstruction problem. The network is trained and validated with synthetic data on eight models with different architectures and training parameters. The models are evaluated on measurement data and the results are compared with those from a purely model-based method. Reconstructions performed on synthetic data, where a ground truth image is available, show that it is possible to achieve reconstructions with the data-driven method that have less leakage than reconstructions from the model-based method. For reconstructions performed on measurement data where no ground truth is available, some variants of the learned model achieve a good result with less leakage.
@mastersthesis{diva2:1561999,
author = {Nilsson, Lovisa},
title = {{Data-Driven Methods for Sonar Imaging}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5381--SE}},
year = {2021},
address = {Sweden},
}
This thesis investigates the development and use of software to measure respiratory frequency on cows using optronics and computer vision. It examines mainly two different strategies of image and signal processing and their performances for different input qualities. The effect of heat stress on dairy cows and the high transmission risk of pneumonia for calves make the investigation done during this thesis highly relevant since they both have the same symptom; increased respiratory frequency. The data set used in this thesis was of recorded dairy cows in different environments and from varying angles. Recordings, where the authors could determine a true breathing frequency by monitoring body movements, were accepted to the data set and used to test and develop the algorithms. One method developed in this thesis estimated the breathing rate in the frequency domain by Fast Fourier Transform and was named "N-point Fast Fourier Transform." The other method was called "Breathing Movement Zero-Crossing Counting." It estimated a signal in the time domain, whose fundamental frequency was determined by a zero-crossing algorithm as the breathing frequency. The result showed that both the developed algorithm successfully estimated a breathing frequency with a reasonable error margin for most of the data set. The zero-crossing algorithm showed the most consistent result with an error margin lower than 0.92 breaths per minute (BPM) for twelve of thirteen recordings. However, it is limited to recordings where the camera is placed above the cow. The N-point FFT algorithm estimated the breathing frequency with error margins between 0.44 and 5.20 BPM for the same recordings as the zero-crossing algorithm. This method is not limited to a specific camera angle but requires the cow to be relatively stationary to get accurate results. Therefore, it could be evaluated with the remaining three recordings of the data set. The error margins for these recordings were measured between 1.92 and 10.88 BPM. Both methods had execution time acceptable for implementation in real-time. It was, however, too incomplete a data set to determine any performance with recordings from different optronic devices.
@mastersthesis{diva2:1563490,
author = {Antonsson, Per and Johansson, Jesper},
title = {{Measuring Respiratory Frequency Using Optronics and Computer Vision}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5376--SE}},
year = {2021},
address = {Sweden},
}
Generic visual object tracking is the task of tracking one or several objects in all frames in a video, knowing only the location and size of the target in the initial frame. Visual tracking can be carried out in both the infrared and the visual spectrum simultaneously, this is known as multi-modal tracking. Utilizing both spectra can result in a more diverse tracker since visual tracking in infrared imagery makes it possible to detect objects even in poor visibility or in complete darkness. However, infrared imagery lacks the number of details that are present in visual images. A common method for visual tracking is to use discriminative correlation filters (DCF). These correlation filters are then used to detect an object in every frame of an image sequence. This thesis focuses on investigating aspects of a DCF based tracker, operating in the two different modalities, infrared and visual imagery. First, it was investigated whether the tracking benefits from using two channels instead of one and what happens to the tracking result if one of those channels is degraded by an external cause. It was also investigated if the addition of image features can further improve the tracking. The result shows that the tracking improves when using two channels instead of only using a single channel. It also shows that utilizing two channels is a good way to create a robust tracker, which is still able to perform even though one of the channels is degraded. Using deep features, extracted from a pre-trained convolutional neural network, was the image feature improving the tracking the most, although the implementation of the deep features made the tracking significantly slower.
@mastersthesis{diva2:1566492,
author = {Wettermark, Emma and Berglund, Linda},
title = {{Multi-Modal Visual Tracking Using Infrared Imagery}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5401--SE}},
year = {2021},
address = {Sweden},
}
The increasing popularity of drones has made it convenient to capture a large number of images of a property, which can then be used to build a 3D model. The conditions of buildings can be analyzed to plan renovations. This creates an interest for automatically identifying building materials, a task well suited for machine learning.
With access to drone imagery of buildings as well as depth maps and normal maps, we created a dataset for semantic segmentation. Two different convolutional neural networks were trained and evaluated, to see how well they perform material segmentation. DeepLabv3+, which uses RGB data, was compared to Depth-Aware CNN, which uses RGB-D data. Our experiments showed that DeepLabv3+ achieved higher mean intersection over union.
To investigate if the information in the depth maps and normal maps could give a performance boost, we conducted experiments with an encoding we call HMN - horizontal disparity, magnitude of normal with ground, normal parallel with gravity. This three channel encoding was used to jointly train two CNNs, one with RGB and one with HMN, and then sum their predictions. This led to improved results for both DeepLabv3+ and Depth-Aware CNN.
@mastersthesis{diva2:1567671,
author = {Rydgård, Jonas and Bejgrowicz, Marcus},
title = {{Semantic Segmentation of Building Materials in Real World Images Using 3D Information}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5405--SE}},
year = {2021},
address = {Sweden},
}
Learning-based multi-view stereo (MVS) has shown promising results in the domain of general 3D reconstruction. However, no work before this thesis has applied learning-based MVS to urban 3D reconstruction from satellite images. In this thesis, learning-based MVS is used to infer depth maps from satellite images. Models are trained on both synthetic and real satellite images from Las Vegas with ground truth data from a high-resolution aerial-based 3D model. This thesis also evaluates different methods for reconstructing digital surface models (DSM) and compares them to existing satellite-based 3D models at Maxar Technologies. The DSMs are created by either post-processing point clouds obtained from predicted depth maps or by an end-to-end approach where the depth map for an orthographic satellite image is predicted.
This thesis concludes that learning-based MVS can be used to predict accurate depth maps. Models trained on synthetic data yielded relatively good results, but not nearly as good as for models trained on real satellite images. The trained models also generalize relatively well to cities not present in training. This thesis also concludes that the reconstructed DSMs achieve better quantitative results than the existing 3D model in Las Vegas and similar results for the test sets from other cities. Compared to ground truth, the best-performing method achieved an L1 and L2 error of 14 % and 29 % lower than Maxar's current 3D model, respectively. The method that uses a point cloud as an intermediate step achieves better quantitative results compared to the end-to-end system. Very promising qualitative results are achieved with the proposed methods, especially when utilizing an end-to-end approach.
@mastersthesis{diva2:1567722,
author = {Yngesjö, Tim},
title = {{3D Reconstruction from Satellite Imagery Using Deep Learning}},
school = {Linköping University},
type = {{}},
year = {2021},
address = {Sweden},
}
Detecting and outlining products in images is beneficial for many use cases in e-commerce, such as automatically identifying and locating products within images and proposing matches for the detections. This study investigated how the utilisation of metadata associated with images of products could help boost the performance of an existing approach with the ultimate goal of reducing manual labour needed to annotate images. This thesis explored if approximate pseudo masks could be generated for products in images by leveraging metadata as image-level labels and subsequently using the masks to train a Mask R-CNN. However, this approach did not result in satisfactory results. Further, this study found that by incorporating the metadata directly in the Mask R-CNN, an mAP performance increase of nearly 5\% was achieved. Furthermore, utilising the available metadata to divide the training samples for a KNN model into subsets resulted in an increased top-3 accuracy of up to 16\%. By representing the data with embeddings created by a pre-trained CNN, the KNN model performed better with both higher accuracy and more reasonable suggestions.
@mastersthesis{diva2:1570488,
author = {Wahlquist, Gustav},
title = {{Improving Automatic Image Annotation Using Metadata}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5398--SE}},
year = {2021},
address = {Sweden},
}
Image segmentation through neural networks and deep learning have, in the recent decade, become a successful tool for automated decision-making. For Luossavaara-Kiirunavaara Aktiebolag (LKAB), this means identifying the amount of slag inside a furnace through computer vision.
There are many prominent convolutional neural network architectures in the literature, and this thesis explores two: a modified U-Net and the PSPNet. The architectures were combined with three loss functions and three class weighting schemes resulting in 18 model configurations that were evaluated and compared. This thesis also explores transfer learning techniques for neural networks tasked with identifying slag in images from inside a furnace. The benefit of transfer learning is that the network can learn to find features from already labeled data of another context. Finally, the thesis explored how temporal information could be utilised by adding an LSTM layer to a model taking pairs of images as input, instead of one.
The results show (1) that the PSPNet outperformed the U-Net for all tested configurations in all relevant metrics, (2) that the model is able to find more complex features while converging quicker by using transfer learning, and (3) that utilising temporal information reduced the variance of the predictions, and that the modified PSPNet using an LSTM layer showed promise in handling images with outlying characteristics.
@mastersthesis{diva2:1572304,
author = {von Koch, Christian and Anz\'{e}n, William},
title = {{Detecting Slag Formation with Deep Learning Methods:
An experimental study of different deep learning image segmentation models}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5427--SE}},
year = {2021},
address = {Sweden},
}
This master thesis studies the learning of dense feature descriptors where camera poses are the only supervisory signal. The use of camera poses as a supervisory signal has only been published once before, and this thesis expands on this previous work by utilizing a couple of different techniques meant increase the robustness of the method, which is particularly important when not having access to ground-truth correspondences. Firstly, an adaptive robust loss is utilized to better differentiate inliers and outliers. Secondly, statistical properties during training are both enforced and adapted to, in an attempt to alleviate problems with uncertainties introduced by not having true correspondences available. These additions are shown to slightly increase performance, and also highlights some key ideas related to prediction certainty and robustness when working with camera poses as a supervisory signal. Finally, possible directions for future work are discussed.
@mastersthesis{diva2:1573398,
author = {Dahlqvist, Marcus},
title = {{Adaptive Losses for Camera Pose Supervision}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5422--SE}},
year = {2021},
address = {Sweden},
}
This thesis investigates the possibility of utilizing data from multiple modalities to enable an automated recycling system to separate ferrous from non-ferrous debris. The two methods sensor fusion and hallucinogenic sensor fusion were implemented in a four-step approach of deep CNNs. Sensor fusion implies that multiple modalities are run simultaneously during the operation of the system.The individual outputs are further fused, and the joint performance expects to be superior to having only one of the sensors. In hallucinogenic sensor fusion, the goal is to achieve the benefits of sensor fusion in respect to cost and complexity even when one of the modalities is reduced from the system. This is achieved by leveraging data from a more complex modality onto a simpler one in a student/teacher approach. As a result, the teacher modality will train the student sensor to hallucinate features beyond its visual spectra. Based on the results of a performed prestudy involving multiple types of modalities, a hyperspectral sensor was deployed as the teacher to complement a simple RGB camera. Three studies involving differently composed datasets were further conducted to evaluate the effectiveness of the methods. The results show that the joint performance of a hyperspectral sensor and an RGB camera is superior to both individual dispatches. It can also be concluded that training a network with hyperspectral images can improve the classification accuracy when operating with only RGB data. However, the addition of a hyperspectral sensor might be considered as superfluous as this report shows that the standardized shapes of industrial debris enable a single RGB to achieve an accuracy above 90%. The material used in this thesis can also be concluded to be suboptimal for hyperspectral analysis. Compared to the vegetation scenes, only a limited amount of additional data could be obtained by including wavelengths besides the ones representing red, green and blue.
@mastersthesis{diva2:1582328,
author = {Brundin, Sebastian and Gräns, Adam},
title = {{Efficient Recycling Of Non-Ferrous Materials Using Cross-Modal Knowledge Distillation}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5403--SE}},
year = {2021},
address = {Sweden},
}
This thesis provides a comparison between instance segmentation methods using point clouds and depth images. Specifically, their performance on cluttered scenes of irregular objects in an industrial environment is investigated.
Recent work by Wang et al. [1] has suggested potential benefits of a point cloud representation when performing deep learning on data from 3D cameras. However, little work has been done to enable quantifiable comparisons between methods based on different representations, particularly on industrial data.
Generating synthetic data provides accurate grayscale, depth map, and point cloud representations for a large number of scenes and can thus be used to compare methods regardless of datatype. The datasets in this work are created using a tool provided by SICK. They simulate postal packages on a conveyor belt scanned by a LiDAR, closely resembling a common industry application. Two datasets are generated. One dataset has low complexity, containing only boxes.The other has higher complexity, containing a combination of boxes and multiple types of irregularly shaped parcels.
State-of-the-art instance segmentation methods are selected based on their performance on existing benchmarks. We chose PointGroup by Jiang et al. [2], which uses point clouds, and Mask R-CNN by He et al. [3], which uses images.
The results support that there may be benefits of using a point cloud representation over depth images. PointGroup performs better in terms of the chosen metric on both datasets. On low complexity scenes, the inference times are similar between the two methods tested. However, on higher complexity scenes, MaskR-CNN is significantly faster.
@mastersthesis{diva2:1584003,
author = {Konradsson, Albin and Bohman, Gustav},
title = {{3D Instance Segmentation of Cluttered Scenes:
A Comparative Study of 3D Data Representations}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5421--SE}},
year = {2021},
address = {Sweden},
}
Deep learning methods for medical image segmentation are hindered by the lack of training data. This thesis aims to develop a method that overcomes this problem. Basic U-net trained on XCAT phantom data was tested first. The segmentation results were unsatisfactory even when artificial quantum noise was added. As a workaround, CycleGAN was used to add tissue textures to the XCAT phantom images by analyzing patient CT images. The generated images were used totrain the network. The textures introduced by CycleGAN improved the segmentation, but some errors remained. Basic U-net was replaced with Attention U-net, which further improved the segmentation. More work is needed to fine-tune and thoroughly evaluate the method. The results obtained so far demonstrate the potential of this method for the segmentation of medical images. The proposed algorithms may be used in iterative image reconstruction algorithms in multi-energy computed tomography.
@mastersthesis{diva2:1584712,
author = {ZHAO, HANG},
title = {{Segmentation and synthesis of pelvic region CT images via neural networks trained on XCAT phantom data}},
school = {Linköping University},
type = {{}},
year = {2021},
address = {Sweden},
}
When radiologists examine X-rays, it is crucial that they are aware of the laterality of the examined body part. The laterality refers to which side of the body that is considered, e.g. Left and Right. The consequences of a mistake based on information regarding the incorrect laterality could be disastrous. This thesis aims to address this problem by providing a deep neural network model that classifies X-rays based on their laterality.
X-ray images contain markers that are used to indicate the laterality of the image. In this thesis, both a classification model and a detection model have been trained to detect these markers and to identify the laterality. The models have been trained and evaluated on four body parts: knees, feet, hands and shoulders. The images can be divided into three laterality classes: Bilateral, Left and Right.
The model proposed in this thesis is a combination of two classification models: one for distinguishing between Bilateral and Unilateral images, and one for classifying Unilateral images as Left or Right. The latter utilizes the confidence of the predictions to categorize some of them as less accurate (Uncertain), which includes images where the marker is not visible or very hard to identify.
The model was able to correctly distinguish Bilateral from Unilateral with an accuracy of 100.0 %. For the Unilateral images, 5.00 % were categorized as Uncertain and for the remaining images, 99.99 % of those were classified correctly as Left or Right.
@mastersthesis{diva2:1587188,
author = {Björn, Martin},
title = {{Laterality Classification of X-Ray Images:
Using Deep Learning}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5417-SE}},
year = {2021},
address = {Sweden},
}
In the glass wool industry, the molten glass flow is monitored for regulation purposes. Given the progress in the computer vision field, the current monitoring solution might be replaced by a camera based solution. The aim of this thesis is to investigate the possibility of using optical flow techniques for estimation of the molten glass flow displacement.
Three glass melt flow datasets were recorded, as well as two additional melt flow datasets, using a NIR camera. The block matching techniques Full Search (FS) and Adaptive Rood Pattern Search (ARPS), as well as the local feature methods ORB and A-KAZE were considered. These four techniques were compared to RAFT, the state-of-the-art approach for optical flow estimation, using available pre-trained models, as well as an approach of using the tracking method ECO for the optical flow estimation.
The methods have been evaluated using the metrics MAE, MSE, and SSIM to compare the warped flow to the target image. In addition, ground truth for 50 frames from each dataset was manually annotated as to use the optical flow metric End-Point Error. To investigate the computational complexity the average computational time per frame was calculated.
The investigation found that RAFT does not perform well on the given data, due to the large displacements of the flows. For simulated displacements of up to about 100 pixels at full resolution, the performance is satisfactory, with results comparable to the traditional methods.
Using ECO for optical flow estimation encounters similar problems as RAFT, where the large displacement proved challenging for the tracker. Simulating smaller motions of up to 60 pixels resulted in good performance, though computation time of the used implementation is much too high for a real-time implementation.
The four traditional block matching and local feature approaches examined in this thesis outperform the state-of-the-art approaches. FS, ARPS, A-KAZE, and ORB all have similar performance on the glass flow datasets, whereas the block matching approaches fail on the alternative melt flow data as the template extraction approach is inadequate. The two local feature approaches, though working reasonably well on all datasets given full resolution, struggle to identify features on down-sampled data. This might be mitigated by fine-tuning the settings of the methods. Generally, ORB mostly outperforms A-KAZE with respect to the evaluation metrics, and is considerably faster.
@mastersthesis{diva2:1592777,
author = {Rudin, Malin},
title = {{Evaluation of Optical Flow for Estimation of Liquid Glass Flow Velocity}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5433--SE}},
year = {2021},
address = {Sweden},
}
Hyperspectral imaging is an expanding topic within the field of computer vision, that uses images of high spectral granularity. Contrastive learning is a discrim- inative approach to self-supervised learning, a form of unsupervised learning where the network is trained using self-created pseudo-labels. This work com- bines these two research areas and investigates how a pretrained network based on contrastive learning can be used for hyperspectral images. The hyperspectral images used in this work are generated from simulated RGB images and spec- tra from a spectral library. The network is trained with a pretext task based on data augmentations, and is evaluated through transfer learning and fine-tuning for a downstream task. The goal is to determine the impact of the pretext task on the downstream task and to determine the required amount of labelled data. The results show that the downstream task (a classifier) based on the pretrained network barely performs better than a classifier without a pretrained network. In the end, more research needs to be done to confirm or reject the benefit of a pretrained network based on contrastive learning for hyperspectral images. Also, the pretrained network should be tested on real-world hyperspectral data and trained with a pretext task designed for hyperspectral images.
@mastersthesis{diva2:1593358,
author = {Syr\'{e}n Grönfelt, Natalie},
title = {{Pretraining a Neural Network for Hyperspectral Images Using Self-Supervised Contrastive Learning}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5382--SE}},
year = {2021},
address = {Sweden},
}
In Storytel’s application on which a user can read and listen to digitalized literature, a user is displayed a list of books where the first thing the user encounters is the book title and cover. A book cover is therefore essential to attract a consumer’s attention. In this study, we take a data-driven approach to investigate the design principles for book covers through deep learning models and explainable AI. The first aim is to explore how well a Convolutional Neural Network (CNN) can interpret and classify a book cover image according to its genre in a multi-class classification task. The second aim is to increase model interpretability and investigate model feature to genre correlations. With the help of the explanatory artificial intelligence method Gradient-weighted Class Activation Map (Grad-CAM), we analyze the pixel-wise contribution to the model prediction. In addition, object detection by YOLOv3 was implemented to investigate which objects are detectable and reoccurring in the book covers. An interplay between Grad-CAM and YOLOv3 was used to investigate how identified objects and features correlate to a specific book genre and ultimately answer what makes a good book cover. Using a State-of-the-Art CNN model architecture we achieve an accuracy of 48% with the best class-wise accuracies for genres Erotica, Economy & Business and Children with accuracies 73%, 67% and 66%. Quantitative results from the Grad-CAM and YOLOv3 interplay show some strong associations between objects and genres, while indicating weak associations between abstract design principles and genres. Furthermore, a qualitative analysis of Grad-CAM visualizations show strong relevance of certain objects and text fonts for specific book genres. It was also observed that the portrayal of a feature was relevant for the model prediction of certain genres.
@mastersthesis{diva2:1576364,
author = {Velander, Alice and Gumpert Harrysson, David},
title = {{Do Judge a Book by its Cover!
Predicting the genre of book covers using supervised deep learning. Analyzing the model predictions using explanatory artificial intelligence methods and techniques.}},
school = {Linköping University},
type = {{}},
year = {2021},
address = {Sweden},
}
Hyperspectral imaging based on the use of an exponentially variable filter gives the possibility to construct a lightweight hyperspectral sensor. The exponentially variable filter captures the whole spectral range in each image where each column captures a different wavelength. To gather the full spectrum for any given point in the image requires the fusion of several gathered images with movement in between captures. The construction of a hyperspectral cube requires registration of the gathered images. With a lightweight sensor comes the possibility to mount the hyperspectral sensor on an unmanned aerial vehicle to collect aerial footage. This thesis presents a registration algorithm capable of constructing a complete hyperspectral cube of almost any chosen area in the captured region. The thesis presents the result of a construction method using a multi-frame super-resolution algorithm trying to increase the spectral resolution and a spline interpolation method interpolating missing spectral data. The result of an algorithm trying to suggest the optimal spectral and spatial resolution before constructing the hyperspectral cube is also presented. Lastly, the result of an algorithm providing information about the quality of the constructed hyperspectral cube is also presented.
@mastersthesis{diva2:1596253,
author = {Freij, Hannes},
title = {{Hyperspectral Image Registration and Construction From Irregularly Sampled Data}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5408--SE}},
year = {2021},
address = {Sweden},
}
Point set registration is a well-researched yet still not a very exploited area in computer vision. As the field of machine learning grows, the possibilities of application expand. This thesis investigates the possibility to expand an already implemented probabilistic machine learning approach to point set registration to more complex, larger datasets gathered in a forest environment. The system used as a starting point was created by Järemo Lawin et. al. [10]. The aim of the thesis was to investigate the possibility to register the forest data with the existing system, without ground-truth poses, with different optimizers, and to implement a SLAM pipeline. Also, older methods were used as a benchmark for evaluation, more specifically iterative closest point(ICP) and fast global registration(FGR).To enable the gathered data to be processed by the registration algorithms, preprocessing was required. Transforming the data points from the coordinate system of the sensor to world relative coordinates via LiDAR base coordinates. Subsequently, the registration was performed with different approaches. Both the KITTI odometry dataset, which RLLReg originally was evaluated with[10], and the gathered forest data were used. Data augmentation was utilized to enable ground-truth-independent training and to increase diversity in the data. In addition, the registration results were used to create a SLAM-pipeline, enabling mapping and localization in the scanned areas. The results showed great potential for using RLLReg to register forest scenes compared to other, older, approaches. Especially, the lack of ground-truth was manageable using data augmentation to create training data. Moreover, there was no evidence that AdaBound improves the system when replacing the Adam-optimizer. Finally, forest models with sensor paths plotted were generated with decent results. However, a potential for post-processing with further refinement is possible. Nevertheless, the possibility of point set registration and LiDAR-SLAM using machine learning has been confirmed.
@mastersthesis{diva2:1612438,
author = {Hjert, Anton},
title = {{Machine Learning for LiDAR-SLAM:
In Forest Terrains}},
school = {Linköping University},
type = {{}},
year = {2021},
address = {Sweden},
}
In this thesis, three well known self-supervised methods have been implemented and trained on road scene images. The three so called pretext tasks RotNet, MoCov2, and DeepCluster were used to train a neural network self-supervised. The self-supervised trained networks where then evaluated on different amount of labeled data on two downstream tasks, object detection and semantic segmentation. The performance of the self-supervised methods are compared to networks trained from scratch on the respective downstream task. The results show that it is possible to achieve a performance increase using self-supervision on a dataset containing road scene images only. When only a small amount of labeled data is available, the performance increase can be substantial, e.g., a mIoU from 33 to 39 when training semantic segmentation on 1750 images with a RotNet pre-trained backbone compared to training from scratch. However, it seems that when a large amount of labeled images are available (>70000 images), the self-supervised pretraining does not increase the performance as much or at all.
@mastersthesis{diva2:1608285,
author = {Gustavsson, Simon},
title = {{Object Detection and Semantic Segmentation Using Self-Supervised Learning}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5357--SE}},
year = {2021},
address = {Sweden},
}
This thesis investigates methods for automatic colour transfer when working withgeodata and possible metrics to evaluate the results. Several methods for colourtransfer as well as methods to create an objective measurement were tested. Themethod was evaluated by using a subjective score which was generated by surveyingeight people working with geodata. In the survey the participants were askedto “Rank the images from most similar to least similar, with what you imagine theresult would have been if you would have made the colour transfer manually”.The method with the best overall performance in this study was using colourtransfer in the CIEl colour space. This method was only matched by a methodsegmenting the image first based on colour information. As the method had thehighest average subjective score but a larger standard deviation than other methods.This was suspected to be largely due to the deviation in quality of the segmentationalgorithm. Using a different method for segmenting the image thismethod might perform even better.The objective measurements proposed in this study were not found to have aconsistent correlation with the subjective measurement, with the exception ofgradient structural similarity. Other methods could have a use in some cases butnot as general colour transfer objective measurement, though a larger study andmore data would be needed to confirm the findings.
@mastersthesis{diva2:1601738,
author = {Ågren, Anton},
title = {{Automatic Colour Transfer for Geodata}},
school = {Linköping University},
type = {{LiTH-ISY-EX--21/5378--SE}},
year = {2021},
address = {Sweden},
}
Classifying clothing attributes in surveillance images can be useful in the forensic field, making it easier to, for example, find suspects based on eyewitness accounts. Deep Neural Networks are often used successfully in image classification, but require a large amount of annotated data. Since labeling data can be time consuming or difficult, and it is easier to get hold of labeled fashion images, this thesis investigates how the domain shift from a fashion domain to a surveillance domain, with little or no annotated data, affects a classifier.
In the experiments, two deep networks of different depth are used as a base and trained on only fashion images as well as both labeled and unlabeled surveillance images, with and without domain adaptation regularizers. The surveillance dataset is new and consists of images that were collected from different surveillance cameras and annotated during this thesis work.
The results show that there is a degradation in performance for a classifier trained on the fashion domain when tested on the surveillance domain, compared to when tested on the fashion domain. The results also show that if no labeled data in the surveillance domain is used for these experiments, it is more effective to use the deeper network and train it on only fashion data, rather than to use the more complicated unsupervised domain adaptation method.
@mastersthesis{diva2:1392992,
author = {Härnström, Denise},
title = {{Classification of Clothing Attributes Across Domains}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5276--SE}},
year = {2020},
address = {Sweden},
}
The performance of conventional deep neural networks tends to degrade when a domain shift is introduced, such as collecting data from a new site. Model-Agnostic Meta-Learning, or MAML, has achieved state-of-the-art performance in few-shot learning by finding initial parameters that adapt easily for new tasks.
This thesis studies MAML in a digital pathology setting. Experiments show that a conventional model generalises poorly to data collected from another site. By annotating a few samples during inference however, a model with initial parameters obtained through MAML training can adapt to achieve better generalisation performance. It is also demonstrated that a simple transfer learning approach using a kNN classifier on features extracted from a conventional model yields good generalisation, but the variance caused by random sampling is higher.
The results indicate that meta learning can lead to a lower annotation effort for machine learning in digital pathology while maintaining accuracy.
@mastersthesis{diva2:1414984,
author = {Fagerblom, Freja},
title = {{Model-Agnostic Meta-Learning for Digital Pathology}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5284--SE}},
year = {2020},
address = {Sweden},
}
When creating a photo realistic 3D model of the world using satellite imagery, image classification is an important part of the process. In this thesis the specificpart of automated building extraction is investigated. This is done by investi-gating the difference in performance between the methods instance segmentation and semantic segmentation for extraction of building footprints in orthorectified imagery. Semantic segmentation of the images is solved by using U-net, a Fully Convolutional Network that outputs a pixel-wise segmentation of the image. Instance segmentation of the images is done by a network called Mask R-CNN.The performance of the models are measured using precision, recall and the F1 score, which is the harmonic mean between precision and recall. The resulting F1 score of the two methods are similar, with U-net achieving a the F1 score of 0.684 without any post processing. Mask R-CNN achieves the F1 score of 0.676 without post processing.
@mastersthesis{diva2:1417200,
author = {Fritz, Karin},
title = {{Instance Segmentation of Buildings in Satellite Images}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5283--SE}},
year = {2020},
address = {Sweden},
}
In today’s society, we experience an increasing challenge to provide healthcare to everyone in need due to the increasing number of patients and the shortage of medical staff. Computers have contributed to mitigating this challenge by offloading the medical staff from some of the tasks. With the rise of deep learning, countless new possibilities have opened to help the medical staff even further. One domain where deep learning can be applied is analysis of ultrasound images. In this thesis we investigate the problem of classifying standard views of the heart in ultrasound images with the help of deep learning. We conduct mainly three experiments. First, we use NasNet mobile, InceptionV3, VGG16 and MobileNet, pre-trained on ImageNet, and finetune them to ultrasound heart images. We compare the accuracy of these networks to each other and to the baselinemodel, a CNN that was proposed in [23]. Then we assess a neural network’s capability to generalize to images from ultrasound machines that the network is not trained on. Lastly, we test how the performance of the networks degrades with decreasing amount of training data. Our first experiment shows that all networks considered in this study have very similar performance in terms of accuracy with Inception V3 being slightly better than the rest. The best performance is achieved when the whole network is finetuned to our problem instead of finetuning only apart of it, while gradually unlocking more layers for training. The generalization experiment shows that neural networks have the potential to generalize to images from ultrasound machines that they are not trained on. It also shows that having a mix of multiple ultrasound machines in the training data increases generalization performance. In our last experiment we compare the performance of the CNN proposed in [23] with MobileNet pre-trained on ImageNet and MobileNet randomly initialized. This shows that the performance of the baseline model suffers the least with decreasing amount of training data and that pre-training helps the performance drastically on smaller training datasets.
@mastersthesis{diva2:1425635,
author = {Pop, David},
title = {{Classification of Heart Views in Ultrasound Images}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5288--SE}},
year = {2020},
address = {Sweden},
}
Previously well aligned image sensors, mounted on the same camera, might become misaligned depending on external vibrations. It is of interest to be able to automatically detect and correct for this misalignment, and to separate the deviation into pointing- and/or parallax errors. Two methods were evaluated for this purpose, an area based image registration method and a feature based image registration method. In the area based method normalized cross-correlation was used to estimate translation parameters. In the feature based method, SIFT or LIOP descriptors were used to extract features that were matched between the two image modalities to estimate transformation parameters. In both methods only image points that were in focus were extracted to avoid detection of false alignment deviations. The results indicate that the area based image registration method has potential to automatically detect and correct for an alignment deviation. Moreover, the area based method showed potential to separate the deviation into pointing errors and parallax errors. The feature based method was limited to specific scenes but could be used as a complement to the area based method in order to additionally correct for rotation and/or scaling.
@mastersthesis{diva2:1434095,
author = {Bjerwe, Ida},
title = {{Automatic Alignment Detection and Correction in Infrared and Visual Image Pairs}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5292--SE}},
year = {2020},
address = {Sweden},
}
The process of locating moving objects through video sequences is a fundamental computer vision problem. This process is referred to as video tracking and has a broad range of applications. Even though video tracking is an open research topic that have received much attention during recent years, developing accurate and robust algorithms that can handle complicated tracking tasks and scenes is still challenging. One challenge in computer vision is to develop systems that like humans can understand, interpret and recognize visual information in different situations.
In this master thesis work, a tracking algorithm based on eye tracking data is proposed. The aim was to compare the tracking performance of the proposed algorithm with a state-of-the-art video tracker. The algorithm was tested on gaze signals from five participants recorded with an eye tracker while the participants were exposed to dynamic stimuli. The stimuli were moving objects displayed on a stationary computer screen. The proposed algorithm is working offline meaning that all data is collected before analysis.
The results show that the overall performance of the proposed eye tracking algorithm is comparable to the performance of a state-of-the-art video tracker. The main weaknesses are low accuracy for the proposed eye tracking algorithm and handling of occlusion for the video tracker. We also suggest a method for using eye tracking as a complement to object tracking methods. The results show that the eye tracker can be used in some situations to improve the tracking result of the video tracker. The proposed algorithm can be used to help the video tracker to redetect objects that have been occluded or for some other reason are not detected correctly. However, ATOM brings higher accuracy.
@mastersthesis{diva2:1435385,
author = {Ejnestrand, Ida and Jakobsson, Linn\'{e}a},
title = {{Object Tracking based on Eye Tracking Data:
A comparison with a state-of-the-art video tracker}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5294--SE}},
year = {2020},
address = {Sweden},
}
The main result of this thesis is a deep learning model named BearNet, which can be trained to detect an arbitrary amount of objects as a set of points. The model is trained using the Weighted Hausdorff distance as loss function. BearNet has been applied and tested on two problems from the industry. These are:
- From an intensity image, detect two pocket points of an EU-pallet which an autonomous forklift could utilize when determining where to insert its forks.
- From a depth image, detect the start, bend and end points of a straw attached to a juice package, in order to help determine if the straw has been attached correctly.
In the development process of BearNet I took inspiration from the designs of U-Net, UNet++ and a high resolution network named HRNet. Further, I used a dataset containing RGB-images from a surveillance camera located inside a mall, on which the aim was to detect head positions of all pedestrians. In an attempt to reproduce a result from another study, I found that the mall dataset suffers from training set contamination when a model is trained, validated, and tested on it with random sampling. Hence, I propose that the mall dataset is evaluated with a sequential data split strategy, to limit the problem.
I found that the BearNet architecture is well suited for both the EU-pallet and straw datasets, and that it can be successfully used on either RGB, intensity or depth images. On the EU-pallet and straw datasets, BearNet consistently produces point estimates within five and six pixels of ground truth, respectively. I also show that the straw dataset only constitutes a small subset of all the challenges that exist in the problem domain related to the attachment of a straw to a juice package, and that one therefore cannot train a robust deep learning model on it. As an example of this, models trained on the straw dataset cannot correctly handle samples in which there is no straw visible.
@mastersthesis{diva2:1442869,
author = {Runow, Björn},
title = {{Deep Learning for Point Detection in Images}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5295--SE}},
year = {2020},
address = {Sweden},
}
Automatic Face Recognition (AFR) can be useful in the forensic field when identifying people in surveillance footage. In AFR systems it is common to use deep neural networks which perform well if the quality of the images keeps a certain level. This is a problem when applying AFR on surveillance data since the quality of those images can be very poor. In this thesis the CNN FaceNet has been used to evaluate how different quality parameters influence the accuracy of the face recognition. The goal is to be able to draw conclusions about how to improve the recognition by using and avoiding certain parameters based on the conditions. Parameters that have been experimented with are angle of the face, image quality, occlusion, colour and lighting. This has been achieved by using datasets with different properties or by alternating the images. The parameters are meant to simulate different situations that can occur in surveillance footage that is difficult for the network to recognise. Three different models have been evaluated with different amount of embeddings and different training data. The results show that the two models trained on the VGGFace2 dataset performs much better than the one trained on CASIA-WebFace. All models performance drops on images with low quality compared to images with high quality because of the training data including mostly high-quality images. In some cases, the recognition results can be improved by applying some alterations in the images. This could be by using one frontal and one profile image when trying to identify a person or occluding parts of the shape of the face if it gets recognized as other persons with similar face shapes. One main improvement would be to extend the training datasets with more low-quality images. To some extent, this could be achieved by different kinds of data augmentation like artificial occlusion and down-sampled images.
@mastersthesis{diva2:1444005,
author = {Tuvskog, Johanna},
title = {{Evaluation of Face Recognition Accuracy in Surveillance Video}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5302--SE}},
year = {2020},
address = {Sweden},
}
The field of autonomous driving is as active as it has ever been, but the reality where an autonomous vehicle can drive on all roads is currently decades away. Instead, using an on-the-fly learning method, such as qHebb learning, a system can,after some demonstration, learn the appearance of any road and take over the steering wheel. By training in a simulator, the amount and variation of training can increase substantially, however, an on-rails auto-pilot does not sufficiently populate the learning space of such a model. This study aims to explore concepts that can increase the variance in the training data whilst the vehicle trains online. Three computationally light concepts are proposed that each manages to result in a model that can navigate through a simple environment, thus performing better than a model trained solely on the auto-pilot. The most noteworthy approach uses multiple thresholds to detect when the vehicle deviates too much and replicates the action of a human correcting its trajectory. After training on less than 300 frames, a vehicle successfully completed the full test environment using this method.
@mastersthesis{diva2:1444702,
author = {Kindstedt, Mathias},
title = {{Exploring the Training Data for Online Learning of Autonomous Driving in a Simulated Environment}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5325--SE}},
year = {2020},
address = {Sweden},
}
This thesis investigates the use of Generative Adversarial Networks (GANs) for detecting images containing non-natural objects in natural environments and if the introduction of stereo data can improve the performance. The state-of-the-art GAN-based anomaly detection method presented by A. Berget al. in [5] (BergGAN) was the base of this thesis. By modifiying BergGAN to not only accept three channel input, but also four and six channel input, it was possible to investigate the effect of introducing stereo data in the method. The input to the four channel network was an RGB image and its corresponding disparity map, and the input to the six channel network was a stereo pair consistingof two RGB images. The three datasets used in the thesis were constructed froma dataset of aerial video sequences provided by SAAB Dynamics, where the scene was mostly wooded areas. The datasets were divided into training and validation data, where the latter was used for the performance evaluation of the respective network. The evaluation method suggested in [5] was used in the thesis, where each sample was scored on the likelihood of it containing anomalies, Receiver Operating Characteristics (ROC) analysis was then applied and the area under the ROC-curve was calculated. The results showed that BergGAN was successfully able to detect images containing non-natural objects in natural environments using the dataset provided by SAAB Dynamics. The adaption of BergGAN to also accept four and six input channels increased the performance of the method, showing that there is information in stereo data that is relevant for GAN-based anomaly detection. There was however no substantial performance difference between the network trained with two RGB images versus the one trained with an RGB image and its corresponding disparity map.
@mastersthesis{diva2:1442532,
author = {Gehlin, Nils and Antonsson, Martin},
title = {{Detecting Non-Natural Objects in a Natural Environment using Generative Adversarial Networks with Stereo Data}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5324--SE}},
year = {2020},
address = {Sweden},
}
In this thesis capsule networks are investigated, both theoretically and empirically. The properties of the dynamic routing [42] algorithm proposed for capsule networks, as well as a routing algorithm in a follow-up paper by Wang et al. [50] are thoroughly investigated. It is conjectured that there are three key attributes that are needed for a good routing algorithm, and these attributes are then related to previous algorithms. A novel routing algorithm EntMin is proposed based on the observations from the investigation of previous algorithms. A thorough evaluation of the performance of different aspects of capsule networks is conducted, and it is shown that EntMin outperforms both dynamic routing and Wang routing. Finally, a capsule network using EntMin routing is compared to a very deep Convolutional Neural Network and it is shown that it achieves comparable performance.
@mastersthesis{diva2:1445181,
author = {Edstedt, Johan},
title = {{Towards Understanding Capsule Networks}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5309--SE}},
year = {2020},
address = {Sweden},
}
Object detection is a classical computer vision task, encountered in many practical applications such as robotics and autonomous driving. The latter involves serious consequences of failure and a multitude of challenging demands, including high computational efficiency and detection accuracy. Distant objects are notably difficult to detect accurately due to their small scale in the image, consisting of only a few pixels. This is especially problematic in autonomous driving, as objects should be detected at the earliest possible stage to facilitate handling of hazardous situations. Previous work has addressed small objects via use of feature pyramids and super-resolution techniques, but the efficiency of such methods is limited as computational cost increases with image resolution. Therefore, a trade-off must be made between accuracy and cost. Opportunely though, a common characteristic of driving scenarios is the predominance of distant objects in the centre of the image. Thus, the full-frame image can be downsampled to reduce computational cost, and a crop can be extracted from the image centre to preserve resolution for distant vehicles. In this way, short- and long-range images are generated. This thesis investigates the fusion of such images in a convolutional neural network, particularly the fusion level, fusion operation, and spatial alignment. A novel framework — DetSLR — is proposed for the task and examined via the aforementioned aspects. Through adoption of the framework for the well-established SSD detector and MobileNetV2 feature extractor, it is shown that the framework significantly improves upon the original detector without incurring additional cost. The fusion level is shown to have great impact on the performance of the framework, favouring high-level fusion, while only insignificant differences exist between investigated fusion operations. Finally, spatial alignment of features is demonstrated to be a crucial component of the framework.
@mastersthesis{diva2:1447580,
author = {Luusua, Emil},
title = {{Vehicle Detection, at a Distance:
Done Efficiently via Fusion of Short- and Long-Range Images}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5328--SE}},
year = {2020},
address = {Sweden},
}
Recent improvements in pose estimation has opened up the possibility of new areas of application. One of them is gait recognition, the task of identifying persons based on their unique style of walking, which is increasingly being recognized as an important method of biometric indentification. This thesis has explored the possibilities of using a pose estimation system, OpenPose, together with deep Recurrent Neural Networks (RNNs) in order to see if there is sufficient information in sequences of 2D poses to use for gait recognition. For this to be possible, a new multi-camera dataset consisting of persons walking on a treadmill was gathered, dubbed the FOI dataset. The results show that this approach has some promise. It achieved an overall classification accuracy of 95,5 % on classes it had seen during training and 83,8 % for classes it had not seen during training. It was unable to recognize sequences from angles it had not seen during training, however. For that to be possible, more data pre-processing will likely be required.
@mastersthesis{diva2:1447593,
author = {Persson, Martin},
title = {{Automatic Gait Recognition:
using deep metric learning}},
school = {Linköping University},
type = {{LIU-ISY/LITH-EX-A--20/5316--SE}},
year = {2020},
address = {Sweden},
}
In digital image correlation, an optical full-field analysis method that can determine displacements of an object under load, high-resolution images are preferable. One way to improve the resolution is to improve the camera hardware. This can be expensive, hence another way to enhance the image is by various image processing techniques increase the resolution of the image. There are several ways of doing this and these techniques are called super-resolution. In this thesisthe theory behind several different approaches to super-resolution is presented and discussed. The goal of this Thesis has been to investigate if super-resolutionis possible in a scene with moving objects as well as movement of the camera. It became clear early on that image registration, a step in many super-resolution methods that will be explained in this thesis, was of utmost importance, and a major part of the work became comparing image registration methods. Data has been recorded and then two different super-resolution algorithms have been evaluated on a data set showing that super-resolution is possible.
@mastersthesis{diva2:1450740,
author = {Dahlström, Erik},
title = {{Super-Resolution Using Dynamic Cameras}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5315--SE}},
year = {2020},
address = {Sweden},
}
Autonomous cars are now becoming a reality, but there are still technical hurdles needed to be overcome for the technology to be safe and reliable. One of these issues is the cars’ ability to estimate braking distances. This function relies heavily on one parameter, friction. Friction is difficult to estimate for a car since the friction coefficient is dependent on both surfaces in contact - the tires and the road. This thesis presents anovel approach to the problem using a neural network classifier trained on features extracted from images of the road. One major advantage the presented method gives over the few but existing conventional methods is the ability to estimate friction on road segments ahead of the vehicle. This gives the vehicle time to slow down while the friction is still sufficient. The estimation pipeline performs significantly better than the baseline methods explored in the thesis and provides satisfying results which demonstrates its potential.
@mastersthesis{diva2:1454043,
author = {Svensson, Erik},
title = {{Transfer Learning for Friction Estimation:
Using Deep Reduced Features}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5312--SE}},
year = {2020},
address = {Sweden},
}
Light Detection and Ranging (LiDAR) sensors have many different application areas, from revealing archaeological structures to aiding navigation of vehicles. However, it is challenging to interpret and fully use the vast amount of unstructured data that LiDARs collect. Automatic classification of LiDAR data would ease the utilization, whether it is for examining structures or aiding vehicles.
In recent years, there have been many advances in deep learning for semantic segmentation of automotive LiDAR data, but there is less research on aerial LiDAR data. This thesis investigates the current state-of-the-art deep learning architectures, and how well they perform on LiDAR data acquired by an Unmanned Aerial Vehicle (UAV). It also investigates different training techniques for class imbalanced and limited datasets, which are common challenges for semantic segmentation networks. Lastly, this thesis investigates if pre-training can improve the performance of the models.
The LiDAR scans were first projected to range images and then a fully convolutional semantic segmentation network was used. Three different training techniques were evaluated: weighted sampling, data augmentation, and grouping of classes. No improvement was observed by the weighted sampling, neither did grouping of classes have a substantial effect on the performance. Pre-training on the large public dataset SemanticKITTI resulted in a small performance improvement, but the data augmentation seemed to have the largest positive impact. The mIoU of the best model, which was trained with data augmentation, was 63.7% and it performed very well on the classes Ground, Vegetation, and Vehicle. The other classes in the UAV dataset, Person and Structure, had very little data and were challenging for most models to classify correctly. In general, the models trained on UAV data performed similarly as the state-of-the-art models trained on automotive data.
@mastersthesis{diva2:1459609,
author = {Serra, Sabina},
title = {{Deep Learning for Semantic Segmentation of 3D Point Clouds from an Airborne LiDAR}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5331--SE}},
year = {2020},
address = {Sweden},
}
The task of 6D pose estimation with deep learning is to train networks to, from an im-age of an object, determine the rotation and translation of the object. Impressive resultshave recently been shown in deep learning based 6D pose estimation. However, many cur-rent solutions rely on real-world data when training, which as opposed to synthetic data,requires time consuming annotation. In this thesis, we introduce a pipeline for generatingsynthetic ground truth data for deep 6D pose estimation, where annotation is done auto-matically. With a 3D CAD-model, we use Blender to render 2D images of the model fromdifferent view points. We also create all other relevant data needed for pose estimation, e.g.,the poses of an object, mask images and 3D keypoints on the object. Using this pipeline, itis possible to adjust different settings to reduce the domain gap between synthetic data andreal-world data and get better pose estimation results. Such settings could be changing themethod of extracting 3D keypoints and varying the scale of the object or the light settingsin the scene.The network used to test the performance of training on our synthetic data is PVNet,which achieves state-of-the-art results for 6D pose estimation. This architecture learns tofind 2D keypoints of the object in the image, as well as 2D–3D keypoint correspondences.With these correspondences, the Perspective-n-Point (PnP) algorithm is used to extract apose. We evaluate the pose estimation of the different settings on the synthetic data andcompare these results to other state-of-the-art work. We find that using only real-worlddata for training is worse than using a combination of synthetic and real-world data. Sev-eral other findings are that varying scale and lightning, in addition to adding random back-ground images to the rendered images improves results. Four different novel keypoint se-lection methods are introduced in this work, and tried against methods used in previouswork. We observe that our methods achieve similar or better results. Finally, we use thebest possible settings from the synthetic data pipeline, but with memory limitations on theamount of training data. We are close to state-of-the-art results, and could get closer withmore data.
@mastersthesis{diva2:1467210,
author = {Löfgren, Tobias and Jonsson, Daniel},
title = {{Generating Synthetic Data for Evaluation and Improvement of Deep 6D Pose Estimation}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5339--SE}},
year = {2020},
address = {Sweden},
}
Multi-pedestrian tracking (MPT) is the task of localizing and following the trajectory of pedestrians in a sequence. Using an MPT algorithm is an important part in preventing pedestrian-vehicle collisions in Automated Driving (AD) and Advanced Driving Assistance Systems (ADAS). It has benefited greatly from the advances in computer vision and machine learning in the last decades. Using a pedestrian detector, the tracking consists of associating the detections between frames and maintaining pedestrian identities throughout the sequence. This can be a challenging task due to occlusions, missed detections and complex scenes. The number of pedestrians is unknown, and it varies with time. Finding new methods for improving MPT is an active research field and there are many approaches found in the literature. This work focuses on improving the detection-to-track association, the data association, with the help of extracted color features for each pedestrian. Utilizing the recent improvements in object detection this work shows that classical color features still is relevant in pedestrian tracking for real time applications with limited computational resources. The appearance is not only used in the data association but also integrated in a new proposed method to avoid tracking errors due to missed detections. The results show that even with simple models the color appearance can be used to improve the tracking results. Evaluation on the commonly used Multi-Object Tracking-benchmark shows an improvement in the Multi-Object Tracking Accuracy and identity switches, while keeping other measures essentially unchanged.
@mastersthesis{diva2:1467160,
author = {Flodin, Frida},
title = {{Improved Data Association for Multi-Pedestrian Tracking Using Image Information}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5329--SE}},
year = {2020},
address = {Sweden},
}
Forged videos of swapped faces, so-called deepfakes, have gained a lot of attention in recent years. Methods for automated detection of this type of manipulation are also seeing rapid progress in their development. The purpose of this thesis work is to evaluate the possibility and effectiveness of using deep embeddings from facial recognition networks as base for detection of such deepfakes. In addition, the thesis aims to answer whether or not the identity embeddings contain information that can be used for detection while analyzed over time and if it is suitable to include information about the person's head pose in this analysis. To answer these questions, three classifiers are created with the intent to answer one question each. Their performances are compared with each other and it is shown that identity embeddings are suitable as a basis for deepfake detection. Temporal analysis of the embeddings also seem effective, at least for deepfake methods that only work on a frame-by-frame basis. Including information about head poses in the videos is shown to not improve a classifier like this.
@mastersthesis{diva2:1476999,
author = {Emir, Alkazhami},
title = {{Facial Identity Embeddings for Deepfake Detection in Videos}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5341--SE}},
year = {2020},
address = {Sweden},
}
CNN-based (Convolutional Neural Network) visual object detectors often reach human level of accuracy but need to be trained with large amounts of manually annotated data. Collecting and annotating this data can frequently be time-consuming and financially expensive. Using generative models to augment the data can help minimize the amount of data required and increase detection per-formance. Many state-of-the-art generative models are Generative Adversarial Networks (GANs). This thesis investigates if and how one can utilize image data to generate new data through GANs to train a YOLO-based (You Only Look Once) object detector, and how CAD (Computer-Aided Design) models can aid in this process.
In the experiments, different models of GANs are trained and evaluated by visual inspection or with the Fréchet Inception Distance (FID) metric. The data provided by Ericsson Research consists of images of antenna and baseband equipment along with annotations and segmentations. Ericsson Research supplied the YOLO detector, and no modifications are made to this detector. Finally, the YOLO detector is trained on data generated by the chosen model and evaluated by the Average Precision (AP).
The results show that the generative models designed in this work can produce RGB images of high quality. However, the quality reduces if binary segmentation masks are to be generated as well. The experiments with CAD input data did not result in images that could be used for the training of the detector.
The GAN designed in this work is able to successfully replace objects in images with the style of other objects. The results show that training the YOLO detector with GAN-modified data compared to training with real data leads to the same detection performance. The results also show that the shapes and backgrounds of the antennas contributed more to detection performance than their style and colour.
@mastersthesis{diva2:1484523,
author = {Thaung, Ludwig},
title = {{Advanced Data Augmentation:
With Generative Adversarial Networks and Computer-Aided Design}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5340--SE}},
year = {2020},
address = {Sweden},
}
3D reconstruction can be used in forensic science to reconstruct crime scenes and objects so that measurements and further information can be acquired off-site. It is desirable to use image based reconstruction methods but there is currently no procedure available for determining the uncertainty of such reconstructions. In this thesis the uncertainty of Structure from Motion is investigated. This is done by exploring the literature available on the subject and compiling the relevant information in a literary summary. Also, Monte Carlo simulations are conducted to study how the feature position uncertainty affects the uncertainty of the parameters estimated by bundle adjustment.
The experimental results show that poses of cameras that contain few image correspondences are estimated with higher uncertainty. The poses of such cameras are estimated with lesser uncertainty if they have feature correspondences in cameras that contain a higher number of projections.
@mastersthesis{diva2:1499090,
author = {Lindberg, Mimmi},
title = {{Forensic Validation of 3D models}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5346--SE}},
year = {2020},
address = {Sweden},
}
In one of the facilities at the Stena Recycling plant in Halmstad, Sweden, about 300 tonnes of metallic waste is processed each day with the aim of sorting out all non-ferrous material. At the end of this process, non-ferrous materials are
manually sorted out from the ferrous materials. This thesis investigates a computer vision based approach to identify and localize the non-ferrous materials
and eventually automate the sorting.Images were captured of ferrous and non-ferrous materials. The images areprocessed and segmented to be used as annotation data for a deep convolutionalneural segmentation network. Network models have been trained on different
kinds and amounts of data. The resulting models are evaluated and tested in ac-cordance with different evaluation metrics. Methods of creating advanced train-ing data by merging imaging information were tested. Experiments with using
classifier prediction confidence to identify objects of unknown classes were per-formed.
This thesis shows that it is possible to discern ferrous from non-ferrous mate-rial with a purely vision based system. The thesis also shows that it is possible to
automatically create annotated training data. It becomes evident that it is possi-ble to create better training data, tailored for the task at hand, by merging image
data. A segmentation network trained on more than two classes yields lowerprediction confidence for objects unknown to the classifier.Substituting manual sorting with a purely vision based system seems like aviable approach. Before a substitution is considered, the automatic system needsto be evaluated in comparison to the manual sorting.
@mastersthesis{diva2:1552630,
author = {Almin, Fredrik},
title = {{Detection of Non-Ferrous Materials with Computer Vision}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5321--SE}},
year = {2020},
address = {Sweden},
}
Visual tracking concerns the problem of following an arbitrary object in a video sequence. In this thesis, we examine how to use stereo images to extend existing visual tracking algorithms, which methods exists to obtain information from stereo images, and how the results change as the parameters to each tracker vary. For this purpose, four abstract approaches are identified, with five distinct implementations. Each tracker implementation is an extension of a baseline algorithm, MOSSE. The free parameters of each model are optimized with respect to two different evaluation strategies called nor- and wir-tests, and four different objective functions, which are then fixed when comparing the models against each other. The results are created on single target tracks extracted from the KITTI tracking dataset, and the optimization results show that none of the objective functions are sensitive to the exposed parameters under the joint selection of model and dataset. The evaluation results also shows that none of the extensions improve the results of the baseline tracker.
@mastersthesis{diva2:1277154,
author = {Dehlin, Carl},
title = {{Visual Tracking Using Stereo Images}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5181--SE}},
year = {2019},
address = {Sweden},
}
This thesis presents and evaluates different methods to semantically segment 3D-models by rendered 2D-views. The 2D-views are segmented separately and then merged together. The thesis evaluates three different merge strategies, two different classification architectures, how many views should be rendered and how these rendered views should be arranged. The results are evaluated both quantitatively and qualitatively and then compared with the current classifier at Vricon presented in [30].
The conclusion of this thesis is that there is a performance gain to be had using this method. The best model was using two views and attains an accuracy of 90.89% which can be compared with 84.52% achieved by the single view network from [30]. The best nine view system achieved a 87.72%. The difference in accuracy between the two and the nine view system is attributed to the higher quality mesh on the sunny side of objects, which typically is the south side.
The thesis provides a proof of concept and there are still many areas where the system can be improved. One of them being the extraction of training data which seemingly would have a huge impact on the performance.
@mastersthesis{diva2:1278684,
author = {Tranell, Victor},
title = {{Semantic Segmentation of Oblique Views in a 3D-Environment}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5185--SE}},
year = {2019},
address = {Sweden},
}
Visual Simultaneous Localization And Mapping (SLAM) allows for three dimensionalreconstruction from a camera’s output and simultaneous positioning of the camera withinthe reconstruction. With use cases ranging from autonomous vehicles to augmentedreality, the SLAM field has garnered interest both commercially and academically.
A SLAM system performs odometry as it estimates the camera’s movement throughthe scene. The incremental estimation of odometry is not error free and exhibits driftover time with map inconsistencies as a result. Detecting the return to a previously seenplace, a loop, means that this new information regarding our position can be incorporatedto correct the trajectory retroactively. Loop detection can also facilitate relocalization ifthe system loses tracking due to e.g. heavy motion blur.
This thesis proposes an odometric system making use of bundle adjustment within akeyframe based stereo SLAM application. This system is capable of detecting loops byutilizing the algorithm FAB-MAP. Two aspects of this system is evaluated, the odometryand the capability to relocate. Both of these are evaluated using the EuRoC MAV dataset,with an absolute trajectory RMS error ranging from 0.80 m to 1.70 m for the machinehall sequences.
The capability to relocate is evaluated using a novel methodology that intuitively canbe interpreted. Results are given for different levels of strictness to encompass differentuse cases. The method makes use of reprojection of points seen in keyframes to definewhether a relocalization is possible or not. The system shows a capability to relocate inup to 85% of all cases when a keyframe exists that can project 90% of its points intothe current view. Errors in estimated poses were found to be correlated with the relativedistance, with errors less than 10 cm in 23% to 73% of all cases.
The evaluation of the whole system is augmented with an evaluation of local imagedescriptors and pose estimation algorithms. The descriptor SIFT was found to performbest overall, but demanding to compute. BRISK was deemed the best alternative for afast yet accurate descriptor.
Conclusions that can be drawn from this thesis is that FAB-MAP works well fordetecting loops as long as the addition of keyframes is handled appropriately.
@mastersthesis{diva2:1287320,
author = {Ringdahl, Viktor},
title = {{Stereo Camera Pose Estimation to Enable Loop Detection}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5186--SE}},
year = {2019},
address = {Sweden},
}
Visual object detection is a popular computer vision task that has been intensively investigated using deep learning on real data. However, data from virtual environments have not received the same attention. A virtual environment enables generating data for locations that are not easily reachable for data collection, e.g. aerial environments. In this thesis, we study the problem of object detection in virtual environments, more specifically an aerial virtual environment. We use a simulator, to generate a synthetic data set of 16 different types of vehicles captured from an airplane.
To study the performance of existing methods in virtual environments, we train and evaluate two state-of-the-art detectors on the generated data set. Experiments show that both detectors, You Only Look Once version 3 (YOLOv3) and Single Shot MultiBox Detector (SSD), reach similar performance quality as previously presented in the literature on real data sets.
In addition, we investigate different fusion techniques between detectors which were trained on two different subsets of the dataset, in this case a subset which has cars with fixed colors and a dataset which has cars with varying colors. Experiments show that it is possible to train multiple instances of the detector on different subsets of the data set, and combine these detectors in order to boost the performance.
@mastersthesis{diva2:1307568,
author = {Norrstig, Andreas},
title = {{Visual Object Detection using Convolutional Neural Networks in a Virtual Environment}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5195--SE}},
year = {2019},
address = {Sweden},
}
This report is the result of a master thesis made by two students at Linköping University. The aim was to find an image registration method for visual and infrared images and to find an error measure for grading the registration performance. In practice this could be used for position determination by registering the infrared image taken at the current position to a set of visual images with known positions and determining which visual image matches the best. Two methods were tried, using different image feature extractors and different ways to match the features. The first method used phase information in the images to generate soft features and then minimised the square error of the optical flow equation to estimate the transformation between the visual and infrared image. The second method used the Canny edge detector to extract hard features from the images and Chamfer distance as an error measure. Both methods were evaluated for registration as well as position determination and yielded promising results. However, the performance of both methods was image dependent. The soft edge method proved to be more robust and precise and worked better than the hard edge method for both registration and position determination.
@mastersthesis{diva2:1323680,
author = {Fridman, Linnea and Nordberg, Victoria},
title = {{Two Multimodal Image Registration Approaches for Positioning Purposes}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5208--SE}},
year = {2019},
address = {Sweden},
}
Traffic sign recognition is an important problem for autonomous cars and driver assistance systems. With recent developments in the field of machine learning, high performance can be achieved, but typically at a large computational cost.
This thesis aims to investigate the relation between classification accuracy and computational complexity for the visual recognition problem of classifying traffic signs. In particular, the benefits of partitioning the classification problem into smaller sub-problems using prior knowledge in the form of shape or current region are investigated.
In the experiments, the convolutional neural network (CNN) architecture MobileNetV2 is used, as it is specifically designed to be computationally efficient. To incorporate prior knowledge, separate CNNs are used for the different subsets generated when partitioning the dataset based on region or shape. The separate CNNs are trained from scratch or initialized by pre-training on the full dataset.
The results support the intuitive idea that performance initially increases with network size and indicate a network size where the improvement stops. Including shape information using the two investigated methods does not result in a significant improvement. Including region information using pretrained separate classifiers results in a small improvement for small complexities, for one of the regions in the experiments.
In the end, none of the investigated methods of including prior knowledge are considered to yield an improvement large enough to justify the added implementational complexity. However, some other methods are suggested, which would be interesting to study in future work.
@mastersthesis{diva2:1324051,
author = {Ekman, Carl},
title = {{Traffic Sign Classification Using Computationally Efficient Convolutional Neural Networks}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5216--SE}},
year = {2019},
address = {Sweden},
}
In large scale productions of metal sheets, it is important to maintain an effective way to continuously inspect the products passing through the production line. The inspection mainly consists of detection of defects and tracking of ID numbers. This thesis investigates the possibilities to create an automatic inspection system by evaluating different machine learning algorithms for defect detection and optical character recognition (OCR) on metal sheet data. Digit recognition and defect detection are solved separately, where the former compares the object detection algorithm Faster R-CNN and the classical machine learning algorithm NCGF, and the latter is based on unsupervised learning using a convolutional autoencoder (CAE).
The advantage of the feature extraction method is that it only needs a couple of samples to be able to classify new digits, which is desirable in this case due to the lack of training data. Faster R-CNN, on the other hand, needs much more training data to solve the same problem. NCGF does however fail to classify noisy images and images of metal sheets containing an alloy, while Faster R-CNN seems to be a more promising solution with a final mean average precision of 98.59%.
The CAE approach for defect detection showed promising result. The algorithm learned how to only reconstruct images without defects, resulting in reconstruction errors whenever a defect appears. The errors are initially classified using a basic thresholding approach, resulting in a 98.9% accuracy. However, this classifier requires supervised learning, which is why the clustering algorithm Gaussian mixture model (GMM) is investigated as well. The result shows that it should be possible to use GMM, but that it requires a lot of GPU resources to use it in an end-to-end solution with a CAE.
@mastersthesis{diva2:1325083,
author = {Grönlund, Jakob and Johansson, Angelina},
title = {{Defect Detection and OCR on Steel}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5220--SE}},
year = {2019},
address = {Sweden},
}
The interest for autonomous driving assistance, and in the end, self-driving cars, has increased vastly over the last decade. Automotive safety continues to be a priority for manufacturers, politicians and people alike. Visual-based systems aiding the drivers have lately been boosted by advances in computer vision and machine learning. In this thesis, we evaluate the concept of an end-to-end machine learning solution for detecting and classifying road lane markings, and compare it to a more classical semantic segmentation solution. The analysis is based on the frame-by-frame scenario, and shows that our proposed end-to-end system has clear advantages when it comes detecting the existence of lanes and producing a consistent, lane-like output, especially in adverse conditions such as weak lane markings. Our proposed method allows the system to predict its own confidence, thereby allowing the system to suppress its own output when it is not deemed safe enough. The thesis finishes with proposed future work needed to achieve optimal performance and create a system ready for deployment in an active safety product.
@mastersthesis{diva2:1326388,
author = {Vigren, Malcolm and Eriksson, Linus},
title = {{End-to-End Road Lane Detection and Estimation using Deep Learning}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5219--SE}},
year = {2019},
address = {Sweden},
}
Multiple object tracking is the process of assigning unique and consistent identities to objects throughout a video sequence. A popular approach to multiple object tracking, and object tracking in general, is to use a method called tracking-by-detection. Tracking-by-detection is a two-stage procedure: an object detection algorithm first detects objects in a frame, these objects are then associated with already tracked objects by a tracking algorithm. One of the main concerns of this thesis is to investigate how different object detection algorithms perform on surveillance video supplied by National Forensic Centre. The thesis then goes on to explore how the stand-alone alone performance of the object detection algorithm correlates with overall performance of a tracking-by-detection system. Finally, the thesis investigates how the use of visual descriptors in the tracking stage of a tracking-by-detection system effects performance.
Results presented in this thesis suggest that the capacity of the object detection algorithm is highly indicative of the overall performance of the tracking-by-detection system. Further, this thesis also shows how the use of visual descriptors in the tracking stage can reduce the number of identity switches and thereby increase performance of the whole system.
@mastersthesis{diva2:1326842,
author = {Nyström, Axel},
title = {{Evaluation of Multiple Object Tracking in Surveillance Video}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5245--SE}},
year = {2019},
address = {Sweden},
}
Semantic segmentation is a key approach to comprehensive image data analysis. It can be applied to analyze 2D images, videos, and even point clouds that contain 3D data points. On the first two problems, CNNs have achieved remarkable progress, but on point cloud segmentation, the results are less satisfactory due to challenges such as limited memory resource and difficulties in 3D point annotation. One of the research studies carried out by the Computer Vision Lab at Linköping University was aiming to ease the semantic segmentation of 3D point cloud. The idea is that by first projecting 3D data points to 2D space and then focusing only on the analysis of 2D images, we can reduce the overall workload for the segmentation process as well as exploit the existing well-developed 2D semantic segmentation techniques. In order to improve the performance of CNNs for 2D semantic segmentation, the study has used input data derived from different modalities. However, how different modalities can be optimally fused is still an open question. Based on the above-mentioned study, this thesis aims to improve the multistream framework architecture. More concretely, we investigate how different singlestream architectures impact the multistream framework with a given fusion method, and how different fusion methods contribute to the overall performance of a given multistream framework. As a result, our proposed fusion architecture outperformed all the investigated traditional fusion methods. Along with the best singlestream candidate and few additional training techniques, our final proposed multistream framework obtained a relative gain of 7.3\% mIoU compared to the baseline on the semantic3D point cloud test set, increasing the ranking from 12th to 5th position on the benchmark leaderboard.
@mastersthesis{diva2:1327473,
author = {He, Linbo},
title = {{Improving 3D Point Cloud Segmentation Using Multimodal Fusion of Projected 2D Imagery Data:
Improving 3D Point Cloud Segmentation Using Multimodal Fusion of Projected 2D Imagery Data}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5190--SE}},
year = {2019},
address = {Sweden},
}
In recent years semantic segmentation models utilizing Convolutional Neural Networks (CNN) have seen significant success for multiple different segmentation problems. Models such as U-Net have produced promising results within the medical field for both regular 2D and volumetric imaging, rivalling some of the best classical segmentation methods.
In this thesis we examined the possibility of using a convolutional neural network-based model to perform segmentation of discrete bone fragments in CT-volumes with segmentation-hints provided by a user. We additionally examined different classical segmentation methods used in a post-processing refinement stage and their effect on the segmentation quality. We compared the performance of our model to similar approaches and provided insight into how the interactive aspect of the model affected the quality of the result.
We found that the combined approach of interactive segmentation and deep learning produced results on par with some of the best methods presented, provided there were adequate amount of annotated training data. We additionally found that the number of segmentation hints provided to the model by the user significantly affected the quality of the result, with convergence of the result around 8 provided hints.
@mastersthesis{diva2:1326942,
author = {Estgren, Martin},
title = {{Bone Fragment Segmentation Using Deep Interactive Object Selection}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5197--SE}},
year = {2019},
address = {Sweden},
}
One fundamental task in robotics is random bin-picking, where it is important to be able to detect an object in a bin and estimate its pose to plan the motion of a robotic arm. For this purpose, this thesis work aimed to investigate and evaluate algorithms for 6D pose estimation when the object was given by a CAD model. The scene was given by a point cloud illustrating a partial 3D view of the bin with multiple instances of the object. Two algorithms were thus implemented and evaluated. The first algorithm was an approach based on Point Pair Features, and the second was Fast Global Registration. For evaluation, four different CAD models were used to create synthetic data with ground truth annotations.
It was concluded that the Point Pair Feature approach provided a robust localization of objects and can be used for bin-picking. The algorithm appears to be able to handle different types of objects, however, with small limitations when the object has flat surfaces and weak texture or many similar details. The disadvantage with the algorithm was the execution time. Fast Global Registration, on the other hand, did not provide a robust localization of objects and is thus not a good solution for bin-picking.
@mastersthesis{diva2:1330419,
author = {Lef, Annette},
title = {{CAD-Based Pose Estimation - Algorithm Investigation}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5239--SE}},
year = {2019},
address = {Sweden},
}
When subtitles are burned into a video, an error can sometimes occur in the encoder that results in the same subtitle being burned into several frames, resulting in subtitles becoming frozen. This thesis provides a way to detect frozen video subtitles with the help of an implemented text detector and classifier.
Two types of classifiers, naïve classifiers and machine learning classifiers, are tested and compared on a variety of different videos to see how much a machine learning approach can improve the performance. The naïve classifiers are evaluated using ground truth data to gain an understanding of the importance of good text detection. To understand the difficulty of the problem, two different machine learning classifiers are tested, logistic regression and random forests.
The result shows that machine learning improves the performance over using naïve classifiers by improving the specificity from approximately 87.3% to 95.8% and improving the accuracy from 93.3% to 95.5%. Random forests achieve the best overall performance, but the difference compared to when using logistic regression is small enough that more computationally complex machine learning classifiers are not necessary. Using the ground truth shows that the weaker naïve classifiers would be improved by at least 4.2% accuracy, thus a better text detector is warranted. This thesis shows that machine learning is a viable option for detecting frozen video subtitles.
@mastersthesis{diva2:1331490,
author = {Sjölund, Jonathan},
title = {{Detection of Frozen Video Subtitles Using Machine Learning}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5206--SE}},
year = {2019},
address = {Sweden},
}
Finding disparity maps between stereo images is a well studied topic within computer vision. While both classical and machine learning approaches exist in the literature, they frequently struggle to correctly solve the disparity in regions with low texture, sharp edges or occlusions. Finding approximate solutions to these problem areas is frequently referred to as disparity refinement, and is usually carried out separately after an initial disparity map has been generated.
In the recent literature, the use of Normalized Convolution in Convolutional Neural Networks have shown remarkable results when applied to the task of stereo depth completion. This thesis investigates how well this approach performs in the case of disparity refinement. Specifically, we investigate how well such a method can improve the initial disparity maps generated by the stereo matching algorithm developed at Saab Dynamics using a rectified stereo rig.
To this end, a dataset of ground truth disparity maps was created using equipment at Saab, namely a setup for structured light and the stereo rig cameras. Because the end goal is a dataset fit for training networks, we investigate an approach that allows for efficient creation of significant quantities of dense ground truth disparities.
The method for generating ground truth disparities generates several disparity maps for every scene measured by using several stereo pairs. A densified disparity map is generated by merging the disparity maps from the neighbouring stereo pairs. This resulted in a dataset of 26 scenes and 104 dense and accurate disparity maps.
Our evaluation results show that the chosen Normalized Convolution Network based method can be adapted for disparity map refinement, but is dependent on the quality of the input disparity map.
@mastersthesis{diva2:1333176,
author = {Cranston, Daniel and Skarfelt, Filip},
title = {{Normalized Convolution Network and Dataset Generation for Refining Stereo Disparity Maps}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5252--SE}},
year = {2019},
address = {Sweden},
}
Watermarking is a technique used to used to mark the ownership in media such as audio or images by embedding a watermark, e.g. copyrights information, into the media. A good watermarking method should perform this embedding without affecting the quality of the media. Recent methods for watermarking in images uses deep learning to embed and extract the watermark in the images. In this thesis, we investigate watermarking in the hearable frequencies of audio using deep learning. More specifically, we try to create a watermarking method for audio that is robust to noise in the carrier, and that allows for the extraction of the embedded watermark from the audio after being played over-the-air. The proposed method consists of two deep convolutional neural network trained end-to-end on music with simulated noise. Experiments show that the proposed method successfully creates watermarks robust to simulated noise with moderate quality reductions, but it is not robust to the real world noise introduced after playing and recording the audio over-the-air.
@mastersthesis{diva2:1340077,
author = {Tegendal, Lukas},
title = {{Watermarking in Audio using Deep Learning}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5246--SE}},
year = {2019},
address = {Sweden},
}
Given satellite images with accompanying pixel classifications and elevation data, we propose different solutions to object detection. The first method uses hierarchical clustering for segmentation and then employs different methods of classification. One of these classification methods used domain knowledge to classify objects while the other used Support Vector Machines. Additionally, a combination of three Support Vector Machines were used in a hierarchical structure which out-performed the regular Support Vector Machine method in most of the evaluation metrics. The second approach is more conventional with different types of Convolutional Neural Networks. A segmentation network was used as well as a few detection networks and different fusions between these. The Convolutional Neural Network approach proved to be the better of the two in terms of precision and recall but the clustering approach was not far behind. This work was done using a relatively small amount of data which potentially could have impacted the results of the Machine Learning models in a negative way.
@mastersthesis{diva2:1346426,
author = {Grahn, Fredrik and Nilsson, Kristian},
title = {{Object Detection in Domain Specific Stereo-Analysed Satellite Images}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5254--SE}},
year = {2019},
address = {Sweden},
}
For a long time stereo-cameras have been deployed in visual Simultaneous Localization And Mapping (SLAM) systems to gain 3D information. Even though stereo-cameras show good performance, the main disadvantage is the complex and expensive hardware setup it requires, which limits the use of the system. A simpler and cheaper alternative are monocular cameras, however monocular images lack the important depth information. Recent works have shown that having access to depth maps in monocular SLAM system is beneficial since they can be used to improve the 3D reconstruction. This work proposes a deep neural network that predicts dense high-resolution depth maps from monocular RGB images by casting the problem as a supervised regression task. The network architecture follows an encoder-decoder structure in which multi-scale information is captured and skip-connections are used to recover details. The network is trained and evaluated on the KITTI dataset achieving results comparable to state-of-the-art methods. With further development, this network shows good potential to be incorporated in a monocular SLAM system to improve the 3D reconstruction.
@mastersthesis{diva2:1347284,
author = {Larsson, Susanna},
title = {{Monocular Depth Estimation Using Deep Convolutional Neural Networks}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5234--SE}},
year = {2019},
address = {Sweden},
}
The organization International Aid Services (IAS) provides people in East Africawith clean water through well drilling. The wells are located in surroundingsfar away for the investors to inspect and therefore IAS wishes to be able to monitortheir wells to get a better overview if different types of improvements needto be made. To see the load on different water sources at different times of theday and during the year, and to know how many people that are visiting thewells, is of particular interest. In this paper, a method is proposed for countingpeople around the wells. The goal is to choose a suitable method for detectinghumans in images and evaluate how it performs. The area of counting humansin images is not a new topic, though it needs to be taken into account that thesituation implies some restrictions. A Raspberry Pi with an associated camerais used, which is a small embedded system that cannot handle large and complexsoftware. There is also a limited amount of data in the project. The methodproposed in this project uses a pre-trained convolutional neural network basedobject detector called the Single Shot Detector, which is adapted to suit smallerdevices and applications. The pre-trained network that it is based on is calledMobileNet, a network that is developed to be used on smaller systems. To see howgood the chosen detector performs it will be compared with some other models.Among them a detector based on the Inception network, a significantly larger networkthan the MobileNet. The base network is modified by transfer learning.Results shows that a fine-tuned and modified network can achieve better result,from a F1-score of 0.49 for a non-fine-tuned model to 0.66 for the fine-tuned one.
@mastersthesis{diva2:1352472,
author = {Kastberg, Maria},
title = {{Using Convolutional Neural Networks to Detect People Around Wells in South Sudan}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5200--SE}},
year = {2019},
address = {Sweden},
}
In this thesis we investigate the use of GANs for texture enhancement. To achievethis, we have studied if synthetic satellite images generated by GANs will improvethe texture in satellite-based 3D maps.
We investigate two GANs; SRGAN and pix2pix. SRGAN increases the pixelresolution of the satellite images by generating upsampled images from low resolutionimages. As for pip2pix, the GAN performs image-to-image translation bytranslating a source image to a target image, without changing the pixel resolution.
We trained the GANs in two different approaches, named SAT-to-AER andSAT-to-AER-3D, where SAT, AER and AER-3D are different datasets provided bythe company Vricon. In the first approach, aerial images were used as groundtruth and in the second approach, rendered images from an aerial-based 3D mapwere used as ground truth.
The procedure of enhancing the texture in a satellite-based 3D map was dividedin two steps; the generation of synthetic satellite images and the re-texturingof the 3D map. Synthetic satellite images generated by two SRGAN models andone pix2pix model were used for the re-texturing. The best results were presentedusing SRGAN in the SAT-to-AER approach, in where the re-textured 3Dmap had enhanced structures and an increased perceived quality. SRGAN alsopresented a good result in the SAT-to-AER-3D approach, where the re-textured3D map had changed color distribution and the road markers were easier to distinguishfrom the ground. The images generated by the pix2pix model presentedthe worst result. As for the SAT-to-AER approach, even though the syntheticsatellite images generated by pix2pix were somewhat enhanced and containedless noise, they had no significant impact in the re-texturing. In the SAT-to-AER-3D approach, none of the investigated models based on the pix2pix frameworkpresented any successful results.
We concluded that GANs can be used as a texture enhancer using both aerialimages and images rendered from an aerial-based 3D map as ground truth. Theuse of GANs as a texture enhancer have great potential and have several interestingareas for future works.
@mastersthesis{diva2:1375054,
author = {Birgersson, Anna and Hellgren, Klara},
title = {{Texture Enhancement in 3D Maps using Generative Adversarial Networks}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5266--SE}},
year = {2019},
address = {Sweden},
}
Deep learning has been intensively researched in computer vision tasks like im-age classification. Collecting and labeling images that these neural networks aretrained on is labor-intensive, which is why alternative methods of collecting im-ages are of interest. Virtual environments allow rendering images and automaticlabeling, which could speed up the process of generating training data and re-duce costs.This thesis studies the problem of transfer learning in image classificationwhen the classifier has been trained on rendered images using a game engine andtested on real images. The goal is to render images using a game engine to createa classifier that can separate images depicting people wearing civilian clothingor camouflage. The thesis also studies how domain adaptation techniques usinggenerative adversarial networks could be used to improve the performance ofthe classifier. Experiments show that it is possible to generate images that canbe used for training a classifier capable of separating the two classes. However,the experiments with domain adaptation were unsuccessful. It is instead recom-mended to improve the quality of the rendered images in terms of features usedin the target domain to achieve better results.
@mastersthesis{diva2:1431281,
author = {Thornström, Johan},
title = {{Domain Adaptation of Unreal Images for Image Classification}},
school = {Linköping University},
type = {{LiTH-ISY-EX--20/5282--SE}},
year = {2019},
address = {Sweden},
}
Recently, the deep neural network structure caps-net was proposed by Sabouret al. [11]. Capsule networks are designed to learn relative geometry betweenthe features of a layer and the features of the next layer. The Capsule network’smain building blocks are capsules, which are represented by vectors. The ideais that each capsule will represent a feature as well as traits or subfeatures ofthat feature. This allows for smart information routing. Capsules traits are usedto predict the traits of the capsules in the next layer, and information is sent toto next layer capsules on which the predictions agree. This is called routing byagreement.This thesis investigates theoretical support of new and existing routing al-gorithms as well as evaluates their performance on the MNIST [16] and CIFAR-10 [8] datasets. A variation of the dynamic routing algorithm presented in theoriginal paper [11] achieved the highest accuracy and fastest execution time.
@mastersthesis{diva2:1314210,
author = {Malmgren, Christoffer},
title = {{A Comparative Study of Routing Methods in Capsule Networks}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5188--SE}},
year = {2019},
address = {Sweden},
}
Visual object tracking is one of the fundamental problems in computer vision, with a wide number of practical applications in e.g.\ robotics, surveillance etc. Given a video sequence and the target bounding box in the first frame, a tracker is required to find the target in all subsequent frames. It is a challenging problem due to the limited training data available. An object tracker is generally evaluated using two criterias, namely robustness and accuracy. Robustness refers to the ability of a tracker to track for long durations, without losing the target. Accuracy, on the other hand, denotes how accurately a tracker can estimate the target bounding box.
Recent years have seen significant improvement in tracking robustness. However, the problem of accurate tracking has seen less attention. Most current state-of-the-art trackers resort to a naive multi-scale search strategy which has fundamental limitations. Thus, in this thesis, we aim to develop a general target estimation component which can be used to determine accurate bounding box for tracking. We will investigate how bounding box estimators used in object detection can be modified to be used for object tracking. The key difference between detection and tracking is that in object detection, the classes to which the objects belong are known. However, in tracking, no prior information is available about the tracked object, other than a single image provided in the first frame. We will thus investigate different architectures to utilize the first frame information to provide target specific bounding box predictions. We will also investigate how the bounding box predictors can be integrated into a state-of-the-art tracking method to obtain robust as well as accurate tracking.
@mastersthesis{diva2:1291564,
author = {Bhat, Goutam},
title = {{Accurate Tracking by Overlap Maximization}},
school = {Linköping University},
type = {{LiTH-ISY-EX--19/5189--SE}},
year = {2019},
address = {Sweden},
}
In this report I summarize my master’s thesis work, in which I have investigated different approaches for fusing imaging modalities for semantic segmentation with deep convolutional networks. State-of-the-art methods for semantic segmentation of RGB-images use pre-trained models, which are fine-tuned to learn task-specific deep features. However, the use of pre-trained model weights constrains the model input to images with three channels (e.g. RGB-images). In some applications, e.g. classification of satellite imagery, there are other imaging modalities that can complement the information from the RGB modality and, thus, improve the performance of the classification. In this thesis, semantic segmentation methods designed for RGB images are extended to handle multiple imaging modalities, without compromising on the benefits, that pre-training on RGB datasets offers.
In the experiments of this thesis, RGB images from satellites have been fused with normalised difference vegetation index (NDVI) and a digital surface model (DSM). The evaluation shows that the modality fusion can significantly improve the performance of semantic segmentation networks in comparison with a corresponding network with only RGB input. However, the different investigated approaches to fuse the modalities proved to achieve similar performance. The conclusion of the experiments is, that the fusion of imaging modalities is necessary, but the method of fusion has shown to be of less importance.
@mastersthesis{diva2:1182913,
author = {Sundelius, Carl},
title = {{Deep Fusion of Imaging Modalities for Semantic Segmentation of Satellite Imagery}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5110--SE}},
year = {2018},
address = {Sweden},
}
Photos captured in the shortwave infrared (SWIR) spectrum are interesting in military applications because they are independent of what time of day the pic- ture is captured because the sun, moon, stars and night glow illuminate the earth with short-wave infrared radiation constantly. A major problem with today’s SWIR cameras is that they are very expensive to produce and hence not broadly available either within the military or to civilians. Using a relatively new tech- nology called compressive sensing (CS), enables a new type of camera with only a single pixel sensor in the sensor (a SPC). This new type of camera only needs a fraction of measurements relative to the number of pixels to be reconstructed and reduces the cost of a short-wave infrared camera with a factor of 20. The camera uses a micromirror array (DMD) to select which mirrors (pixels) to be measured in the scene, thus creating an underdetermined linear equation system that can be solved using the techniques described in CS to reconstruct the im- age. Given the new technology, it is in the Swedish Defence Research Agency (FOI) interest to evaluate the potential of a single pixel camera. With a SPC ar- chitecture developed by FOI, the goal of this thesis was to develop methods for sampling, reconstructing images and evaluating their quality. This thesis shows that structured random matrices and fast transforms have to be used to enable high resolution images and speed up the process of reconstructing images signifi- cantly. The evaluation of the images could be done with standard measurements associated with camera evaluation and showed that the camera can reproduce high resolution images with relative high image quality in daylight.
@mastersthesis{diva2:1185507,
author = {Brorsson, Andreas},
title = {{Compressive Sensing: Single Pixel SWIR Imaging of Natural Scenes}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5108--SE}},
year = {2018},
address = {Sweden},
}
Industrial applications of computer vision often utilize traditional image processing techniques whereas state-of-the-art methods in most image processing challenges are almost exclusively based on convolutional neural networks (CNNs). Thus there is a large potential for improving the performance of many machine vision applications by incorporating CNNs.
One such application is the classification of juice boxes with straws, where the baseline solution uses classical image processing techniques on depth images to reject or accept juice boxes. This thesis aim to investigate how CNNs perform on the task of semantic segmentation (pixel-wise classification) of said images and if the result can be used to increase classification performance.
A drawback of CNNs is that they usually require large amounts of labelled data for training to be able to generalize and learn anything useful. As labelled data is hard to come by, two ways to get cheap data are investigated, one being synthetic data generation and the other being automatic labelling using the baseline solution.
The implemented network performs well on semantic segmentation, even when trained on synthetic data only, though the performance increases with the ratio of real (automatically labelled) to synthetic images. The classification task is very sensitive to small errors in semantic segmentation and the results are therefore not as good as the baseline solution. It is suspected that the drop in performance between validation and test data is due to a domain shift between the data sets, e.g. variations in data collection and straw and box type, and fine-tuning to the target domain could definitely increase performance.
When trained on synthetic data the domain shift is even larger and the performance on classification is next to useless. It is likely that the results could be improved by using more advanced data generation, e.g. a generative adversarial network (GAN), or more rigorous modelling of the data.
@mastersthesis{diva2:1189501,
author = {Carlsson, Mattias},
title = {{Neural Networks for Semantic Segmentation in the Food Packaging Industry}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5113--SE}},
year = {2018},
address = {Sweden},
}
Deep learning has been rapidly growing in recent years obtaining excellent results for many computer vision applications, such as image classification and object detection. One aspect for the increased popularity of deep learning is that it mitigates the need for hand-crafted features. This thesis work investigates deep learning as a methodology to solve the problem of autonomous collision avoidance for a small robotic car. To accomplish this, transfer learning is used with the VGG16 deep network pre-trained on ImageNet dataset. A dataset has been collected and then used to fine-tune and validate the network offline. The deep network has been used with the robotic car in a real-time manner. The robotic car sends images to an external computer, which is used for running the network. The predictions from the network is sent back to the robotic car which takes actions based on those predictions. The results show that deep learning has great potential in solving the collision avoidance problem.
@mastersthesis{diva2:1204063,
author = {Strömgren, Oliver},
title = {{Deep Learning for Autonomous Collision Avoidance}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5115--SE}},
year = {2018},
address = {Sweden},
}
The aim of this work is to find a method for removing haze from satellite imagery. This is done by taking two algorithms developed for images taken from the sur- face of the earth and adapting them for satellite images. The two algorithms are Single Image Haze Removal Using Dark Channel Prior by He et al. and Color Im- age Dehazing Using the Near-Infrared by Schaul et al. Both algorithms, altered to fit satellite images, plus the combination are applied on four sets of satellite images. The results are compared with each other and the unaltered images. The evaluation is both qualitative, i.e. looking at the images, and quantitative using three properties: colorfulness, contrast and saturated pixels. Both the qualitative and the quantitative evaluation determined that using only the altered version of Dark Channel Prior gives the result with the least amount of haze and whose colors look most like reality.
@mastersthesis{diva2:1215181,
author = {Hultberg, Johanna},
title = {{Dehazing of Satellite Images}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5121--SE}},
year = {2018},
address = {Sweden},
}
3D reconstruction is the process of constructing a three-dimensional model from images. It contains multiple steps where each step can induce errors. When doing 3D reconstruction of outdoor scenes, there are some types of scene content that regularly cause problems and affect the resulting 3D model. Two of these are water, due to its fluctuating nature, and sky because of it containing no useful (3D) data. These areas cause different problems throughout the process and do generally not benefit it in any way. Therefore, masking them early in the reconstruction chain could be a useful step in an outdoor scene reconstruction pipeline. Manual masking of images is a time-consuming and boring task and it gets very tedious for big data sets which are often used in large scale 3D reconstructions. This master thesis explores if this can be done automatically using Convolutional Neural Networks for semantic segmentation, and to what degree the masking would benefit a 3D reconstruction pipeline.
@mastersthesis{diva2:1216761,
author = {Kernell, Björn},
title = {{Improving Photogrammetry using Semantic Segmentation}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5118--SE}},
year = {2018},
address = {Sweden},
}
In this thesis we study a perception problem in the context of autonomous driving. Specifically, we study the computer vision problem of 3D object detection, in which objects should be detected from various sensor data and their position in the 3D world should be estimated. We also study the application of Generative Adversarial Networks in domain adaptation techniques, aiming to improve the 3D object detection model's ability to transfer between different domains.
The state-of-the-art Frustum-PointNet architecture for LiDAR-based 3D object detection was implemented and found to closely match its reported performance when trained and evaluated on the KITTI dataset. The architecture was also found to transfer reasonably well from the synthetic SYN dataset to KITTI, and is thus believed to be usable in a semi-automatic 3D bounding box annotation process. The Frustum-PointNet architecture was also extended to explicitly utilize image features, which surprisingly degraded its detection performance. Furthermore, an image-only 3D object detection model was designed and implemented, which was found to compare quite favourably with current state-of-the-art in terms of detection performance.
Additionally, the PixelDA approach was adopted and successfully applied to the MNIST to MNIST-M domain adaptation problem, which validated the idea that unsupervised domain adaptation using Generative Adversarial Networks can improve the performance of a task network for a dataset lacking ground truth annotations. Surprisingly, the approach did however not significantly improve upon the performance of the image-based 3D object detection models when trained on the SYN dataset and evaluated on KITTI.
@mastersthesis{diva2:1218149,
author = {Gustafsson, Fredrik and Linder-Nor\'{e}n, Erik},
title = {{Automotive 3D Object Detection Without Target Domain Annotations}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5138--SE}},
year = {2018},
address = {Sweden},
}
The cost and environmental damage of reclaims is a large problem within thepaper industry. With certain types of paper, so called crepe marks on the paper’ssurface is a common issue, leading to printing defects and consequentlyreclaims. This thesis compares four different image analysis methods for evaluatingcrepe marks and predicting printing results. The methods evaluated consistsof one established methods, two adaptations of established methods andone novel method. All methods were evaluated on the same data, topographicheight images of paper samples from 4 paper rolls of similar type but differingin roughness. The method based on 1D Fourier analysis and the method basedon fully convolutional networks performs best, depending on if speed or detailedcharacteristics is a priority.
@mastersthesis{diva2:1219118,
author = {Strömberg, Isak},
title = {{Characterization of creping marks in paper}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5151--SE}},
year = {2018},
address = {Sweden},
}
When a Time-of-Flight (ToF) depth camera is used to monitor a region of interest, it has to be mounted correctly and have information regarding its position. Manual configuration currently require managing captured 3D ToF data in a 2D environment, which limits the user and might give rise to errors due to misinterpretation of the data. This thesis investigates if a real time 3D reconstruction mesh from a Microsoft HoloLens can be used as a target for point cloud registration using the ToF data, thus configuring the camera autonomously. Three registration algorithms, Fast Global Registration (FGR), Joint Registration Multiple Point Clouds (JR-MPC) and Prerejective RANSAC, were evaluated for this purpose.
It was concluded that despite using different sensors it is possible to perform accurate registration. Also, it was shown that the registration can be done accurately within a reasonable time, compared with the inherent time to perform 3D reconstruction on the Hololens. All algorithms could solve the problem, but it was concluded that FGR provided the most satisfying results, though requiring several constraints on the data.
@mastersthesis{diva2:1222450,
author = {Kjell\'{e}n, Kevin},
title = {{Point Cloud Registration in Augmented Reality using the Microsoft HoloLens}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5160--SE}},
year = {2018},
address = {Sweden},
}
Volume measurements of timber loads is done in conjunction with timber trade. When dealing with goods of major economic values such as these, it is important to achieve an impartial and fair assessment when determining price-based volumes.
With the help of Saab’s missile targeting technology, CIND AB develops products for digital volume measurement of timber loads. Currently there is a system in operation that automatically reconstructs timber trucks in motion to create measurable images of them. Future iterations of the system is expected to fully automate the scaling by generating a volumetric representation of the timber and calculate its external gross volume. The first challenge towards this development is to separate the timber load from the truck.
This thesis aims to evaluate and implement appropriate method for semantic pixel-wise segmentation of timber loads in real time. Image segmentation is a classic but difficult problem in computer vision. To achieve greater robustness, it is therefore important to carefully study and make use of the conditions given by the existing system. Variations in timber type, truck type and packing together create unique combinations that the system must be able to handle. The system must work around the clock in different weather conditions while maintaining high precision and performance.
@mastersthesis{diva2:1222024,
author = {Sällqvist, Jessica},
title = {{Real-time 3D Semantic Segmentation of Timber Loads with Convolutional Neural Networks}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5131--SE}},
year = {2018},
address = {Sweden},
}
Data about the earth is increasing in value and demand from customers, but itis difficult to produce accurately and cheap. This thesis examines if it is possible to take low resolution and distorted 3D data and increase the accuracy of building geometry by performing building reconstruction. Building reconstruction is performed with a Markov chain Monte Carlo method where building primitives are placed iteratively until a good fit is found. The digital height model and pixel classification used is produced by Vricon. The method is able to correctly place primitive models, but often overestimate their dimensions by about 15%.
@mastersthesis{diva2:1223969,
author = {Nilsson, Mats},
title = {{Building Reconstruction of Digital Height Models with the Markov Chain Monte Carlo Method}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5130--SE}},
year = {2018},
address = {Sweden},
}
Robotic bin picking is the problem of emptying a bin of randomly distributedobjects through a robotic interface. This thesis examines an SVM approach to ex-tract grasping points for a vacuum-type gripper. The SVM is trained on syntheticdata and used to classify the points of a non-synthetic 3D-scanned point cloud aseither graspable or non-graspable. The classified points are then clustered intograspable regions from which the grasping points are extracted.
The SVM models and the algorithm as a whole are trained and evaluated againstcubic and cylindrical objects. Separate SVM models are trained for each type ofobject in addition to one model being trained on a dataset containing both typesof objects. It is shown that the performance of the SVM in terms accuracy isdependent on the objects and their geometrical properties. Further, it is shownthat the algorithm is reasonably robust in terms of successfully picking objects,regardless of the scale of the objects.
@mastersthesis{diva2:1243310,
author = {Olsson, Fredrik},
title = {{Feature Based Learning for Point Cloud Labeling and Grasp Point Detection}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5165--SE}},
year = {2018},
address = {Sweden},
}
The purpose of the thesis was to investigate the possibility of using machine learn-ing for automation of liver fat measurements in fat-water magnetic resonancei maging (MRI). The thesis presents methods for texture based liver classificationand Proton Density Fat Fraction (PDFF) regression using multi-layer perceptrons utilizing 2D and 3D textural image features. The first proposed method was a data classification method with the goal to distinguish between suitable andunsuitable regions to measure PDFF in. The second proposed method was a combined classification and regression method where the classification distinguishes between liver and non-liver tissue. The goal of the regression model was to predict the difference d = pdff mean − pdff ROI between the manual ground truth mean and the fat fraction of the active Region of Interest (ROI).Tests were performed on varying sizes of Image Feature Regions (froi) and combinations of image features on both of the proposed methods. The tests showed that 3D measurements using image features from discrete wavelet transforms produced measurements similar to the manual fat measurements. The first method resulted in lower relative errors while the second method had a higher method agreement compared to manual measurements.
@mastersthesis{diva2:1248500,
author = {Grundström, Tobias},
title = {{Automated Measurements of Liver Fat Using Machine Learning}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5166--SE}},
year = {2018},
address = {Sweden},
}
Recently, sensors such as radars and cameras have been widely used in automotives, especially in Advanced Driver-Assistance Systems (ADAS), to collect information about the vehicle's surroundings. Stereo cameras are very popular as they could be used passively to construct a 3D representation of the scene in front of the car. This allowed the development of several ADAS algorithms that need 3D information to perform their tasks. One interesting application is Road Surface Preview (RSP) where the task is to estimate the road height along the future path of the vehicle. An active suspension control unit can then use this information to regulate the suspension, improving driving comfort, extending the durabilitiy of the vehicle and warning the driver about potential risks on the road surface. Stereo cameras have been successfully used in RSP and have demonstrated very good performance. However, the main disadvantages of stereo cameras are their high production cost and high power consumption. This limits installing several ADAS features in economy-class vehicles. A less expensive alternative are monocular cameras which have a significantly lower cost and power consumption. Therefore, this thesis investigates the possibility of solving the Road Surface Preview task using a monocular camera. We try two different approaches: structure-from-motion and Convolutional Neural Networks.The proposed methods are evaluated against the stereo-based system. Experiments show that both structure-from-motion and CNNs have a good potential for solving the problem, but they are not yet reliable enough to be a complete solution to the RSP task and be used in an active suspension control unit.
@mastersthesis{diva2:1253882,
author = {Ekström, Marcus},
title = {{Road Surface Preview Estimation Using a Monocular Camera}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5173--SE}},
year = {2018},
address = {Sweden},
}
Thermal spectrum cameras are gaining interest in many applications due to their long wavelength which allows them to operate under low light and harsh weather conditions. One disadvantage of thermal cameras is their limited visual interpretability for humans, which limits the scope of their applications. In this thesis, we try to address this problem by investigating the possibility of transforming thermal infrared (TIR) images to perceptually realistic visible spectrum (VIS) images by using Convolutional Neural Networks (CNNs). Existing state-of-the-art colorization CNNs fail to provide the desired output as they were trained to map grayscale VIS images to color VIS images. Instead, we utilize an auto-encoder architecture to perform cross-spectral transformation between TIR and VIS images. This architecture was shown to quantitatively perform very well on the problem while producing perceptually realistic images. We show that the quantitative differences are insignificant when training this architecture using different color spaces, while there exist clear qualitative differences depending on the choice of color space. Finally, we found that a CNN trained from daytime examples generalizes well on tests from night time.
@mastersthesis{diva2:1255342,
author = {Nyberg, Adam},
title = {{Transforming Thermal Images to Visible Spectrum Images Using Deep Learning}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5167--SE}},
year = {2018},
address = {Sweden},
}
This master thesis explores the possibility of using generative Adversarial Networks (GANs) to refine labeled synthetic code images to resemble real code images while preserving label information. The GAN used in this thesis consists of a refiner and a discriminator. The discriminator tries to distinguish between real images and refined synthetic images. The refiner tries to fool the discriminator by producing refined synthetic images such that the discriminator classify them as real. By updating these two networks iteratively, the idea is that they will push each other to get better, resulting in refined synthetic images with real image characteristics.
The aspiration, if the exploration of GANs turns out successful, is to be able to use refined synthetic images as training data in Semantic Segmentation (SS) tasks and thereby eliminate the laborious task of gathering and labeling real data. Starting off from a foundational GAN-model, different network architectures, hyperparameters and other design choices are explored to find the best performing GAN-model.
As is widely acknowledged in the relevant literature, GANs can be difficult to train and the results in this thesis are varying and sometimes ambiguous. Based on the results from this study, the best performing models do however perform better in SS tasks than the unrefined synthetic set they are based on and benchmarked against, with regards to Intersection over Union.
@mastersthesis{diva2:1254973,
author = {Stenhagen, Petter},
title = {{Improving Realism in Synthetic Barcode Images using Generative Adversarial Networks}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5169--SE}},
year = {2018},
address = {Sweden},
}
In recent years, development of Convolutional Neural Networks has enabled high performing semantic segmentation models. Generally, these deep learning based segmentation methods require a large amount of annotated data. Acquiring such annotated data for semantic segmentation is a tedious and expensive task.
Within machine learning, active learning involves in the selection of new data in order to limit the usage of annotated data. In active learning, the model is trained for several iterations and additional samples are selected that the model is uncertain of. The model is then retrained on additional samples and the process is repeated again. In this thesis, an active learning framework has been applied to road segmentation which is semantic segmentation of objects related to road scenes.
The uncertainty in the samples is estimated with Monte Carlo dropout. In Monte Carlo dropout, several dropout masks are applied to the model and the variance is captured, working as an estimate of the model’s uncertainty. Other metrics to rank the uncertainty evaluated in this work are: a baseline method that selects samples randomly, the entropy in the default predictions and three additional variations/extensions of Monte Carlo dropout.
Both the active learning framework and uncertainty estimation are implemented in the thesis. Monte Carlo dropout performs slightly better than the baseline in 3 out of 4 metrics. Entropy outperforms all other implemented methods in all metrics. The three additional methods do not perform better than Monte Carlo dropout.
An analysis of what kind of uncertainty Monte Carlo dropout capture is performed together with a comparison of the samples selected by baseline and Monte Carlo dropout. Future development and possible improvements are also discussed.
@mastersthesis{diva2:1259079,
author = {Sörsäter, Michael},
title = {{Active Learning for Road Segmentation using Convolutional Neural Networks}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5176--SE}},
year = {2018},
address = {Sweden},
}
Training data is the bottleneck for training Convolutional Neural Networks. A larger dataset gives better accuracy though also needs longer training time. It is shown by finetuning neural networks on synthetic rendered images, that the mean average precision increases. This method was applied to two different datasets with five distinctive objects in each. The first dataset consisted of random objects with different geometric shapes. The second dataset contained objects used to assemble IKEA furniture. The neural network with the best performance, trained on 5400 images, achieved a mean average precision of 0.81 on a test which was a sample of a video sequence. Analysis of the impact of the factors dataset size, batch size, and numbers of epochs used in training and different network architectures were done. Using synthetic images to train CNN’s is a promising path to take for object detection where access to large amount of annotated image data is hard to come by.
@mastersthesis{diva2:1267446,
author = {Vi, Margareta},
title = {{Object Detection Using Convolutional Neural Network Trained on Synthetic Images}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5180--SE}},
year = {2018},
address = {Sweden},
}
Visual tracking is a computer vision problem where the task is to follow a targetthrough a video sequence. Tracking has many important real-world applications in several fields such as autonomous vehicles and robot-vision. Since visual tracking does not assume any prior knowledge about the target, it faces different challenges such occlusion, appearance change, background clutter and scale change. In this thesis we try to improve the capabilities of tracking frameworks using discriminative correlation filters by incorporating scene depth information. We utilize scene depth information on three main levels. First, we use raw depth information to segment the target from its surroundings enabling occlusion detection and scale estimation. Second, we investigate different visual features calculated from depth data to decide which features are good at encoding geometric information available solely in depth data. Third, we investigate handling missing data in the depth maps using a modified version of the normalized convolution framework. Finally, we introduce a novel approach for parameter search using genetic algorithms to find the best hyperparameters for our tracking framework. Experiments show that depth data can be used to estimate scale changes and handle occlusions. In addition, visual features calculated from depth are more representative if they were combined with color features. It is also shown that utilizing normalized convolution improves the overall performance in some cases. Lastly, the usage of genetic algorithms for hyperparameter search leads to accuracy gains as well as some insights on the performance of different components within the framework.
@mastersthesis{diva2:1266346,
author = {Stynsberg, John},
title = {{Incorporating Scene Depth in Discriminative Correlation Filters for Visual Tracking}},
school = {Linköping University},
type = {{LiTH-ISY-EX--18/5178--SE}},
year = {2018},
address = {Sweden},
}
In many situations after a big catastrophe such as the one in Fukushima, the disaster area is highly dangerous for humans to enter. It is in such environments that a semi-autonomous robot could limit the risks to humans by exploring and mapping the area on its own. This thesis intends to design and implement a software based SLAM system which has potential to run in real-time using a Kinect 2 sensor as input.
The focus of the thesis has been to create a system which allows for efficient storage and representation of the map, in order to be able to explore large environments. This is done by separating the map in different abstraction levels corresponding to local maps connected by a global map.
During the implementation, this structure has been kept in mind in order to allow modularity. This makes it possible for each sub-component in the system to be exchanged if needed.
The thesis is broad in the sense that it uses techniques from distinct areas to solve the sub-problems that exist. Some examples being, object detection and classification, point-cloud registration and efficient 3D-based occupancy trees.
@mastersthesis{diva2:1065996,
author = {Holmquist, Karl},
title = {{SLAMIt A Sub-Map Based SLAM System:
On-line creation of multi-leveled map}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/5021--SE}},
year = {2017},
address = {Sweden},
}
After a digital photo has been taken by a camera, it can be manipulated to be more appealing. Two ways of doing that are to reduce noise and to increase the saturation. With time and skills in an image manipulating program, this is usually done by hand. In this thesis, automatic image improvement based on artificial neural networks is explored and evaluated qualitatively and quantitatively. A new approach, which builds on an existing method for colorizing gray scale images is presented and its performance compared both to simpler methods and the state of the art in image denoising. Saturation is lowered and noise added to original images, which the methods receive as inputs to improve upon. The new method is shown to improve in some cases but not all, depending on the image and how it was modified before given to the method.
@mastersthesis{diva2:1098332,
author = {Lind, Benjamin},
title = {{Artificial Neural Networks for Image Improvement}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5025--SE}},
year = {2017},
address = {Sweden},
}
This thesis investigates if support vector machine classification is a suitable approach when performing automatic segmentation of knee cartilage using quantitative magnetic resonance imaging data. The data sets used are part of a clinical project that investigates if patients that have suffered recent knee damage will develop cartilage damage. Therefore the thesis also investigates if the segmentation results can be used to predict the clinical outcome of the patients.
Two methods that perform the segmentation using support vector machine classification are implemented and evaluated. The evaluation indicates that it is a good approach for the task, but the implemented methods needs to be further improved and tested on more data sets before clinical use.
It was not possible to relate the cartilage properties to clinical outcome using the segmentation results. However, the investigation demonstrated good promise of how the segmentation results, if they are improved, can be used in combination with quantitative magnetic resonance imaging data to analyze how the cartilage properties change over time or vary between knees.
@mastersthesis{diva2:1109911,
author = {Lind, Marcus},
title = {{Automatic Segmentation of Knee Cartilage Using Quantitative MRI Data}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5041--SE}},
year = {2017},
address = {Sweden},
}
Automated navigability assessment based on image sensor data is an important concern in the design of autonomous robotic systems. The problem consists in finding a mapping from input data to the navigability status of different areas of the surrounding world. Machine learning techniques are often applied to this problem. This thesis investigates an approach to navigability assessment in the image plane, based on offline learning using deep convolutional neural networks, applied to RGB and depth data collected using a robotic platform. Training outputs were generated by manually marking out instances of near collision in the sequences and tracing back the location of the near-collision frame through the previous frames. Several combinations of network inputs were tried out. Inputs included grayscale gradient versions of the RGB frames, depth maps, image coordinate maps and motion information in the form of a previous RGB frame or heading maps. Some improvement compared to simple depth thresholding was demonstrated, mainly in the handling of noise and missing pixels in the depth maps. The resulting networks appear to be mostly dependent on depth information; an attempt to train a network without the depth frames was unsuccessful,and a network trained using the depth frames alone performed similarly to networks trained with additional inputs. An unsuccessful attempt at training a network towards a more motion-dependent navigability concept was also made. It was done by including training frames captured as the robot was moving away from the obstacle, where the corresponding training outputs were marked as obstacle-free.
@mastersthesis{diva2:1110839,
author = {Wimby Schmidt, Ebba},
title = {{Navigability Assessment for Autonomous Systems Using Deep Neural Networks}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5045--SE}},
year = {2017},
address = {Sweden},
}
There is a need for quantitative CT data in radiation therapy. Currently there are only few algorithms that address this issue, for instance the commercial DirectDensity algorithm. In scientific literature, an example of such an algorithm is DIRA. DIRA is an iterative model-based reconstruction method for dual-energy CT whose goal is to determine the material composition of the patient from accurate linear attenuation coefficients. It has been implemented in a two dimensional geometry, i.e., it could process axial scans only. There was a need to extend DIRA so that it could process projection data generated in helical scanning geometries. The newly developed algorithm (DIRA-3D) implemented (i) polyenergetic semi-parallel projection generation, (ii) mono-energetic parallel projection generation and (iii) the PI-method for image reconstruction. The computation experiments showed that the accuracies of the resulting LAC and mass fractions were comparable to the ones of the original DIRA. The results converged after 10 iterations.
@mastersthesis{diva2:1111894,
author = {Björnfot, Magnus},
title = {{Extension of DIRA (Dual-Energy Iterative Algorithm) to 3D Helical CT}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5057--SE}},
year = {2017},
address = {Sweden},
}
Visual Object Tracking is the computer vision problem of estimating a target trajectory in a video given only its initial state. A visual tracker often acts as a component in the intelligent vision systems seen in for instance surveillance, autonomous vehicles or robots, and unmanned aerial vehicles. Applications may require robust tracking performance on difficult sequences depicting targets undergoing large changes in appearance, while enforcing a real-time constraint. Discriminative correlation filters have shown promising tracking performance in recent years, and consistently improved state-of-the-art. With the advent of deep learning, new robust deep features have improved tracking performance considerably. However, methods based on discriminative correlation filters learn a rigid template describing the target appearance. This implies an assumption of target rigidity which is not fulfilled in practice. This thesis introduces an approach which integrates deformability into a stateof-the-art tracker. The approach is thoroughly tested on three challenging visual tracking benchmarks, achieving state-of-the-art performance.
@mastersthesis{diva2:1111930,
author = {Johnander, Joakim},
title = {{Visual Tracking with Deformable Continuous Convolution Operators}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5047--SE}},
year = {2017},
address = {Sweden},
}
Modern cars are often equipped with sensors like radar, infrared cameras and stereo cameras that collect information about its surroundings. By using a stereo camera, it is possible to receive information about the distance to points in front of the car. This information can be used to estimate the height of the predicted path of the car. An application which does this is the stereo based Road surface preview (RSP) algorithm. By using the output from the RSP algorithm it is possible to use active suspension control, which controls the vertical movement of the wheels relative to the chassis. This application primarily makes the driving experience more comfortable, but also extends the durability of the vehicle. The idea behind this Master’s thesis is to create an evaluation tool for the RSP algorithm, which can be used at arbitrary roads.
The thesis describes the proposed evaluation tool, where focus has been to make an accurate comparison of camera data received from the RSP algorithm and laser data used as ground truth in this thesis. Since the tool shall be used at the company proposing this thesis, focus has also been on making the tool user friendly. The report discusses the proposed methods, possible sources to errors and improvements. The evaluation tool considered in this thesis shows good results for the available test data, which made it possible to include an investigation of a possible improvement of the RSP algorithm.
@mastersthesis{diva2:1115333,
author = {Manfredsson, Johan},
title = {{Evaluation Tool for a Road Surface Algorithm}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5063--SE}},
year = {2017},
address = {Sweden},
}
Image registration is the process of geometrically deforming a template image into a reference image. This technique is important and widely used within thefield of medical IT. The purpose could be to detect image variations, pathologicaldevelopment or in the company AMRA’s case, to quantify fat tissue in variousparts of the human body.From an MRI (Magnetic Resonance Imaging) scan, a water and fat tissue image isobtained. Currently, AMRA is using the Morphon algorithm to register and segment the water image in order to quantify fat and muscle tissue. During the firstpart of this master thesis, two alternative registration methods were evaluated.The first algorithm was Free Form Deformation which is a non-linear parametricbased method. The second algorithm was a non-parametric optical flow basedmethod known as the Demon algorithm. During the second part of the thesis,the Demon algorithm was used to evaluate the effect of using the fat images forregistrations.
@mastersthesis{diva2:1118172,
author = {Ivarsson, Magnus},
title = {{Evaluation of 3D MRI Image Registration Methods}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5037--SE}},
year = {2017},
address = {Sweden},
}
All dairy cows in Europe wear unique identification tags in their ears. These eartags are standardized and contains the cows identification numbers, today only used for visual identification by the farmer. The cow also needs to be identified by an automatic identification system connected to milk machines and other robotics used at the farm. Currently this is solved with a non-standardized radio transmitter which can be placed on different places on the cow and different receivers needs to be used on different farms. Other drawbacks with the currently used identification system are that it is expensive and unreliable. This thesis explores the possibility to replace this non standardized radio frequency based identification system with a standardized computer vision based system. The method proposed in this thesis uses a color threshold approach for detection, a flood fill approach followed by Hough transform and a projection method for segmentation and evaluates template matching, k-nearest neighbour and support vector machines as optical character recognition methods. The result from the thesis shows that the quality of the data used as input to the system is vital. By using good data, k-nearest neighbour, which showed the best results of the three OCR approaches, handles 98 % of the digits.
@mastersthesis{diva2:1120668,
author = {Ilestrand, Maja},
title = {{Automatic Eartag Recognition on Dairy Cows in Real Barn Environment}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5072--SE}},
year = {2017},
address = {Sweden},
}
The two main bottlenecks using deep neural networks are data dependency and training time. This thesis proposes a novel method for weight initialization of the convolutional layers in a convolutional neural network. This thesis introduces the usage of sparse dictionaries. A sparse dictionary optimized on domain specific data can be seen as a set of intelligent feature extracting filters. This thesis investigates the effect of using such filters as kernels in the convolutional layers in the neural network. How do they affect the training time and final performance?
The dataset used here is the Cityscapes-dataset which is a library of 25000 labeled road scene images.The sparse dictionary was acquired using the K-SVD method. The filters were added to two different networks whose performance was tested individually. One of the architectures is much deeper than the other. The results have been presented for both networks. The results show that filter initialization is an important aspect which should be taken into consideration while training the deep networks for semantic segmentation.
@mastersthesis{diva2:1127291,
author = {Andersson, Viktor},
title = {{Semantic Segmentation:
Using Convolutional Neural Networks and Sparse dictionaries}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5054--SE}},
year = {2017},
address = {Sweden},
}
Extracting foreground objects from an image is a hot research topic. Doing thisfor high quality real world images in real-time on limited hardware such as asmart phone, is a demanding task. This master thesis shows how this problemcan be addressed using Otsu’s method together with Gaussian probability dis-tributions to create classifiers in different colour channels. We also show howclassifiers can be combined resulting in higher accuracy than using only the indi-vidual classifiers. We also propose using inter-class variance together with imagevariance to estimate classifier quality.A data set was produced to evaluate performance. The data set featuresreal-world images captured by a smart phone and objects of varying complex-ity against plain backgrounds that can be found in a typical office or urban space.
@mastersthesis{diva2:1144357,
author = {poole, alexander},
title = {{Real-Time Image Segmentation for Augmented Reality by Combiningmulti-Channel Thresholds.}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5083--SE}},
year = {2017},
address = {Sweden},
}
During flights with manned or unmanned aircraft, continuous recording can result in avery high number of images to analyze and evaluate. To simplify image analysis and tominimize data link usage, appropriate images should be suggested for transfer and furtheranalysis. This thesis investigates features used for selection of images worthy of furtheranalysis using machine learning. The selection is done based on the criteria of havinggood quality, salient content and being unique compared to the other selected images.The investigation is approached by implementing two binary classifications, one regardingcontent and one regarding quality. The classifications are made using support vectormachines. For each of the classifications three feature extraction methods are performedand the results are compared against each other. The feature extraction methods used arehistograms of oriented gradients, features from the discrete cosine transform domain andfeatures extracted from a pre-trained convolutional neural network. The images classifiedas both good and salient are then clustered based on similarity measures retrieved usingcolor coherence vectors. One image from each cluster is retrieved and those are the resultingimages from the image selection. The performance of the selection is evaluated usingthe measures precision, recall and accuracy. The investigation showed that using featuresextracted from the discrete cosine transform provided the best results for the quality classification.For the content classification, features extracted from a convolutional neuralnetwork provided the best results. The similarity retrieval showed to be the weakest partand the entire system together provides an average accuracy of 83.99%.
@mastersthesis{diva2:1151145,
author = {Lorentzon, Matilda},
title = {{Feature Extraction for Image Selection Using Machine Learning}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5097--SE}},
year = {2017},
address = {Sweden},
}
The ability to automatically estimate the volume of timber is becoming increasingly important within the timber industry. The large number of timber trucks arriving each day at Swedish timber terminals fortifies the need for a volume estimation performed in real-time and on-the-go as the trucks arrive.
This thesis investigates if a volumetric integration of disparity maps acquired from a Multi-View Stereo (MVS) system is a suitable approach for automatic volume estimation of timber loads. As real-time execution is preferred, efforts were made to provide a scalable method. The proposed method was quantitatively evaluated on datasets containing two geometric objects of known volume. A qualitative comparison to manual volume estimates of timber loads was also made on datasets recorded at a Swedish timber terminal.
The proposed method is shown to be both accurate and precise under specific circumstances. However, robustness is poor to varying weather conditions, although a more thorough evaluation of this aspect needs to be performed. The method is also parallelizable, which means that future efforts can be made to significantly decrease execution time.
@mastersthesis{diva2:1153580,
author = {Rundgren, Emil},
title = {{Automatic Volume Estimation of Timber from Multi-View Stereo 3D Reconstruction}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5093--SE}},
year = {2017},
address = {Sweden},
}
Being able to reconstruct real-world environments into digital 3D models is something that has many different types of interesting applications. With the current state of the art, the results can be very impressive, but there is naturally still room for improvements. This thesis looks into essentially two different parts. The first part is about finding out wether it is feasible to detect geometric primitives, mainly planes, in the initially reconstructed point cloud. The second part looks into using the information about which points have been fitted to a geometric primitive to improve the final model.
Detection of the geometric primitives is done using the RANSAC-algorithm, which is a method for discovering if a given model is present in a data set.
A few different alternatives are evaluated for using the information about the geometric primitives to improve the final surface. The first option is to project points onto their identified shape. The second option is to remove points that have not been matched to a shape. The last option is to evaluate the possibility of changing the weights of individual points, which is an alternative available in the chosen surface reconstruction method.
The detection of geometric primitives shows some potential, but it often requires manual intervention to find correct parameters for different types of data sets. As for using the information about the geometric primitives to improve the final model, both projecting points and removal of non-matched points, does not quite address the problem at hand. Increasing the weights on matched points does show some potential, however, but is still far from being a complete method.
A small part of the thesis looks into the possibility of automatically finding areas where there are significant differences between the initial point cloud and a reconstructed surface. For this, hierarchical clustering is used. This part is however not evaluated quantitatively
@mastersthesis{diva2:1153573,
author = {Norlander, Robert},
title = {{Make it Complete:
Surface Reconstruction Aided by Geometric Primitives}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5096--SE}},
year = {2017},
address = {Sweden},
}
The Exponential Linear Unit (ELU) has been proven to speed up learning and improve the classification performance over activation functions such as ReLU and Leaky ReLU for convolutional neural networks. The reasons behind the improved behavior are that ELU reduces the bias shift, it saturates for large negative inputs and it is continuously differentiable. However, it remains open whether ELU has the optimal shape and we address the quest for a superior activation function.
We use a new formulation to tune a piecewise linear activation function during training, to investigate the above question, and learn the shape of the locally optimal activation function. With this tuned activation function, the classification performance is improved and the resulting, learned activation function shows to be ELU-shaped irrespective if it is initialized as a RELU, LReLU or ELU. Interestingly, the learned activation function does not exactly pass through the origin indicating that a shifted ELU-shaped activation function is preferable. This observation leads us to introduce the Shifted Exponential Linear Unit (ShELU) as a new activation function.
Experiments on Cifar-100 show that the classification performance is further improved when using the ShELU activation function in comparison with ELU. The improvement is achieved when learning an individual bias shift for each neuron.
@techreport{diva2:1154026,
author = {Grelsson, Bertil and Felsberg, Michael},
title = {{Performance boost in Convolutional Neural Networks by tuning shifted activation functions}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2017},
type = {Other academic},
number = {, },
address = {Sweden},
}
The thesis work evaluates a method to estimate the volume of stone and gravelpiles using only a cellphone to collect video and sensor data from the gyroscopesand accelerometers. The project is commissioned by Escenda Engineering withthe motivation to replace more complex and resource demanding systems with acheaper and easy to use handheld device. The implementation features popularcomputer vision methods such as KLT-tracking, Structure-from-Motion, SpaceCarving together with some Sensor Fusion. The results imply that it is possible toestimate volumes up to a certain accuracy which is limited by the sensor qualityand with a bias.
@mastersthesis{diva2:1172784,
author = {Fallqvist, Marcus},
title = {{Automatic Volume Estimation Using Structure-from-Motion Fused with a Cellphone's Inertial Sensors}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5107--SE}},
year = {2017},
address = {Sweden},
}
Barcodes are ubiquituous in modern society and they have had industrial application for decades. However, for noisy images modern methods can underperform. Poor lighting conditions, occlusions and low resolution can be problematic in decoding. This thesis aims to solve this problem by using neural networks, which have enjoyed great success in many computer vision competitions the last years. We investigate how three different networks perform on data sets with noisy images. The first network is a single classifier, the second network is an ensemble classifier and the third is based on a pre-trained feature extractor. For comparison, we also test two baseline methods that are used in industry today. We generate training data using software and modify it to ensure proper generalization. Testing data is created by photographing barcodes in different settings, creating six image classes - normal, dark, white, rotated, occluded and wrinkled. The proposed single classifier and ensemble classifier outperform the baseline as well as the pre-trained feature extractor by a large margin. The thesis work was performed at SICK IVP, a machine vision company in Linköping in 2017.
@mastersthesis{diva2:1164104,
author = {Fridborn, Fredrik},
title = {{Reading Barcodes with Neural Networks}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5102--SE}},
year = {2017},
address = {Sweden},
}
Semantic segmentation of a scene aims to give meaning to the scene by dividing it into meaningful — semantic — parts. Understanding the scene is of great interest for all kinds of autonomous systems, but manual annotation is simply too time consuming, which is why there is a need for an alternative approach. This thesis investigates the possibility of automatically segmenting 3D-models of urban scenes, such as buildings, into a predetermined set of labels. The approach was to first acquire ground truth data by manually annotating five 3D-models of different urban scenes. The next step was to extract features from the 3D-models and evaluate which ones constitutes a suitable feature space. Finally, three supervised learners were implemented and evaluated: k-Nearest Neighbour (KNN), Support Vector Machine (SVM) and Random Classification Forest (RCF). The classifications were done point-wise, classifying each 3D-point in the dense point cloud belonging to the model being classified.
The result showed that the best suitable feature space is not necessarily the one containing all features. The KNN classifier got the highest average accuracy overall models — classifying 42.5% of the 3D points correct. The RCF classifier managed to classify 66.7% points correct in one of the models, but had worse performance for the rest of the models and thus resulting in a lower average accuracy compared to KNN. In general, KNN, SVM, and RCF seemed to have different benefits and drawbacks. KNN is simple and intuitive but by far the slowest classifier when dealing with a large set of training data. SVM and RCF are both fast but difficult to tune as there are more parameters to adjust. Whether the reason for obtaining the relatively low highest accuracy was due to the lack of ground truth training data, unbalanced validation models, or the capacity of the learners, was never investigated due to a limited time span. However, this ought to be investigated in future studies.
@mastersthesis{diva2:1166634,
author = {Lind, Johan},
title = {{Make it Meaningful:
Semantic Segmentation of Three-Dimensional Urban Scene Models}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5103--SE}},
year = {2017},
address = {Sweden},
}
The purpose of this document is to reect on novel and upcoming methods for computer vision that might have relevance for application in robot vision and video analytics. The document covers many dierent sub-elds of computer vision, most of which have been addressed by our research activity at the computer vision laboratory. The report has been written based on a request of, and supported by, FOI.
@techreport{diva2:1165440,
author = {Felsberg, Michael},
title = {{Five years after the Deep Learning revolution of computer vision:
State of the art methods for online image and video analysis}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2017},
type = {Other academic},
number = {, },
address = {Sweden},
}
The recent emergence of time-of-flight cameras has opened up new possibilities in the world of computer vision. These compact sensors, capable of recording the depth of a scene in real-time, are very advantageous in many applications, such as scene or object reconstruction. This thesis first addresses the problem of fusing depth data with color images. A complete process to combine a time-of-flight camera with a color camera is described and its accuracy is evaluated. The results show that a satisfying precision is reached and that the step of calibration is very important.
The second part of the work consists of applying super-resolution techniques to the time-of-flight camera in order to improve its low resolution. Different types of super-resolution algorithms exist but this thesis focuses on the combination of multiple shifted depth maps. The proposed framework is made of two steps: registration and reconstruction. Different methods for each step are tested and compared according to the improvements reached in term of level of details, sharpness and noise reduction. The results obtained show that Lucas-Kanade performs the best for the registration and that a non-uniform interpolation gives the best results in term of reconstruction. Finally, a few suggestions are made about future work and extensions for our solutions.
@mastersthesis{diva2:1149382,
author = {Zins, Matthieu},
title = {{Color Fusion and Super-Resolution for Time-of-Flight Cameras}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5089--SE}},
year = {2017},
address = {Sweden},
}
The objective of this master’s thesis work is to evaluate the potential benefit of a superpixel preprocessing step for general object detection in a traffic environment. The various effects of different superpixel parameters on object detection performance, as well as the benefit of including depth information when generating the superpixels are investigated.
In this work, three superpixel algorithms are implemented and compared, including a proposal for an improved version of the popular Spectral Linear Iterative Clustering superpixel algorithm (SLIC). The proposed improved algorithm utilises a coarse-to-fine approach which outperforms the original SLIC for high-resolution images. An object detection algorithm is also implemented and evaluated. The algorithm makes use of depth information obtained by a stereo camera to extract superpixels corresponding to foreground objects in the image. Hierarchical clustering is then applied, with the segments formed by the clustered superpixels indicating potential objects in the input image.
The object detection algorithm managed to detect on average 58% of the objects present in the chosen dataset. It performed especially well for detecting pedestrians or other objects close to the car. Altering the density distribution of the superpixels in the image yielded an increase in detection rate, and could be achieved both with or without utilising depth information. It was also shown that the use of superpixels greatly reduces the amount of computations needed for the algorithm, indicating that a real-time implementation is feasible.
@mastersthesis{diva2:1141088,
author = {Wälivaara, Marcus},
title = {{General Object Detection Using Superpixel Preprocessing}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5085--SE}},
year = {2017},
address = {Sweden},
}
Deep learning has dominated the computer vision field since 2012, but a common criticism of deep learning methods is their dependence on large amounts of data. To combat this criticism research into data-efficient deep learning is growing. The foremost success in data-efficient deep learning is transfer learning with networks pre-trained on the ImageNet dataset. Pre-trained networks have achieved state-of-the-art performance on many tasks. We consider the pre-trained network method for a new task where we have to collect the data. We hypothesize that the data efficiency of pre-trained networks can be improved through informed data collection. After exhaustive experiments on CaffeNet and VGG16, we conclude that the data efficiency indeed can be improved. Furthermore, we investigate an alternative approach to data-efficient learning, namely adding domain knowledge in the form of a spatial transformer to the pre-trained networks. We find that spatial transformers are difficult to train and seem to not improve data efficiency.
@mastersthesis{diva2:1112122,
author = {Lundström, Dennis},
title = {{Data-efficient Transfer Learning with Pre-trained Networks}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5051--SE}},
year = {2017},
address = {Sweden},
}
This work investigates the landscape of aerial image stereo matching (AISM) methods suitable for large scale forest variable estimation. AISM methods are an important source of remotely collected information used in modern forestry to keep track of a growing forest's condition.
A total of 17 AISM methods are investigated, out of which 4 are evaluated by processing a test data set consisting of three aerial images. The test area is located in southern Sweden, consisting of mainly Norway Spruce and Scots Pine. From the resulting point clouds and height raster images, a total of 30 different metrics of both height and density types are derived. Linear regression is used to fit functions from metrics derived from AISM data to a set of forest variables including tree height (HBW), tree diameter (DBW), basal area, volume. As ground truth, data collected by dense airborne laser scanning is used. Results are presented as RMSE and standard deviation concluded from the linear regression.
For tree height, tree diameter, basal area, volume the RMSE ranged from 7.442% to 10.11%, 11.58% to 13.96%, 32.01% to 35.10% and 34.01% to 38.26% respectively. The results concluded that all four tested methods achieved comparable estimation quality although showing small differences among them. Keystone and SURE performed somewhat better while MicMac placed third and Photoscan achieved the less accurate result.
@mastersthesis{diva2:1109735,
author = {Svensk, Joakim},
title = {{Evaluation of Aerial Image Stereo Matching Methods for Forest Variable Estimation}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5036--SE}},
year = {2017},
address = {Sweden},
}
Now and then train accidents occur. Collisions between trains and objects such as animals, humans, cars, and fallen trees can result in casualties, severe damage on the train, and delays in the train traffic. Thus, train collisions are a considerable problem with consequences affecting society substantially.
The company Termisk Systemteknik AB has on commission by Rindi Solutions AB investigated the possibility to detect anomalies on the railway using a trainmounted thermal imaging camera. Rails are also detected in order to determine if an anomaly is on the rail or not. However, the rail detection method does not work satisfactory at long range.
The purpose of this master’s thesis is to improve the previous rail detector at long range by using machine learning, and in particular deep learning and a convolutional neural network. Of interest is also to investigate if there are any advantages using cross-modal transfer learning.
A labelled dataset for training and testing was produced manually. Also, a loss function tailored to the particular problem at hand was constructed. The loss function was used both for improving the system during training and evaluate the system’s performance during testing. Finally, eight different approaches were evaluated, each one resulting in a different rail detector.
Several of the rail detectors, and in particular all the rail detectors using crossmodal transfer learning, perform better than the previous rail detector. Thus, the new rail detectors show great potential to the rail detection problem.
@mastersthesis{diva2:1111486,
author = {Wedberg, Magnus},
title = {{Detecting Rails in Images from a Train-Mounted Thermal Camera Using a Convolutional Neural Network}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5058--SE}},
year = {2017},
address = {Sweden},
}
When forensic examiners try to identify the perpetrator of a felony, they use individual facial marks when comparing the suspect with the perpetrator. Facial marks are often used for identification and they are nowadays found manually. To speed up this process, it is desired to detect interesting facial marks automatically. This master thesis describes a method to automatically detect and separate permanent and non-permanent marks. It uses a fast radial symmetry algorithm as a core element in the mark detector. After candidate skin mark extraction, the false detections are removed depending on their size, shape and number of hair pixels. The classification of the skin marks is done with a support vector machine and the different features are examined. The results show that the facial mark detector has a good recall while the precision is poor. The elimination methods of false detection were analysed as well as the different features for the classifier. One can conclude that the color of facial marks is more relevant than the structure when classifying them into permanent and non-permanent marks.
@mastersthesis{diva2:1107743,
author = {Moulis, Armand},
title = {{Automatic Detection and Classification of Permanent and Non-Permanent Skin Marks}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5048--SE }},
year = {2017},
address = {Sweden},
}
In computer vision, it has in recent years become more popular to use point clouds to represent 3D data. To understand what a point cloud contains, methods like semantic segmentation can be used. Semantic segmentation is the problem of segmenting images or point clouds and understanding what the different segments are. An application for semantic segmentation of point clouds are e.g. autonomous driving, where the car needs information about objects in its surrounding.
Our approach to the problem, is to project the point clouds into 2D virtual images using the Katz projection. Then we use pre-trained convolutional neural networks to semantically segment the images. To get the semantically segmented point clouds, we project back the scores from the segmentation into the point cloud. Our approach is evaluated on the semantic3D dataset. We find our method is comparable to state-of-the-art, without any fine-tuning on the Semantic3Ddataset.
@mastersthesis{diva2:1091059,
author = {Tosteberg, Patrik},
title = {{Semantic Segmentation of Point Clouds Using Deep Learning}},
school = {Linköping University},
type = {{LiTH-ISY-EX--17/5029--SE}},
year = {2017},
address = {Sweden},
}
@techreport{diva2:1083263,
author = {Eldesokey, Abdelrahman},
title = {{Normalized Convolutional Neural Networks for Sparse Data}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2017},
type = {Other academic},
number = {LiTH-ISY-R, 3096},
address = {Sweden},
}
In the field of Natural Language Processing, supervised machine learning is commonly used to solve classification tasks such as sentiment analysis and text categorization. The classical way of representing the text has been to use the well known Bag-Of-Words representation. However lately low-dimensional dense word vectors have come to dominate the input to state-of-the-art models. While few studies have made a fair comparison of the models' sensibility to the text representation, this thesis tries to fill that gap. We especially seek insight in the impact various unsupervised pre-trained vectors have on the performance. In addition, we take a closer look at the Random Indexing representation and try to optimize it jointly with the classification task. The results show that while low-dimensional pre-trained representations often have computational benefits and have also reported state-of-the-art performance, they do not necessarily outperform the classical representations in all cases.
@mastersthesis{diva2:928411,
author = {Norlund, Tobias},
title = {{The Use of Distributional Semantics in Text Classification Models:
Comparative performance analysis of popular word embeddings}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/4926--SE}},
year = {2016},
address = {Sweden},
}
Measurements performed from stereo reconstruction can be obtained with a high accuracy with correct calibrated cameras. A stereo camera rig mounted in an outdoor environment is exposed to temperature changes, which has an impact of the calibration of the cameras.
The aim of the master thesis was to investigate the thermal impact of a calibrated stereo camera rig. This was performed by placing a stereo rig in a temperature chamber and collect data of a calibration board at different temperatures. Data was collected with two different cameras and lensesand used for calibration of the stereo camera rig for different scenarios. The obtained parameters were plotted and analyzed.
The result from the master thesis gives that the thermal variation has an impact of the accuracy of the calibrated stereo camera rig. A calibration obtained in one temperature can not be used for a different temperature without a degradation of the accuracy. The plotted parameters from the calibration had a high noise level due to problems with the calibration methods, and no visible trend from temperature changes could be seen.
@mastersthesis{diva2:941863,
author = {Andersson, Elin},
title = {{Thermal Impact of a Calibrated Stereo Camera Rig}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/4980--SE}},
year = {2016},
address = {Sweden},
}
Segmentation of the brain into sub-volumes has many clinical applications. Manyneurological diseases are connected with brain atrophy (tissue loss). By dividingthe brain into smaller compartments, volume comparison between the compartmentscan be made, as well as monitoring local volume changes over time. Theformer is especially interesting for the left and right cerebral hemispheres, dueto their symmetric appearance. By using automatic segmentation, the time consumingstep of manually labelling the brain is removed, allowing for larger scaleresearch.In this thesis, three automatic methods for segmenting the brain from magneticresonance (MR) images are implemented and evaluated. Since neither ofthe evaluated methods resulted in sufficiently good segmentations to be clinicallyrelevant, a novel segmentation method, called SB-GC (shape bottleneck detectionincorporated in graph cuts), is also presented. SB-GC utilizes quantitative MRIdata as input data, together with shape bottleneck detection and graph cuts tosegment the brain into the left and right cerebral hemispheres, the cerebellumand the brain stem. SB-GC shows promises of highly accurate and repeatable resultsfor both healthy, adult brains and more challenging cases such as childrenand brains containing pathologies.
@mastersthesis{diva2:933699,
author = {Stacke, Karin},
title = {{Automatic Brain Segmentation into Substructures Using Quantitative MRI}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/4956--SE}},
year = {2016},
address = {Sweden},
}
Face recognition is the problem of identifying individuals in images. This thesis evaluates two methods used to determine if pairs of face images belong to the same individual or not. The first method is a combination of principal component analysis and a neural network and the second method is based on state-of-the-art convolutional neural networks. They are trained and evaluated using two different data sets. The first set contains many images with large variations in, for example, illumination and facial expression. The second consists of fewer images with small variations.
Principal component analysis allowed the use of smaller networks. The largest network has 1.7 million parameters compared to the 7 million used in the convolutional network. The use of smaller networks lowered the training time and evaluation time significantly. Principal component analysis proved to be well suited for the data set with small variations outperforming the convolutional network which need larger data sets to avoid overfitting. The reduction in data dimensionality, however, led to difficulties classifying the data set with large variations. The generous amount of images in this set allowed the convolutional method to reach higher accuracies than the principal component method.
@mastersthesis{diva2:931705,
author = {Habrman, David},
title = {{Face Recognition with Preprocessing and Neural Networks}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/4953--SE}},
year = {2016},
address = {Sweden},
}
The art of reconstructing a real-world scene digitally has been on the mind of researchers for decades. Recently, it has attracted more and more attention from companies seeing a chance to bring this kind of technology to the market. Digital reconstruction of buildings in particular is a niche that has both potential and room for improvement. With this background, this thesis will present the design and evaluation of a pipeline made to find and correct approximately flat surfaces in architectural scenes. The scenes are 3D-reconstructed triangle meshes based on RGB images. The thesis will also comprise an evaluation of a few different components available for doing this, leading to a choice of best components. The goal is to improve the visual quality of the reconstruction.
The final pipeline is designed with two blocks - one to detect initial plane seeds and one to refine the detected planes. The first block makes use of a multi-label energy formulation on the graph that describes the reconstructed surface. Penalties are assigned to each vertex and each edge of the graph based on the vertex labels, effectively describing a Markov Random Field. The energy is minimized with the help of the alpha-expansion algorithm. The second block uses heuristics for growing the detected plane seeds, merging similar planes together and extracting deviating details.
Results on several scenes are presented, showing that the visual quality has been improved while maintaining accuracy compared with ground truth data.
@mastersthesis{diva2:917230,
author = {Jonsson, Mikael},
title = {{Make it Flat:
Detection and Correction of Planar Regions in Triangle Meshes}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/4930--SE}},
year = {2016},
address = {Sweden},
}
Lung cancer is the most common type of cancer in the world and always manifests as lung nodules. Nodules are small tumors that consist of lung tissue. They are usually spherical in shape and their cores can be either solid or subsolid. Nodules are common in lungs, but not all of them are malignant. To determine if a nodule is malignant or benign, attributes like nodule size and volume growth are commonly used. The procedure to obtain these attributes is time consuming, and therefore calls for tools to simplify the process.
The purpose of this thesis work was to investigate the feasibility of a semi-automatic lungnodule segmentation pipeline including volume estimation. This was done by implementing, tuning and evaluating image processing algorithms with different characteristics to create pipeline candidates. These candidates were compared using a similarity index between their segmentation results and ground truth markings to determine the most promising one.
The best performing pipeline consisted of a fixed region of interest together with a level set segmentation algorithm. Its segmentation accuracy was not consistent for all nodules evaluated, but the pipeline showed great potential when dynamically adapting its parameters for each nodule. The use of dynamic parameters was only brie y explored, and further research would be necessary to determine its feasibility.
@mastersthesis{diva2:911649,
author = {Berglin, Lukas},
title = {{Design, Evaluation and Implementation of a Pipeline for Semi-Automatic Lung Nodule Segmentation}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/4925--SE}},
year = {2016},
address = {Sweden},
}
Simultaneous localization and mapping (SLAM) is the problem of mapping your surroundings while simultaneously localizing yourself in the map. It is an important and active area of research for robotics. In this master thesis two approaches are attempted to reduce the drift which appears over time in SLAM algorithms. The first approach tries 3 different motion models for the camera. Two of the models exploit the a priori knowledge that the camera is mounted on a trolley. These two methods are shown to improve the results. The second approach attempts to reduce the drift by reducing noise in the point cloud data used for mapping. This is done by finding planar surfaces in the point clouds. Median filtering is used as an alternative to compare the result for noise reduction. The planes estimation approach is also shown to reduce the drift, while the median estimation makes it worse.
@mastersthesis{diva2:957728,
author = {Bondemark, Richard},
title = {{Improving SLAM on a TOF Camera by Exploiting Planar Surfaces}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/4984--SE}},
year = {2016},
address = {Sweden},
}
Detection and positioning of anatomical landmarks, also called points of interest(POI), is often a concept of interest in medical image processing. Different measures or automatic image analyzes are often directly based upon positions of such points, e.g. in organ segmentation or tissue quantification. Manual positioning of these landmarks is a time consuming and resource demanding process. In this thesis, a general method for positioning of anatomical landmarks is outlined, implemented and evaluated. The evaluation of the method is limited to three different POI; left femur head, right femur head and vertebra T9. These POI are used to define the range of the abdomen in order to measure the amount of abdominal fat in 3D data acquired with quantitative magnetic resonance imaging (MRI). By getting more detailed information about the abdominal body fat composition, medical diagnoses can be issued with higher confidence. Examples of applications could be identifying patients with high risk of developing metabolic or catabolic disease and characterizing the effects of different interventions, i.e. training, bariatric surgery and medications. The proposed method is shown to be highly robust and accurate for positioning of left and right femur head. Due to insufficient performance regarding T9 detection, a modified method is proposed for T9 positioning. The modified method shows promises of accurate and repeatable results but has to be evaluated more extensively in order to draw further conclusions.
@mastersthesis{diva2:957048,
author = {Järrendahl, Hannes},
title = {{Automatic Detection of Anatomical Landmarks in Three-Dimensional MRI}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/4990--SE}},
year = {2016},
address = {Sweden},
}
Object Recognition is the art of localizing predefined objects in image sensor data. In this thesis a depth sensor was used which has the benefit that the 3D pose of the object can be estimated. This has applications in e.g. automatic manufacturing, where a robot picks up parts or tools with a robot arm.
This master thesis presents an implementation and an evaluation of a system for object recognition of 3D models in depth sensor data. The system uses several depth images rendered from a 3D model and describes their characteristics using so-called feature descriptors. These are then matched with the descriptors of a scene depth image to find the 3D pose of the model in the scene. The pose estimate is then refined iteratively using a registration method. Different descriptors and registration methods are investigated.
One of the main contributions of this thesis is that it compares two different types of descriptors, local and global, which has seen little attention in research. This is done for two different scene scenarios, and for different types of objects and depth sensors. The evaluation shows that global descriptors are fast and robust for objects with a smooth visible surface whereas the local descriptors perform better for larger objects in clutter and occlusion. This thesis also presents a novel global descriptor, the CESF, which is observed to be more robust than other global descriptors. As for the registration methods, the ICP is shown to perform most accurately and ICP point-to-plane more robust.
@mastersthesis{diva2:972438,
author = {Grankvist, Ola},
title = {{Recognition and Registration of 3D Models in Depth Sensor Data}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/4993--SE}},
year = {2016},
address = {Sweden},
}
Cardiovascular diseases are among the most common causes of death worldwide. One of the recently developed flow analysis technique called 4D flow magnetic resonance imaging (MRI) allows an early detection of such diseases. Due to the limited resolution and contrast between blood pool and myocardium of 4D flow images, cine MR images are often used for cardiac segmentation. The delineated structures are then transferred to the 4D Flow images for cardiovascular flow analysis. Cine MR images are however acquired with multiple breath-holds, which can be challenging for some people, especially, when a cardiovascular disease is present. Consequently, unexpected breathing motion by a patient may lead to misalignments between the acquired cine MR images.
The goal of the thesis is to test the feasibility of an automatic image registration method to correct the misalignment caused by respiratory motion in morphological 2D cine MR images by using the 4D Flow MR as the reference image. As a registration method relies on a set of optimal parameters to provide desired results, a comprehensive investigation was performed to find such parameters. Different combinations of registration parameters settings were applied on 20 datasets from both healthy volunteers and patients. The best combinations, selected on the basis of normalized cross-correlation, were evaluated using the clinical gold-standard by employing widely used geometric measures of spatial correspondence. The accuracy of the best parameters from geometric evaluation was finally validated by using simulated misalignments.
Using a registration method consisting of only translation improved the results for both datasets from healthy volunteers and patients and the simulated misalignment data. For the datasets from healthy volunteers and patients, the registration improved the results from 0.7074 ± 0.1644 to 0.7551 ± 0.0737 in Dice index and from 1.8818 ± 0.9269 to 1.5953 ± 0.5192 for point-to-curve error. These values are a mean value for all the 20 datasets.
The results from geometric evaluation on the data from both healthy volunteers and patients show that the developed correction method is able to improve the alignment of the cine MR images. This allows a reliable segmentation of 4D flow MR images for cardiac flow assessment.
@mastersthesis{diva2:972664,
author = {Härd, Victoria},
title = {{Automatic Alignment of 2D Cine Morphological Images Using 4D Flow MRI Data}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/4992--SE}},
year = {2016},
address = {Sweden},
}
This master-thesis presents an approach to track and count the number of fruit incommercial mango orchards. The algorithm is intended to enable precision agri-culture and to facilitate labour and post-harvest storage planning. The primary objective is to develop an multi-view algorithm and investigate how it can beused to mitigate the effects of visual occlusion, to improve upon estimates frommethods that use a single central or two opposite viewpoints. Fruit are detectedin images by using two classification methods: dense pixel-wise cnn and regionbased r-cnn detection. Pair-wise fruit correspondences are established between images by using geometry provided by navigation data, and lidar data is used to generate image masks for each separate tree, to isolate fruit counts to individual trees. The tracked fruit are triangulated to locate them in 3D space, and spatial statistics are calculated over whole orchard blocks. The estimated tree counts are compared to single view estimates and validated against ground truth data of 16 mango trees from a Bundaberg mango orchard in Queensland, Australia. The results show a high R2-value of 0.99335 for four hand labelled trees and a highest R2-value of 0.9165 for the machine labelled images using the r-cnn classifier forthe 16 target trees.
@mastersthesis{diva2:1045302,
author = {Stein, Madeleine},
title = {{Improving Image Based Fruitcount Estimates Using Multiple View-Points}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/5003--SE}},
year = {2016},
address = {Sweden},
}
Since most people now have a high-performing computing device with an attached camera in their pocket, in the form of a smartphone, robotics and computer vision researchers are thrilled about the possibility this creates. Such devices have previously been used in robotics to create 3D maps of environments and objects by feeding the camera data to a 3D reconstruction algorithm.
The big downside with smartphones is that their cameras use a different sensor than what is usually used in robotics, namely a rolling shutter camera.These cameras are cheaper to produce but are not as well suited for general 3D reconstruction algorithms as the global shutter cameras typically used in robotics research. One recent, accurate and performance effective 3D reconstruction method which could be used on a mobile device, if tweaked, is LSD-SLAM.
This thesis uses the LSD-SLAM method developed for global shutter cameras and incorporates additional methods developed allow the usage of rolling shutter data.The developed method is evaluated by calculating numbers of failed 3D reconstructions before a successful one is obtained when using rolling shutter data.The result is a method which improves this metric with about 70\% compared to the unedited LSD-SLAM method.
@mastersthesis{diva2:1058367,
author = {Tallund, Lukas},
title = {{Handling of Rolling Shutter Effects in Monocular Semi-Dense SLAM Algorithms}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/5016--SE}},
year = {2016},
address = {Sweden},
}
This thesis presents a way to generate a Digital Terrain Model (dtm) from a Digital Surface Model (dsm) and multi spectral images (including the Near Infrared (nir) color band). An Artificial Neural Network (ann) is used to pre-classify the dsm and multi spectral images. This in turn is used to filter the dsm to a dtm. The use of an ann as a classifier provided good results. Additionally, the addition of the nir color band resulted in an improvement of the accuracy of the classifier. Using the classifier, a dtm was easily extracted without removing natural edges or height variations in the forests and cities. These challenges are handled with great satisfaction as compared to earlier methods.
@mastersthesis{diva2:1058430,
author = {Tapper, Gustav},
title = {{Extraction of DTM from Satellite Images Using Neural Networks}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/5017--SE}},
year = {2016},
address = {Sweden},
}
The usage of 3D modelling is increasing fast, both for civilian and military areas, such as navigation, targeting and urban planning. When creating a 3D model from satellite images, clouds canbe problematic. Thus, automatic detection ofclouds inthe imagesis ofgreat use. This master thesis was carried out at Vricon, who produces 3D models of the earth from satellite images.This thesis aimed to investigate if Support Vector Machines could classify pixels into cloud or non-cloud, with a combination of texture and color as features. To solve the stated goal, the task was divided into several subproblems, where the first part was to extract features from the images. Then the images were preprocessed before fed to the classifier. After that, the classifier was trained, and finally evaluated.The two methods that gave the best results in this thesis had approximately 95 % correctly classified pixels. This result is better than the existing cloud segmentation method at Vricon, for the tested terrain and cloud types.
@mastersthesis{diva2:932606,
author = {Gasslander, Maja},
title = {{Segmentation of Clouds in Satellite Images}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/4945--SE}},
year = {2016},
address = {Sweden},
}
Generic visual tracking is a challenging computer vision problem, where the position of a specified target is estimated through a sequence of frames. The only given information is the initial location of the target. Therefore, the tracker has to adapt and learn any kind of object, which it describes through visual features used to differentiate target from background. Standard appearance features only capture momentary visual information. This master’s thesis investigates the use of deep features extracted through optical flow images processed in a deep convolutional network. The optical flow is calculated using two consecutive images, and thereby captures the dynamic nature of the scene. Results show that this information is complementary to the standard appearance features, and improves performance of the tracker. Deep features are typically very high dimensional. Employing dimensionality reduction can increase both the efficiency and performance of the tracker. As a second aim in this thesis, PCA and PLS were evaluated and compared. The evaluations show that the two methods are almost equal in performance, with PLS actually receiving slightly better score than the popular PCA. The final proposed tracker was evaluated on three challenging datasets, and was shown to outperform other state-of-the-art trackers.
@mastersthesis{diva2:1071737,
author = {Gladh, Susanna},
title = {{Visual Tracking Using Deep Motion Features}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/5005--SE}},
year = {2016},
address = {Sweden},
}
Automatisk travmätning är ett mätsystem som mäter vedvolymen på virkeslastbilar. Systemet består av sex stycken sensor-system. Varje sensor kalibreras först individuellt och sedan ihop för att ge ett sammanfogat världskoordinat system. Varje sensor genererar en djupbild och en reflektansbild, där värdena i djupbilden representerar avståndet från kameran. Uppdragsgivaren har utvecklat en algoritm som utifrån mätdatat(bilderna) uppskattar vedvolymen till en viss noggrannhet som uppfyller kraven ställda av skogsindustrin för automatisk mätning av travar på virkeslastbil. I den här rapporten undersöks om bättre mätresultat kan uppnås exempelvis med andra metoder eller kombinationer av dem.Till förfogande finns ca 125 dataset av travar där facit finns. Facit består av manuella stickprovsmätningar där varje enskild stock mätts för sig. Initialt valdes aktivt att inte sätta sig in i uppdragsgivarens algoritm för att inte bli färgad av hur de kommit fram till sina resultat. Främst används fram- och baksidebilderna av entrave för att hitta stockarna. Därefter interpoleras de funna stockarna in till mitten av traven eller så paras stockarna ihop från de båda sidorna. Ibland finns vissa problem med bilderna. Oftast är minst en av sidorna ockluderade av lastbilshytten, kranen eller en annan trave. Då gäller det att hitta uppskattning utifrån det data man ser för fylla upp de skymda områdena.I början av examensarbetet användes två metoder(MSER och Punktplanmetoden) för undersöka om man kunde uppnå bra resultat utifrån att enbart mäta datat och användadet som initial gissning till volymen. Dock upptäcktes det att värdefulla detaljer i dataseten missades för att mer noggrant bestämma vedvolymen. Exempel på sådan data är fördelningen av diametern på de funna stockändarna. Tillika tenderades kraftig överestimering när travarna innehöll en viss mängd ris och eller dåligt kvistade stockar. Därefter konstruerades en geometrisk metod, och det var den här metoden som det lades mest tid på.I figurerna nedan visas en tabell och en graf där alla tre metoders resultat under bark(UB) visas och intervall gränserna för att uppfylla kraven ställda av skogsindustrin.
@mastersthesis{diva2:968712,
author = {Lindberg, Pontus},
title = {{Automatisk volymmätning av virkestravar på lastbil}},
school = {Linköping University},
type = {{LiTH-ISY-EX--16/4955--SE}},
year = {2016},
address = {Sweden},
}
The poaching of rhinoceros has increased dramatically the last few years andthe park rangers are often helpless against the militarised poachers. LinköpingUniversity is running several projects with the goal to aid the park rangers intheir work.This master thesis was produced at CybAero AB, which builds Remotely PilotedAircraft System (RPAS). With their helicopters, high end cameras with a rangesufficient to cover the whole area can be flown over the parks.The aim of this thesis is to investigate different methods to automatically findrhinos and humans, using airborne cameras. The system uses two cameras, onecolour camera and one thermal camera. The latter is used to find interestingobjects which are then extracted in the colour image. The object is then classifiedas either rhino, human or other. Several methods for classification have beenevaluated.The results show that classifying solely on the thermal image gives nearly as highaccuracy as classifying only in combination with the colour image. This enablesthe system to be used in dusk and dawn or in bad light conditions. This is animportant factor since most poaching occurs at dusk or dawn. As a conclusion asystem capable of running on low performance hardware and placeable on boardthe aircraft is presented.
@mastersthesis{diva2:843745,
author = {Karlsson Schmidt, Carl},
title = {{Rhino and Human Detection in Overlapping RGB and LWIR Images}},
school = {Linköping University},
type = {{LiTH-ISY-EX--15/4837--SE}},
year = {2015},
address = {Sweden},
}
The Kinect v2 is a RGB-D sensor manufactured as a gesture interaction tool for the entertainment console XBOX One. In this thesis we will use it to perform 3D reconstruction and investigate its ability to measure depth. In order to sense both color and depth the Kinect v2 has two cameras: one RGB camera and one infrared camera used to produce depth and near infrared images. These cameras need to be calibrated if we want to use them for 3D reconstruction. We present a calibration procedure for simultaneously calibrating the cameras and extracting their relative pose. This enables us to construct colored meshes of the environment. When we know the camera parameters of the infrared camera, the depth images could be used to perform the Kinectfusion algorithm. This produces well-formed meshes of the environment by combining many depth frames taken from several camera poses.The Kinect v2 uses a time-of-flight technology were the phase shifts are extracted from amplitude modulated infrared light signals produced by an emitter. The extracted phase shifts are then converted to depth values. However, the extraction of phase shifts includes a phase unwrapping procedure, which is sensitive to noise and can result in large depth errors.By utilizing the ability to access the raw phase measurements from the device we managed to modify the phase unwrapping procedure. This new procedure includes an extraction of several hypotheses for the unwrapped phase and a spatial propagation to select amongst them. This proposed method has been compared with the available drivers in the open source library libfreenect2 and the Microsoft Kinect SDK v2. Our experiments show that the depth images of the two available drivers have similar quality and our proposed method improves over libfreenect2. The calculations in the proposed method are more expensive than those in libfreenect2 but it still runs at 2.5× real time. However, contrary to libfreenect2 the proposed method lacks a filter that removes outliers from the depth images. It turned out that this is an important feature when performing Kinect fusion and future work should thus be focused on adding an outlier filter.
@mastersthesis{diva2:854680,
author = {Järemo Lawin, Felix},
title = {{Depth Data Processing and 3D Reconstruction Using the Kinect v2}},
school = {Linköping University},
type = {{LiTH-ISY-EX--15/4884--SE}},
year = {2015},
address = {Sweden},
}
In a synchronized multi camera system it is imperative that the synchronization error between the different cameras is as close to zero as possible and the jitter of the presumed frame rate is as small as possible. It is even more important when these systems are used in an autonomous vehicle trying to sense its surroundings. We would never hand over the control to a autonomous vehicle if we couldn't trust the data it is using for moving around.
The purpose of this thesis was to build a synchronization setup for a multi camera system using state of the art RayTrix digital cameras that will be used in the iQMatic project involving autonomous heavy duty vehicles. The iQMatic project is a collaboration between several Swedish industrial partners and universities. There was also software development for the multi camera system involved. Different synchronization techniques were implemented and then analysed against the system requirements. The two techniques were hardware trigger i.e. external trigger using a microcontroller, and software trigger using the API from the digital cameras.
Experiments were conducted by testing the different trigger modes with the developed multi camera software. The conclusions show that the hardware trigger is preferable in this particular system by showing more stability and better statistics against the system requirements than the software trigger. But the thesis also show that additional experiments are needed for a more accurate analysis.
@mastersthesis{diva2:822340,
author = {Vibeck, Alexander},
title = {{Synchronization of a Multi Camera System}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--15/0438--SE}},
year = {2015},
address = {Sweden},
}
Autonomous Driving or self driving vehicles are concepts of vehicles knowing their environment and making driving manoeuvres without instructions from a driver. The concepts have been around for decades but has improved significantly in the last years since research in this area has made significant progress. Benefits of autonomous driving include the possibility to decrease the number of accidents in traffic and thereby saving lives.
A major challenge in autonomous driving is to acquire 3D information and relations between all objects in surrounding traffic. This is referred to as \textit{spatial perception}. Stereo camera systems have become a central sensor module for advanced driver assistance systems and autonomous driving. For object detection and measurements at large distances stereo vision encounter difficulties. This includes objects being small, having low contrast and the presence of image noise. Having an accurate perception of the environment at large distances is however of high interest for many applications, especially autonomous driving.
This thesis proposes a method which tries to increase the range to where generic objects are first detected using a given stereo camera setup. Objects are represented by planes in 3D space. The input image is segmented into the various objects and the 3D plane parameters are estimated jointly. The 3D plane parameters are estimated directly from the stereo image pairs. In particular, this thesis investigates methods to introduce geometric constraints to the segmentation or labeling task, i.e assigning each considered pixel in the image to a plane.
The methods provided in this thesis show that despite the difficulties at large distances it is possible to exploit planar primitives in 3D space for obstacle detection at distances where other methods fail.
@mastersthesis{diva2:778457,
author = {Hillgren, Patrik},
title = {{Geometric Scene Labeling for Long-Range Obstacle Detection}},
school = {Linköping University},
type = {{LiTH-ISY-EX--14/4819--SE}},
year = {2015},
address = {Sweden},
}
In a time when cattle herds grow continually larger the need for automatic methods to detect diseases is ever increasing. One possible method to discover diseases is to use thermal images and automatic head and eye detectors. In this thesis an eye detector and a head detector is implemented using the Random Forests classifier. During the implementation the classifier is evaluated using three different descriptors: Histogram of Oriented Gradients, Local Binary Patterns, and a descriptor based on pixel differences. An alternative classifier, the Support Vector Machine, is also evaluated for comparison against Random Forests.
The thesis results show that Histogram of Oriented Gradients performs well as a description of cattle heads, while Local Binary Patterns performs well as a description of cattle eyes. The provided descriptor performs almost equally well in both cases. The results also show that Random Forests performs approximately as good as the Support Vector Machine, when the Support Vector Machine is paired with Local Binary Patterns for both heads and eyes.
Finally the thesis results indicate that it is easier to detect and locate cattle heads than it is to detect and locate cattle eyes. For eyes, combining a head detector and an eye detector is shown to give a better result than only using an eye detector. In this combination heads are first detected in images, followed by using the eye detector in areas classified as heads.
@mastersthesis{diva2:856339,
author = {Sandsveden, Daniel},
title = {{Evaluation of Random Forests for Detection and Localization of Cattle Eyes}},
school = {Linköping University},
type = {{LiTH-ISY-EX--15/4885--SE}},
year = {2015},
address = {Sweden},
}
In the steel industry, laser triangulation based measurement systems can be utilizedfor evaluating the flatness of the steel products. Shapeline is a company in Linköpingthat manufactures such measurement systems. This thesis work will present a series ofexperiments on a Shapeline measurement system in a relatively untested environment, thehot rolling mill at SSAB in Borlänge.The purpose of this work is to evaluate how the conditions at a hot rolling mill affectsthe measurement performance. It has been anticipated that measuring in high temperatureenvironment would introduce difficulties that do not exist when measuring in cold environments.A number of different experiments were conducted, where equipment such as laserand camera bandpass filter were alternated. Via the experiments, information about noisedue to the environment in the hot rolling mill was gained. The most significant noise wascaused by heat shimmering. Using the presented methods, the magnitude and frequencyspectrum of the heat shimmering noise could be determined. The results also indicates thatheat shimmering cause large errors and is quite troublesome to counter. In addition to this,the quality of the line detections under the hot rolling mill circumstances was examined. Itcould be observed that the line detections did not introduce any significant errors despitethe harmful conditions.
@mastersthesis{diva2:857691,
author = {Larsson, Oliver},
title = {{Evaluation of Flatness Gauge for Hot Rolling Mills}},
school = {Linköping University},
type = {{LiTH-ISY-EX--15/4894--SE}},
year = {2015},
address = {Sweden},
}
The car has increasingly become more and more intelligent throughout the years. Today's radar and vision based safety systems can warn a driver and brake the vehicle automatically if obstacles are detected. Research projects such as the Google Car have even succeeded in creating fully autonomous cars.
The demands to obtain the highest rating in safety tests such as Euro NCAP are also steadily increasing, and as a result, the development of these systems have become more attractive for car manufacturers. In the near future, a car must have a system for detecting, and performing automatic braking for pedestrians to receive the highest safety rating of five stars. The prospect is that the volume of active safety system will increase drastically when the car manufacturers start installing them in not only luxury cars, but also in the regularly priced ones. The use of automatic braking comes with a high demand on the performance of active safety systems, false positives must be avoided at all costs.
Dollar et al. [2014] introduced Aggregated Channel Features (ACF) which is based on a 10-channel LUV+HOG feature map. The method uses decision trees learned from boosting and has been shown to outperform previous algorithms in object detection tasks. The rediscovery of neural networks, and especially Convolutional Neural Networks (CNN) has increased the performance in almost every field of machine learning, including pedestrian detection. Recently Yang et al.[2015] combined the two approaches by using the the feature maps from a CNN as input to a decision tree based boosting framework. This resulted in state of the art performance on the challenging Caltech pedestrian data set.
This thesis presents an approach to improve the performance of a cascade of boosted classifiers by investigating the impact of using color information for pedestrian detection. The color self similarity feature introduced by Walk et al.[2010] was used to create a version better adapted for boosting. This feature is then used in combination with a gradient based feature at the last step of a cascade.
The presented feature increases the performance compared to currently used classifiers at Autoliv, on data recorded by Autoliv and on the benchmark Caltech pedestrian data set.
@mastersthesis{diva2:867888,
author = {Hansson, Niklas},
title = {{Color Features for Boosted Pedestrian Detection}},
school = {Linköping University},
type = {{LiTH-ISY-EX--15/4899--SE}},
year = {2015},
address = {Sweden},
}
In the field of industrial automation large savings can be realized if position andorientation of an object is known. Knowledge about an objects position and orien-tation can be used by advanced robotic systems to be able to work with complexitems. Specifically 2D-objects are a big enough sub domain to motivate specialattention. Traditionally this problem has been solved with large mechanical sys-tems that forces the objects into specific configurations. Besides being expensive,taking up a lot of space and having great difficulty handling fragile items, thesemechanical systems have to be constructed for each particular type of object. Thisthesis explores the possibility of using registration algorithms from computer vi-sion based on 3D-data to find flat objects. While systems for locating 3D objectsalready exists they have issues with locating essentially flat objects since theirpositioning is mostly a function of their contour. The thesis consists of a briefexamination of 2D-algorithms and their extension to 3D as well as results fromthe most suitable algorithm.
@mastersthesis{diva2:821158,
author = {Ingberg, Benjamin},
title = {{Registration of 2D Objects in 3D data}},
school = {Linköping University},
type = {{LiTH-ISY-EX--15/4848--SE}},
year = {2015},
address = {Sweden},
}
In this thesis a system for creating panoramic video has been developed. The panoramic video is formed by stitching several camera streams together. The system is designed as a vehicle mounted system, but can be applied to several other areas, such as surveillance. The system creates the video by finding features that correspond in the overlapping frames. By using cylinder projection the problem is reduced to finding a translation between the images and using algorithms such as ORB matching features can be detected and described. The camera frames are stitched together by calculating the average translation of the matching features. To reduce artifacts such as ghosting, a simple but effective alpha blending technique has been used. The system has been implemented using C++ and the OpenCV library and the algorithm is capable of processing about 15 frames per second making it close to real-time. With future improvements, such as parallel processing of the cameras, the system may be speeded up even further and possibly include other types of image processing, e.g. object recognition and tracking.
@mastersthesis{diva2:822602,
author = {Rydholm, Niklas},
title = {{Panoramic Video Stitching}},
school = {Linköping University},
type = {{LiTH-ISY-EX--15/4858--SE}},
year = {2015},
address = {Sweden},
}
Anomaly detection is a general theory of detecting unusual patterns or events in data. This master thesis investigates the subject of anomaly detection in two different applications. The first application is product inspection using a camera and the second application is surveillance using a 2D laser scanner.
The first part of the thesis presents a system for automatic visual defect inspection. The system is based on aligning the images of the product to a common template and doing pixel-wise comparisons. The system is trained using only images of products that are defined as normal, i.e. non-defective products. The visual properties of the inspected products are modelled using three different methods. The performance of the system and the different methods have been evaluated on four different datasets.
The second part of the thesis presents a surveillance system based on a single laser range scanner. The system is able to detect certain anomalous events based on the time, position and velocities of individual objects in the scene. The practical usefulness of the system is made plausible by a qualitative evaluation using unlabelled data.
@mastersthesis{diva2:855502,
author = {Thulin, Peter},
title = {{Anomaly Detection for Product Inspection and Surveillance Applications}},
school = {Linköping University},
type = {{LiTH-ISY-EX--15/4889--SE}},
year = {2015},
address = {Sweden},
}
Integrated camera systems for increasing safety and maneuverability are becoming increasingly common for heavy vehicles. One problem with heavy vehicles today is that there are blind spots where the driver has no or very little view. There is a great demand on increasing the safety and helping the driver to get a better view of his surroundings. This can be achieved by a sophisticated camera system, using cameras with wide field of view, that could cover dangerous blind spots.
This master thesis aims to investigate and develop a prototype solution for a camera system consisting of two fisheye cameras. The solution covers both hardware choices and software development including camera calibration and image stitching. Two different fisheye camera calibration toolboxes are compared and their results discussed, with the aim to find the most suitable for this application. The result from the two toolboxes differ in performance, and the result from only one of the toolboxes is sufficient for image stitching.
@mastersthesis{diva2:854521,
author = {Söderroos, Anna},
title = {{Fisheye Camera Calibration and Image Stitching for Automotive Applications}},
school = {Linköping University},
type = {{LiTH-ISY-EX--15/4887--SE}},
year = {2015},
address = {Sweden},
}
Generic visual tracking is one of the classical problems in computer vision. In this problem, no prior knowledge of the target is available aside from a bounding box in the initial frame of the sequence. The generic visual tracking is a difficult task due to a number of factors such as momentary occlusions, target rotations, changes in target illumination and variations in the target size. In recent years, discriminative correlation filter (DCF) based trackers have shown promising results for visual tracking. These DCF based methods use the Fourier transform to efficiently calculate detection and model updates, allowing significantly higher frame rates than competing methods. However, existing DCF based methods only estimate translation of the object while ignoring changes in size.This thesis investigates the problem of accurately estimating the scale variations within a DCF based framework. A novel scale estimation method is proposed by explicitly constructing translation and scale filters. The proposed scale estimation technique is robust and significantly improve the tracking performance, while operating at real-time. In addition, a comprehensive evaluation of feature representations in a DCF framework is performed. Experiments are performed on the benchmark OTB-2015 dataset, as well as the VOT 2014 dataset. The proposed methods are shown to significantly improve the performance of existing DCF based trackers.
@mastersthesis{diva2:910736,
author = {Häger, Gustav},
title = {{Improving Discriminative Correlation Filters for Visual Tracking}},
school = {Linköping University},
type = {{LiTH-ISY-EX-15/4919--SE}},
year = {2015},
address = {Sweden},
}
Pedestrian detection is an important field with applications in active safety systems for cars as well as autonomous driving. Since autonomous driving and active safety are becoming technically feasible now the interest for these applications has dramatically increased.The aim of this thesis is to investigate convolutional neural networks (CNN) for pedestrian detection. The reason for this is that CNN have recently beensuccessfully applied to several different computer vision problems. The main applications of pedestrian detection are in real time systems. For this reason,this thesis investigates strategies for reducing the computational complexity offorward propagation for CNN.The approach used in this thesis for extracting pedestrians is to use a CNN tofind a probability map of where pedestrians are located. From this probabilitymap bounding boxes for pedestrians are generated. A method for handling scale invariance for the objects of interest has also been developed in this thesis. Experiments show that using this method givessignificantly better results for the problem of pedestrian detection.The accuracy which this thesis has managed to achieve is similar to the accuracy for some other works which use CNN.
@mastersthesis{diva2:839692,
author = {Molin, David},
title = {{Pedestrian Detection Using Convolutional Neural Networks}},
school = {Linköping University},
type = {{LiTH-ISY-EX--15/4855--SE}},
year = {2015},
address = {Sweden},
}
Machine learning can be utilized in many different ways in the field of automatic manufacturing and logistics. In this thesis supervised machine learning have been utilized to train a classifiers for detection and recognition of objects in images. The techniques AdaBoost and Random forest have been examined, both are based on decision trees.
The thesis has considered two applications: barcode detection and optical character recognition (OCR). Supervised machine learning methods are highly appropriate in both applications since both barcodes and printed characters generally are rather distinguishable.
The first part of this thesis examines the use of machine learning for barcode detection in images, both traditional 1D-barcodes and the more recent Maxi-codes, which is a type of two-dimensional barcode. In this part the focus has been to train classifiers with the technique AdaBoost. The Maxi-code detection is mainly done with Local binary pattern features. For detection of 1D-codes, features are calculated from the structure tensor. The classifiers have been evaluated with around 200 real test images, containing barcodes, and shows promising results.
The second part of the thesis involves optical character recognition. The focus in this part has been to train a Random forest classifier by using the technique point pair features. The performance has also been compared with the more proven and widely used Haar-features. Although, the result shows that Haar-features are superior in terms of accuracy. Nevertheless the conclusion is that point pairs can be utilized as features for Random forest in OCR.
@mastersthesis{diva2:822575,
author = {Fridolfsson, Olle},
title = {{Machine Learning:
for Barcode Detection and OCR}},
school = {Linköping University},
type = {{LiTH-ISY-Ex--15/4842--SE}},
year = {2015},
address = {Sweden},
}
A classic Computer Vision task is the estimation of a 3D map from a collection of images. This thesis explores the online simultaneous estimation of camera poses and map points, often called Visual Simultaneous Localisation and Mapping [VSLAM]. In the near future the use of visual information by autonomous cars is likely, since driving is a vision dominated process. For example, VSLAM could be used to estimate the position of the car in relation to objects of interest, such as the road, other cars and pedestrians. Aimed at the creation of a real-time, robust, loop closing, single camera SLAM system, the properties of several state-of-the-art VSLAM systems and related techniques are studied. The system goals cover several important, if difficult, problems, which makes a solution widely applicable. This thesis makes two contributions: A rigorous qualitative analysis of VSLAM methods and a system designed accordingly. A novel tracking by matching scheme is proposed, which, unlike the trackers used by many similar systems, is able to deal better with forward camera motion. The system estimates general motion with loop closure in real time. The system is compared to a state-of-the-art monocular VSLAM algorithm and found to be similar in speed and performance.
@mastersthesis{diva2:771912,
author = {Persson, Mikael},
title = {{Online Monocular SLAM:
Rittums}},
school = {Linköping University},
type = {{Lith-ISY-EX--13/4741-SE}},
year = {2014},
address = {Sweden},
}
The interest of using GPU:s as general processing units for heavy computations (GPGPU) has increased in the last couple of years. Manufacturers such as Nvidia and AMD make GPU:s powerful enough to outrun CPU:s in one order of magnitude, for suitable algorithms. For embedded systems, GPU:s are not as popular yet. The embedded GPU:s available on the market have often not been able to justify hardware changes from the current systems (CPU:s and FPGA:s) to systems using embedded GPU:s. They have been too hard to get, too energy consuming and not suitable for some algorithms. At SICK IVP, advanced computer vision algorithms run on FPGA:s. This master thesis optimizes two such algorithms for embedded GPU:s and evaluates the result. It also evaluates the status of the embedded GPU:s on the market today. The results indicates that embedded GPU:s perform well enough to run the evaluatedd algorithms as fast as needed. The implementations are also easy to understand compared to implementations for FPGA:s which are competing hardware.
@mastersthesis{diva2:768419,
author = {Nilsson, Mattias},
title = {{Evaluation of Computer Vision Algorithms Optimized for Embedded GPU:s.}},
school = {Linköping University},
type = {{LiTH-ISY-EX--14/4816--SE}},
year = {2014},
address = {Sweden},
}
The usage of 3D-modeling is expanding rapidly. Modeling from aerial imagery has become very popular due to its increasing number of both civilian and mili- tary applications like urban planning, navigation and target acquisition.
This master thesis project was carried out at Vricon Systems at SAAB. The Vricon system produces high resolution geospatial 3D data based on aerial imagery from manned aircrafts, unmanned aerial vehicles (UAV) and satellites.
The aim of this work was to investigate to what degree superpixel segmentation and supervised learning can be applied to a terrain classification problem using imagery and digital surface models (dsm). The aim was also to investigate how the height information from the digital surface model may contribute compared to the information from the grayscale values. The goal was to identify buildings, trees and ground. Another task was to evaluate existing methods, and compare results.
The approach for solving the stated goal was divided into several parts. The first part was to segment the image using superpixel segmentation, after that features were extracted. Then the classifiers were created and trained and finally the classifiers were evaluated.
The classification method that obtained the best results in this thesis had approx- imately 90 % correctly labeled superpixels. The result was equal, if not better, compared to other solutions available on the market.
@mastersthesis{diva2:767120,
author = {Ringqvist, Sanna},
title = {{Classification of terrain using superpixel segmentation and supervised learning}},
school = {Linköping University},
type = {{LiTH-ISY-EX--14/4752--SE}},
year = {2014},
address = {Sweden},
}
High resolution 3D images are of high interest in military operations, where data can be used to classify and identify targets. The Swedish defence research agency (FOI) is interested in the latest research and technologies in this area. A draw- back with normal 3D-laser systems are the lack of high resolution for long range measurements. One technique for high long range resolution laser radar is based on time correlated single photon counting (TCSPC). By repetitively sending out short laser pulses and measure the time of flight (TOF) of single reflected pho- tons, extremely accurate range measurements can be done. A drawback with this method is that it is hard to create single photon detectors with many pixels and high temporal resolution, hence a single detector is used. Scanning an entire scene with one detector is very time consuming and instead, as this thesis is all about, the entire scene can be measured with less measurements than the number of pixels. To do this a technique called compressed sensing (CS) is introduced. CS utilizes that signals normally are compressible and can be represented sparse in some basis representation. CS sets other requirements on the sampling compared to the normal Shannon-Nyquist sampling theorem. With a digital micromirror device (DMD) linear combinations of the scene can be reflected onto the single photon detector, creating scalar intensity values as measurements. This means that fewer DMD-patterns than the number of pixels can reconstruct the entire 3D-scene. In this thesis a computer model of the laser system helps to evaluate different CS reconstruction methods with different scenarios of the laser system and the scene. The results show how many measurements that are required to reconstruct scenes properly and how the DMD-patterns effect the results. CS proves to enable a great reduction, 85 − 95 %, of the required measurements com- pared to pixel-by-pixel scanning system. Total variation minimization proves to be the best choice of reconstruction method.
@mastersthesis{diva2:722826,
author = {Fall, Erik},
title = {{Compressed Sensing for 3D Laser Radar}},
school = {Linköping University},
type = {{LiTH-ISY-EX---14/4767---SE}},
year = {2014},
address = {Sweden},
}
A lane position system and enhancement techniques, for increasing the robustnessand availability of such a system, are investigated. The enhancements areperformed by using additional sensor sources like map data and GPS. The thesiscontains a description of the system, two models of the system and two implementedfilters for the system. The thesis also contains conclusions and results oftheoretical and experimental tests of the increased robustness and availability ofthe system. The system can be integrated with an existing system that investigatesdriver behavior, developed for fatigue. That system was developed in aproject named Drowsi, where among others Volvo Technology participated.
@mastersthesis{diva2:749036,
author = {Landberg, Markus},
title = {{Enhancement Techniques for Lane PositionAdaptation (Estimation) using GPS- and Map Data}},
school = {Linköping University},
type = {{LiTH-ISY-EX--14/4788--SE}},
year = {2014},
address = {Sweden},
}
In recent years several depth cameras have emerged on the consumer market, creating many interesting possibilities forboth professional and recreational usage. One example of such a camera is the Microsoft Kinect sensor originally usedwith the Microsoft Xbox 360 game console. In this master thesis a system is presented that utilizes this device in order to create an as accurate as possible 3D reconstruction of an indoor environment. The major novelty of the presented system is the data structure based on signed distance fields and voxel octrees used to represent the observed environment.
@mastersthesis{diva2:716061,
author = {Bengtsson, Morgan},
title = {{Indoor 3D Mapping using Kinect}},
school = {Linköping University},
type = {{LiTH-ISY-EX--14/4753--SE}},
year = {2014},
address = {Sweden},
}
The Next-Best-View (NBV) problem plays an important part in automatic 3D object reconstruction and exploration applications. This thesis presents a novel approach of ray-casting in Occupancy Grid Maps (OGM) in the context of solving the NBV problem in a 3D-exploration setting. The proposed approach utilizes the structure of an octree-based OGM to perform calculations of potential information gain. The computations are significantly faster than current methods, without decreasing mapping quality. Performance, both in terms of mapping quality, coverage and computational complexity, is experimentally verified through a comparison with existing state-of-the-art methods using high-resolution point cloud data generated using time-of-flight laser range scanners.
Current methods for viewpoint ranking focus either heavily on mapping performance or computation speed. The results presented in this thesis indicate that the proposed method is able to achieve a mapping performance similar to the performance-oriented approaches while maintaining the same low computation speed as more approximative methods.
@mastersthesis{diva2:761834,
author = {Svensson, Martin},
title = {{Accelerated Volumetric Next-Best-View Planning in 3D Mapping}},
school = {Linköping University},
type = {{LiTH-ISY-EX--14/4801--SE}},
year = {2014},
address = {Sweden},
}
Many methods have been developed for visual tracking of generic objects. The vast majority of these assume the world is two-dimensional, either ignoring the third dimension or only dealing with it indirectly. This causes difficulties for the tracker when the target approaches or moves away from the camera, is occluded or moves out of the camera frame.
Unmanned aerial vehicles (UAVs) are increasingly used in civilian applications and some of these will undoubtedly carry tracking systems in the future. As they move around, these trackers will encounter both scale changes and occlusions. To improve the tracking performance in these cases, the third dimension should be taken into account.
This thesis extends the capabilities of a 2D tracker to three dimensions, with the assumption that the target moves on a ground plane.
The position of the tracker camera is established by matching the video it produces to a sparse point-cloud map built with off-the-shelf structure-from-motion software. A target is tracked with a generic 2D tracker and subsequently positioned on the ground. Should the target disappear from view, its motion on the ground is predicted. In combination, these simple techniques are shown to improve the robustness of a tracking system on a moving platform under target scale changes and occlusions.
@mastersthesis{diva2:761603,
author = {Robinson, Andreas},
title = {{Implementation and evaluation of a 3D tracker}},
school = {Linköping University},
type = {{LiTH-ISY-EX--14/4800--SE}},
year = {2014},
address = {Sweden},
}
In this thesis we study the problem of multi-session dense rgb-d slam for 3D reconstruc- tion. Multi-session reconstruction can allow users to capture parts of an object that could not easily be captured in one session, due for instance to poor accessibility or user mistakes. We first present a thorough overview of single-session dense rgb-d slam and describe the multi-session problem as a loosening of the incremental camera movement and static scene assumptions commonly held in the single-session case. We then implement and evaluate sev- eral variations on a system for doing two-session reconstruction as an extension to a single- session dense rgb-d slam system.
The extension from one to several sessions is divided into registering separate sessions into a single reference frame, re-optimizing the camera trajectories, and fusing together the data to generate a final 3D model. Registration is done by matching reconstructed models from the separate sessions using one of two adaptations on a 3D object detection pipeline. The registration pipelines are evaluated with many different sub-steps on a challenging dataset and it is found that robust registration can be achieved using the proposed methods on scenes without degenerate shape symmetry. In particular we find that using plane matches between two sessions as constraints for as much as possible of the registration pipeline improves results.
Several different strategies for re-optimizing camera trajectories using data from both ses- sions are implemented and evaluated. The re-optimization strategies are based on re- tracking the camera poses from all sessions together, and then optionally optimizing over the full problem as represented on a pose-graph. The camera tracking is done by incrementally building and tracking against a tsdf volume, from which a final 3D mesh model is extracted. The whole system is qualitatively evaluated against a realistic dataset for multi-session re- construction. It is concluded that the overall approach is successful in reconstructing objects from several sessions, but that other fine grained registration methods would be required in order to achieve multi-session reconstructions that are indistinguishable from singe-session results in terms of reconstruction quality.
@mastersthesis{diva2:772448,
author = {Widebäck West, Nikolaus},
title = {{Multiple Session 3D Reconstruction using RGB-D Cameras}},
school = {Linköping University},
type = {{LiTH-ISY-EX--14/4814--SE}},
year = {2014},
address = {Sweden},
}
Visual simultaneous localization and mapping (SLAM) as field has been researched for ten years, but with recent advances in mobile performance visual SLAM is entering the consumer market in a completely new way. A visual SLAM system will however be sensitive to non cautious use that may result in severe motion, occlusion or poor surroundings in terms of visual features that will cause the system to temporarily fail. The procedure of recovering from such a fail is called relocalization. Together with two similar problems localization, to find your position in an existing SLAM session, and loop closing, the online reparation and perfection of the map in an active SLAM session, these can be grouped as visual location recognition (VLR).
This thesis presents novel results by combining the scalability of FabMap and the precision of 13th Lab's tracking yielding high-precision VLR, +/- 10 cm, while maintaining above 99 % precision and 60 % recall for sessions containing thousands of images. Everything functional purely on a normal mobile phone.
The applications of VLR are many. Indoors, where GPS is not functioning, VLR can still provide positional information and navigate you through big complexes like airports and museums. Outdoors, VLR can improve the precision of GPS tenfold yielding a new level of navigational experience. Virtual and augmented reality applications are other areas that benefit from improved positioning and localization.
@mastersthesis{diva2:767444,
author = {Sjöholm, Alexander},
title = {{Closing the Loop:
Mobile Visual Location Recognition}},
school = {Linköping University},
type = {{LiTH-ISY-EX--14/4813--SE}},
year = {2014},
address = {Sweden},
}
Datorseende är ett snabbt växande, tvärvetenskapligt forskningsområde vars tillämpningar tar en allt mer framskjutande roll i dagens samhälle. Med ett ökat intresse för datorseende ökar också behovet av att kunna kontrollera kameror kopplade till datorseende system.
Vid Linköpings tekniska högskola, på avdelningen för datorseende, har ramverket EDSDK++ utvecklats för att fjärrstyra digitala kameror tillverkade av Canon Inc. Ramverket är mycket omfattande och innehåller en stor mängd funktioner och inställningsalternativ. Systemet är därför till stor del ännu relativt oprövat. Detta examensarbete syftar till att utveckla ett demonstratorsystem till EDSDK++ i form av ett enkelt active vision system, som med hjälp av ansiktsdetektion i realtid styr en kameratilt, samt en kamera monterad på tilten, till att följa, zooma in och fokusera på ett ansikte eller en grupp av ansikten. Ett krav var att programbiblioteket OpenCV skulle användas för ansiktsdetektionen och att EDSDK++ skulle användas för att kontrollera kameran. Dessutom skulle ett API för att kontrollera kameratilten utvecklas.
Under utvecklingsarbetet undersöktes bl.a. olika metoder för ansiktsdetektion. För att förbättra prestandan användes multipla ansiktsdetektorer, som med hjälp av multitrådning avsöker en bild parallellt från olika vinklar. Såväl experimentella som teoretiska ansatser gjordes för att bestämma de parametrar som behövdes för att kunna reglera kamera och kameratilt. Resultatet av arbetet blev en demonstrator, som uppfyllde samtliga krav.
@mastersthesis{diva2:722871,
author = {Karg\'{e}n, Rolf},
title = {{Utveckling av ett active vision system för demonstration av EDSDK++ i tillämpningar inom datorseende}},
school = {Linköping University},
type = {{LiTH-ISY-EX-ET--14/0419--SE}},
year = {2014},
address = {Sweden},
}
Recording a video sequence with a camera during movement often produces blurred results. This is mainly due to motion blur which is caused by rapid movement of objects in the scene or the camera during recording. By correcting for changes in the orientation of the camera, caused by e.g. uneven terrain, it is possible to minimize the motion blur and thus, produce a stabilized video.
In order to do this, data gathered from a gyroscope and the camera itself can be used to measure the orientation of the camera. The raw data needs to be processed, synchronized and filtered to produce a robust estimate of the orientation. This estimate can then be used as input to some automatic control system in order to correct for changes in the orientation
This thesis focuses on examining the possibility of such a stabilization. The actual stabilization is left for future work. An evaluation of the hardware as well as the implemented methods are done with emphasis on speed, which is crucial in real time computing.
@mastersthesis{diva2:656064,
author = {Gratorp, Eric},
title = {{Evaluation of online hardware video stabilization on a moving platform}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4723--SE}},
year = {2013},
address = {Sweden},
}
Modern day cars are often equipped with a vision system that collects informa- tion about the car and its surroundings. Camera calibration is extremely impor- tant in order to maintain high accuracy in an automotive safety applications. The cameras are calibrated offline in the factory, however the mounting of the camera may change slowly over time. If the angles of the actual mounting of the cam- era are known compensation for the angles can be done in software. Therefore, online calibration is desirable.
This master’s thesis describes how to dynamically calibrate the roll angle. Two different methods have been implemented and compared.The first detects verti- cal edges in the image, such as houses and lamp posts. The second one method detects license plates on other cars in front of the camera in order to calculate the roll angle.
The two methods are evaluated and the results are discussed. The results of the methods are very varied, and the method that turned out to give the best results was the one that detects vertical edges.
@mastersthesis{diva2:630415,
author = {de Laval, Astrid},
title = {{Online Calibration of Camera Roll Angle}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4688--SE}},
year = {2013},
address = {Sweden},
}
A laser triangulating camera system projects a laser line onto an object to create height curveson the object surface. By moving the object, height curves from different parts of the objectcan be observed and combined to produce a three dimensional representation of the object.The calibration of such a camera system involves transforming received data to get real worldmeasurements instead of pixel based measurements.
The calibration method presented in this thesis focuses specifically on small fields ofview. The goal is to provide an easy to use and robust calibration method that can complementalready existing calibration methods. The tool should get as good measurementsin metric units as possible, while still keeping complexity and production costs of the calibrationobject low. The implementation uses only data from the laser plane itself making itusable also in environments where no external light exist.
The proposed implementation utilises a complete scan of a three dimensional calibrationobject and returns a calibration for three dimensions. The results of the calibration havebeen evaluated against synthetic and real data.
@mastersthesis{diva2:630377,
author = {Rydström, Daniel},
title = {{Calibration of Laser Triangulating Cameras in Small Fields of View}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4669--SE}},
year = {2013},
address = {Sweden},
}
In certain industries, quality testing is crucial, to make sure that the components being manufactured do not contain any defects. One method to detect these defects is to heat the specimen being inspected and then to study the cooling process using infrared thermography. The explorations of non-destructive testing using thermography is at an early stage and therefore the purpose of this thesis is to analyse some of the existing techniques and to propose improvements.
A test specimen containing several different defects was designed specifically for this thesis. A flash lamp was used to heat the specimen and a high-speed infrared camera was used to study both the spatial and temporal features of the cooling process. An algorithm was implemented to detect anomalies and different parameter settings were evaluated. The results show that the proposed method is successful at finding the searched for defects, and also outperforms one of the old methods.
@mastersthesis{diva2:610166,
author = {Höglund, Kristofer},
title = {{Non-destructive Testing Using Thermographic Image Processing}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4655--SE}},
year = {2013},
address = {Sweden},
}
Examensarbetet har gått ut på att undersöka om det är möjligt att jämföra foto- grafier på havsbotten tagna med en kamera monterad på SAAB Dynamics farkost AUV-62, här kallad Sapphires, med SONAR-bilder tagna från samma farkost men vid ett annat tillfälle. Föremål avbildade med kamera och sidriktade SONAR:er delar i normalfallet inte visuellt utseende och är därför svåra att jämföra. Meto- den som valts för att jämföra kamera- och SONAR-bilderna grundar sig av denna anledning inte på föremåls individuella utseede utan på mönster skapade av flera föremål. Föremål i bilderna identifieras som objekt, vilka beskrivs med en posi- tion i long. lat. och radie. I kamerabilderna identifieras objekt genom att segmen- tera bilderna med hjälp av MSER, där stenar och andra föremål har ett avvikand utseende från bakgrunden bestående av sand. I SONAR-bilden identifieras om- råden med objekt genom att studera hög intensitet på ekosvaren vilket motsva- rar föremål som reflekterat ljudpulserna bra, från dessa skapas objekten genom att använda MSER på dessa områden. De två uppsättningarna med objekt, från kamera- och SONAR-bilden, jämförs sedan genom att alla objekt i kamera-bilden jämförs med alla objekt i SONAR-bilden genom att translatera efter hypotesen att de är samma objekt och se hur många av dem övriga som passar in i antagandet.
@mastersthesis{diva2:680896,
author = {Ekblad, Richard},
title = {{Korrelering mellan optiskt och akustiskt avbildade objekt på havsbotten}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4742--SE}},
year = {2013},
address = {Sweden},
}
A fully automatic de-weathering system to increase the visibility/stability in surveillance applications during bad weather has been developed. Rain, snow and haze during daylight are handled in real-time performance with acceleration from CUDA implemented algorithms. Video from fixed cameras is processed on a PC with no need of special hardware except an NVidia GPU. The system does not use any background model and does not require any precalibration. Increase in contrast is obtained in all haze/rain/snow-cases while the system lags the maximum of one frame during rain or snow removal. De-hazing can be obtained for any distance to simplify tracking or other operating algorithms on a surveillance system.
@mastersthesis{diva2:647937,
author = {Pettersson, Niklas},
title = {{GPU-Accelerated Real-Time Surveillance De-Weathering}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4677--SE}},
year = {2013},
address = {Sweden},
}
In Sweden and many other northern countries, it is common for heat to be distributed to homes and industries through district heating networks. Such networks consist of pipes buried underground carrying hot water or steam with temperatures in the range of 90-150 C. Due to bad insulation or cracks, heat or water leakages might appear.
A system for large-scale monitoring of district heating networks through remote thermography has been developed and is in use at the company Termisk Systemteknik AB. Infrared images are captured from an aircraft and analysed, finding and indicating the areas for which the ground temperature is higher than normal. During the analysis there are, however, many other warm areas than true water or energy leakages that are marked as detections. Objects or phenomena that can cause false alarms are those who, for some reason, are warmer than their surroundings, for example, chimneys, cars and heat leakages from buildings.
During the last couple of years, the system has been used in a number of cities. Therefore, there exists a fair amount of examples of different types of detections. The purpose of the present master’s thesis is to evaluate the reduction of false alarms of the existing analysis that can be achieved with the use of a learning system, i.e. a system which can learn how to recognize different types of detections.
A labelled data set for training and testing was acquired by contact with customers. Furthermore, a number of features describing the intensity difference within the detection, its shape and propagation as well as proximity information were found, implemented and evaluated. Finally, four different classifiers and other methods for classification were evaluated.
The method that obtained the best results consists of two steps. In the initial step, all detections which lie on top of a building are removed from the data set of labelled detections. The second step consists of classification using a Random forest classifier. Using this two-step method, the number of false alarms is reduced by 43% while the percentage of water and energy detections correctly classified is 99%.
@mastersthesis{diva2:640093,
author = {Berg, Amanda},
title = {{Classification of leakage detections acquired by airborne thermography of district heating networks}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4678--SE}},
year = {2013},
address = {Sweden},
}
In factory automation cameras and image processing algorithms can be used to inspect objects. This can decrease the faulty objects that leave the factory and reduce manual labour needed. A vision sensor is a system where camera and image processing is delivered together, and that only needs to be configured for the application that it is to be used for. Thus no programming knowledge is needed for the customer. In this Master’s thesis a way to make the configuration of a vision sensor even easier is developed and evaluated.
The idea is that the customer knows his or her product much better than he or she knows image processing. The customer could take images of positive and negative samples of the object that is to be inspected. The algorithm should then, given these images, configure the vision sensor automatically.
The algorithm that is developed to solve this problem is described step by step with examples to illustrate the problems that needed to be solved. Much of the focus is on how to compare two configurations to each other, in order to find the best one. The resulting configuration from the algorithm is then evaluated with respect to types of applications, computation time and representativeness of the input images.
@mastersthesis{diva2:624443,
author = {Ollesson, Niklas},
title = {{Automatic Configuration of Vision Sensor}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4666--SE}},
year = {2013},
address = {Sweden},
}
Identification of individuals has been solved with many different solutions around the world, either using biometric data or external means of verification such as id cards or RFID tags. The advantage of using biometric measurements is that they are directly tied to the individual and are usually unalterable. Acquiring dependable measurements is however challenging when the individuals are uncooperative. A dependable system should be able to deal with this and produce reliable identifications.
The system proposed in this thesis can autonomously classify uncooperative specimens from depth data. The data is acquired from a depth camera mounted in an uncontrolled environment, where it was allowed to continuously record for two weeks. This requires stable data extraction and normalization algorithms to produce good representations of the specimens. Robust descriptors can therefore be extracted from each sample of a specimen and together with different classification algorithms, the system can be trained or validated. Even with as many as 138 different classes the system achieves high recognition rates. Inspired by the research field of face recognition, the best classification algorithm, the method of fisherfaces, was able to accurately recognize 99.6% of the validation samples. Followed by two variations of the method of eigenfaces, achieving recognition rates of 98.8% and 97.9%. These results affirm that the capabilities of the system are adequate for a commercial implementation.
@mastersthesis{diva2:635227,
author = {Björkeson, Felix},
title = {{Autonomous Morphometrics using Depth Cameras for Object Classification and Identification}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4680--SE}},
year = {2013},
address = {Sweden},
}
In most cases today when a specific person's whereabouts is monitored through video surveillance it is done manually and his or her location when not seen is based on assumptions on how fast he or she can move. Since humans are good at recognizing people this can be done accurately, given good video data, but the time needed to go through all data is extensive and therefore expensive. Because of the rapid technical development computers are getting cheaper to use and therefore more interesting to use for tedious work.
This thesis is a part of a larger project that aims to see to what extent it is possible to estimate a person of interest's time dependent 3D position, when seen in surveillance videos. The surveillance videos are recorded with non overlapping monocular cameras. Furthermore the project aims to see if the person of interest's movement, when position data is unavailable, could be predicted. The outcome of the project is a software capable of following a person of interest's movement with an error estimate visualized as an area indicating where the person of interest might be at a specific time.
This thesis main focus is to implement and evaluate a people detector meant to be used in the project, reduce noise in position measurement, predict the position when the person of interest's location is unknown, and to evaluate the complete project.
The project combines known methods in computer vision and signal processing and the outcome is a software that can be used on a normal PC running on a Windows operating system. The software implemented in the thesis use a Hough transform based people detector and a Kalman filter for one step ahead prediction. The detector is evaluated with known methods such as Miss-rate vs. False Positives per Window or Image (FPPW and FPPI respectively) and Recall vs. 1-Precision.
The results indicate that it is possible to estimate a person of interest's 3D position with single monocular cameras. It is also possible to follow the movement, to some extent, were position data are unavailable. However the software needs more work in order to be robust enough to handle the diversity that may appear in different environments and to handle large scale sensor networks.
@mastersthesis{diva2:652387,
author = {Markström, Johannes},
title = {{3D Position Estimation of a Person of Interest in Multiple Video Sequences:
People Detection}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4721--SE}},
year = {2013},
address = {Sweden},
}
Because of the increase in the number of security cameras, there is more video footage available than a human could efficiently process. In combination with the fact that computers are getting more efficient, it is getting more and more interesting to solve the problem of detecting and recognizing people automatically.
Therefore a method is proposed for estimating a 3D-path of a person of interest in multiple, non overlapping, monocular cameras. This project is a collaboration between two master theses. This thesis will focus on recognizing a person of interest from several possible candidates, as well as estimating the 3D-position of a person and providing a graphical user interface for the system. The recognition of the person of interest includes keeping track of said person frame by frame, and identifying said person in video sequences where the person of interest has not been seen before.
The final product is able to both detect and recognize people in video, as well as estimating their 3D-position relative to the camera. The product is modular and any part can be improved or changed completely, without changing the rest of the product. This results in a highly versatile product which can be tailored for any given situation.
@mastersthesis{diva2:650889,
author = {Johansson, Victor},
title = {{3D Position Estimation of a Person of Interest in Multiple Video Sequences:
Person of Interest Recognition}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4718--SE}},
year = {2013},
address = {Sweden},
}
Automatic tracking of an object of interest in a video sequence is a task that has been much researched. Difficulties include varying scale of the object, rotation and object appearance changing over time, thus leading to tracking failures. Different tracking methods, such as short-term tracking often fail if the object steps out of the camera’s field of view, or changes shape rapidly. Also, small inaccuracies in the tracking method can accumulate over time, which can lead to tracking drift. Long-term tracking is also problematic, partly due to updating and degradation of the object model, leading to incorrectly classified and tracked objects.
This master’s thesis implements a long-term tracking framework called Tracking-Learning-Detection which can learn and adapt, using so called P/N-learning, to changing object appearance over time, thus making it more robust to tracking failures. The framework consists of three parts; a tracking module which follows the object from frame to frame, a learning module that learns new appearances of the object, and a detection module which can detect learned appearances of the object and correct the tracking module if necessary.
This tracking framework is evaluated on thermal infrared videos and the results are compared to the results obtained from videos captured within the visible spectrum. Several important differences between visual and thermal infrared tracking are presented, and the effect these have on the tracking performance is evaluated.
In conclusion, the results are analyzed to evaluate which differences matter the most and how they affect tracking, and a number of different ways to improve the tracking are proposed.
@mastersthesis{diva2:627964,
author = {Stigson, Magnus},
title = {{Object Tracking Using Tracking-Learning-Detection inThermal Infrared Video}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4668--SE}},
year = {2013},
address = {Sweden},
}
Visual tracking is a classical computer vision problem with many important applications in areas such as robotics, surveillance and driver assistance. The task is to follow a target in an image sequence. The target can be any object of interest, for example a human, a car or a football. Humans perform accurate visual tracking with little effort, while it remains a difficult computer vision problem. It imposes major challenges, such as appearance changes, occlusions and background clutter. Visual tracking is thus an open research topic, but significant progress has been made in the last few years.
The first part of this thesis explores generic tracking, where nothing is known about the target except for its initial location in the sequence. A specific family of generic trackers that exploit the FFT for faster tracking-by-detection is studied. Among these, the CSK tracker have recently shown obtain competitive performance at extraordinary low computational costs. Three contributions are made to this type of trackers. Firstly, a new method for learning the target appearance is proposed and shown to outperform the original method. Secondly, different color descriptors are investigated for the tracking purpose. Evaluations show that the best descriptor greatly improves the tracking performance. Thirdly, an adaptive dimensionality reduction technique is proposed, which adaptively chooses the most important feature combinations to use. This technique significantly reduces the computational cost of the tracking task. Extensive evaluations show that the proposed tracker outperform state-of-the-art methods in literature, while operating at several times higher frame rate.
In the second part of this thesis, the proposed generic tracking method is applied to human tracking in surveillance applications. A causal framework is constructed, that automatically detects and tracks humans in the scene. The system fuses information from generic tracking and state-of-the-art object detection in a Bayesian filtering framework. In addition, the system incorporates the identification and tracking of specific human parts to achieve better robustness and performance. Tracking results are demonstrated on a real-world benchmark sequence.
@mastersthesis{diva2:709327,
author = {Danelljan, Martin},
title = {{Visual Tracking}},
school = {Linköping University},
type = {{LiTH-ISY-EX--13/4736--SE}},
year = {2013},
address = {Sweden},
}
Functional Magnetic Resonance Imaging (fMRI) is one of the best techniques for neuroimaging and has revolutionized the way to understand the brain functions. It measures the changes in the blood oxygen level-dependent (BOLD) signal which is related to the neuronal activity. Complexity of the data, presence of different types of noises and the massive amount of data makes the fMRI data analysis a challenging one. It demands efficient signal processing and statistical analysis methods. The inference of the analysis is used by the physicians, neurologists and researchers for better understanding of the brain functions.
The purpose of this study is to design a toolbox for fMRI data analysis. It includes methods to detect the brain activity maps, estimation of the hemodynamic response (HDR) and the connectivity of the brain structures. This toolbox provides methods for detection of activated brain regions measured with Bayesian estimator. Results are compared with the conventional methods such as t-test, ordinary least squares (OLS) and weighted least squares (WLS). Brain activation and HDR are estimated with linear adaptive model and nonlinear method based on radial basis function (RBF) neural network. Nonlinear autoregressive with exogenous inputs (NARX) neural network is developed to model the dynamics of the fMRI data. This toolbox also provides methods to brain connectivity such as functional connectivity and effective connectivity. These methods are examined on simulated and real fMRI datasets.
@mastersthesis{diva2:551505,
author = {Budde, Kiran Kumar},
title = {{A Matlab Toolbox for fMRI Data Analysis: Detection, Estimation and Brain Connectivity}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4600--SE}},
year = {2012},
address = {Sweden},
}
The introduction of dual energy CT, DECT, in the field of medical healthcare has made it possible to extract more information of the scanned objects. This in turn has the potential to improve the accuracy in radiation therapy dose planning. One problem that remains before successful material decomposition can be achieved however, is the presence of beam hardening and scatter artifacts that arise in a scan. Methods currently in clinical use for removal of beam hardening often bias the CT numbers. Hence, the possibility for an appropriate tissue decomposition is limited.
Here a method for successful decomposition as well as removal of the beam hardening artifact is presented. The method uses effective linear attenuations for the five base materials, water, protein, adipose, cortical bone and marrow, to perform the decomposition on reconstructed simulated data. This is performed inside an iterative loop together with the polychromatic x-ray spectra to remove the beam hardening
@mastersthesis{diva2:549562,
author = {Grandell, Oscar},
title = {{An iterative reconstruction algorithm for quantitative tissue decomposition using DECT}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4617--SE}},
year = {2012},
address = {Sweden},
}
Autonomous vehicles have many application possibilities within many different fields like rescue missions, exploring foreign environments or unmanned vehicles etc. For such system to navigate in a safe manner, high requirements of reliability and security must be fulfilled.
This master's thesis explores the possibility to use the machine learning algorithm convolutional network on a robotic platform for autonomous path following. The only input to predict the steering signal is a monochromatic image taken by a camera mounted on the robotic car pointing in the steering direction. The convolutional network will learn from demonstrations in a supervised manner.
In this thesis three different preprocessing options are evaluated. The evaluation is based on the quadratic error and the number of correctly predicted classes. The results show that the convolutional network has no problem of learning a correct behaviour and scores good result when evaluated on similar data that it has been trained on. The results also show that the preprocessing options are not enough to ensure that the system is environment dependent.
@mastersthesis{diva2:534610,
author = {Schmiterlöw, Maria},
title = {{Autonomous Path Following Using Convolutional Networks}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4577--SE}},
year = {2012},
address = {Sweden},
}
This is a master thesis of the Master of Science degree program in Applied Physics and Electrical Engineering at Linköping University. The goal of this thesis is to find out how the Microsoft Kinect can be used as a part of a camera rig to create accurate 3D-models of an indoor environment. The Microsoft Kinect is marketed as a touch free game controller for the Microsoft Xbox 360 game console. The Kinect contains a color and a depth camera. The depth camera works by constantly projecting a near infrared dot pattern that is observed with a near infrared camera. In this thesis it is described how to model the near infrared projector pattern to enable external near infrared cameras to be used to improve the measurement precision. The depth data that the Kinect output have been studied to determine what types of errors it contains. The finding was that the Kinect uses an online calibration algorithm that changes the depth data.
@mastersthesis{diva2:566581,
author = {Nordmark, Anton},
title = {{Kinect 3D Mapping}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4636--SE}},
year = {2012},
address = {Sweden},
}
In this master thesis a visual odometry system is implemented and explained. Visual odometry is a technique, which could be used on autonomous vehicles to determine its current position and is preferably used indoors when GPS is notworking. The only input to the system are the images from a stereo camera and the output is the current location given in relative position.
In the C++ implementation, image features are found and matched between the stereo images and the previous stereo pair, which gives a range of 150-250 verified feature matchings. The image coordinates are triangulated into a 3D-point cloud. The distance between two subsequent point clouds is minimized with respect to rigid transformations, which gives the motion described with six parameters, three for the translation and three for the rotation.
Noise in the image coordinates gives reconstruction errors which makes the motion estimation very sensitive. The results from six experiments show that the weakness of the system is the ability to distinguish rotations from translations. However, if the system has additional knowledge of how it is moving, the minimization can be done with only three parameters and the system can estimate its position with less than 5 % error.
@mastersthesis{diva2:550998,
author = {Johansson, Fredrik},
title = {{Visual Stereo Odometry for Indoor Positioning}},
school = {Linköping University},
type = {{LiTH-ISY-EX--12/4621--SE}},
year = {2012},
address = {Sweden},
}
This is a master thesis of the Master of Science degree program in Applied Physics and Electrical Engineering (Y) at Linköping University. The goal of the projectis to develop an application for creating a map in real time from a video camera on a miniature unmanned aerial vehicle. This thesis project and report is a first exploratory study for this application. It implements a prototype method and evaluates it on sample sequences from an on-board video camera. The method first looks for good points to follow in the image and then tracks them in a sequence.The image is then pasted, or merged, together with previous images so that points from the different images align.
Two methods to find good points to follow are examined with focus on real-time performance. The result is that the much faster FAST detector method yielded satisfactory results good enough to replace the slower standard method of the Harris-Stephens corner detector.
It is also examined whether it is possible to assume that the ground is a flat surface in this application or if a computationally more expensive method estimating altitude information has to be used. The result is that at high altitudes or when the ground is close to flat in reality and the camera points straight downwards a two-dimensional method will do. If flying lower or with high objects in the picture, which is often the case in this application, it must to be taken into account that the points really are at different heights, hence the ground can not be assumed to be flat.
@mastersthesis{diva2:514063,
author = {Wolkesson, Henrik},
title = {{Realtime Mosaicing of Video Stream from $\mu$UAV}},
school = {Linköping University},
type = {{LiTH-ISY-EX--07/4140--SE}},
year = {2012},
address = {Sweden},
}
In today's industry 3D cameras are often used to inspect products. The camera produces both a 3D model and an intensity image by capturing a series of profiles of the object using laser triangulation. In many of these setups a physical encoder is attached to, for example, the conveyor belt that the product is travelling on. The encoder is used to get an accurate reading of the speed that the product has when it passes through the laser. Without this, the output image from the camera can be distorted due to a variation in velocity.
In this master thesis a method for integrating the functionality of this physical encoder into the software of the camera is proposed. The object is scanned together with a pattern, with the help of this pattern the object can be restored to its original proportions.
@mastersthesis{diva2:455669,
author = {Josefsson, Mattias},
title = {{3D camera with built-in velocity measurement}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4523--SE}},
year = {2011},
address = {Sweden},
}
When patients move during an MRI examination, severe artifacts arise in the reconstructed image and motion correction is therefore often desired. An in-plane motion correction algorithm suitable for PRESTO-CAN, a new 3D functional MRI method where sampling of k-space is radial in kx-direction and kz-direction and Cartesian in ky-direction, was implemented in this thesis work.
Rotation and translation movements can be estimated and corrected for sepa- rately since the magnitude of the data is only affected by the rotation. The data were sampled in a radial pattern and the rotation was estimated by finding the translation in angular direction using circular correlation. Correlation was also used when finding the translation in x-direction and z-direction.
The motion correction algorithm was evaluated on computer simulated data, the motion was detected and corrected for, and this resulted in images with greatly reduced artifacts due to patient movements.
@mastersthesis{diva2:456354,
author = {Karlsson, Anette},
title = {{In-Plane Motion Correction in Reconstruction of non-Cartesian 3D-functional MRI}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4480--SE}},
year = {2011},
address = {Sweden},
}
Den här rapporten har utrett möjligheten att automatiskt identifiera diken frånflygburet insamlat LiDAR-data. Den metod för identifiering som har valts harförst skapat en höjdbild från LiDAR-data. Därefter har den tagit fram kandidatertill diken genom att vektorisera resultatet från en linjedetektering. Egenskaper-na för dikeskandidaterna har sedan beräknats genom en analys av höjdprofilerför varje enskild kandidat, där höjdprofilerna skapats utifrån ursprungliga data.Genom att filtrera kandidaterna efter deras egenskaper kan dikeskartor med an-vändarspecificerade mått på diken presenteras i ett vektorformat som underlättarvidare användning. Rapporten beskriver hur algoritmen har implementerats ochpresenterar också exempel på resultat. Efter en analys av algoritmen samt förslagpå förbättringar presenteras den viktigaste behållningen av rapporten; Att det ärmöjligt med automatisk detektering av diken.
@mastersthesis{diva2:456702,
author = {Wasell, Richard},
title = {{Automatisk detektering av diken i LiDAR-data}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4524--SE}},
year = {2011},
address = {Sweden},
}
Today, 3D models of cities are created from aerial images using a camera rig. Images, together with sensor data from the flights, are stored for further processing when building 3D models. However, there is a market demand for a more mobile solution of satisfactory quality. If the camera position can be calculated for each image, there is an existing algorithm available for the creation of 3D models.
This master thesis project aims to investigate whether the iPhone 4 offers good enough image and sensor data quality from which 3D models can be created. Calculations on movements and rotations from sensor data forms the foundation of the image processing, and should refine the camera position estimations.
The 3D models are built only from image processing since sensor data cannot be used due to poor data accuracy. Because of that, the scaling of the 3D models are unknown and a measurement is needed on the real objects to make scaling possible. Compared to a test algorithm that calculates 3D models from only images, already available at the SBD’s system, the quality of the 3D model in this master thesis project is almost the same or, in some respects, even better when compared with the human eye.
@mastersthesis{diva2:452945,
author = {Lundqvist, Tobias},
title = {{3D mapping with iPhone}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4517--SE}},
year = {2011},
address = {Sweden},
}
In this master thesis the possibility of detecting and tracking objects in multispectral infrared video sequences is investigated. The current method with fix-sized rectangles have significant disadvantages. These disadvantages will be solved using image segmentation to estimate the shape of the object. The result of the image segmentation is used to determine the infrared contrast of the object. Our results show how some objects will give very good segmentation, tracking as well as shape detection. The objects that perform best are the flares and countermeasures. But especially helicopters seen from the side, with significant movements, is better detected with our method. The motion of the object is very important since movement is the main component in successful shape detection. This is so because helicopters are much colder than flares and engines. Detecting the presence and position of moving objects is easier and can be done quite successfully even with helicopters. But using structure tensors we can also detect the presence and estimate the position for stationary objects.
@mastersthesis{diva2:415941,
author = {Möller, Sebastian},
title = {{Image Segmentation and Target Tracking using Computer Vision}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4424--SE}},
year = {2011},
address = {Sweden},
}
A common computer vision task is navigation and mapping. Many indoor navigation tasks require depth knowledge of flat, unstructured surfaces (walls, floor, ceiling). With passive illumination only, this is an ill-posed problem. Inspired by small children using a torchlight, we use a spotlight for active illumination. Using our torchlight approach, depth and orientation estimation of unstructured, flat surfaces boils down to estimation of ellipse parameters. The extraction of ellipses is very robust and requires little computational effort.
@techreport{diva2:650756,
author = {Felsberg, Michael and Larsson, Fredrik and Wang, Han and Ynnerman, Anders and Schön, Thomas},
title = {{Torchlight Navigation}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2011},
type = {Other academic},
number = {LiTH-ISY-R, 3004},
address = {Sweden},
}
Medical imaging is an important tool for diagnosis and treatment planning today. However as the demand for efficiency increases at the same time as the data volumes grow immensely, the need for computer assisted analysis, such as image segmentation, to help and guide the practitioner increases.
Medical image segmentation could be used for various different tasks, the localization and delineation of pathologies such as cancer tumors is just one example. Numerous problems with noise and image artifacts in the generated images make the segmentation a difficult task, and the developer is forced to choose between speed and performance. In clinical practise, however, this is impossible as both speed and performance are crucial. One solution to this problem might be to involve the user more in the segmentation, using interactivite algorithms where the user might influence the segmentation for an improved result.
This thesis has concentrated on finding a fast and interactive segmentation method for liver tumor segmentation. Various different methods were explored, and a few were chosen for implementation and further development. Two methods appeared to be the most promising, Bayesian Region Growing (BRG) and Level Set.
An interactive Level Set algorithm emerged as the best alternative for the interactivity of the algorithm, and could be used in combination with both BRG and Level Set. A new data term based on a probability model instead of image edges was also explored for the Level Set-method, and proved to be more promising than the original one. The probability based Level Set and the BRG method both provided good quality results, but the fastest of the two was the BRG-method, which could segment a tumor present in 25 CT image slices in less than 10 seconds when implemented in Matlab and mex-C++ code on an ACPI x64-based PC with two 2.4 GHz Intel(R) Core(TM) 2CPU and 8 GB RAM memory. The interactive Level Set could be succesfully used as an interactive addition to the automatic method, but its usefulness was somewhat reduced by its slow processing time ( 1.5 s/slice) and the relative complexity of the needed user interactions.
@mastersthesis{diva2:438557,
author = {Thomasson, Viola},
title = {{Liver Tumor Segmentation Using Level Sets and Region Growing}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4485--SE}},
year = {2011},
address = {Sweden},
}
MRI (Magnetic Resonance Imaging) is a medical imaging method that uses magnetic fields in order to retrieve images of the human body. This thesis revolves around a novel acquisition method of 3D fMRI (functional Magnetic Resonance Imaging) called PRESTO-CAN that uses a radial pattern in order to sample the (kx,kz)-plane of k-space (the frequency domain), and a Cartesian sample pattern in the ky-direction. The radial sample pattern allows for a denser sampling of the central parts of k-space, which contain the most basic frequency information about the structure of the recorded object. This allows for higher temporal resolution to be achieved compared with other sampling methods since a fewer amount of total samples are needed in order to retrieve enough information about how the object has changed over time. Since fMRI is mainly used for monitoring blood flow in the brain, increased temporal resolution means that we can be able to track fast changes in brain activity more efficiently.The temporal resolution can be further improved by reducing the time needed for scanning, which in turn can be achieved by applying parallel imaging. One such parallel imaging method is SENSE (SENSitivity Encoding). The scan time is reduced by decreasing the sampling density, which causes aliasing in the recorded images. The aliasing is removed by the SENSE method by utilizing the extra information provided by the fact that multiple receiver coils with differing sensitivities are used during the acquisition. By measuring the sensitivities of the respective receiver coils and solving an equation system with the aliased images, it is possible to calculate how they would have looked like without aliasing.In this master thesis, SENSE has been successfully implemented in PRESTO-CAN. By using normalized convolution in order to refine the sensitivity maps of the receiver coils, images with satisfying quality was able to be reconstructed when reducing the k-space sample rate by a factor of 2, and images of relatively good quality also when the sample rate was reduced by a factor of 4. In this way, this thesis has been able to contribute to the improvement of the temporal resolution of the PRESTO-CAN method.
@mastersthesis{diva2:423964,
author = {Ahlman, Gustav},
title = {{Improved Temporal Resolution Using Parallel Imaging in Radial-Cartesian 3D functional MRI}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4470--SE}},
year = {2011},
address = {Sweden},
}
In this thesis, an investigation was performed to find ways of differencing between firesand vehicles at waste stations in hope of removing vehicles as a source of error duringearly fire detection. The existing system makes use of a heat camera, which rotates in 48different angles (also known as zones) in a fixed position. If the heat is above a certainvalue within a zone the system sounds the fire alarm.The rotation of the camera results in an unwanted displacement between two successiveframes within the same zone. By use of image registration, this displacement wasremoved. After the registration of an image, segmentation was performed where coldobjects are eliminated as an error source. Lastly, an analysis was performed upon thewarm objects.At the end, it was proven that the image registration had been a successful improvementof the existing system. It was also shown that vehicles can, to some extent, beeliminated as an error source.
@mastersthesis{diva2:446792,
author = {Söderström, Rikard},
title = {{An early fire detection system through registration and analysis of waste station IR-images}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4354--SE}},
year = {2011},
address = {Sweden},
}
This report introduces some simplications to the method by Fitzgibbon et al. that allows for 3D model construction from turn-table sequences. It is assumed that the reader has previously read in order to fully understand this report.
Fitzgibbon et al. presents a method for 3D model construction that utilizes the extra constraints imposed by turn-table sequences. Restricting the scenario to a turn-table sequence with a single camera with xed settings produces these extra constraints:
C1. The internal parameters for the camera are the same for all images
C2. The motion of the camera can be described by a rotation around a singleaxis
It is shown that in the uncalibrated case the number of parameters to estimate is m + 8 where m is the number of images.
We further simplify the problem by using extra constrains given from the fact that we know:
C3. The internal parameters of the camera, i.e the K matrix
C4. That the angle between each pair of consecutive cameras is the same
Using these extra simplications makes it possible to create a 3D model from realistic data without using Bundle Adjustment.
@techreport{diva2:434353,
author = {Larsson, Fredrik},
title = {{Automatic 3D Model Construction for Turn-Table Sequences - A Simplification}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2011},
type = {Other academic},
number = {LiTH-ISY-R, 3022},
address = {Sweden},
}
Monitoring wear particles in lubricating oils allows specialists to evaluate thehealth and functionality of a mechanical system. The main analysis techniquesavailable today are manual particle analysis and automatic optical analysis. Man-ual particle analysis is effective and reliable since the analyst continuously seeswhat is being counted . The drawback is that the technique is quite time demand-ing and dependent of the skills of the analyst. Automatic optical particle countingconstitutes of a closed system not allowing for the objects counted to be observedin real-time. This has resulted in a number of sources of error for the instrument.In this thesis a new method for counting particles based on light microscopywith image analysis is proposed. It has proven to be a fast and effective methodthat eliminates the sources of error of the previously described methods. Thenew method correlates very well with manual analysis which is used as a refer-ence method throughout this study. Size estimation of particles and detectionof metallic particles has also shown to be possible with the current image analy-sis setup. With more advanced software and analysis instrumentation, the imageanalysis method could be further developed to a decision based machine allowingfor declarations about which wear mode is occurring in a mechanical system.
@mastersthesis{diva2:420518,
author = {Ceco, Ema},
title = {{Image Analysis in the Field of Oil Contamination Monitoring}},
school = {Linköping University},
type = {{LITH-ISY-EX--11/4467--SE}},
year = {2011},
address = {Sweden},
}
Most mobile video-recording devices of today, e.g. cell phones and music players, make use of a rolling shutter camera. A rolling shutter camera captures video by recording every frame line-by-line from top to bottom of the image, leading to image distortions in situations where either the device or the target is moving. Recording video by hand also leads to visible frame-to-frame jitter.
In this thesis, methods to decrease distortion caused by the motion of a video-recording device with a rolling shutter camera are presented. The methods are based on estimating the orientation of the camera from gyroscope and accelerometer measurements.
The algorithms are implemented on the iPod Touch 4, and the resulting videos are compared to those of competing stabilization software, both commercial and free, in a series of blind experiments. The results from this user study shows that the methods presented in the thesis perform equal to or better than the others.
@mastersthesis{diva2:420914,
author = {Hanning, Gustav},
title = {{Video Stabilization and Rolling Shutter Correction using Inertial Measurement Sensors}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4464--SE}},
year = {2011},
address = {Sweden},
}
In this master thesis, a model-based video coding algorithm has been developed that uses input from a colour and depth camera, such as the Microsoft Kinect. Using a model-based representation of a video has several advantages over the commonly used block-based approach, used by the H.264 standard. For example, videos can be rendered in 3D, be viewed from alternative views, and have objects inserted into them for augmented reality and user interaction.
This master thesis demonstrates a very efficient way of encoding the geometry of a scene. The results of the proposed algorithm show that it can reach very low bitrates with comparable results to the H.264 standard.
@mastersthesis{diva2:420400,
author = {Sandberg, David},
title = {{Model-Based Video Coding Using a Colour and Depth Camera}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4463--SE}},
year = {2011},
address = {Sweden},
}
3D cameras delivering height data can be used for quality inspection of goods on a conveyor.
It is then of interest to distinguish the important parts of the image from background and noise and further to divide these interesting parts into segments that have a strong correlation to objects on the conveyor belt.
Segmentation can easily be done by thresholding in the simple case. However, in more complex situations, for example when objects touch or overlap, this does not work well.
In this thesis, research and evaluation of a few different methods for segmentation of height image data are presented. The focus is to find an accurate method for segmentation of smooth irregularly shaped organic objects such as vegetables or shellfish.
For evaluative purposes a database consisting of height images depicting a variety of such organic objects has been collected.
We show in the thesis that a conventional gradient magnitude method is hard to beat in the general case. If, however, the objects to be segmented are heavily non-convex with a lot of crests and valleys within themselves one could be better off choosing a normalized least squares method.
@mastersthesis{diva2:393236,
author = {Schöndell, Andreas},
title = {{Evaluation of methods for segmentation of 3D range image data}},
school = {Linköping University},
type = {{LiTH-ISY-EX--11/4346--SE}},
year = {2011},
address = {Sweden},
}
The thesis presents an investigation of the potential of measuring plant condition from hyperspectral reflectance data. To do this, some linear methods for embedding the high dimensional hyperspectral data and to perform regression to a plant condition space have been compared. A preprocessing step that aims at normalized illumination intensity in the hyperspectral images has been conducted and some different methods for this purpose have also been compared.A large scale experiment has been conducted where tobacco plants have been grown and treated differently with respect to watering and nutrition. The treatment of the plants has served as ground truth for the plant condition. Four sets of plants have been grown one week apart and the plants have been measured at different ages up to the age of about five weeks. The thesis concludes that there is a relationship between plant treatment and their leaves' spectral reflectance, but the treatment has to be somewhat extreme for enabling a useful treatment approximation from the spectrum. CCA has been the proposed method for calculation of the hyperspectral basis that is used to embed the hyperspectral data to the plant condition (treatment) space. A preprocessing method that uses a weighted normalization of the spectrums for illumination intensity normalization is concluded to be the most powerful of the compared methods.
@mastersthesis{diva2:350907,
author = {Johansson, Peter},
title = {{Plant Condition Measurement from Spectral Reflectance Data}},
school = {Linköping University},
type = {{LiTH-ISY-EX--10/4369--SE}},
year = {2010},
address = {Sweden},
}
This thesis treats topics within the area of object recognition. A real-time view matching method has been developed to compute the transformation between two different images of the same scene. This method uses a color based region detector called MSCR and affine transformations of these regions to create affine-invariant patches that are used as input to the SIFT algorithm. A parallel method to compute the SIFT descriptor has been created with relaxed constraints so that the descriptor size and the number of histogram bins can be adjusted. Additionally, a matching step to deduce correspondences and a parallel RANSAC method have been created to estimate the undergone transformation between these descriptors. To achieve real-time performance, the implementation has been targeted to use the parallel nature of the GPU with CUDA as the programming language. Focus has been put on the architecture of the GPU to find the best way to parallelize the different processing steps. CUDA has also been combined with OpenGL to be able to use the hardware accelerated anisotropic sampling method for affine transformations of regions. Parts of the implementation can also be used individually from either Matlab or by using the provided C++ library directly. The method was also evaluated in terms of accuracy and speed. It was shown that our algorithm has similar or better accuracy at finding correspondences than SIFT when the 3D geometry changes are large but we get a slightly worse result on images with flat surfaces.
@mastersthesis{diva2:345932,
author = {Lind, Anders},
title = {{High-speed View Matching using Region Descriptors}},
school = {Linköping University},
type = {{LiTH-ISY-EX--10/4356--SE}},
year = {2010},
address = {Sweden},
}
This thesis describes the development of a robotic platform for evaluation of gaze stabilization algorithms built for the Sensorimotor Systems Laboratory at the University of British Columbia. The primary focus of the work was to measure the performance of a biomimetic vestibulo-ocular reflex controller for gaze stabilization using cerebellar feedback. A flexible robotic system was designed and built in order to run reproducible test sequences at high speeds featuring three dimensional linear movement and rotation around the vertical axis. On top of the robot head a 1 DOF camera head can be independently controlled by a stabilization algorithm implemented in Simulink. Vestibular input is provided by a 3-axis accelerometer and a 3-axis gyroscope. The video feed from the camera head is fed into a workstation computer running a custom image processing program which evaluates both the absolute and relative movement of the images in the sequence. The absolute angles of tracked regions in the image are continuously returned, as well as the movement of the image sequence across the sensor in full 3 DOF camera rotation. Due to dynamic downsampling and noise suppression algorithms very good performance was reached, enabling retinal slip estimation at 720 degrees per second. Two different controllers were implemented, one adaptive open loop controller similar to Dean et al.’s work[12] and one reference implementation using closed loop control and optimal linear estimation of reference angles. A sequence of tests were run in order to evaluate the performance of the two algorithms. The adaptive controller was shown to offer superior performance, dramatically reducing the movement of the image for all test sequences, while also offering better performance as it was tuned over time.
@mastersthesis{diva2:359452,
author = {Landgren, Axel},
title = {{A robotic camera platform for evaluation of biomimetic gaze stabilization using adaptive cerebellar feedback}},
school = {Linköping University},
type = {{LiTH-ISY-EX--10/4351--SE}},
year = {2010},
address = {Sweden},
}
Man portable air defence systems, MANPADS, pose a big threat to civilian and military aircraft. This thesis aims to find methods that could be used in a missile approach warning system based on infrared cameras.
The two main tasks of the completed system are to classify the type of missile, and also to estimate its position and velocity from a sequence of images.
The classification is based on hidden Markov models, one-class classifiers, and multi-class classifiers.
Position and velocity estimation uses a model of the observed intensity as a function of real intensity, image coordinates, distance and missile orientation. The estimation is made by an extended Kalman filter.
We show that fast classification of missiles based on radiometric data and a hidden Markov model is possible and works well, although more data would be needed to verify the results.
Estimating the position and velocity works fairly well if the initial parameters are known. Unfortunately, some of these parameters can not be computed using the available sensor data.
@mastersthesis{diva2:323455,
author = {Holm Ovr\'{e}n, Hannes and Emilsson, Erika},
title = {{Missile approach warning using multi-spectral imagery}},
school = {Linköping University},
type = {{LiTH-ISY-EX--10/4329--SE}},
year = {2010},
address = {Sweden},
}
We will present the basic theory for the camera geometry. Our goal is camera calibration and the tools necessary for this. We start with homogeneous matrices that can be used to describe geometric transformations in a simple manner. Then we consider the pinhole camera model, the simplified camera model that we will show how to calibrate.
A camera matrix describes the mapping from the 3D world to a camera image. The camera matrix can be determined through a number of corresponding points measured in the world and the image. We also demonstrate the common special case of camera calibration when it can be assumed that the world is flat. Then, a plane in the world is transformed to the image plane. Such a plane-to-plane mapping is called a homography.
Finally, we discuss some useful mathematical tools needed for camera calibration. We show that the solution we present for the determination of the camera matrix is equivalent to a least-squares solution. We also show how to solve a homogeneous system of equations using SVD (singular value decomposition).
@techreport{diva2:693117,
author = {Magnusson, Maria},
title = {{Short on camera geometry and camera calibration}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2010},
type = {Other academic},
number = {LiTH-ISY-R, 3070},
address = {Sweden},
}
This master thesis investigates the difficulties of constructing a depth map using one low resolution grayscale camera mounted in the front of a car. The goal is to produce a depth map in real-time to assist other algorithms in the safety system of a car. This has been shown to be difficult using the evaluated combination of camera position and choice of algorithms.
The main problem is to estimate an accurate optical flow. Another problem is to handle moving objects. The conclusion is that the implementations, mainly triangulation of corresponding points tracked using a Lucas Kanade tracker, provide information of too poor quality to be useful for the safety system of a car.
@mastersthesis{diva2:355971,
author = {Svensson, Fredrik},
title = {{Structure from Forward Motion}},
school = {Linköping University},
type = {{LiTH-ISY-EX--10/4364--SE}},
year = {2010},
address = {Sweden},
}
In this thesis an algorithm for producing saliency maps as well as an algorithm for detecting salient regions based on the saliency map was developed. The saliency values are computed as center-surround differences and a local descriptor called the region p-channel is used to represent center and surround respectively. An integral image representation called the integral p-channel is used to speed up extraction of the local descriptor for any given image region. The center-surround difference is calculated as either histogram or p-channel dissimilarities.
Ground truth was collected using human subjects and the algorithm’s ability to detect salient regions was evaluated against this ground truth. The algorithm was also compared to another saliency algorithm.
Two different center-surround interpretations are tested, as well as several p-channel and histogram dissimilarity measures. The results show that for all tested settings the best performing dissimilarity measure is the so called diffusion distance. The performance comparison showed that the algorithm developed in this thesis outperforms the algorithm against which it was compared, both with respect to region detection and saliency ranking of regions. It can be concluded that the algorithm shows promising results and further investigation of the algorithm is recommended. A list of suggested approaches for further research is provided.
@mastersthesis{diva2:291472,
author = {Tuttle, Alexander},
title = {{Saliency Maps using Channel Representations}},
school = {Linköping University},
type = {{LITH-ISY-EX--10/4169--SE}},
year = {2010},
address = {Sweden},
}
Foreground segmentation is a common first step in tracking and surveillance applications. The purpose of foreground segmentation is to provide later stages of image processing with an indication of where interesting data can be found. This thesis is an investigation of how foreground segmentation can be performed in two contexts: as a pre-step to trajectory tracking and as a pre-step in indoor surveillance applications.
Three methods are selected and detailed: a single Gaussian method, a Gaussian mixture model method, and a codebook method. Experiments are then performed on typical input video using the methods. It is concluded that the Gaussian mixture model produces the output which yields the best trajectories when used as input to the trajectory tracker. An extension is proposed to the Gaussian mixture model which reduces shadow, improving the performance of foreground segmentation in the surveillance context.
@mastersthesis{diva2:285807,
author = {Molin, Joel},
title = {{Foreground Segmentation of Moving Objects}},
school = {Linköping University},
type = {{LiTH-ISY-EX--10/4299--SE}},
year = {2010},
address = {Sweden},
}
Most people are familiar with the BRIO labyrinth game and the challenge of guiding the ball through the maze. The goal of this project was to use this game to create a platform for evaluation of control algorithms. The platform was used to evaluate a few different controlling algorithms, both traditional automatic control algorithms as well as algorithms based on online incremental learning.
The game was fitted with servo actuators for tilting the maze. A camera together with computer vision algorithms were used to estimate the state of the game. The evaluated controlling algorithm had the task of calculating a proper control signal, given the estimated state of the game.
The evaluated learning systems used traditional control algorithms to provide initial training data. After initial training, the systems learned from their own actions and after a while they outperformed the controller used to provide initial training.
@mastersthesis{diva2:322572,
author = {Öfjäll, Kristoffer},
title = {{LEAP, A Platform for Evaluation of Control Algorithms}},
school = {Linköping University},
type = {{LiTH-ISY-EX--10/4370--SE}},
year = {2010},
address = {Sweden},
}
In this work we present a region detector, an adaptation to range data of the popular Maximally Stable Extremal Regions (MSER) region detector. We call this new detector Maximally Robust Range Regions (MRRR). We apply the new detector to real range data captured by a commercially available laser range camera. Using this data we evaluate the repeatability of the new detector and compare it to some other recently published detectors. The presented detector shows a repeatability which is better or the same as the best of the other detectors. The MRRR detector also offers additional data on the detected regions. The additional data could be crucial in applications such as registration or recognition.
@techreport{diva2:325006,
author = {Viksten, Fredrik and Forss\'{e}n, Per-Erik},
title = {{Maximally Robust Range Regions}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2010},
type = {Other academic},
number = {LiTH-ISY-R, 2961},
address = {Sweden},
}
This document is an addendum to the main text in A local geometry-based descriptor for 3D data applied to object pose estimation by Fredrik Viksten and Klas Nordberg. This addendum gives proofs for propositions stated in the main document. This addendum also details how to extract information from the fourth order tensor refered to as S22 in the main document.
@techreport{diva2:325000,
author = {Nordberg, Klas and Viksten, Fredrik},
title = {{A local geometry based descriptor for 3D data:
Addendum on rank and segment extraction}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2010},
type = {Other academic},
number = {LiTH-ISY-R, 2951},
address = {Sweden},
}
Within this thesis an algorithm for object recognition called Cluster Matching has been developed, implemented and evaluated. The image information is sampled at arbitrary sample points, instead of interest points, and local image features are extracted. These sample points are used as a compact representation of the image data and can quickly be searched for prior known objects. The algorithm is evaluated on a test set of images and the result is surprisingly reliable and time efficient.
@mastersthesis{diva2:284633,
author = {Lennartsson, Mattias},
title = {{Object Recognition with Cluster Matching}},
school = {Linköping University},
type = {{LITH-ISY-EX--09/4152--SE}},
year = {2009},
address = {Sweden},
}
This thesis is about improving the image quality of image sequences scanned by the film scanner GoldenEye. Film grain is often seen as an artistic effect in film sequences but scanned images can be more grainy or noisy than the intention. To remove the grain and noise as well as sharpen the images a few known image enhancement methods have been implemented, tested and evaluated. An own idea of a thresholding method using the dyadic wavelet transform has also been tested. As benchmark has MATLAB been used but one method has also been implemented in C/C++. Some of the methods works satisfactory when it comes to the image result but none of the methods works satisfactory when it comes to time consumption. To solve that a few speed up ideas are suggested in the end of the thesis. A method to correct the color of the sequences has also been suggested.
@mastersthesis{diva2:210478,
author = {Stuhr, Lina},
title = {{Grain Reduction in Scanned Image Sequences under Time Constraints}},
school = {Linköping University},
type = {{LiTH-ISY-EX--09/4203--SE}},
year = {2009},
address = {Sweden},
}
Gaze tracking is the estimation of the point in space a person is “looking at”. This is widely used in both diagnostic and interactive applications, such as visual attention studies and human-computer interaction. The most common commercial solution used to track gaze today uses a combination of infrared illumination and one or more cameras. These commercial solutions are reliable and accurate, but often expensive. The aim of this thesis is to construct a simple single-camera gaze tracker from off-the-shelf components. The method used for gaze tracking is based on infrared illumination and a schematic model of the human eye. Based on images of reflections of specific light sources in the surfaces of the eye the user’s gaze point will be estimated. Evaluation is also performed on both the software and hardware components separately, and on the system as a whole. Accuracy is measured in spatial and angular deviation and the result is an average accuracy of approximately one degree on synthetic data and 0.24 to 1.5 degrees on real images at a range of 600 mm.
@mastersthesis{diva2:209626,
author = {Wallenberg, Marcus},
title = {{A Single-Camera Gaze Tracker using Controlled Infrared Illumination}},
school = {Linköping University},
type = {{LITH-ISY-EX--09/4199--SE}},
year = {2009},
address = {Sweden},
}
Time of flight is an imaging technique with uses depth information to capture 3D information in a scene. Recent developments in the technology have made ToF cameras more widely available and practical to work with. The cameras now enable real time 3D imaging and positioning in a compact unit, making the technology suitable for variety of object recognition tasks
An object recognition system for locating teats is at the center of the DeLaval VMS, which is a fully automated system for milking cows. By implementing ToF technology as part of the visual detection procedure, it would be possible to locate and track all four teat’s positions in real time and potentially provide an improvement compared with the current system.
The developed algorithm for teat detection is able to locate teat shaped objects in scenes and extract information of their position, width and orientation. These parameters are determined with an accuracy of millimeters. The algorithm also shows promising results when tested on real cows. Although detecting many false positives the algorithm was able to correctly detected 171 out of 232 visible teats in a test set of real cow images. This result is a satisfying proof of concept and shows the potential of ToF technology in the field of automated milking.
@mastersthesis{diva2:224321,
author = {Westberg, Michael},
title = {{Time of Flight Based Teat Detection}},
school = {Linköping University},
type = {{LiTH-ISY-EX--09/4154 --SE}},
year = {2009},
address = {Sweden},
}
This Master Thesis has been conducted at the National Laboratory of Forensic Science (SKL) in Linköping. When images that are to be analyzed at SKL, presenting an interesting object, are of bad quality there may be a need to enhance them. If several images with the object are available, the total amount of information can be used in order to estimate one single enhanced image. A program to do this has been developed by studying methods for image registration and high resolution image estimation. Tests of important parts of the procedure have been conducted. The final results are satisfying and the key to a good high resolution image seems to be the precision of the image registration. Improvements of this part may lead to even better results. More suggestions for further improvementshave been proposed.
@mastersthesis{diva2:390,
author = {Karelid, Mikael},
title = {{Image Enhancement over a Sequence of Images}},
school = {Linköping University},
type = {{LiTH-ISY-EX--08/4013--SE}},
year = {2008},
address = {Sweden},
}
This report gives an overview and motivates the design of a C++ framework for object recognition using channel-coded feature maps. The code was produced in connection to the work on my PhD thesis Channel-Coded Feature Maps for Object Recognition and Machine Learning. The package contains algorithms ranging from basic image processing routines to specific complex algorithms for creating channel-coded feature maps through piecewise polynomials. Much emphasis has been put in creating a flexible framework using virtual interfaces. This makes it easy e.g.~to switch between different image primitives detectors or learning methods in an object recognizer. Some common design choices include an image class with a convenient but fast pixel access, a configurable assert macro for error handling and a common base class for object ownership management. The main computer vision algorithms are channel-coded feature maps (CCFMs) including their derivatives, single-sided colored lines, object detection using an abstract hypothesize-verify framework and tracking and pose estimation using locally weighted regression and CCFMs. The code is considered as having alpha status at best. It is available under the GNU General Public License (GPL) and is mainly intended for future research on the subject.
@techreport{diva2:288558,
author = {Jonsson, Erik},
title = {{Object Recognition using Channel-Coded Feature Maps: C++ Implementation Documentation}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2008},
type = {Other academic},
number = {LiTH-ISY-R, 2838},
address = {Sweden},
}
an Internal Navigational System and a Global Navigational Satellite System (GNSS). In navigational warfare the GNSS can be jammed, therefore are a third navigational system is needed. The system that has been tried in this thesis is camera based navigation. Through a video camera and a sensor reference the position is determined. This thesis will process the matching between the sensor reference and the video image.
Two methods have been implemented: normalized cross correlation and position determination through a homography. Normalized cross correlation creates a correlation matrix. The other method uses point correspondences between the images to determine a homography between the images. And through the homography obtain a position. The more point correspondences the better the position determination will be.
The results have been quite good. The methods have got the right position when the Euler angles of the UAV have been known. Normalized cross correlation has been the best method of the tested methods.
@mastersthesis{diva2:128466,
author = {Olgemar, Markus},
title = {{Camera Based Navigation:
Matching between Sensor reference and Video image}},
school = {Linköping University},
type = {{LITH-ISY-EX--08/4170--SE}},
year = {2008},
address = {Sweden},
}
There has been a rapid progress of the graphics processor the last years, much because of the demands from computer games on speed and image quality. Because of the graphics processor’s special architecture it is much faster at solving parallel problems than the normal processor. Due to its increasing programmability it is possible to use it for other tasks than it was originally designed for.
Even though graphics processors have been programmable for some time, it has been quite difficult to learn how to use them. CUDA enables the programmer to use C-code, with a few extensions, to program NVIDIA’s graphics processor and completely skip the traditional programming models. This thesis investigates if the graphics processor can be used for calculations without knowledge of how the hardware mechanisms work. An image processing algorithm calculating the optical flow has been implemented. The result shows that it is rather easy to implement programs using CUDA, but some knowledge of how the graphics processor works is required to achieve high performance.
@mastersthesis{diva2:127132,
author = {Ringaby, Erik},
title = {{Optical Flow Computation on Compute Unified Device Architecture}},
school = {Linköping University},
type = {{LiTH-ISY-EX--08/4043--SE}},
year = {2008},
address = {Sweden},
}
The purpose of this master thesis was to study computer vision algorithms for vehicle detection in monochrome images captured by mono camera. The work has mainly been focused on detecting rear-view cars in daylight conditions. Previous work in the literature have been revised and algorithms based on edges, shadows and motion as vehicle cues have been modified, implemented and evaluated. This work presents a combination of a multiscale edge based detection and a shadow based detection as the most promising algorithm, with a positive detection rate of 96.4% on vehicles at a distance of between 5 m to 30 m. For the algorithm to work in a complete system for vehicle detection, future work should be focused on developing a vehicle classifier to reject false detections.
@mastersthesis{diva2:18234,
author = {Lundagårds, Marcus},
title = {{Vehicle Detection in Monochrome Images}},
school = {Linköping University},
type = {{LiTH-ISY-EX--08/4148--SE}},
year = {2008},
address = {Sweden},
}
In this thesis it is examined whether the pose of an object can be determined by a system trained with a synthetic 3D model of said object. A number of variations of methods using P-channel representation are examined. Reference images are rendered from the 3D model, features, such as gradient orientation and color information are extracted and encoded into P-channels. The P-channel representation is then used to estimate an overlapping channel representation, using B1-spline functions, to estimate a density function for the feature set. Experiments were conducted with this representation as well as the raw P-channel representation in conjunction with a number of distance measures and estimation methods.
It is shown that, with correct preprocessing and choice of parameters, the pose can be detected with some accuracy and, if not in real-time, fast enough to be useful in a tracker initialization scenario. It is also concluded that the success rate of the estimation depends heavily on the nature of the object.
@mastersthesis{diva2:17521,
author = {Berg, Martin},
title = {{Pose Recognition for Tracker Initialization Using 3D Models}},
school = {Linköping University},
type = {{LiTH-ISY-EX--07/4076--SE}},
year = {2008},
address = {Sweden},
}
The PRESTO sequence is a well-known 3-D fMRI imaging sequence. In this sequence the echo planar imaging technique is merged with the echo-shift technique. This combination results in a very fast image acquisition, which is required for fMRI examinations of neural activation in the human brain. The aim of this work was to use the basic Cartesian PRESTO sequence as a framework when developing a novel trajectory using a non-Cartesian grid.
Our new pulse sequence, PRESTO CAN, rotates the k-space profiles around the ky-axis in a non-Cartesian manner. This results in a high sampling density close to the centre of the k-space, and at the same time it provides sparser data collection of the part of the k-space that contains less useful information. This "can- or cylinder-like" pattern is expected to result in a much faster k-space acquisition without loosing important spatial information.
A new reconstruction algorithm was also developed. The purpose was to be able to construct an image volume from data obtained using the novel PRESTO CAN sequence. This reconstruction algorithm was based on the gridding technique, and a Kaiser-Bessel window was also used in order to re-sample the data onto a Cartesian grid. This was required to make 3-D Fourier transformation possible. In addition, simulations were also performed in order to verify the function of the reconstruction algorithm. Furthermore, in vitro tests showed that the development of the PRESTO CAN sequence and the corresponding reconstruction algorithm were highly successful.
In the future, the results can relatively easily be extended and generalized for in vivo investigations. In addition, there are numerous exciting possibilities for extending the basic techniques described in this thesis.
@mastersthesis{diva2:397232,
author = {Thyr, Per},
title = {{Method for Acquisition and Reconstruction of non-Cartesian 3-D fMRI}},
school = {Linköping University},
type = {{LITH-ISY-EX--08/4058--SE}},
year = {2008},
address = {Sweden},
}
In this thesis, two real-time stereo methods have been implemented and evaluated. The first one is based on blockmatching and the second one is based on local phase. The goal was to be able to run the algorithms at real-time and examine which one is best. The blockmatching method performed better than the phase based method, both in speed and accuracy. SIMD operations (Single Instruction Multiple Data) have been used in the processor giving a speed boost by a factor of two.
@mastersthesis{diva2:16992,
author = {Arvidsson, Lars},
title = {{Stereoseende i realtid}},
school = {Linköping University},
type = {{LITH-ISY-EX--07/3944--SE}},
year = {2007},
address = {Sweden},
}
In this thesis spacetime analysis is applied to laser triangulation in an attempt to eliminate certain artifacts caused mainly by reflectance variations of the surface being measured. It is shown that spacetime analysis do eliminate these artifacts almost completely, it is also shown that the shape of the laser beam used no longer is critical thanks to the spacetime analysis, and that in some cases the laser probably even could be exchanged for a non-coherent light source. Furthermore experiments of running the derived algorithm on a GPU (Graphics Processing Unit) are conducted with very promising results.
The thesis starts by deriving the theory needed for doing spacetime analysis in a laser triangulation setup taking perspective distortions into account, then several experiments evaluating the method is conducted.
@mastersthesis{diva2:17262,
author = {Benderius, Björn},
title = {{Laser Triangulation Using Spacetime Analysis}},
school = {Linköping University},
type = {{LITH-ISY-EX--07/4047--SE}},
year = {2007},
address = {Sweden},
}
Today, tool center point calibration is mostly done by a manual procedure. The method is very time consuming and the result may vary due to how skilled the operators are.
This thesis proposes a new automated iterative method for tool center point calibration of industrial robots, by making use of computer vision and image processing techniques. The new method has several advantages over the manual calibration method. Experimental verifications have shown that the proposed method is much faster, still delivering a comparable or even better accuracy. The setup of the proposed method is very easy, only one USB camera connected to a laptop computer is needed and no contact with the robot tool is necessary during the calibration procedure.
The method can be split into three different parts. Initially, the transformation between the robot wrist and the tool is determined by solving a closed loop of homogeneous transformations. Second an image segmentation procedure is described for finding point correspondences on a rotation symmetric robot tool. The image segmentation part is necessary for performing a measurement with six degrees of freedom of the camera to tool transformation. The last part of the proposed method is an iterative procedure which automates an ordinary four point tool center point calibration algorithm. The iterative procedure ensures that the accuracy of the tool center point calibration only depends on the accuracy of the camera when registering a movement between two positions.
@mastersthesis{diva2:23964,
author = {Hallenberg, Johan},
title = {{Robot Tool Center Point Calibration using Computer Vision}},
school = {Linköping University},
type = {{LiTH-ISY-EX-- 07/3943--SE}},
year = {2007},
address = {Sweden},
}
A common problem when using background models to segment moving objects from video sequences is that objects cast shadow usually significantly differ from the background and therefore get detected as foreground. This causes several problems when extracting and labeling objects, such as object shape distortion and several objects merging together. The purpose of this thesis is to explore various possibilities to handle this problem.
Three methods for statistical background modeling are reviewed. All methods work on a per pixel basis, the first is based on approximating the median, the next on using Gaussian mixture models, and the last one is based on channel representation. It is concluded that all methods detect cast shadows as foreground.
A study of existing methods to handle cast shadows has been carried out in order to gain knowledge on the subject and get ideas. A common approach is to transform the RGB-color representation into a representation that separates color into intensity and chromatic components in order to determine whether or not newly sampled pixel-values are related to the background. The color spaces HSV, IHSL, CIELAB, YCbCr, and a color model proposed in the literature (Horprasert et al.) are discussed and compared for the purpose of shadow detection. It is concluded that Horprasert's color model is the most suitable for this purpose.
The thesis ends with a proposal of a method to combine background modeling using Gaussian mixture models with shadow detection using Horprasert's color model. It is concluded that, while not perfect, such a combination can be very helpful in segmenting objects and detecting their cast shadow.
@mastersthesis{diva2:23393,
author = {Wood, John},
title = {{Statistical Background Models with Shadow Detection for Video Based Tracking}},
school = {Linköping University},
type = {{LITH-ISY-EX--07/3921--SE}},
year = {2007},
address = {Sweden},
}
The objective of this thesis is to investigate if it is possible to use stereo vision to find and track the players and the ball during a football game.
The thesis shows that it is possible to detect all players that isn’t too occluded by another player. Situations when a player is occluded by another player is solved by tracking the players from frame to frame.
The ball is also detected in most frames by looking for ball-like features. As with the players the ball is tracked from frame to frame so that when the ball is occluded, the positions is estimated by the tracker.
@mastersthesis{diva2:23152,
author = {Borg, Johan},
title = {{Detecting and Tracking Players in Football Using Stereo Vision}},
school = {Linköping University},
type = {{LiTH-ISY-EX--07/3535--SE}},
year = {2007},
address = {Sweden},
}
One major goal of the COSPAL project is to develop an artificial cognitive system architecture with the capability of exploratory learning. Exploratory learning is a strategy that allows to apply generalization on a conceptual level, resulting in an extension of competences. Whereas classical learning methods aim at best possible generalization, i.e., concluding from a number of samples of a problem class to the problem class itself, exploration aims at applying acquired competences to a new problem class. Incremental or online learning is an inherent requirement to perform exploratory learning.
Exploratory learning requires new theoretic tools and new algorithms. In the COSPAL project, we mainly investigate reinforcement-type learning methods for exploratory learning and in this paper we focus on its algorithmic aspect. Learning is performed in terms of four nested loops, where the outermost loop reflects the user-reinforcement-feedback loop, the intermediate two loops switch between different solution modes at symbolic respectively sub-symbolic level, and the innermost loop performs the acquired competences in terms of perception-action cycles. We present a system diagram which explains this process in more detail.
We discuss the learning strategy in terms of learning scenarios provided by the user. This interaction between user (’teacher’) and system is a major difference to most existing systems where the system designer places his world model into the system. We believe that this is the key to extendable robust system behavior and successful interaction of humans and artificial cognitive systems.
We furthermore address the issue of bootstrapping the system, and, in particular, the visual recognition module.We give some more in-depth details about our recognition method and how feedback from higher levels is implemented. The described system is however work in progress and no final results are available yet. The available preliminary results that we have achieved so far, clearly point towards a successful proof of the architecture concept.
@techreport{diva2:302803,
author = {Felsberg, Michael and Wiklund, Johan and Jonsson, Erik and Moe, Anders and Granlund, Gösta},
title = {{Exploratory Learning Structure in Artificial Cognitive Systems}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2006},
type = {Other academic},
number = {LiTH-ISY-R, 2738},
address = {Sweden},
}
Fluoroskopi är benämningen på kontinuerlig röntgengenomlysning av en patient. Eftersom patienten och även läkaren då utsätts för kontinuerlig röntgenstrålning måste strålningsdosen hållas låg, vilket leder till brusiga bilder. Det är därför önskvärt att genom bildbehandling förbättra bilderna. Bildförbättringen måste dock ske i realtid och därför kan inte konventionella metoder användas.
Detta examensarbete avser att undersöka hur ortogonala s k. derivataoperatorer kan användas för att förbättra läsbarheten av fluoroskopibilder med hjälp av brusundertryckning och kantförstärkning. Derivataoperatorer är separerbara vilket gör dem extremt beräkningsvänliga och lätta att infoga i en skalpyramid. Skalpyramiden ger möjlighet att processa strukturer och detaljer av olika storlek var för sig samtidigt som nedsamplingsmekanismen gör att denna uppdelning inte nämnvärt ökar beräkningsbördan. I den fullständiga lösningen införes också struktur-/brusseparering för att förhindra förstärkning av och undertrycka bidrag från de frekvensband där en pixel domineras av brus.
Resultaten visar att brus verkligen kan undertryckas medan kanter och linjer bevaras bra eller förstärkes om så önskas. Den riktade filtreringen gör dock att det lätt uppstår maskliknande strukturer i bruset, men detta kan undvikas med rätt parameterinställning av struktur-/brussepareringen. Förhållandet mellan riktad och icke-riktad filtrering är likaledes styrbart via en parameter som kan optimeras med hänsyn till behov och önskemål vid varje tillämpning.
@mastersthesis{diva2:21733,
author = {Brolund, Hans},
title = {{Förbättring av fluoroskopibilder}},
school = {Linköping University},
type = {{LITH-ISY-EX-06/3823-SE}},
year = {2006},
address = {Sweden},
}
The objective of this master thesis was to study the performance of an active triangulation system for 3-D imaging in underwater applications. Structured light from a 20 mW laser and a conventional video camera was used to collect data for generation of 3-D images. Different techniques to locate the laser line and transform it into spatial coordinates were developed and evaluated. A field- and a laboratory trial were performed.
From the trials we can conclude that the distance resolution is much higher than the lateral- and longitudinal resolution. The lateral resolution can be improved either by using a high frame rate camera or simply by using a low scanning speed. It is possible to obtain a range resolution of less than a millimeter. The maximum range of vision was 5 meters under water measured on a white target and 3 meters for a black target in clear sea water. These results are however dependent on environmental and system parameters such as laser power, laser beam divergence and water turbidity. A higher laser power would for example increase the maximum range.
@mastersthesis{diva2:21659,
author = {Norström, Christer},
title = {{Underwater 3-D imaging with laser triangulation}},
school = {Linköping University},
type = {{LiTH-ISY-EX--06/3851--SE}},
year = {2006},
address = {Sweden},
}
The increased usage of infrared sensors by pilots has created a growing demand for simulated environments based on infrared radiation. This has led to an increased need for Saab to refine their existing model for simulating real-time infrared imagery, resulting in the carrying through of this thesis. Saab develops the Gripen aircraft, and they provide training simulators where pilots can train in a realistic environment. The new model is required to be based on the real-world behavior of infrared radiation, and furthermore, unlike Saab's existing model, have dynamically changeable attributes.
This thesis seeks to develop a simulation model compliant with the requirements presented by Saab, and to develop the implementation of a test environment demonstrating the features and capabilities of the proposed model. All through the development of the model, the pilot training value has been kept in mind.
The first part of the thesis consists of a literature study to build a theoretical base for the rest of the work. This is followed by the development of the simulation model itself and a subsequent implementation thereof. The simulation model and the test implementation are evaluated as the final step conducted within the framework of this thesis.
The main conclusions of this thesis first of all includes that the proposed simulation model does in fact have its foundation in physics. It is further concluded that certain attributes of the model, such as time of day, are dynamically changeable as requested. Furthermore, the test implementation is considered to have been feasibly integrated with the current simulation environment.
A plan concluding how to proceed has also been developed. The plan suggests future work with the proposed simulation model, since the evaluation shows that it performs well in comparison to the existing model as well as other products on the market.
@mastersthesis{diva2:22896,
author = {Dehlin, Jonas and Löf, Joakim},
title = {{Dynamic Infrared Simulation:
A Feasibility Study of a Physically Based Infrared Simulation Model}},
school = {Linköping University},
type = {{LITH-ISY-EX--06/3815--SE}},
year = {2006},
address = {Sweden},
}
This report describes a method to detect and recognize objects from 3D laser radar data. The method is based on local descriptors computed from triplets of planes that are estimated from the data set. Each descriptor that is computed on query data is compared with descriptors computed on object model data to get a hypothesis of object class and pose. An hypothesis is either verified or rejected using a similarity measure between the model data set and the query data set.
@techreport{diva2:257173,
author = {Johansson, Björn and Moe, Anders},
title = {{Object Recognition in 3D Laser Radar Data using Plane triplets}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2005},
type = {Other academic},
number = {LiTH-ISY-R, 2708},
address = {Sweden},
}
Denna uppsats syftar till att undersöka användbarheten av metoden Independent Component Analysis (ICA) för brusreducering av bilder tagna av infraröda kameror. Speciellt fokus ligger på att reducera additivt brus. Bruset delas upp i två delar, det Gaussiska bruset samt det sensorspecifika mönsterbruset. För att reducera det Gaussiska bruset används en populär metod kallad sparse code shrinkage som bygger på ICA. En ny metod, även den byggandes på ICA, utvecklas för att reducera mönsterbrus. För varje sensor utförs, i den nya metoden, en analys av bilddata för att manuellt identifiera typiska mönsterbruskomponenter. Dessa komponenter används därefter för att reducera mönsterbruset i bilder tagna av den aktuella sensorn. Det visas att metoderna ger goda resultat på infraröda bilder. Algoritmerna testas både på syntetiska såväl som på verkliga bilder och resultat presenteras och jämförs med andra algoritmer.
@mastersthesis{diva2:20831,
author = {Björling, Robin},
title = {{Denoising of Infrared Images Using Independent Component Analysis}},
school = {Linköping University},
type = {{LiTH-ISY-EX--05/3726--SE}},
year = {2005},
address = {Sweden},
}
This report develops a method for probabilistic conceptual sensor modeling. The idea is to generate probabilities for detection, recognition and identification based on a few simple factors. The
focus lies on FLIR sensors and thermal radiation, even if discussions of other wavelength bands are made. The model can be used as a hole or some or several parts can be used to create a simpler model. The core of the model is based on the Johnson criteria that uses resolution as the input parameter. Some extensions that models other factors are also implemented. In the end a short discussion of the possibility to use this model for other sensors than FLIR is made.
@mastersthesis{diva2:20633,
author = {Sonesson, Mattias},
title = {{A Probabilistic Approach to Conceptual Sensor Modeling}},
school = {Linköping University},
type = {{LITH-ISY-EX-3428-2004}},
year = {2005},
address = {Sweden},
}
This thesis describes new methods for automatic crack detection in pavements. Cracks in pavements can be used as an early indication for the need of reparation.
Automatic crack detection is preferable compared to manual inventory; the repeatability can be better, the inventory can be done at a higher speed and can be done without interruption of the traffic.
The automatic and semi-automatic crack detection systems that exist today use Image Analysis methods. There are today powerful methods available in the area of Computer Vision. These methods work in higher dimensions with greater complexity and generate measures of local signal properties, while Image Analyses methods for crack detection use morphological operations on binary images.
Methods for digitalizing video data on VHS-cassettes and stitching images from nearby frames have been developed.
Four methods for crack detection have been evaluated, and two of them have been used to form a crack detection and classification program implemented in the calculation program Matlab.
One image set was used during the implementation and another image set was used for validation. The crack detection system did perform correct detection on 99.2 percent when analysing the images which were used during implementation. The result of the crack detection on the validation data was not very good. When the program is being used on data from other pavements than the one used during implementation, information about the surface texture is required to calibrate the crack detection.
@mastersthesis{diva2:20160,
author = {Håkansson, Staffan},
title = {{Detektering av sprickor i vägytor med hjälp av Datorseende}},
school = {Linköping University},
type = {{LITH-ISY-EX--05/3699--SE}},
year = {2005},
address = {Sweden},
}
This report adresses the problem of software correction of spatially variant blur in digital images. The problem arises when the camera optics contains flaws, when the scene contains multiple moving objects with different relative motion or the camera itself is i.e. rotated. Compensation through deconvolving is impossible due to the shift-variance in the PSF hence alternative methods are required. There are a number of suggested methods published. This report evaluates two methods
@mastersthesis{diva2:20290,
author = {Andersson, Mathias},
title = {{Image processing algorithms for compensation of spatially variant blur}},
school = {Linköping University},
type = {{LITH-ISY-EX--05/3633--SE}},
year = {2005},
address = {Sweden},
}
@techreport{diva2:257175,
author = {Forss\'{e}n, Per-Erik and Johansson, Björn and Granlund, Gösta},
title = {{Learning under Perceptual Aliasing}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2005},
type = {Other academic},
number = {, },
address = {Sweden},
}
This report introduces a robust contour descriptor for view-based object recognition. In recent years great progress has been made in the field of view based object recognition mainly due to the introduction of texture based features such as SIFT and MSER. Although these are remarkably successful for textured objects, they have problems with man-made objects with little or no texture. For such objects, either explicit geometrical models, or contour and shading based features are also needed. This report introduces a robust contour descriptor which we hope can be combined with texture based features to obtain object recognition systems that work in a wider range of situations. Each detected contour is described as a sequence of line and ellipse segments, both which have well defined geometrical transformations to other views. The feature detector is also quite fast, this is mainly due to the idea of first detecting chains of contour points, these chains are then split into line segments, which are later either kept, grouped into ellipses or discarded. We demonstrate the robustness of the feature detector with a repeatability test under general homography transformations of a planar scene. Through the repeatability test, we find that using ellipse segments instead of lines, where this is appropriate improves repeatability. We also apply the features in a robotic setting where object appearances are learned by manipulating the objects.
@techreport{diva2:288582,
author = {Forssen, Per-Erik and Moe, Anders},
title = {{Contour Descriptors for View-Based Object Recognition}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2005},
type = {Other academic},
number = {LiTH-ISY-R, 2706},
address = {Sweden},
}
@techreport{diva2:262476,
author = {Jonsson, Erik and Felsberg, Michael and Granlund, Gösta},
title = {{Incremental Associative Learning}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2005},
type = {Other academic},
number = {LiTH-ISY-R, 2691},
address = {Sweden},
}
The MATLAB/C program take version 3.1 is a program for simulation of X-ray projections from 3D volume data. It is based on an older C version by Muller-Merbach as well as an extended C version by Turbell. The program can simulate 2D X-ray projections from 3D objects. These data can then be input to 3D reconstruction algorithms. Here however, we only demonstrate a couple of 2D reconstruction algorithms, written in MATLAB. Simple MATLAB examples show how to generate the take projections followed by subsequent reconstruction. Compared to the old take version, the C code have been carefully revised. A preliminary, rather untested feature of using a polychromatic X-ray source with different energy levels was already included in the old take version. The current polychromatic feature X-ray is however carefully tested. For example, it has been compared with the results from the program described by Malusek et al. We also demonstrate experiments with a polychromatic X-ray source and a Plexiglass object giving the beam-hardening artefact. Detector sensitivity for different energy levels is not included in take. However, in section~\refsec:realexperiment, we describe a technique to include the detector sensitivity into the energy spectrum. Finally, an experiment with comparison of real and simulated data were performed. The result wasn't completely successful, but we still demonstrate it. Contemporary analytical reconstruction methods for helical cone-beam CT have to be designed to handle the Long Object Problem. Normally, a moderate amount of over-scanning is sufficient for reconstruction of a certain Region-of-interest (ROI). Unfortunately, for iterative methods, it seems that the useful ROI will diminish for every iteration step. The remedies proposed here are twofold. Firstly, we use careful extrapolation and masking of projection data. Secondly, we generate and utilize projection data from incompletely reconstructed volume parts, which is rather counter-intuitive and contradictory to our initial assumptions. The results seem very encouraging. Even voxels close to the boundary in the original ROI are as well enhanced by the iterative loop as the middle part.
@techreport{diva2:288581,
author = {Seger, Olle and Seger, Maria Magnusson},
title = {{The MATLAB/C program take - a program for simulation of X-ray projections from 3D volume data. Demonstration of beam-hardening artefacts in subsequent CT reconstruction.}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2005},
type = {Other academic},
number = {LiTH-ISY-R, 2682},
address = {Sweden},
}
To summarize, the VISATEC project was initiated to combine the specific scientific competencies of the research groups at CAU and LiU, together with the industrial view on vision applications, in order to develop novel, more robust algorithms for object localization and recognition. This goal was achieved by a two-fold strategy, whereby on the one hand more robust basic algorithms were developed and on the other hand a method for the combination of these algorithms was devised. In particular, the latter confirmed the consortium’s belief that an appropriate combination of a number of basic algorithms will lead to more robust results than a single method could do.
However, the multi-cue integration is just one algorithm of many that were developed in the VISATEC project. All developed algorithms are described in some detail in the remainder of this report. An overview of the respective publications can be found in appendix.
Despite some difficulties that were encountered on the way, we as a consortium feel that the VISATEC project was a success. That this is not only our opinion reflects in the outcome of the final review. We believe that the work that was done during these three years of the project, not only furthered our understanding of the matter, but also added to the knowledge within the scientific community and showed new possibilities for industrial vision applications.
@techreport{diva2:288604,
author = {Sommer, Gerald and Granlund, Gösta and Granert, Oliver and Krause, Martin and Nordberg, Klas and Perwass, Christian and Söderberg, Robert and Viksten, Fredrik and Chavarria, Marco},
title = {{Information Society Technologies (IST) programme:
Final Report}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2005},
type = {Other academic},
number = {, },
address = {Sweden},
}
To improve the control of a steel casting process ABB has developed an Electro Magnetic Brake (EMBR). This product is designed to improve steel quality, i.e. reduce non-metallic inclusions and blisters as well as risk of surface cracks. There is a demand of increasing the steel quality and in order to optimize the steel casting, simulations and experiments play an important role in achieving this. An advanced CFD simulation model has been created to carry out this task.
The validation of the simulation model is performed on a water model that has been built for this purpose. This water model also makes experiments possible. One step to the simulation model is to measure the velocity and motion pattern of the seeding particles and the air bubbles in the water model to see if it corresponds to the simulation results.
Since the water is transparent, seeding particles have been added to the liquid in order to observe the motion of the water. They have the same density as water. Hence the particles will follow the flow accurately. The motions of the air bubbles that are added into the water model need also to be observed since they influence the flow pattern.
An algorithm - ”Transparent motions” - is thoroughly inspected and implemented. ”Transparent motions” was originally designed to post process x-ray images. However in this thesis, it is investigated whether the algorithm might be applicable to the water model and the image sequences containing seeding particles and air bubbles that are going to be used for motion estimation.
The result show satisfying results for image sequences of particles only, however with a camera with a faster sampling interval, these results would improve. For image sequences with both bubbles and particles no results have been achieved.
@mastersthesis{diva2:21306,
author = {Gustafsson, Gabriella},
title = {{Multiphase Motion Estimation in a Two Phase Flow}},
school = {Linköping University},
type = {{LITH-ISY-EX--05/3723--SE}},
year = {2005},
address = {Sweden},
}
This thesis describes and evaluates a number of approaches and algorithms for nonuniform correction (NUC) and suppression of fixed pattern noise in a image sequence. The main task for this thesis work was to create a general NUC for infrared focal plane arrays. To create a radiometrically correct NUC, reference based methods using polynomial approximation are used instead of the more common scene based methods which creates a cosmetic NUC.
The pixels that can not be adjusted to give a correct value for the incomming radiation are defined as dead. Four separate methods of identifying dead pixels are used to find these pixels. Both the scene sequence and calibration data are used in these identifying methods.
The algorithms and methods have all been tested by using real image sequences. A graphical user interface using the presented algorithms has been created in Matlab to simplify the correction of image sequences. An implementation to convert the corrected values from the images to radiance and temperature is also performed.
@mastersthesis{diva2:21133,
author = {Isoz, Wilhelm},
title = {{Calibration of Multispectral Sensors}},
school = {Linköping University},
type = {{LiTH-ISY-EX--05/3651--SE}},
year = {2005},
address = {Sweden},
}
Detta examensarbete utreder avståndsbedömning med hjälp av bildbehandling och stereoseende för känd kamerauppställning.
Idag existerar ett stort antal beräkningsmetoder för att få ut avstånd till objekt, men metodernas prestanda har knappt mätts. Detta arbete tittar huvudsakligen på olika blockbaserade metoder för avståndsbedömning och tittar på möjligheter samt begränsningar då man använder sig av känd kunskap inom bildbehandling och stereoseende för avståndsbedömning. Arbetet är gjort på Bofors Defence AB i Karlskoga, Sverige, i syfte att slutligen användas i ett optiskt sensorsystem. Arbetet utreder beprövade
Resultaten pekar mot att det är svårt att bestämma en närmask, avstånd till samtliga synliga objekt, men de testade metoderna bör ändå kunna användas punktvis för att beräkna avstånd. Den bästa metoden bygger på att man beräknar minsta absolutfelet och enbart behåller de säkraste värdena.
@mastersthesis{diva2:20786,
author = {Hedlund, Gunnar},
title = {{Närmaskbestämning från stereoseende}},
school = {Linköping University},
type = {{LiTH-ISY-EX--05/3623--SE}},
year = {2005},
address = {Sweden},
}
The purpose of this master thesis, performed at FOI, was to evaluate a range gated underwater camera, for the application identification of bottom objects. The master thesis was supported by FMV within the framework of “arbetsorder Systemstöd minjakt (Jan Andersson, KC Vapen)”. The central part has been field trials, which have been performed in both turbid and clear water. Conclusions about the performance of the camera system have been done, based on resolution and contrast measurements during the field trials. Laboratory testing has also been done to measure system specific parameters, such as the effective gate profile and camera gate distances.
The field trials shows that images can be acquired at significantly longer distances with the tested gated camera, compared to a conventional video camera. The distance where the target can be detected is increased by a factor of 2. For images suitable for mine identification, the increase is about 1.3. However, studies of the performance of other range gated systems shows that the increase in range for mine identification can be about 1.6. Gated viewing has also been compared to other technical solutions for underwater imaging.
@mastersthesis{diva2:20570,
author = {Andersson, Adam},
title = {{Range Gated Viewing with Underwater Camera}},
school = {Linköping University},
type = {{LITH-ISY-EX--05/3718--SE}},
year = {2005},
address = {Sweden},
}
Just how far is it possible to make learning of new parts for recognition and robot picking autonomous? This thesis initially gives the prerequisites for the steps in learning and calibration that are to be automated. Among these tasks are to select a suitable part model from numerous candidates with the help of a new part segmenter, as well as computing the spatial extent of this part, facilitating robotic collision handling. Other tasks are to analyze the part model in order to highlight correct and suitable edge segments for increasing pattern matching certainty, and to choose appropriate acceptance levels for pattern matching. Furthermore, tasks deal with simplifying camera calibration by analyzing the calibration pattern, as well as compensating for differences in perspective at great depth variations, by calculating the centre of perspective of the image. The image processing algorithms created in order to solve the tasks are described and evaluated thoroughly. This thesis shows that simplification of steps of learning and calibration, by the help of advanced image processing, really is possible.
@mastersthesis{diva2:19024,
author = {Wernersson, Björn and Södergren, Mikael},
title = {{Automatiserad inlärning av detaljer för igenkänning och robotplockning}},
school = {Linköping University},
type = {{LiTH-ISY-EX--05/3755--SE}},
year = {2005},
address = {Sweden},
}
Contemporary algorithms employed for reconstruction of 3D volumes from helical cone beam projections are so called non-exact algorithms. This means that the reconstructed volumes will contain artifacts irrespective of the detector resolution and number of projections angles employed in the process.
It has been proposed that these artifacts can be suppressed using an iterative scheme which comprises computation of projections from the already reconstructed volume as well as the non-exact reconstruction itself.
The purpose of the present work is to examine if the iterative scheme can be applied to the non-exact reconstruction method PI-original in order to improve the reconstruction result. An important part in this implementation is a careful design of the projection operator, as a poorly designed projection operator may result in aliasing and/or other artifacts in the reconstruction result. Since the projection data is truncated, special care must be taken along the boundaries of the detector. Three different ways of handling this interpolation problem is proposed and examined.
The results show that artifacts caused by the PI-original method can indeed be reduced by the iterative scheme. However, each iteration requires at least three times more processing time than the initial reconstruction, which may call for certain compromises, smartness and/or parallelization in the innermost loops. Furthermore, at higher cone angles certain types of artifacts seem to grow by each iteration instead of being suppressed.
@mastersthesis{diva2:19912,
author = {Sunnegårdh, Johan},
title = {{Iterative Enhancement of Non-Exact Reconstruction in Cone Beam CT}},
school = {Linköping University},
type = {{LITH-ISY-EX--04/3646--SE}},
year = {2004},
address = {Sweden},
}
Den här rapporten beskriver och utvärderar ett antal algoritmer för multisensordatafusion av radar och IR/TV-data på rådatanivå. Med rådatafusion menas att fusionen ska ske innan attribut- eller objektextrahering. Attributextrahering kan medföra att information går förlorad som skulle kunna förbättra fusionen. Om fusionen sker på rådatanivå finns mer information tillgänglig och skulle kunna leda till en förbättrad attributextrahering i ett senare steg. Två tillvägagångssätt presenteras. Den ena metoden projicerar radarbilden till IR-vyn och vice versa. Fusionen utförs sedan på de par av bilder med samma dimensioner. Den andra metoden fusionerar de två ursprungliga bilderna till en volym. Volymen spänns upp av de tre dimensionerna representerade i ursprungsbilderna. Metoden utökas också genom att utnyttja stereoseende. Resultaten visar att det kan vara givande att utnyttja stereoseende då den extra informationen underlättar fusionen samt ger en mer generell lösning på problemet.
@mastersthesis{diva2:19523,
author = {Schultz, Johan},
title = {{Sensordatafusion av IR- och radarbilder}},
school = {Linköping University},
type = {{}},
year = {2004},
address = {Sweden},
}
This report brings together a novel approach to some computer vision problems and a particular algorithmic development of the Landweber iterative algorithm. The algorithm solves a class of high-dimensional, sparse, and constrained least-squares problems, which arise in various computer vision learning tasks, such as object recognition and object pose estimation. The algorithm has recently been applied to these problems, but it has been used rather heuristically. In this report we describe the method and put it on firm mathematical ground. We consider a convexly constrained weighted least-squares problem and propose for its solution a projected Landweber method which employs oblique projections onto the closed convex constraint set. We formulate the problem, present the algorithm and work out its convergence properties, including a rate-of-convergence result. The results are put in perspective of currently available projected Landweber methods. The application to supervised learning is described, and the method is evaluated in a function approximation experiment.
@techreport{diva2:244368,
author = {Johansson, Björn and Elfving, Tommy and Kozlov, Vladimir and Censor, Yair and Granlund, Gösta},
title = {{The Application of an Oblique-Projected Landweber Method to a Model of Supervised Learning}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2004},
type = {Other academic},
number = {LiTH-ISY-R, 2623},
address = {Sweden},
}
This report describes how blob features can be used for automatic estimation of the fundamental matrix from two perspective projections of a 3D scene. Blobs are perceptually salient, homogeneous, compact image regions. They are represented by their average colour, area, centre of gravity and inertia matrix. Coarse blob correspondences are found by voting using colour and local similarity transform matching on blob pairs. We then do RANSAC sampling of the coarse correspondences, and weight each estimate according to how well the approximating conics and colours of two blobs correspond. The initial voting significantly reduces the number of RANSAC samples required, and the extra information besides position, allows us to reject false matches more accurately than in RANSAC using point features.
@techreport{diva2:288340,
author = {Forssen, Per-Erik and Moe, Anders},
title = {{Automatic Estimation of Epipolar Geometry from Blob Features}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2004},
type = {Other academic},
number = {LiTH-ISY-R, 2620},
address = {Sweden},
}
This report describes a fourth order tensor defined on projective spaces which can be used for the representation of medium-level features, e.g., one or more oriented segments. The tensor has one part which describes what type of local structures are present in a region, and one part which describes where they are located. This information can be used, e.g., to represent multiple orientations, corners, and line-endings. The tensor can be defined for arbitrary signal dimension, but the presentation focuses on the properties of the fourth order tensor for the case of 2D and 3D image data. A method for estimating the proposed tensor representation by means of simple computations directly from the structure tensor is presented. Given a simple matrix representation of the tensor, it can be shown that there is a direct correspondence between the number of oriented segments and the rank of the matrix provided that the number of segments is three or less. The \publication also presents techniques for extracting information about the oriented segments which the tensor represent. Finally, it shown that a small set of coefficients can be computed from the proposed tensor which are invariant to changes of the coordinate system.
@techreport{diva2:288343,
author = {Nordberg, Klas},
title = {{A fourth order tensor for representation of orientation and position of oriented segments}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2004},
type = {Other academic},
number = {LiTH-ISY-R, 2587},
address = {Sweden},
}
In this paper we present a new and efficient method to implement robust smoothing of low-level signal features: B-spline channel smoothing. This method consists of three steps: encoding of the signal features into channels, averaging of the channels, and decoding of the channels. We show that linear smoothing of channels is equivalent to robust smoothing of the signal features, where we make use of quadratic B-splines to generate the channels. The linear decoding from B-spline channels allows to derive a robust error norm which is very similar to Tukey's biweight error norm. Channel smoothing is superior to iterative robust smoothing implementations like non-linear diffusion, bilateral filtering, and mean-shift approaches for four reasons: it has lower computational complexity, it is easy to implement, it chooses the global minimum error instead of the nearest local minimum, and it can also be used on non-linear spaces, such as orientation space. In the experimental part of the paper we compare channel smoothing and the previously mentioned three other approaches for 2D orientation data.
@techreport{diva2:288553,
author = {Felsberg, Michael and Forssen, Per-Erik and Scharr, Hanno},
title = {{Efficient Robust Smoothing of Low-Level Signal Features}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2004},
type = {Other academic},
number = {LiTH-ISY-R, 2619},
address = {Sweden},
}
In this paper we present a new method to implement a robust estimator: B-spline channel smoothing. We show that linear smoothing of channels is equivalent to a robust estimator, where we make use of the channel representation based upon quadratic B-splines. The linear decoding from B-spline channels allows to derive a robust error norm which is very similar to Tukey's biweight error norm. Using channel smoothing instead of iterative robust estimator implementations like non-linear diffusion, bilateral filtering, and mean-shift approaches is advantageous since channel smoothing is faster, it is easy to implement, it chooses the global minimum error instead of the nearest local minimum, and it can also be used on non-linear spaces, such as orientation space. As an application, we implemented orientation smoothing and compared it to the other three approaches.
@techreport{diva2:288549,
author = {Felsberg, Michael and Forssen, Per-Erik and Scharr, Hanno},
title = {{B-Spline Channel Smoothing for Robust Estimation}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2004},
type = {Other academic},
number = {LiTH-ISY-R, 2579},
address = {Sweden},
}
Most contemporary CT-sytems employ non-exact methods. This treatise reports on how these methods could be transformed from non-exact to exact reconstruction methods by means of iterative post-processing. Compared to traditional algebraic reconstruction (ART) we expect much faster convergence (in theory quadratic), due to a much improved first guess and the fact that each iteration includes the same non-exact analytical reconstruction step as the first guess.
@techreport{diva2:288551,
author = {Danielsson, Per-Erik and Seger, Maria Magnusson},
title = {{Combining Fourier and iterative methods in computer tomography:
Analysis of an iteration scheme. The 2D-case}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2004},
type = {Other academic},
number = {LiTH-ISY-R, 2634},
address = {Sweden},
}
This report evaluates the stability of two image interest point detectors, star-pattern points and points based on the fourth order tensor. The Harris operator is also included for comparison. Different image transformations are applied and the repeatability of points between a reference image and each of the transformed images are computed. The transforms are plane rotation, change in scale, change in view, and change in lightning conditions. We conclude that the result largely depends on the image content. The star-pattern points and the fourth order tensor models the image as locally straight lines, while the Harris operator is based on simple/non-simple signals. The two methods evaluated here perform equally well or better than the Harris operator if the model is valid, and perform worse otherwise.
@techreport{diva2:288612,
author = {Johansson, Björn and Söderberg, Robert},
title = {{A Repeatability Test for Two Orientation Based Interest Point Detectors}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2004},
type = {Other academic},
number = {LiTH-ISY-R, 2606},
address = {Sweden},
}
In this paper we propose a new operator which combines advantages of monogenic scale-space and Gaussian scale-space, of the monogenic signal and the structure tensor. The gradient energy tensor (GET) defined in this paper is based on Gaussian derivatives up to third order using different scales. These filters are commonly available, separable, and have an optimal uncertainty. The response of this new operator can be used like the monogenic signal to estimate the local amplitude, the local phase, and the local orientation of an image, but it also allows to measure the coherence of image regions as in the case of the structure tensor
@techreport{diva2:288639,
author = {Felsberg, Michael},
title = {{The GET Operator}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2004},
type = {Other academic},
number = {LiTH-ISY-R, 2633},
address = {Sweden},
}
This report describes a view-based method for object recognition and estimation of object pose in still images. The method is based on feature vector matching and clustering. A set of interest points, in this case star-patterns, is detected and combined into pairs. A pair of patches, centered around each point in the pair, is extracted from a local orientation image. The patch orientation and size depends on the relative positions of the points, which make them invariant to translation, rotation, and scale. Each pair of patches constitutes a feature vector. The method is demonstrated on a number of real images.
@techreport{diva2:257174,
author = {Johansson, Björn and Moe, Anders},
title = {{Patch-Duplets for Object Recognition and Pose Estimation}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2003},
type = {Other academic},
number = {LiTH-ISY-R, 2553},
address = {Sweden},
}
A Transaction Reproduction System (ARTSY) is a distributed system that enables secure transactions and reproductions of digital content over an insecure network. A field of application is reproductions of visual arts: A print workshop could for example use ARTSY to print a digital image that is located at a remote museum. The purpose of this master thesis project was to propose a specification for ARTSY and to show that it is technically feasible to implement it.
An analysis of the security threats in the ARTSY context was performed and a security model was developed. The security model was approved by a leading computer security expert. The security mechanisms that were chosen for the model were: Asymmetric cryptology, digital signatures, symmetric cryptology and a public key registry. A Software Requirements Specification was developed. It contains extra directives for image reproduction systems but it is possible to use it for an arbitrary type of reproduction system. A prototype of ARTSY was implemented using the Java programming language. The prototype uses XML to manage information and Java RMI to enable remote communication between its components. It was built as a platform independent system and it has been tested and proven to be operational on the Sun Solaris platform as well as the Win32 platform.
@mastersthesis{diva2:18935,
author = {Björk, Mårten and Max, Sofia},
title = {{ARTSY:
A Reproduction Transaction System}},
school = {Linköping University},
type = {{}},
year = {2003},
address = {Sweden},
}
This master's thesis develops an algorithm for tracking of cars robust enough to handle turning cars. It is implemented in the image processing environment Image Processing Application Programming Interface (IPAPI) for use with the WITAS project.
Firstly, algorithms, comparable with one currently used in the WITAS-project, are studied. The focus is on how rotation, that originates from the turning of the cars, affects tracking performance. The algorithms studied all perform an exhaustive search over a region, close to the last known position of the object being tracked, to find a match. After this, an iterative algorithm, based on the idea that a car can only rotate, translate and change scale, is introduced. The algorithm estimates the parameters describing this rotation, translation, and change of scale, iteratively. The iterative process needs a initial parameter estimate that is accurate enough for the algorithm to converge. The developed algorithm is based on an earlier publication on the subject, however the mathematical description, and deduction, of it is taken one step further than in this publication.
The iterative algorithm used performs well under the assumption that the data used fulfills some basic criteria. These demands comprises: placement of camera, template size as well as how the parameters may vary between two observations. The iterative algorithm is also potentially faster than exhaustive search methods, because few iterations are needed when the parameters change slowly. Better initial parameters should improve stability and speed of convergation. Other suggestions that could give better performance is discussed, e.g., methods to better extract the target from the surroundings.
@mastersthesis{diva2:19030,
author = {Öberg, Per},
title = {{Tracking by Image Processing in a Real Time System}},
school = {Linköping University},
type = {{}},
year = {2003},
address = {Sweden},
}
A complete prototype system for measuring vehicle lateral position has been set up during the course of this master’s thesis project. In the development of the software, images acquired from a back-ward looking video camera mounted on the roof of the vehicle were used.
The problem of using computer vision to measure lateral position can be divided into road marking detection and lateral position extraction. Since the strongest characteristic of a road marking image are the edges of the road markings, the road marking detection step is based on edge detection. For the detection of the straight edge lines a Hough based method was chosen. Due to peak spreading in Hough space, the difficulty of detecting the correct peak in Hough space was encountered. A flexible Hough peak detection algorithm was developed based on an adaptive window that takes peak spreading into account. The road marking candidate found by the system is verified before the lateral position data is generated. A good performance of the road marking tracking algorithm was obtained by exploiting temporal correlation to update a search region within the image. A camera calibration made the extraction of real-world lateral position information and yaw angle data possible.
This vision-based method proved to be very accurate. The standard deviation of the error in the position detection is 0.012 m within an operating range of ±2 m from the image centre. During continuous road markings the rate of valid data is on average 96 %, whereas it drops to around 56 % for sections with intermittent road markings. The system performs well during lane change manoeuvres, which is an indication that the system tracks the correct road marking. This prototype system is a robust and automatic measurement system, which will benefit VTI in its many driving behaviour research programs.
@mastersthesis{diva2:19311,
author = {Ågren, Elisabeth},
title = {{Lateral Position Detection Using a Vehicle-Mounted Camera}},
school = {Linköping University},
type = {{}},
year = {2003},
address = {Sweden},
}
This thesis describes and evaluates a number of algorithms for reducing fixed pattern noise in image sequences. Fixed pattern noise is the dominantnoise component for many infrared detector systems, perceived as a superimposed pattern that is approximately constant for all image frames.
Primarily, methods based on estimation of the movement between individual image frames are studied. Using scene-matching techniques, global motion between frames can be successfully registered with sub-pixel accuracy. This allows each scene pixel to be traced along a path of individual detector elements. Assuming a static scene, differences in pixel intensities are caused by fixed pattern noise that can be estimated and removed.
The algorithms have been tested by using real image data from existing infrared imaging systems with good results. The tests include both a two-dimensional focal plane array detector and a linear scanning one-dimensional detector, in different scene conditions.
@mastersthesis{diva2:19078,
author = {Torle, Petter},
title = {{Scene-based correction of image sensor deficiencies}},
school = {Linköping University},
type = {{}},
year = {2003},
address = {Sweden},
}
By analyzing ISAR images, the characteristics of military platforms with respect to radar visibility can be evaluated. The method, which is based on the Discrete-Time Fourier Transform (DTFT), that is currently used to calculate the ISAR images requires large computations efforts. This thesis investigates the possibility to replace the DTFT with the Fast Fourier Transform (FFT). Such a replacement is not trivial since the DTFT is able to compute a contribution anywhere along the spatial axis while the FFT delivers output data at fixed sampling, which requires subsequent interpolation. The interpolation leads to a difference in the ISAR image compared to the ISAR image obtained by DTFT. On the other hand, the FFT is much faster. In this quality-and-time trade-off, the objective is to minimize the error while keeping high computational efficiency.
The FFT-approach is evaluated by studying execution time and image error when generating ISAR images for an aircraft model in a controlled environment. The FFT method shows good results. The execution speed is increased significantly without any visible differences in the ISAR images. The speed-up- factor depends on different parameters: image size, degree of zero-padding when calculating the FFT and the number of frequencies in the input data.
@mastersthesis{diva2:19402,
author = {Dahlbäck, Niklas},
title = {{Implementation of a fast method for reconstruction of ISAR images}},
school = {Linköping University},
type = {{}},
year = {2003},
address = {Sweden},
}
In this paper we address the problem of appropriately representing the intrinsic dimensionality of image neighborhoods. This dimensionality describes the degrees of freedom of a local image patch and it gives rise to some of the most often applied corner and edge detectors. It is common to categorize the intrinsic dimensionality (iD) to three distinct cases: i0D, i1D, and i2D. Real images however contain combinations of all three dimensionalities which has to be taken into account by a continuous representation. Based on considerations of the structure tensor, we derive a cone-shaped iD-space which leads to a probabilistic point of view to the estimation of intrinsic dimensionality.
@techreport{diva2:288326,
author = {Felsberg, Michael and Kruger, Norbert},
title = {{A Probabilistic Definition of Intrinsic Dimensionality for Images}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2003},
type = {Other academic},
number = {LiTH-ISY-R, 2520},
address = {Sweden},
}
The use of linear filters, i.e. convolutions, inevitably introduces dependencies in the uncertainties of the filter outputs. Such non-vanishing covariances appear both between different positions and between the responses from different filters (even at the same position). This report describes how these covariances between the output of linear filters can be computed. We then examine the induced covariance matrices for some typical 1D and 2D filters. Finally the total noise reduction properties are examined.
@techreport{diva2:288311,
author = {Spies, Hagen},
title = {{Covariances of Linear Filter Outputs in Computer Vision}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2003},
type = {Other academic},
number = {LiTH-ISY-R, 2504},
address = {Sweden},
}
Image intensity gradients can be encoded in a 2-dimensional channel representation. This report discusses the computation of such gradient channel matrices and what information can be extracted from it. In particular does this representation allow to distinguish multiple orientations and magnitudes in a single representation. It is shown that this can be used to recover orientation very accurately. This holds in particular near orientation discontinuities where classical orientation estimation fails.
@techreport{diva2:288613,
author = {Spies, Hagen},
title = {{Gradient Channel Matrices for Orientation Estimation}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2003},
type = {Other academic},
number = {LiTH-ISY-R, 2540},
address = {Sweden},
}
The channel representation is a simple yet powerful representation of scalars and vectors. It is especially suited for representation of several scalars at the same time without mixing them up.
This report is partly intended to serve as a simple illustration of the channel representation. The report shows how the channels can be used to represent multiple orientations in two dimensions. The idea is to make a channel representation of the local orientation angle computed from the image gradient. The representation basically becomes an orientation histogram with overlapping bins.
The channel histogram is compared with the orientation tensor, which is another representation of orientation. The performance comparable to tensors in the simple signal case, but decreases slightly for increasing number of channels. The channel histogram outperforms the tensors on non-simple signals.
@techreport{diva2:257179,
author = {Johansson, Björn},
title = {{Representing Multiple Orientations in 2D with Orientation Channel Histograms}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2002},
type = {Other academic},
number = {LiTH-ISY-R, 2475},
address = {Sweden},
}
Turning solar collectors, heliostats, is certainly not a new idea but has been explored for at least two decades. Projects on this subject have resulted in more or less realistic constructions, in a commercial point of view. Far too often the technical goals have had higher priority than the economical, which has resulted in a few constructions having the ability to compete with conventional, fix solar collectors. The economical issues have been given high priority in this project, without decreasing the demands on reliability. The system has been given the following mechanical and electronical properties: One-axis movement, fix heat carrying fluid system, microcomputer controlled movement and automatic protection from overheating. According to the development in digital technology with lower prices on advanced semiconductors as a consequence, the conclusion is that the prerequisites of this concept will be even better in the future. The result of this thesis is a heliostat function that increases the energy gain by up to 40%, compared to a field of MaReCo collectors without this function. Though, the cost only increases by 13%.
@mastersthesis{diva2:17448,
author = {Svensson, Mikael},
title = {{Utveckling av styrning till solföljande MaReCo-hybrid i Hammarby Sjöstad}},
school = {Linköping University},
type = {{}},
year = {2002},
address = {Sweden},
}
Face detection and pose estimation are two widely studied problems - mainly because of their use as subcomponents in important applications, e.g. face recognition. In this thesis I investigate a new approach to the general problem of object detection and pose estimation and apply it to faces. Face detection can be considered a special case of this general problem, but is complicated by the fact that faces are non-rigid objects. The basis of the new approach is the use of scale and orientation invariant feature structures - feature triplets - extracted from the image, as well as a biologically inspired associative structure which maps from feature triplets to desired responses (position, pose, etc.). The feature triplets are constructed from curvature features in the image and coded in a way to represent distances between major facial features (eyes, nose and mouth). The final system has been evaluated on different sets of face images.
@mastersthesis{diva2:17324,
author = {Isaksson, Marcus},
title = {{Face Detection and Pose Estimation using Triplet Invariants}},
school = {Linköping University},
type = {{}},
year = {2002},
address = {Sweden},
}
The purpose of this master's thesis is to evaluate whether it is feasible to use the panchromatic band of Landsat 7 in order to improve the spatial resolution of colour images. The images are to be used as texture in visual databases for flight simulators and for this reason it is important that the fusion preserves natural colours.
A number of methods for fusing panchromatic and multispectral images are discussed. Four of them are implemented and evaluated. The result is that standard methods such as HSI substitution are not suitable for this purpose since they do not preserve natural colours. However, if only high frequencies of the panchromatic image are used the resolution can be improved without noticeable colour distortion.
@mastersthesis{diva2:17912,
author = {Molin, Sara},
title = {{Förbättring av upplösningen i Landsat 7-bilder med hjälp av bildfusion}},
school = {Linköping University},
type = {{}},
year = {2002},
address = {Sweden},
}
This thesis presents a 3D semi-automatic segmentation technique for extracting the lumen surface of the Carotid arteries including the bifurcation from 3D and 4D ultrasound examinations.
Ultrasound images are inherently noisy. Therefore, to aid the inspection of the acquired data an adaptive edge preserving filtering technique is used to reduce the general high noise level. The segmentation process starts with edge detection with a recursive and separable 3D Monga-Deriche-Canny operator. To reduce the computation time needed for the segmentation process, a seeded region growing technique is used to make an initial model of the artery. The final segmentation is based on the inflatable balloon model, which deforms the initial model to fit the ultrasound data. The balloon model is implemented with the finite element method.
The segmentation technique produces 3D models that are intended as pre-planning tools for surgeons. The results from a healthy person are satisfactory and the results from a patient with stenosis seem rather promising. A novel 4D model of wall motion of the Carotid vessels has also been obtained. From this model, 3D compliance measures can easily be obtained.
@mastersthesis{diva2:17818,
author = {Mattsson, Per and Eriksson, Andreas},
title = {{Segmentation of Carotid Arteries from 3D and 4D Ultrasound Images}},
school = {Linköping University},
type = {{}},
year = {2002},
address = {Sweden},
}
This Master’s thesis studies the possibility of using image processing as a tool to facilitate vine management, in particular shoot counting and assessment of the grapevine canopy. Both are areas where manual inspection is done today. The thesis presents methods of capturing images and segmenting different parts of a vine. It also presents and evaluates different approaches on how shoot counting can be done. Within canopy assessment, the emphasis is on methods to estimate canopy density. Other possible assessment areas are also discussed, such as canopy colour and measurement of canopy gaps and fruit exposure. An example of a vine assessment system is given.
@mastersthesis{diva2:18665,
author = {Bjurström, Håkan and Svensson, Jon},
title = {{Assessment of Grapevine Vigour Using Image Processing}},
school = {Linköping University},
type = {{}},
year = {2002},
address = {Sweden},
}
The purpose of this thesis is to investigate the applicability of a certain model based classification algorithm. The algorithm is centered around a flexible wireframe prototype that can instantiate a number of different vehicle classes such as a hatchback, pickup or a bus to mention a few. The parameters of the model are fitted using Newton minimization of errors between model line segments and observed line segments. Furthermore a number of methods for object detection based on motion are described and evaluated. Results from both experimental and real world data is presented.
@mastersthesis{diva2:18561,
author = {Böckert, Andreas},
title = {{Vehicle detection and classification in video sequences}},
school = {Linköping University},
type = {{}},
year = {2002},
address = {Sweden},
}
This is a thesis written for a master's degree at the Computer Vision Laboratory, University of Linköping. An abstract outer product is defined and used as a bridge to reach 2:nd and 4:th order tensors. Some applications of these in geometric analysis of range data are discussed and illustrated. In idealized setups, simple geometric objects, like spheres or polygons, are successfully detected. Finally, the generalization to n:th order tensors for storing and analysing geometric information is discussed.
@mastersthesis{diva2:18558,
author = {Eidehall, Andreas},
title = {{Tensor representation of 3D structures}},
school = {Linköping University},
type = {{}},
year = {2002},
address = {Sweden},
}
A new architecture for learning systems has been developed. A number of particular design features in combination result in a very high performance and excellent robustness. The architecture uses a monopolar channel information representation. The channel representation implies a partially overlapping mapping of signals into a higher-dimensional space, such that a flexible but continuous restructuring mapping can be made. The high-dimensional mapping introduces locality in the information representation, which is directly available in wavelets or filter outputs. Single level maps using this representation can produce closed decision regions, thereby eliminating the need for costly back-propagation. The monopolar property implies that data only utilizes one polarity, say positive values, in addition to zero, allowing zero to represent no information. This leads to an efficient sparse representation.
The processing mode of the architecture is association where the mapping of feature inputs onto desired state outputs is learned from a representative training set. The sparse monopolar representation together with locality, using individual learning rates, allows a fast optimization, as the system exhibits linear complexity. Mapping into multiple channels gives a strategy to use confidence statements in data, leading to a low sensitivity to noise in features. The result is an architecture allowing systems with a complexity of some hundred thousand features described by some hundred thousand samples to be trained in typically less than an hour. Experiments that demonstrate functionality and noise immunity are presented. The architecture has been applied to the design of hyper complex operations for view centered object recognition in robot vision.
@techreport{diva2:257178,
author = {Granlund, Gösta and Forss\'{e}n, Per-Erik and Johansson, Björn},
title = {{HiperLearn:
A High Performance Learning Architecture}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2002},
type = {Other academic},
number = {LiTH-ISY-R, 2409},
address = {Sweden},
}
This report describes how the choice of kernel affects a non-parametric density estimation. Methods for accurate localisation of peaks in the estimated densities are developed for Gaussian and cos2 kernels. The accuracy and robustness of the peak localisation methods are studied with respect to noise, number of samples, and interference between peaks. Although the peak localisation is formulated in the framework of non-parametric density estimation, the results are also applicable to associative learning with localised responses.
@techreport{diva2:288272,
author = {Forssen, Per-Erik},
title = {{Observations Concerning Reconstructions with Local Support}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2002},
type = {Other academic},
number = {LiTH-ISY-R, 2425},
address = {Sweden},
}
In this report we describe how an RGB component colour image may be expanded into a set of channel images, and how the original colour image may be reconstructed from these. We also demonstrate the effect of averaging on the channel images and how it differs from conventional averaging. Finally we demonstrate how boundaries can be detected as a change in the confidence of colour state.
@techreport{diva2:288277,
author = {Forssen, Per-Erik and Granlund, Gösta and Wiklund, Johan},
title = {{Channel Representation of Colour Images}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2002},
type = {Other academic},
number = {LiTH-ISY-R, 2418},
address = {Sweden},
}
In this paper we address the topics of scale-space and phase-based signal processing in a common framework. The involved linear scale-space is no longer based on the Gaussian kernel but on the Poisson kernel. The resulting scale-space representation is directly related to the monogenic signal, a 2D generalization of the analytic signal. Hence, the local phase arises as a natural concept in this framework which results in several advanced relationships that can be used in image processing.
@techreport{diva2:288275,
author = {Felsberg, Michael and Sommer, Gerald},
title = {{The Poisson Scale-Space: A Unified Approach to Phase-Based Image Processing in Scale-Space}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2002},
type = {Other academic},
number = {LiTH-ISY-R, 2453},
address = {Sweden},
}
In this paper we consider the channel representation based upon quadratic B-splines from a statistical point of view. Interpreting the channel representation as a kernel method for estimating probability density functions, we establish a channel algebra which allows to perform basic algebraic operations on measurements directly in the channel representation. Furthermore, as a central point, we identify the smoothing of channel values with a robust estimator, or equivalently, a diffusion process.
@techreport{diva2:288621,
author = {Felsberg, Michael and Scharr, Hanno and Forssen, Per-Erik},
title = {{The B-Spline Channel Representation: Channel Algebra and Channel Based Diffusion Filtering}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2002},
type = {Other academic},
number = {LiTH-ISY-R, 2461},
address = {Sweden},
}
The structure tensor has been used mainly for representation of local orientation in spaces of arbitrary dimensions, where the eigenvectors represent the orientation and the corresponding eigenvalues indicate the type of structure which is represented. Apart from being local, the structure tensor may be referred to as "object centered" since it describes the corresponding structure relative to a local reference system. This paper proposes that the basic properties of the structure tensor can be extended to a tensor defined in a projective space rather than in a local Euclidean space. The result, the "projective tensor", is symmetric in the same way as the structure tensor, and also uses the eigensystem to carry the relevant information. However, instead of orientation, the projective tensor represents geometrical primitives such as points, lines, and planes (depending on dimensionality of the underlying space). Furthermore, this representation has the useful property of mapping the operation of forming the affine hull of points and lines to tensor summation, e.g., the sum of two projective tensors which represent two points amounts to a projective tensor that represent the line which passes through the two points, etc. The projective tensor may be referred to as "view centered" since each tensor, which still may be defined on a local scale, represents a geometric primitive relative to a global image based reference system. This implies that two such tensors may be combined, e.g., using summation, in a meaningful way over large regions.
@techreport{diva2:288635,
author = {Nordberg, Klas},
title = {{The structure tensor in projective spaces}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2002},
type = {Other academic},
number = {LiTH-ISY-R, 2424},
address = {Sweden},
}
Next generation helical cone-beam CT will feature pitches around 80 mm. It is predicted that reconstruction algorithms to be used in these machines with still rather modest cone angles may not necessarily be exact, but rather have an emphasis on simplicity and speed. The PImethods are a family of non-exact algorithms, all of which are based on complete data capture with a detector collimated to the Tam-window followed by rebinning to obliquely parallel ray geometry. The non-exactness is identified as inconsistency in the space invariant one-dimensional ramp-filtering step. It is shown that this inconsistency can be reduced resulting in significant improvement in image quality and increased tolerance for higher pitch and cone angle. A short theoretical background for the PI-methods is given but the algorithms themselves are not given in any detail. A set of experiments on mathematical phantoms illustrate (among other things) how the amount of artefacts grow with increasing cone angles.
@techreport{diva2:288610,
author = {Danielsson, Per-Erik and Seger, Maria Magnusson and Turbell, Henrik},
title = {{The PI-methods for Helical Cone-Beam Tomography}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2002},
type = {Other academic},
number = {LiTH-ISY-R, 2428},
address = {Sweden},
}
This Master's Thesis discusses the different trade-offs a programmer needs to consider when constructing image processing systems. First, an overview of the different alternatives available is given followed by a focus on systems based on general hardware. General, in this case, means mass-market with a low price-performance-ratio. The software environment is focused on UNIX, sometimes restricted to Linux, together with C, C++ and ANSI-standardized APIs.
@mastersthesis{diva2:303037,
author = {Nordlöv, Per},
title = {{Implementation Aspects of Image Processing}},
school = {Linköping University},
type = {{LiTH-ISY-Ex No. 3088}},
year = {2001},
address = {Sweden},
}
The aim of this master thesis is to classify the tree class from an image of a leaf with a computer vision classiffication system. We compare different descriptors that will describe the leaves different features. We will also look at different classiffication models and combine them with the descriptors to build a system hat could classify the different tree classes.
@mastersthesis{diva2:303038,
author = {Söderkvist, Oskar},
title = {{Computer Vision Classification of Leaves from Swedish Trees}},
school = {Linköping University},
type = {{LiTH-ISY-Ex No. 3132}},
year = {2001},
address = {Sweden},
}
This report is a complement to the working document [1], where a sparse associative network is described. This report shows that the net learning rule in [1] can be viewed as the solution to a weighted least squares problem. This means that we can apply the theory framework of least squares problems, and compare the net rule with some other iterative algorithms that solve the same problem. The learning rule is compared with the gradient search algorithm and the RPROP algorithm in a simple synthetic experiment. The gradient rule has the slowest convergence while the associative and the RPROP rules have similar convergence. The associative learning rule has a smaller initial error than the RPROP rule though.
It is also shown in the same experiment that we get a faster convergence if we have a monopolar constraint on the solution, i.e. if the solution is constrained to be non-negative. The least squares error is a bit higher but the norm of the solution is smaller, which gives a smaller interpolation error.
The report also discusses a generalization of the least squares model, which include other known function approximation models.
[1] G Granlund. Paralell Learning in Artificial Vision Systems: Working Document. Dept. EE, Linköping University, 2000
@techreport{diva2:257177,
author = {Johansson, Björn},
title = {{On Sparse Associative Networks:
A Least Squares Formulation}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2001},
type = {Other academic},
number = {LiTH-ISY-R, 2368},
address = {Sweden},
}
This report describes an idea based on the work in [1], where an algorithm for learning automatic representation of visual operators is presented. The algorithm in [1] uses canonical correlation to find a suitable subspace in which the signal is invariant to some desired properties. This report presents a related approach specially designed for classification problems. The goal is to find a subspace in which the signal is invariant within each class, and at the same time compute the class representation in that subspace. This algorithm is closely related to the one in [1], but less computationally demanding, and it is shown that the two algorithms are equivalent if we have equal number of training samples for each class. Even though the new algorithm is designed for pure classification problems it can still be used to learn visual operators as will be shown in the experiment section. [1] M. Borga. Learning Multidimensional Signal Processing. PhD thesis, Linköping University, Sweden, SE-581 83 Linköping, 1998. Dissertation No 531, ISBN 91-7219-202-X.
@techreport{diva2:288281,
author = {Johansson, Björn},
title = {{On Classification: Simultaneously Reducing Dimensionality and Finding Automatic Representation using Canonical Correlation}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2001},
type = {Other academic},
number = {LiTH-ISY-R, 2375},
address = {Sweden},
}
This report starts with an introduction to the concepts active perception, reactive systems, and state dependency, and to fundamental aspects of perception such as the perceptual aliasing problem, and the number-of-percepts vs. number-of-states trade-off. We then introduce finite state machines, and extend them to accommodate active perception. Finally we demonstrate a state-transition mechanism that is applicable to autonomous navigation.
@techreport{diva2:288318,
author = {Forssen, Per-Erik},
title = {{Autonomous Navigation using Active Perception}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2001},
type = {Other academic},
number = {LiTH-ISY-R, 2395},
address = {Sweden},
}
This report describes a novel window matching technique. We perform window matching by transforming image data into sparse features, and apply a computationally efficient matching technique in the sparse feature space. The gain in execution time for the matching is roughly 10 times compared to full window matching techniques such as SSD, but the total execution time for the matching also involves an edge filtering step. Since the edge responses may be used for matching of several regions, the proposed matching technique is increasingly advantageous when the number of regions to keep track of increases, and when the size of the search window increases. The technique is used in a real-time ego-motion estimation system in the WITAS project. Ego-motion is estimated by tracking of a set of structure points, i.e. regions that do not have the aperture problem. Comparisons with SSD, with regard to speed and accuracy are made.
@techreport{diva2:288544,
author = {Forssen, Per-Erik},
title = {{Window Matching using Sparse Templates}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2001},
type = {Other academic},
number = {LiTH-ISY-R, 2392},
address = {Sweden},
}
This report defines the rank complement of a diagonalizable matrix (i.e. a matrix which can be brought to a diagonal form by means of a change of basis) as the interchange of the range and the null space. Given a diagonalizable matrix A there is in general no unique matrix Ac which has a range equal to the null space of A and a null space equal to the range of A, only matrices of full rank have a unique rank complement; the zero matrix. Consequently, the rank complement operation is not a distinct operation, but rather a characterization of any operation which makes an interchange of the range and the null space. One particular rank complement operation is introduced here, which eventually leads to an implementation of rank complement operations in terms of polynomials in A. The main result is that for each possible rank r of A there is a polynomial in A which evaluates to a matrix Ac which is a rank complement of A. The report provides explicit expressions for matrix polynomials which compute a rank complement of a symmetric matrix. These results are then generalized to the case of diagonalizable matrices. Finally, a Matlab function is described that implements a rank complement operation based on the results derived.
@techreport{diva2:288596,
author = {Nordberg, Klas and Farnebäck, Gunnar},
title = {{Rank complement of diagonalizable matrices using polynomial functions}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2001},
type = {Other academic},
number = {LiTH-ISY-R, 2369},
address = {Sweden},
}
The purpose of this master’s thesis was to study the possibility to use computer vision methods to detect and classify objects in the front passenger seat in a car. This work presents different approaches to solve this problem and evaluates the usefulness of each technique. The classification information should later be used to modulate the speed and the force of the airbag, to be able to provide each occupant with optimal protection and safety.
This work shows that computer vision has a great potential in order to provide data, which may be used to perform reliable occupant classification. Future choice of method to use depends on many factors, for example costs and requirements on the system from laws and car manufacturers. Further, evaluation and tests of the methods in this thesis, other methods, the ABE approach and post-processing of the results should also be made before a reliable classification algorithm may be written.
@mastersthesis{diva2:303034,
author = {Klomark, Marcus},
title = {{Occupant Detection using Computer Vision}},
school = {Linköping University},
type = {{LiTH-ISY-Ex No. 3026}},
year = {2000},
address = {Sweden},
}
This survey contains links and facts to a number of projects on content based search in image databases around the world today. The main focus is on what kind of image features is used but also the user interface and the users possibility to interact with the system (i.e. what 'visual language' is used).
@techreport{diva2:257176,
author = {Johansson, Björn},
title = {{A Survey on:
Contents Based Search in Image Databases}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2000},
type = {Other academic},
number = {LiTH-ISY-R, 2215},
address = {Sweden},
}
Some image patterns, e.g. circles, hyperbolic curves, star patterns etc., can be described in a compact way using local orientation. The features mentioned above is part of a family of patterns called rotational symmetries. This theory can be used to detect image patterns from the local orientation in double angle representation of an images. Some of the rotational symmetries are described originally from the local orientation without being designed to detect a certain feature. The question is then: given a description in double angle representation, what kind of image features does this description correspond to? This 'inverse', or backprojection, is not unambiguous - many patterns has the same local orientation description. This report answers this question for the case of rotational symmetries and also for some other descriptions.
@techreport{diva2:288305,
author = {Johansson, Björn},
title = {{Backprojection of Some Image Symmetries Based on a Local Orientation Description}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2000},
type = {Other academic},
number = {LiTH-ISY-R, 2311},
address = {Sweden},
}
@techreport{diva2:288331,
author = {Granlund, Gösta H.},
title = {{Context Controllable Linkage Models}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2000},
type = {Other academic},
number = {LiTH-ISY-R, 2238},
address = {Sweden},
}
@techreport{diva2:288280,
author = {Granlund, Gösta H.},
title = {{Learning Through Response-Driven Association}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2000},
type = {Other academic},
number = {LiTH-ISY-R, 2237},
address = {Sweden},
}
@techreport{diva2:288276,
author = {Granlund, Gösta H.},
title = {{Low Level Image Interpretation Using Associative Mapping}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2000},
type = {Other academic},
number = {LiTH-ISY-R, 2239},
address = {Sweden},
}
@techreport{diva2:288317,
author = {Granlund, Gösta},
title = {{The Dichotomy of Strategies for Spatial-Cognitive Information Processing}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2000},
type = {Other academic},
number = {LiTH-ISY-R, 2241},
address = {Sweden},
}
This report describes a technique to detect curvature. The technique uses local polynomial fitting on a local orientation description of an image. The idea is based on the theory of rotational symmetries which describes curvature, circles, star-patterns etc. The local polynomial fitting is shown to be equivalent to calculating partial derivatives on a lowpass version of the local orientation. The new method can therefore be very efficiently implemented both in the singlescale case and in the multiscale case.
@techreport{diva2:288546,
author = {Johansson, Björn},
title = {{Curvature Detection using Polynomial Fitting on Local Orientation}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2000},
type = {Other academic},
number = {LiTH-ISY-R, 2312},
address = {Sweden},
}
@techreport{diva2:288548,
author = {Granlund, Gösta H.},
title = {{Channel Representation of Information}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2000},
type = {Other academic},
number = {LiTH-ISY-R, 2236},
address = {Sweden},
}
This report describes the principles of an algorithm developed within the WITAS project. The goal of the WITAS project is to build an autonomous helicopter that can navigate autonomously, using differential GPS, GIS-data of the underlying terrain (elevation models and digital orthophotographs) and a video camera. Using differential GPS and other non-visual sensory equipment, the system is able to obtain crude estimates of its position and heading direction. These estimates can be refined by matching of camera-images and the on-board GIS-data. This refinement process, however is rather time consuming, and will thus only be made every once in a while. For real-time refinement of camera position and heading, the system will iteratively update the estimates using frame to frame correspondence only. In each frame a sparse set of image displacement estimates is calculated, and from these the perspective in the current image can be found. Using the calculated perspective and knowledge of the camera parameters, new values of camera position and heading can be obtained. The resultant camera position and heading can exhibit a slow drift if the original alignment was not perfect, and thus a corrective alignment with GIS-data should be performed once every minute or so.
@techreport{diva2:288566,
author = {Forssen, Per-Erik},
title = {{Updating Camera Location and Heading using a Sparse Displacement Field}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2000},
type = {Other academic},
number = {LiTH-ISY-R, 2318},
address = {Sweden},
}
This report describes an experimental still image coder that grew out of a project in the graduate course ``Advanced Video Coding'' in spring 2000. The project has investigated the idea to use local orientation histograms in fractal coding. Instead of performing a correlation-like grey-level matching of image regions, the block search is made by matching feature histograms of the block contents. The feature investigated in this report is local orientation, but in principle other features could be used as well. In its current state the coder does not outperform state of the art still image coders, but the block-search strategy seems promising, and will probably prove useful in several other applications.
@techreport{diva2:288616,
author = {Forssen, Per-Erik and Johansson, Björn},
title = {{Fractal Coding by Means of Local Feature Histograms}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2000},
type = {Other academic},
number = {LiTH-ISY-R, 2295},
address = {Sweden},
}
@techreport{diva2:288619,
author = {Granlund, Gösta H.},
title = {{The Use of Dynamics to Establish Knowledge of Invariant Structure}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2000},
type = {Other academic},
number = {LiTH-ISY-R, 2240},
address = {Sweden},
}
One important problem in image analysis is the localization of a template in a larger image. Applications where the solution of this problem can be used include: tracking, optical flow, and stereo vision. The matching method studied here solve this problem by defining a new similarity measurement between a template and an image neighborhood. This similarity is computed for all possible integer positions of the template within the image. The position for which we get the highest similarity is considered to be the match. The similarity is not necessarily computed using the original pixel values directly, but can of course be derived from higher level image features.
The similarity measurement can be computed in differentways and the simplest approach are correlation-type algorithms. Aschwanden and Guggenb¨uhl [2] have done a comparison between such algorithms. One of best and simplest algorithms they tested is normalized cross-correlation (NCC). Therefore this algorithm has been used to compare with the PAIRS algorithm that is developed by the author and described in this text. It uses a completely different similarity measurement based on sets of bits extracted from the template and the image.
This work is done withinWITAS which is a project dealing with UAV’s (unmanned aerial vehicles). Two specific applications of the developed template matching algorithm have been studied.
- One application is tracking of cars in video sequences from a helicopter.
- The other one is computing optical flow in such video sequences in order to detect moving objects, especially vehicles on roads.
The video from the helicopter is in color (RGB) and this fact is used in the presented tracking algorithm. The PAIRS algorithm have been applied to these two applications and the results are reported.
A part of this text will concern a general approach to template matching called Maximum Entropy Matching (MEM) that is developed here. The main idea of MEM is that the more data we compare on a computer the longer it takes and therefore the data that we compare should have maximum average information, that is, maximum entropy. We will see that this approach can be useful to create template matching algorithms which are in the order of 10 times faster then correlation (NCC) without decreasing the performance.
@techreport{diva2:288327,
author = {Lundberg, Frans},
title = {{Maximum Entropy Matching: An Approach to Fast Template Matching}},
institution = {Linköping University, Department of Electrical Engineering},
year = {2000},
type = {Other academic},
number = {LiTH-ISY-R, 2313},
address = {Sweden},
}
This thesis investigates the possibilities of using GIS (Geographic Information System) data with an airborne autonomous vehicle developed in the WITAS project. Available for the thesis are high resolution (0.16 meter sample interval) aerial photographs over Stockholm, and vector data in a common GIS format containing all roads in the Stockholm area.
A method for removing cars from aerial photographs is presented, using the filtering method normalized convolution, originally developed for filtering uncertain and incomplete data. By setting the certainty to zero over the cars, this data is disregarded in the filtering process, resulting in an image without cars. This method is further improved by choosing an anisotropic applicability function, resulting in a filtering that preserves structures oriented in certain directions.
The available vector data is investigated with regard to its use in a simulator for vehicle movement, and is found to be missing much of the essential information needed in such a simulator. A new data format better suited to these requirements is created, using the extensible markup language (XML) which generates a humanreadable data format and can use existing parsers to make the implementation simpler. The result is a somewhat complex, but highly general data format that can accurately express almost any type of road and intersection. Cars can follow arbitrary paths in the road database and move with a smooth motion suitable for use as input to image processing equipment. The simulator does not allow any dynamic behaviour such as changing speeds, starting or stopping, or interaction between cars, takeovers or intelligent behavior in intersections.
In the airborne vehicle, a mapping from pixels in a camera image (like the ones output from the simulator) to locations in the road database is needed. This is an inverse mapping with respect to visualizing as described above. This gives important information to a car tracking system regarding the probable movement of cars and also making it possible to determine if a car breaks traffic regulations. A mapping of this kind is created using a simplified form of ray tracing known as ray casting, together with space partitioning methods used to vastly improve efficiency.
All above mentioned tasks are implemented using C++ and object oriented methods, giving maintainable and extendable code suiting a quickly changing research area. The interface to the simulator is designed to be compatible to the existing simulation software used in the WITAS project. Visualization is done through the OpenGL graphics library, providing realistic effects such as lighting and shading.
@mastersthesis{diva2:303032,
author = {Langemark, Stefan},
title = {{GIS in a simulator environment and efficient inverse mapping of roads}},
school = {Linköping University},
type = {{LiTH-ISY-Ex No. 2090}},
year = {1999},
address = {Sweden},
}
We explore the use of colour for interpretation of unstructured off-road scenes. The aim is to extract driveable areas for use in an autonomous off-road vehicle in real-time. The terrain is an unstructured tropical jungle area with vegetation, water and red mud roads.
We show that hue is both robust to changing lighting conditions and an important feature for correctly interpreting this type of scene. We believe that our method also can be deployed in other types of terrain, with minor changes, as long as the terrain is coloured and well saturated.
Only 2D information is processed at the moment, but we aim at extending the method to also treat 3D information, by the use of stereo vision or motion.
@mastersthesis{diva2:303033,
author = {Bergquist, Urban},
title = {{Colour Vision and Hue for Autonomous Vehicle Guidance}},
school = {Linköping University},
type = {{LiTH-ISY-Ex No. 2091}},
year = {1999},
address = {Sweden},
}
Erfarenheter från tidigare försök på Korsnäs AB visar att det är väldigt svårt att på matematisk väg förutsäga vad som händer under framställningen av pappersmassa i en kontinuerlig kokare.
Målet med detta examensarbete var att undersöka möjligheterna att med hjälp av neurala nätverk underlätta regleringen genom att prediktera ligninhalten hos pappersmassan tre och en halv timme innan den aktuella flisen är färdigkokt.
På grund av den, med produktionstakten, varierande tidsförskjutningen mellan olika givarsignaler löstes problemet med en enkel, lokal modell per produktionstakt. Alla ingående modeller minimeras med avseende på både antalet noder i det gömda lagret och antalet ingångar, vilket gav en slutlig lösning med fyra enkla modeller uppbyggda av framåtkopplade neurala nätverk, var och ett med ett gömt lager innehållandes tre noder.
Prediktionen av ligninhalten påvisade till slut goda egenskaper, med avseende på hur väl prediktionen följer den verkliga kappatalsanalysatorn.
@mastersthesis{diva2:303022,
author = {Stewing, Robert},
title = {{Parameterprediktering med multipla sammansatta lokala neuronnätsbaserade modeller vid framställning av pappersmassa}},
school = {Linköping University},
type = {{LiTH-ISY-Ex No. 1991}},
year = {1999},
address = {Sweden},
}
Automated storage systems often rely on that the positions of the pallets are known with high precision. In this thesis, a turnable camera mounted on the robot has been used for handling the situation of approximately known pallet positions. The robot is given the approximate location of a pallet, and its objective is to locate the pallet with a precision that is high enough to be able to approach it from the correct direction and then lift it. For this, a precision of a few centimetres in each direction is needed.
A system for locating the pallet from single images, based on rotational symmetry filters, has been developed, and a simple program for controlling the robot has been implemented. These could very well be extended and improved, e.g. by considering multiple images and improving the path planning.
The main part of the thesis deals with the image processing part. Other parts of the project, apart from the controller, include implementation of servers controlling the camera and the frame grabber.
Some tests have been made, which show fairly promising results.
@mastersthesis{diva2:303029,
author = {Roll, Jakob},
title = {{A System for Visual-Based Automated Storage Robots}},
school = {Linköping University},
type = {{LiTH-ISY-Ex No. 2053}},
year = {1999},
address = {Sweden},
}
Computer vision systems used in autonomous mobile vehicles are typically linked to higher-level deliberation processes. One important aspect of this link is how to connect, or anchor, the symbols used at the higher level to the objects in the vision system that these symbols refer to. Anchoring is complicated by the fact that the vision data are inherently affected by uncertainty. We propose an anchoring technique that uses fuzzy sets to represent the uncertainty in the perceptual data. We show examples where this technique allows a deliberative system to reason about the objects (cars) detected by a vision system embarked in an unmanned helicopter, in the framework of the Witas project.
@techreport{diva2:288592,
author = {Andersson, Thord and Coradeschi, Silvia and Saffiotti, Alessandro},
title = {{Fuzzy matching of visual cues in an unmanned airborne vehicle}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1999},
type = {Other academic},
number = {, },
address = {Sweden},
}
@techreport{diva2:288602,
author = {Reed, Todd},
title = {{A Baseline System for Image and Map Registration using Sparse Hierarchical Features}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1999},
type = {Other academic},
number = {LiTH-ISY-R, 2138},
address = {Sweden},
}
In this report, mainly three different problems are considered. The first problem considered is how to filter position data of vehicles. To do so the vehicles have to be tracked. This is done with Kalman filters. The second problem considered is how to control a camera to keep a vehicle in the center of the image, under three different conditions. This is mainly solved with a Kalman filter. The last problem considered is how to use the color of the vehicles to make classification of them more robust. Some suggestions on how this might be done are given. However, no really good method to do this has been found.
@mastersthesis{diva2:530596,
author = {Moe, Anders},
title = {{Investigations in Tracking and Colour Classification}},
school = {Linköping University},
type = {{}},
year = {1998},
address = {Sweden},
}
A recursive method to condense general multidimensional FIR-filters into a sequence of simple kernels with mainly one dimensional extent has been worked out. Convolver networks adopted for 2, 3 and 4D signals is presented and the performance is illustrated for spherically separable quadrature filters. The resulting filter responses are mapped to a non biased tensor representation where the local tensor constitutes a robust estimate of both the shape and the orientation (velocity) of the neighbourhood. A qualitative evaluation of this General Sequential Filter concept results in no detectable loss in accuracy when compared to conventional FIR (Finite Impulse Response) filters but the computational complexity is reduced several orders in magnitude. For the examples presented in this paper the attained speed-up is 5, 25 and 300 times for 2D, 3D and 4D data respectively The magnitude of the attained speed-up implies that complex spatio-temporal analysis can be performed using standard hardware, such as a powerful workstation, in close to real time. Due to the soft implementation of the convolver and the tree structure of the sequential filtering approach the processing is simple to reconfigure for the outer as well as the inner (vector length) dimensionality of the signal. The implementation was made in AVS (Application Visualization System) using modules written in C.
@techreport{diva2:288295,
author = {Andersson, Mats and Wiklund, Johan and Knutsson, Hans},
title = {{Sequential Filter Trees for Efficient 2D 3D and 4D Orientation Estimation}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1998},
type = {Other academic},
number = {LiTH-ISY-R, 2070},
address = {Sweden},
}
This paper presents our general strategy for designing learning machines as well as a number of particular designs. The search for methods allowing a sufficient level of adaptivity are based on two main principles: 1. Simple adaptive local models and 2. Adaptive model distribution. Particularly important concepts in our work is mutual information and canonical correlation. Examples are given on learning feature descriptors, modeling disparity, synthesis of a global 3-mode model and a setup for reinforcement learning of online video coder parameter control.
@techreport{diva2:288299,
author = {Knutsson, Hans and Borga, Magnus and Landelius, Tomas},
title = {{Learning Multidimensional Signal Processing}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1998},
type = {Other academic},
number = {LiTH-ISY-R, 2039},
address = {Sweden},
}
This report introduces a signal processing strategy for depth segmentation and scene reconstruction that incorporates occlusion as a natural component. The work aims to maximize the use of connectivity in the temporal domain as much as possible under the condition that the scene is static and that the camera motion is known. An object behind the foreground is reconstructed using the fact that different parts of the object have been seen in different images in the sequence. One of the main ideas in the reported work is the use of a spatiotemporal certainty volume c(x) with the same dimension as the input spatiotemporal volume s(x), and then use c(x) as a 'blackboard' for rejecting already segmented image structures. The segmentation starts with searching for image structures in the foreground, eliminates their occluding influence, and then proceeds. Normalized convolution, which is a Weighted Least Mean Square technique for filtering data with varying spatial reliability, is used for all filtering. High spatial resolution near object borders is achieved and only neighboring structures with similar depth supports each other.
@techreport{diva2:288324,
author = {Ulvklo, Morgan and Granlund, Gösta H. and Knutsson, Hans},
title = {{Adaptive Reconstruction using Multiple Views}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1998},
type = {Other academic},
number = {LiTH-ISY-R, 2036},
address = {Sweden},
}
@techreport{diva2:288634,
author = {Borga, Magnus and Knutsson, Hans},
title = {{An Adaptive Stereo Algorithm Based on Canonical Correlation Analysis}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1998},
type = {Other academic},
number = {LiTH-ISY-R, 2013},
address = {Sweden},
}
@techreport{diva2:288629,
author = {Granlund, Gösta},
title = {{Does Vision Inevitably Have to be Active?}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1998},
type = {Other academic},
number = {LiTH-ISY-R, 2068},
address = {Sweden},
}
Artificiella neurala nätverk (ANN) är en teknik som under de senaste tio åren har mognat och som numera återfinns i allt fler tillämpningar så som avläsning av skriven text, linjär programmering, reglerteknik, expertsystem, taligenkänning och många olika sorters klassificeringsproblem [Zurada, 1992]. Vi ville i vårt examensarbete försöka använda ANN i en industriell process där standardmetoder ej fungerat tillfredsställande eller varit svåra att tillämpa. En sådan process fann vi i tillverkningen av pappersmassa.
För att tillverka pappersmassa från ved krävs en lång och komplicerad process uppdelad i flera olika steg. Ett av dessa steg är den så kallade kokningen där man med hjälp av högt tryck och varm lut bryter ned träflis till fibrer. Kokningsprocessen är komplex, pågår under lång tid (ca. 8 timmar) samt påverkas av en stor mängd parametrar och därför krävs det stor erfarenhet och kunskap för att kunna styra den. På Kværner Pulping Technologies i Karlstad, som konstruerar bl.a. kokare, har man tagit fram en simulator för kokningsprocessen för att man skall få en bättre insikt i hur processen fungerar och följaktligen kunna styra kokningen på ett bättre sätt. Simulatorns beteende är beroende av ett antal s.k. dolda parametrar som är en delmängd av de parametrar som antas påverka kokningsprocessen. Dessa dolda parametrar är svåra/omöjliga att mäta och därför sätts dessa i simuleringen till estimerade värden. De, i den riktiga processen, motsvarande dolda parametrarna varierar dock på ett okänt sätt. De påverkas dels av interna processer i kokaren, dels av externa orsaker, t.ex. kan träflis av en annan kvalitet matas in i kokaren. Detta leder till simulatorn ger bra simuleringar under ganska kort tid då de dolda parametrarna är approximativt konstanta.
Om man på något sätt skulle kunna detektera förändringarna i de dolda parametrarna i processen och föra över dessa till simulatorn, skulle den kunna gå "parallellt" med kokprocessen. Simulatorn skulle i detta fall utgöra ett utmärkt kompletterande verktyg för den person som styr kokprocessen, eftersom han/hon skulle få en bättre uppfattning om vad som händer/hände i processen och därmed få ett större beslutsunderlag för styrning. Detta förutsätter att simulatorn är så pass bra att den under stationära förhållanden i parametrarna lyckas fånga den globala utvecklingen i kokaren med tillräcklig precision.
Som ett första steg för att nå detta mål avser vi i denna rapport att undersöka om detektering av förändringar i de dolda parametrarna i simulatorn är möjlig med hjälp av framåtkopplade ANN och inlärningsalgoritmen resilient propagation.
Rapporten är uppdelad i 7 kapitel där vi i kapitel 2 kommer behandla problemet mer i detalj. Kapitel 3 och 4 är av allmänt slag där vi beskriver tillverkningsprocessen för papper och vad artificiella neurala nätverk egentligen är. I kapitel 5 beskriver vi de olika lösningsförslag som behandlats och de resultat vi har uppnått. Slutsatser och resultat sammanfattas i kapitel 6 . Det finns mycket mer vi skulle vilja pröva på och undersöka, dessa fortsatta arbeten beskriver vi kapitel 7. Sist i rapporten kommer bilagorna 1 och 2 med detaljer som vi finner relevanta, men som är för skrymmande att ta med i huvuddelen av rapporten. I bilaga 3 har vi bifogat den programkod vi producerat under arbetets gång.
@mastersthesis{diva2:302994,
author = {Andersson, Thord and Karlsson, Mikael},
title = {{Neuronnätsbaserad identifiering av processparametrar vid tillverkning av pappersmassa}},
school = {Linköping University},
type = {{LiTH-ISY-Ex No. 1709}},
year = {1997},
address = {Sweden},
}
Chapter 2 describes the concept of canonical correlation. This you have to know about in order to understand the continuing discussion.
Chapter 3 introduce you to the problem that was to be solved.
Chapter 4, 5 and 6 discusses three different suggestions how to approach the problem. Each chapter begins with a section of experiments as a motivation of the approach. Then follows some theory and mathematical manipulations to structure the thoughts. The last sections contains discussions and suggestions concerning the approach.
Finally chapter 7 contains a summary and a comparismental discussion of the approaches.
@mastersthesis{diva2:303009,
author = {Johansson, Björn},
title = {{Multidimensional signal recognition, invariant to affine transformation and time-shift, using canonical correlation}},
school = {Linköping University},
type = {{LiTH-ISY-EX-1825}},
year = {1997},
address = {Sweden},
}
Segmentation is a process that separates objects in an image. In medical images, particularly image volumes, the field of application is wide. For example 3D visualisations of the anatomy could benefit enormously from segmentation. The aim of this thesis is to construct a segmentation tool.
The project consist three main parts. First, a survey of the actual need of segmentation in medical image volumes was carried out. Then a unique three-step model for a segmentation tool was implemented, tested and evaluated.
The first step of the segmentation tool is a seed-growing method that uses the intensity and an orientation tensor estimate to decide which voxels that are part of the project. The second step uses an active contour, a deformable “balloon”. The contour is shrunk to fit the segmented border from the first step, yielding a surface suitable for visualisation. The last step consists of letting the contour reshape according to the orientation tensor estimate.
The use evaluation establishes the usefulness of the tool. The model is flexible and well adapted to the users’ requests. For unclear objects the segmentation may fail, but the cause is mostly poor image quality. Even though much work remains to be done on the second and third part of the tool, the results are most promising.
@mastersthesis{diva2:303019,
author = {Lundström, Claes},
title = {{Segmentation of Medical Image Volumes}},
school = {Linköping University},
type = {{LiTH-ISY-Ex No. 1864}},
year = {1997},
address = {Sweden},
}
In this report, the principles of man-made object detection in satellite images is investigated. An overview of terminology and of how the detection problem is usually solved today is given. A three level system to solve the detection problem is proposed. The main branches of this system handle road, and city detection respectively. To achieve data source flexibility, the Logical Sensor notion is used to model the low level system components. Three Logical Sensors have been implemented and tested on Landsat TM and SPOT XS scenes. These are: BDT (Background Discriminant Transformation) to construct a man-made object property field; Local-orientation for texture estimation and road tracking; Texture estimation using local variance and variance of local orientation. A gradient magnitude measure for road seed generation has also been tested.
@mastersthesis{diva2:303014,
author = {Forss\'{e}n, Per-Erik},
title = {{Detection of Man-made Objects in Satellite Images}},
school = {Linköping University},
type = {{LiTH-ISY-Ex No. 1852}},
year = {1997},
address = {Sweden},
}
@techreport{diva2:288304,
author = {Ulvklo, Morgan and Uppsäll, Magnus},
title = {{Adaptive Reconstruction using Multiple Views - Results and Applications}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1997},
type = {Other academic},
number = {, },
address = {Sweden},
}
@techreport{diva2:288560,
author = {Karlholm, Jörgen},
title = {{Tracking of occluded targets in head-up display sequences}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1997},
type = {Other academic},
number = {LiTH-ISY-R, 1993},
address = {Sweden},
}
This paper presents a novel algorithm for analysis of stochastic processes. The algorithm can be used to find the required solutions in the cases of principal component analysis (PCA), partial least squares (PLS), canonical correlation analysis (CCA) or multiple linear regression (MLR). The algorithm is iterative and sequential in its structure and uses on-line stochastic approximation to reach an equilibrium point. A quotient between two quadratic forms is used as an energy function and it is shown that the equilibrium points constitute solutions to the generalized eigenproblem.
@techreport{diva2:288565,
author = {Borga, Magnus and Landelius, Tomas and Knutsson, Hans},
title = {{A Unified Approach to PCA, PLS, MLR and CCA}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1997},
type = {Other academic},
number = {LiTH-ISY-R, 1992},
address = {Sweden},
}
To find a shape in an image, a technique called snakes or active contours can be used. An active contour is a curve that moves towards the sought-for shape in a way controlled by internal forces - such as rigidity and elasticity - and an image force. The image force should attract the contour to certain features, such as edges, in the image. This is done by creating an attractor image, which defines how strongly each point in the image should attract the contour.
In this thesis the extension to contours (surfaces) in three dimensional images is studied. Methods of representation of the contour and computation of the internal forces are treated.
Also, a new way of creating the attractor image, using the orientation tensor to detect planar structure in 3D images, is studied. The new method is not generally superior to those already existing, but still has its uses in specific applications.
During the project, it turned out that the main problem of active contours in 3D images was instability due to strong internal forces overriding the influence of the attractor image. The problem was solved satisfactory by projecting the elasticity force on the contour’s tangent plane, which was approximated efficiently using sphere-fitting.
@mastersthesis{diva2:302987,
author = {Ahlberg, Jörgen},
title = {{Active Contours in Three Dimensions}},
school = {Linköping University},
type = {{LiTH-ISY-Ex No. 1708}},
year = {1996},
address = {Sweden},
}
This Master's Thesis addresses the problem of segmenting an image sequence with respect to the motion in the sequence. As a basis for the motion estimation, 3D orientation tensors are used. The goal of the segmentation is to partition the images into regions, characterized by having a coherent motion. The motion model is affine with respect to the image coordinates. A method to estimate the parameters of the motion model from the orientation tensors in a region is presented. This method can also be generalized to a large class of motion models.
Two segmentation algorithms are presented together with a postprocessing algorithm. All these algorithms are based on the competitive algorithm, a general method for distributing points between a number of regions, without relying on arbitrary threshold values. The first segmentation algorithm segments each image independently, while the second algorithm recursively takes advantage of the previous segmentation. The postprocessing algorithm stabilizes the segmentations of a whole sequence by imposing continuity constraints.
The algorithms have been implemented and the results of applying them to a test sequence are presented. Interesting properties of the algorithms are that they are robust to the aperture problem and that they do not require a dense velocity ¯eld.
It is finally discussed how the algorithms can be developed and improved. It is straightforward to extend the algorithms to base the segmentations on alternative or additional features, under not too restrictive conditions on the features.
@mastersthesis{diva2:302971,
author = {Farnebäck, Gunnar},
title = {{Motion-based segmentation of image sequences}},
school = {Linköping University},
type = {{LiTH-ISY-Ex No. 1596}},
year = {1996},
address = {Sweden},
}
This report documents work done at the request of the Swedish Defense Research Establishment. The studied problem is that of detecting point-shaped targets, i.e. targets whose only significant property is that of being very small, in a cluttered environment. Three approaches to the problem have been considered. The first one, based on motion compensation, was rejected at an early stage due to expected problems with robustness and computational demands. The second method, based on background modeling with principal components, turned out successful and has been studied in depth, including discussion of various extensions and improvements of the presented algorithm. Finally, a Wiener filter approach has also turned out successful, including an approximation with separable filters. The methods have been tested on sequences obtained by an IR sensor. While both the two latter approaches work well on the test sequences, the Wiener filter is simpler and computationally less expensive than the background modeling. On the other hand, the background modeling is likely to have better possibilities for extensions and improvements.
@techreport{diva2:288286,
author = {Farnebäck, Gunnar and Knutsson, Hans and Granlund, Gösta},
title = {{Detection of point-shaped targets}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1996},
type = {Other academic},
number = {LiTH-ISY-R, 1921},
address = {Sweden},
}
This paper presents a novel algorithm for finding the solution of the generalized eigenproblem where the matrices involved contain expectation values from stochastic processes. The algorithm is iterative and sequential to its structure and uses on-line stochastic approximation to reach an equilibrium point. A quotient between two quadratic forms is suggested as an energy function for this problem and is shown to have zero gradient only at the points solving the eigenproblem. Furthermore it is shown that the algorithm for the generalized eigenproblem can be used to solve three important problems as special cases. For a stochastic process the algorithm can be used to find the directions for maximal variance, covariance, and canonical correlation as well as their magnitudes.
@techreport{diva2:288332,
author = {Knutsson, Hans and Borga, Magnus and Landelius, Tomas},
title = {{Generalized Eigenproblem for Stochastic Process Covariances}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1996},
type = {Other academic},
number = {LiTH-ISY-R, 1916},
address = {Sweden},
}
Two new reinforcement learning algorithms are presented. Both use a binary tree to store simple local models in the leaf nodes and coarser global models towards the root. It is demonstrated that a meaningful partitioning into local models can only be accomplished in a fused space consisting of both input and output. The first algorithm uses a batch like statistic procedure to estimate the reward functions in the fused space. The second one uses channel coding to represent the output- and input vectors allowing a simple iterative algorithm based on competing subsystems. The behaviors of both algorithms are illustrated in a preliminary experiment.
@techreport{diva2:288282,
author = {Landelius, Tomas and Borga, Magnus and Knutsson, Hans},
title = {{Reinforcement Learning Trees}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1996},
type = {Other academic},
number = {LiTH-ISY-R, 1828},
address = {Sweden},
}
A scheme for performing generalized convolutions is presented. A flexible convolver, which runs on standard workstations, has been implemented. It is designed for maximum throughput and flexibility. The implementation incorporates spatio-temporal convolutions with configurable vector combinations. It can handle general multilinear operations, i.e. tensor operations on multidimensional data of any order. The input data and the kernel coefficients can be of arbitrary vector length. The convolver is configurable for IIR filters in the time dimension. Other features of the implemented convolver are scattered kernel data, region of interest and subsampling. The implementation is done as a C-library and a graphical user interface in AVS (Application Visualization System).
@techreport{diva2:288320,
author = {Wiklund, Johan and Knutsson, Hans},
title = {{A Generalized Convolver}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1996},
type = {Other academic},
number = {LiTH-ISY-R, 1830},
address = {Sweden},
}
A number of success stories have been told where reinforcement learning has been applied to problems in continuous state spaces using neural nets or other sorts of function approximators in the adaptive critics. However, the theoretical understanding of why and when these algorithms work is inadequate. This is clearly exemplified by the lack of convergence results for a number of important situations. To our knowledge only two such results been presented for systems in the continuous state space domain. The first is due to Werbos and is concerned with linear function approximation and heuristic dynamic programming. Here no optimal strategy can be found why the result is of limited importance. The second result is due to Bradtke and deals with linear quadratic systems and quadratic function approximators. Bradtke's proof is limited to ADHDP and policy iteration techniques where the optimal solution is found by a number of successive approximations. This paper deals with greedy techniques, where the optimal solution is directly aimed for. Convergence proofs for a number of adaptive critics, HDP, DHP, ADHDP and ADDHP, are presented. Optimal controllers for linear quadratic regulation (LQR) systems can be found by standard techniques from control theory but the assumptions made in control theory can be weakened if adaptive critic techniques are employed. The main point of this paper is, however, not to emphasize the differences but to highlight the similarities and by so doing contribute to a theoretical understanding of adaptive critics.
@techreport{diva2:288542,
author = {Landelius, Tomas and Knutsson, Hans},
title = {{Greedy adaptive critics for LPQ [dvs LQR] problems:
Convergence Proofs}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1996},
type = {Other academic},
number = {LiTH-ISY-R, 1896},
address = {Sweden},
}
This paper reviews an existing algorithm for adaptive control based on explicit criterion maximization (ECM) and presents an extended version suited for reinforcement learning tasks. Furthermore, assumptions under which the algorithm convergences to a local maxima of a long term utility function are given. Such convergence theorems are very rare for reinforcement learning algorithms working with continuous state and action spaces. A number of similar algorithms, previously suggested to the reinforcement learning community, are briefly surveyed in order to give the presented algorithm a place in the field. The relations between the different algorithms is exemplified by checking their consistency on a simple problem of linear quadratic regulation (LQR).
@techreport{diva2:288584,
author = {Landelius, Tomas and Knutsson, Hans},
title = {{Reinforcement Learning Adaptive Control and Explicit Criterion Maximization}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1996},
type = {Other academic},
number = {LiTH-ISY-R, 1829},
address = {Sweden},
}
This paper presents novel algorithms for finding the singular value decomposition (SVD) of a general covariance matrix by stochastic approximation. General in the sense that also non-square, between sets, covariance matrices are dealt with. For one of the algorithms, convergence is shown using results from stochastic approximation theory. Proofs of this sort, establishing both the point of equilibrium and its domain of attraction, have been reported very rarely for stochastic, iterative feature extraction algorithms.
@techreport{diva2:288273,
author = {Landelius, Tomas and Knutsson, Hans and Borga, Magnus},
title = {{On-Line Singular Value Decomposition of Stochastic Process Covariances}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1995},
type = {Other academic},
number = {LiTH-ISY-R, 1762},
address = {Sweden},
}
This paper presents a novel learning algorithm that finds the linear combination of one set of multi-dimensional variates that is the best predictor, and at the same time finds the linear combination of another set which is the most predictable. This relation is known as the canonical correlation and has the property of being invariant with respect to affine transformations of the two sets of variates. The algorithm successively finds all the canonical correlations beginning with the largest one. It is shown that canonical correlations can be used in computer vision to find feature detectors by giving examples of the desired features. When used on the pixel level, the method finds quadrature filters and when used on a higher level, the method finds combinations of filter output that are less sensitive to noise compared to vector averaging.
@techreport{diva2:288567,
author = {Knutsson, Hans and Borga, Magnus and Landelius, Tomas},
title = {{Learning Canonical Correlations}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1995},
type = {Other academic},
number = {LiTH-ISY-R, 1761},
address = {Sweden},
}
This paper presents an algorithm for estimation of local curvature from gradients of a tensor field that represents local orientation. The algorithm is based on an operator representation of the orientation tensor, which means that change of local orientation corresponds to a rotation of the eigenvectors of the tensor. The resulting curvature descriptor is a vector that points in the direction of the image in which the local orientation rotates anti-clockwise and the norm of the vector is the inverse of the radius of curvature. Two coefficients are defined that relate the change of local orientation with either curves or radial patterns.
@techreport{diva2:288599,
author = {Nordberg, Klas and Knutsson, Hans and Granlund, Gösta},
title = {{Local Curvature from Gradients of the Orientation Tensor Field}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1995},
type = {Other academic},
number = {LiTH-ISY-R, 1783},
address = {Sweden},
}
@techreport{diva2:288633,
author = {Wilson, Roland and Knutsson, Hans},
title = {{Seeing Things II}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1995},
type = {Other academic},
number = {LiTH-ISY-R, 1787},
address = {Sweden},
}
This paper addresses the idea of learning by reinforcement, within the theory of behaviorism. The reason for this choice is its generality and especially that the reinforcement learning paradigm allows systems to be designed, which can improve their behavior beyond that of their teacher. The role of the teacher is to define the reinforcement function, which acts as a description of the problem the machine is to solve. Gained knowledge is represented by a behavior probability density function which is approximated with a number of normal distributions, stored in the nodes of a binary tree. It is argued that a meaningful partitioning into local models can only be accomplished in a fused space consisting of both stimuli and responses. Given a stimulus, the system searches for responses likely to result in highly reinforced decisions by treating the sum of the two normal distributions on each level in the tree as a distribution describing the system's behavior at that resolution. The resolution of the response, as well as the tree growing and pruning processes, are controlled by a random variable based on the difference in performance between two consecutive levels in the tree. This results in a system that will never be content but will indefinitely continue to search for better solutions.
@techreport{diva2:288270,
author = {Landelius, Tomas and Knutsson, Hans},
title = {{A Dynamic Tree Structure for Incremental Reinforcement Learning of Good Behavior}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1994},
type = {Other academic},
number = {LiTH-ISY-R, 1628},
address = {Sweden},
}
A robust, general and computationally simple reinforcement learning system is presented. It uses a channel representation which is robust and continuous. The accumulated knowledge is represented as a reward prediction function in the outer product space of the input- and output channel vectors. Each computational unit generates an output simply by a vector-matrix multiplication and the response can therefore be calculated fast. The response and a prediction of the reward are calculated simultaneously by the same system, which makes TD-methods easy to implement if needed. Several units can cooperate to solve more complicated problems. A dynamic tree structure of linear units is grown in order to divide the knowledge space into a sufficiently number of regions in which the reward function can be properly described. The tree continuously tests split- and prune criteria in order to adapt its size to the complexity of the problem.
@techreport{diva2:288288,
author = {Borga, Magnus and Knutsson, Hans},
title = {{A Binary Competition Tree for Reinforcement Learning}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1994},
type = {Other academic},
number = {LiTH-ISY-R, 1623},
address = {Sweden},
}
A robust, fast and general method for estimation of object properties is proposed. It is based on a representation of theses properties in terms of channels. Each channel represents a particular value of a property, resembling the activity of biological neurons. Furthermore, each processing unit, corresponding to an artificial neuron, is a linear perceptron which operates on outer products of input data. This implies a more complex space of invariances than in the case of first order characteristic without abandoning linear theory. In general, the specific function of each processing unit has to to be learned and a fast and simple learning rule is presented. The channel representation, the processing structure and the learning rule has been tested on stereo image data showing a cube with various 3D positions and orientations. The system was able to learn a channel representation for the horizontal position, the depth, and the orientation of the cube, each property invariant to the other two.
@techreport{diva2:288329,
author = {Nordberg, Klas and Granlund, Gösta and Knutsson, Hans},
title = {{Representation and Learning of Invariance}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1994},
type = {Other academic},
number = {LiTH-ISY-R, 1552},
address = {Sweden},
}
@techreport{diva2:288308,
author = {Westin, Carl-Fredrik and Westelius, Carl-Johan and Wiklund, Johan and Knutsson, Hans and Granlund, Gösta},
title = {{ESPRIT Basic Research Action 7108, Vision as Process, DR.B.2:
Integration of Multi-level Control Loops and FOA}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1994},
type = {Other academic},
number = {, },
address = {Sweden},
}
We apply the 3D-orientation tensor representation to construct an object tracking algorithm. 2D-line normal flow is estimated by computing the eigenvector associated with the largest eigenvalue of 3D (two spatial dimensions plus time) tensors with a planar structure. Object's true 2D velocity is computed by averaging tensors with consistent normal flows, generating a 3D line representation that corresponds to a 2D point in motion. Flow induced by camera rotation is compensated for by ignoring points with velocity consistent with the ego-rotation. A region-of-interest growing process based on motion consistency generates estimates of object size and position.
@techreport{diva2:288608,
author = {Karlholm, Jörgen and Westelius, Carl-Johan and Westin, Carl-Fredrik and Knutsson, Hans},
title = {{Object Tracking Based on the Orientation Tensor Concept}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1994},
type = {Other academic},
number = {LiTH-ISY-R, 1658},
address = {Sweden},
}
Three-dimensional data processing is becoming more and more common. Typical operations are for example estimation of optical flow in video sequences and orientation estimation in 3-D MR images. This paper proposes an efficient approach to robust low level feature extraction for 3-D image analysis. In contrast to many earlier algorithms the methods proposed in this paper support the use of relatively complex models at the initial processing steps. The aim of this approach is to provide the means to handle complex events at the initial processing steps and to enable reliable estimates in the presence of noise. A limited basis filter set is proposed which forms a basis on the unit sphere and is related to spherical harmonics. From these basis filters, different types of orientation selective filters are synthesized. An interpolation scheme that provides a rotation as well as a translation of the synthesized filter is presented. The purpose is to obtain a robust and invariant feature extraction at a manageable computational cost.
@techreport{diva2:288342,
author = {Andersson, Mats T. and Knutsson, Hans},
title = {{Controllable 3-D Filters for Low Level Computer Vision}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1993},
type = {Other academic},
number = {LiTH-ISY-R, 1526},
address = {Sweden},
}
@techreport{diva2:288587,
author = {Granlund, Gösta},
title = {{ESPRIT Project BRA 3038: Vision as Process, Final Report}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1993},
type = {Other academic},
number = {LiTH-ISY-R, 1473},
address = {Sweden},
}
The tensor representation has proven a successful tool as a mean to describe local multi-dimensional orientation. In this respect, the tensor representation is a map from the local orientation to a second order tensor. This paper investigates how variations of the orientation are mapped to variation of the tensor, thereby giving an explicit equivariance relation. The results may be used in order to design tensor based algorithms for extraction of image features defined in terms of local variations of the orientation, e.g. multi-dimensional curvature or circular symmetries. It is assumed that the variation of the local orientation can be described in terms of an orthogonal transformation group. Under this assumption a corresponding orthogonal transformation group, acting on the tensor, is constructed. Several correspondences between the two groups are demonstrated.
@techreport{diva2:288623,
author = {Nordberg, Klas and Knutsson, Hans and Granlund, Gösta},
title = {{On the Equivariance of the Orientation and the Tensor Field Representation}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1993},
type = {Other academic},
number = {LiTH-ISY-R, 1530},
address = {Sweden},
}
@techreport{diva2:288577,
author = {Westin, Carl-Fredrik and Westelius, Carl-Johan},
title = {{ESPRIT Basic Research Action 7108, Vision as Process, DR.B.1: Integration of Low-level FOA \texttt{\char`\\}\& Control Mechanisms}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1993},
type = {Other academic},
number = {, },
address = {Sweden},
}
@techreport{diva2:288594,
author = {Larsen, Rasmus},
title = {{Thoughts on Bayesian Estimation of Motion Vector Fields}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1993},
type = {Other academic},
number = {LiTH-ISY-R, 1521},
address = {Sweden},
}
@techreport{diva2:288569,
author = {et, Erik Granum},
title = {{ESPRIT Basic Research Action 7108, Vision as Process, Periodic progress report}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1993},
type = {Other academic},
number = {, },
address = {Sweden},
}
@techreport{diva2:288290,
author = {Wiklund, Johan and Westin, Carl-Fredrik and Westelius, Carl-Johan},
title = {{AVS, Application Visualization System, Software Evaluation Report}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1993},
type = {Other academic},
number = {LiTH-ISY-R, 1469},
address = {Sweden},
}
@techreport{diva2:288563,
author = {Wilson, Roland and Knutsson, Hans},
title = {{Seeing Things [1]}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1993},
type = {Other academic},
number = {LiTH-ISY-R, 1467},
address = {Sweden},
}
The topic of this report is signal representation in the context of hierarchical image processing. An overview of hierarchical processing systems is included as well as a presentation of various approaches to signal representation, feature representation and feature extraction. It is claimed that image hierarchies based on feature extraction, so called feature hierarchies, demand a signal representation other than the standard spatial or linear representation used today. A new representation, the operator representation is developed. It is based on an interpretation of features in terms of signal transformations. This representation has no references to any spatial ordering of the signal element and also gives an explicit representation of signal features. Using the operator representation, a generalization of the standard phase concept in image processing is introduced. Based on the operator representation, two algorithms for extraction of feature values are presented. Both have the capability of generating phase invariant feature descriptors. It is claimed that the operator representation in conjunction with some appropriate feature extraction algorithm is well suited as a general framework for defining multi level feature hierarchies. The report contains an appendical chapter containing the mathematical details necessary to comprehend the presentation.
@techreport{diva2:288284,
author = {Nordberg, Klas},
title = {{Signal Representation and Signal Processing using Operators}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1992},
type = {Other academic},
number = {LiTH-ISY-I, 1387},
address = {Sweden},
}
This survey considers response generating systems that improve their behaviour using reinforcement learning. The difference between unsupervised learning, supervised learning, and reinforcement learning is described. Two general problems concerning learning systems are presented; the credit assignment problem and the problem of perceptual aliasing. Notations and some general issues concerning reinforcement learning systems are presented. Reinforcement learning systems are further divided into two main classes; memory mapping and projective mapping systems. Each of these classes is described and some examples are presented. Some other approaches are mentioned that do not fit into the two main classes. Finally some issues not covered by the surveyed articles are discussed, and some comments on the subject are made.
@techreport{diva2:288303,
author = {Borga, Magnus and Carlsson, Tomas},
title = {{A Survey of Current Techniques for Reinforcement Learning}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1992},
type = {Other academic},
number = {LiTH-ISY-I, 1391},
address = {Sweden},
}
@techreport{diva2:288262,
author = {Westelius, Carl-Johan},
title = {{ESPRIT Basic Research Action 3038, Vision as Process, DS.A.2.1: Software for Model Support and Local FOA Control}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1992},
type = {Other academic},
number = {, },
address = {Sweden},
}
@techreport{diva2:288294,
author = {Westin, Carl-Fredrik},
title = {{ESPRIT Basic Research Action 3038, Vision as Process, DR.A.2.1: Model Support and Local FOA Control}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1992},
type = {Other academic},
number = {, },
address = {Sweden},
}
@techreport{diva2:288264,
author = {Westelius, Carl-Johan and Knutsson, Hans and Wiklund, Johan},
title = {{Robust Vergence Control Using Scale--Space Phase Information}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1992},
type = {Other academic},
number = {LiTH-ISY-I, 1363},
address = {Sweden},
}
@techreport{diva2:288561,
author = {Bårman, Håkan and Knutsson, Hans and Granlund, Gösta H.},
title = {{A Note on Estimation of Optical Flow and Acceleration}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1992},
type = {Other academic},
number = {LiTH-ISY-I, 1313},
address = {Sweden},
}
@techreport{diva2:288339,
author = {Wiklund, Johan and Westelius, Carl-Johan and Knutsson, Hans},
title = {{Hierarchical Phase Based Disparity Estimation}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1992},
type = {Other academic},
number = {LiTH-ISY-I, 1327},
address = {Sweden},
}
@techreport{diva2:288624,
author = {Bårman, Håkan and Granlund, Gösta},
title = {{Hierarchical Feature Extraction for Computer-Aided Analysis of Mammograms}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1992},
type = {Other academic},
number = {LiTH-ISY-R, 1448},
address = {Sweden},
}
@techreport{diva2:288589,
author = {Wiklund, Johan and Knutsson, Hans and Wilson, Roland},
title = {{A Hierarchical Stereo Algorithm}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1991},
type = {Other academic},
number = {LiTH-ISY-I, 1167},
address = {Sweden},
}
@techreport{diva2:288298,
author = {Westelius, Carl-Johan and Knutsson, Hans},
title = {{ESPRIT Basic Research Action 3038, Vision as Process, DS.A.1.1: Preliminary Software for Feature Extraction}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1991},
type = {Other academic},
number = {, },
address = {Sweden},
}
The problem of incorporating orientation selectivity into transforms which provide local frequency representation of image regions over a range of spatial scales is investigated. It is shown that this can be achieved if the local spectra are defined on a log-polar coordinate lattice and that by appropriate choice of window functions, the spectra can be designed to be steerable in arbitrary orientations. In addition, the resulting class of transforms can be defined to be invertible, be based on window functions having good localization in both the spatial and spatial frequency domains, and be efficiently implemented using FFT techniques. Results of using one such transform for linear feature extraction demonstrate its effectiveness when dealing with oriented features.
@techreport{diva2:288269,
author = {Calway, Andrew},
title = {{Incorporating Orientation Selectivity in Wavelet Transforms: For Multi--Resolution Fourier Analysis of Images}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1991},
type = {Other academic},
number = {LiTH-ISY-I, 1243},
address = {Sweden},
}
@techreport{diva2:288292,
author = {Wilson, Roland and Calway, Andrew and Pearson, Edward R. S.},
title = {{A generalised wavelet transform for Fourier analysis: The multiresolution Fourier transform and its application to image and audio signal analysis}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1991},
type = {Other academic},
number = {LiTH-ISY-I, 1177},
address = {Sweden},
}
@techreport{diva2:288547,
author = {Bårman, Håkan and Knutsson, Hans and Granlund, Gösta H.},
title = {{Using Principal Direction Estimates for Shape and Acceleration Description}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1991},
type = {Other academic},
number = {LiTH-ISY-I, 1231},
address = {Sweden},
}
@techreport{diva2:288341,
author = {Westin, Carl-Fredrik and Knutsson, Hans},
title = {{Line Segmentation by Clustering in Möbius-Hough Space}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1991},
type = {Other academic},
number = {LiTH-ISY-I, 1221},
address = {Sweden},
}
@techreport{diva2:288333,
author = {Westelius, Carl-Johan and Granlund, Gösta},
title = {{Integrated Analyzes-Control Structure for Robotic Systems}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1991},
type = {Other academic},
number = {, },
address = {Sweden},
}
@techreport{diva2:288626,
author = {Westin, Carl-Fredrik and Knutsson, Hans},
title = {{ESPRI Basic Research Action 3038, Vision as Process, DR.A.1.2: Definition of feature generating procedures}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1991},
type = {Other academic},
number = {, },
address = {Sweden},
}
@techreport{diva2:288293,
author = {Bårman, Håkan and Granlund, Gösta H. and Knutsson, Hans},
title = {{Hierarchical Curvature Estimation and Description}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1990},
type = {Other academic},
number = {LiTH-ISY-I, 1095},
address = {Sweden},
}
@techreport{diva2:288325,
author = {Westelius, Carl-Johan and Knutsson, Hans and Granlund, Gösta H.},
title = {{Focus of Attention Control}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1990},
type = {Other academic},
number = {LiTH-ISY-I, 1140},
address = {Sweden},
}
@techreport{diva2:288319,
author = {Westin, Carl-Fredrik and Knutsson, Hans},
title = {{A Parameter Mapping for Line Segmentation}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1990},
type = {Other academic},
number = {LiTH-ISY-I, 1151},
address = {Sweden},
}
@techreport{diva2:288313,
author = {Järvinen, Arto and Wiklund, Johan},
title = {{Study of information mapping in Kohonen--Networks}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1989},
type = {Other academic},
number = {LiTH-ISY-I, 0978},
address = {Sweden},
}
@techreport{diva2:288328,
author = {Granlund, Gösta H.},
title = {{Image Processing Systems and Components}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1989},
type = {Other academic},
number = {LiTH-ISY-I, 1016},
address = {Sweden},
}
@techreport{diva2:288296,
author = {Granlund, Gösta H.},
title = {{Discriminant Functions, Linear Operations and Learning}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1989},
type = {Other academic},
number = {LiTH-ISY-I, 1015},
address = {Sweden},
}
@techreport{diva2:288321,
author = {Granlund, Gösta H.},
title = {{Information Representation in Image Analysis Algorithms}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1989},
type = {Other academic},
number = {LiTH-ISY-I, 1017},
address = {Sweden},
}
@techreport{diva2:288609,
author = {Bårman, Håkan and Knutsson, Hans and Granlund, Gösta H.},
title = {{Mechanisms for Striate Cortex Organization}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1989},
type = {Other academic},
number = {LiTH-ISY-I, 1020},
address = {Sweden},
}
@techreport{diva2:288606,
author = {Westin, Carl-Fredrik and Westelius, Carl-Johan},
title = {{Brain chaos. A feature or a bug?}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1989},
type = {Other academic},
number = {LiTH-ISY-I, 0990},
address = {Sweden},
}
This report is a survey of information representations in both biological and artificial neural networks. The correct information representation is crucial for the dynamics and the adaptation algorithms of neural networks. A number of examples of existing information representations are given.
@techreport{diva2:288541,
author = {Järvinen, Arto},
title = {{Information representation in neural networks -- A survey}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1989},
type = {Other academic},
number = {LiTH-ISY-I, 0994},
address = {Sweden},
}
@techreport{diva2:288265,
author = {Granlund, Gösta H.},
title = {{Integrated Analysis-Response Structures for Robotics Systems}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1988},
type = {Other academic},
number = {LiTH-ISY-I, 0932},
address = {Sweden},
}
@techreport{diva2:288287,
author = {Granlund, Gösta H.},
title = {{Bi-Directionally Adaptive Models in Image Analysis}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1988},
type = {Other academic},
number = {LiTH-ISY-I, 0930},
address = {Sweden},
}
@techreport{diva2:288334,
author = {Andersson, Mats and Granlund, Gösta H.},
title = {{A Hybrid Image Processing Architecture}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1988},
type = {Other academic},
number = {LiTH-ISY-I, 0929},
address = {Sweden},
}
@techreport{diva2:288338,
author = {Granlund, Gösta H. and Knutsson, Hans},
title = {{Compact Associative Representation of Structural Information}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1988},
type = {Other academic},
number = {LiTH-ISY-I, 0931},
address = {Sweden},
}
@techreport{diva2:288646,
author = {Granlund, Gösta H.},
title = {{Integrated Analysis-Response Structures for Robotics Systems}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1988},
type = {Other academic},
number = {LiTH-ISY-I, 0932},
address = {Sweden},
}
@techreport{diva2:288640,
author = {Granlund, Gösta H.},
title = {{Magnitude Representation of Feature Variables}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1988},
type = {Other academic},
number = {LiTH-ISY-I, 0933},
address = {Sweden},
}
@techreport{diva2:288600,
author = {Bårman, Håkan and Haglund, Leif and Granlund, Gösta H.},
title = {{Context Dependent Hierarchical Image Processing for Remote Sensing Data, Part Two: Contextual Classification and Segmentation}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1988},
type = {Other academic},
number = {LiTH-ISY-I, 0924},
address = {Sweden},
}
@techreport{diva2:288336,
author = {Bigun, Josef},
title = {{Impressions from Picture Processing in USA and Japan}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1988},
type = {Other academic},
number = {LiTH-ISY-I, 0892},
address = {Sweden},
}
The symmetries in a neighbourhood of a gray value image are modelled by conjugate harmonic function pairs. A harmonic function pair is utilized to represent a coordinate transformation defining a symmetry type. Inthis coordinate representation the image parts, which are symmetric with respect to the chosen function pair, have iso-gray value curves which are simple lines or parallel line patterns. The detection is modelled in thespecial Fourier domain corresponding to the new variables by minimizing an error function. It is shown that the minimiza.tion process ar detection of these patterns can be carried out for the whole image entirely in the spatial domain by convolutions. The convolution kernel is complex valued, as is the the result. The magnitudes of the result are shown to correspond to a well defi.ned certainty measure, while the orientation is the lea.st square estimate of an orientation in the Fourier transform corresponding to the harmonic coordinates. Applica tions to four symmetries a.re given. These are circular, linear, hyperbolic and parabolic symmetries. Experimental results a.re presented.
@techreport{diva2:288323,
author = {Bigun, Josef},
title = {{Detection of Linear Symmetry in Multiple Dimensions for Description of Local Orientation and Optical Flow}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1988},
type = {Other academic},
number = {LiTH-ISY-I, 893},
address = {Sweden},
}
@techreport{diva2:288607,
author = {Albregtsen, Fritz},
title = {{Enhancing Satellite Images of the Antarctic Snow and Ice Cover by Context Dependent Anisotropic Nonstationary Filtering.}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1987},
type = {Other academic},
number = {LiTH-ISY-I, 0852},
address = {Sweden},
}
@techreport{diva2:288274,
author = {Bigun, Josef},
title = {{Optimal Orientation Detection of Circular Symmetry.}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1987},
type = {Other academic},
number = {LiTH-ISY-I, 0871},
address = {Sweden},
}
The problem of optimal detection of orientation in arbitrary neighborhoods is solved in the least squares sense. It is shown that this corresponds to fitting an axis in the Fourier domain of the n-dimensional neighborhood, the solution of which is a well known solution of a matrix eigenvalue problem. The eigenvalues are the variance or inertia with respect to the axes given by their respective eigen vectors. The orientation is taken as the axis given by the least eigenvalue. Moreover it is shown that the necessary computations can be pursued in the spatial domain without doing a Fourier transformation. An implementation for 2-D is presented. Two certainty measures are given corresponding to the orientation estimate. These are the relative or the absolute distances between the two eigenvalues, revealing whether the fitted axis is much better than an axis orthogonal to it. The result of the implementation is verified by experiments which confirm an accurate orientation estimation and reliable certainty measure in the presence of additive noise at high level as well as low levels.
@techreport{diva2:691493,
author = {Bigun, Josef},
title = {{Optimal Orientation Detection of Linear Symmetry}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1987},
type = {Other academic},
number = {LiTH-ISY-I, 828},
address = {Sweden},
}
@techreport{diva2:288554,
author = {Granlund, Gösta H.},
title = {{Introduction to GOP Computer Vision.}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1986},
type = {Other academic},
number = {LiTH-ISY-I, 0849},
address = {Sweden},
}
@techreport{diva2:288617,
author = {Bårman, Håkan and Granlund, Gösta H. and Knutsson, Hans and Näppä, L.},
title = {{Context Dependent Hierarchical Image Processing for Remote Sensing Data.}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1986},
type = {Other academic},
number = {LiTH-ISY-I, 0824},
address = {Sweden},
}
A definition of central symmetry for local neighborhoods of 2-D images is given. A complete ON-set of centrally symmetric basis functions is proposed. The local neighborhoods are expanded in this basis. The behavior of coefficient spectrum obtained by this expansion is proposed to be the foundation of central symmetry parameters of the neighbqrhoods. Specifically examination of two such behaviors are proposed: Point concentration and line concentration of the energy spectrum. Moreover, the study of these types of behaviors of the spectrum are shown to be possible to do in the spatial domain.
@techreport{diva2:691498,
author = {Bigun, Josef and Granlund, Gösta H.},
title = {{Central Symmetry Modelling}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1986},
type = {Other academic},
number = {LiTH-ISY-I, 789},
address = {Sweden},
}
@techreport{diva2:288310,
author = {Näppä, Lars and Granlund, Gösta H.},
title = {{Texture Analysis and Description.}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1985},
type = {Other academic},
number = {LiTH-ISY-I, 0775},
address = {Sweden},
}
@techreport{diva2:403796,
author = {Granlund, Gösta},
title = {{Images and Computers}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1984},
type = {Other academic},
number = {LiTH-ISY-I, 0701},
address = {Sweden},
}
@techreport{diva2:288302,
author = {Wilson, Roland},
title = {{The Uncertainty Principle in Image Coding}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1983},
type = {Other academic},
number = {LiTH-ISY-I, 0579},
address = {Sweden},
}
@techreport{diva2:403798,
author = {Wilson, Roland},
title = {{A Class of Local Centroid Algorithms for Classification and Quantization in Spaces of Arbitrary Dimension}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1983},
type = {Other academic},
number = {LiTH-ISY-I, 0610},
address = {Sweden},
}
@techreport{diva2:403809,
author = {Wilson, Roland and Granlund, Gösta},
title = {{The Uncertainty Principle in Image Processing}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1983},
type = {Other academic},
number = {LiTH-ISY-I, 0576},
address = {Sweden},
}
@techreport{diva2:403800,
author = {Wilson, Roland},
title = {{Quad-Tree Predictive Coding:
A New Class of Image Data Compression Algorithms}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1983},
type = {Other academic},
number = {LiTH-ISY-I, 0609},
address = {Sweden},
}
@techreport{diva2:403805,
author = {Wilson, Roland},
title = {{Uncertainty, Eigenvalue Problems and Filter Design}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1983},
type = {Other academic},
number = {LiTH-ISY-I, 0580},
address = {Sweden},
}
@techreport{diva2:403801,
author = {Wilson, Roland},
title = {{The Uncertainty Principle in Vision}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1983},
type = {Other academic},
number = {LiTH-ISY-I, 0581},
address = {Sweden},
}
@techreport{diva2:288540,
author = {Granlund, Gösta H.},
title = {{Hierarchical Distributed Data Structures and Operations}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1982},
type = {Other academic},
number = {LiTH-ISY-I, 0512},
address = {Sweden},
}
Operators for extraction of local information are essential components in an image processing system. This paper concentrates on the design and evaluation of convolution kernel sets enabling easy estimation of local orientation and frequency.
Consideration of interpolation properties and the limiting effects of the uncertainty principle leads to the definition of an "i deal" quadrature filter function. An optimization procedure is utilized to produce pairs of convolution kernels which implement an approximation of the desired function. A number of optimization results are presented.
To evaluate the performance of the optimized kernels in an image processing task, a series of experiments have been carried out. Examples are given of local orientation and frequency estimates for images with different signal to noise ratios. An angle deviation measure is defined and avector averaging scheme is introduced to increase angle estimation accuracy. Using a OdB SNR testimage, orientation estimates are produced having an expected deviation of less than 7 degrees.
@techreport{diva2:319074,
author = {Knutsson, Hans},
title = {{Design of Convolution Kernels}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1982},
type = {Other academic},
number = {LiTH-ISY-I, 0557},
address = {Sweden},
}
@techreport{diva2:288571,
author = {Granlund, Gösta H. and Knutsson, Hans and Hedlund, Martin},
title = {{Hierarchical Processing of Structural Information}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1981},
type = {Other academic},
number = {LiTH-ISY-I, 0481},
address = {Sweden},
}
@techreport{diva2:288309,
author = {Kunt, Murat},
title = {{Picture Coding with the General Operator Processor (GOP)}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1980},
type = {Other academic},
number = {LiTH-ISY-I, 0370},
address = {Sweden},
}
@techreport{diva2:288306,
author = {Knutsson, Hans},
title = {{3-D Reconstruction by Fourier Techniques with Error Estimates}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1978},
type = {Other academic},
number = {LiTH-ISY-I, 0214},
address = {Sweden},
}
@techreport{diva2:288337,
author = {Granlund, Gösta H.},
title = {{Computer Processing and Display of Chromosome Image Information}},
institution = {Linköping University, Department of Electrical Engineering},
year = {1973},
type = {Other academic},
number = {LiTH-ISY-I, 0023},
address = {Sweden},
}
Senast uppdaterad: 2015-02-25