Intro To EmguCV

Common Features Within EmguCV

Advertisements

Although Python’s OpenCV library is an extensive library with numerous functions and methods that allow the user to accomplish virtually anything, the library does have its limitations. Originally released in C++, OpenCV has drastically changed over time, with supported languages growing to Python and Java.

Over time, due to its increasing popularity, OpenCV has had numerous wrappers created to allow its libraries to be implemented in a verity of other languages. One of the most common, and the wrapper that we will be using today, is EmguCV.

Written in C#, EmguCV utilizes almost all of the functions from OpenCV, and is very helpful when wanting to implement machine learning and computer vision into the .NET framework. In the following blog post, I’ll detail how to get started within EmguCV, and some of its primary overall functionality.

After downloading and declaring the EmguCV Library or Nuget Package in Visual Studio, we will want to create an empty Mat object to store any information in that we deem fit. To create an empty Mat object, you can use the following code below.

Mat img = new Mat();

Next, we want to load an image into our program, so we can use it and manipulate it to our specifications. To load in an image, we use the following code below, and storing the information in the Mat object we created above.

img = CvInvoke.Imread("img.jpg", ImreadModes.AnyColor);

In the line above, we specified the image to be read in as AnyColor, or a Bgr image. This is ideal because, if we ever need to change the format of the image later, we can do so, while still having the original image to reference to.

Although the above way is a simple way to load in an image, it can be done all on one line with combining the code above.

Mat img = CvInvoke.Imread("img.jpg", ImreadModes.AnyColor);

Although the Mat object is the ideal way we want to transfer and convert data within an image, there are some older algorithms within EmguCV, that require an Image format instead of a Mat. To implement this, use the following code.

//Declaring an empty Image
Image<Bgr, byte> img = new Image<Bgr, byte>();

//Converting a Mat object to an Image
Mat img = new Mat();
Image<Bgr, byte> newImg = img.ConvertTo<Bgr, byte>();

Again, we are loading in the image in Bgr mode, but if necessary, we can declare it in grayscale instead.

//Declaring an empty Image
Image<Gray, byte> img = new Image<Gray, byte>();

//Converting a Mat object to an Image
Mat img = new Mat();
Image<Gray, byte> newImg = img.ConvertTo<Gray, byte>();

If you want to convert an Image to a Mat object, you can use the following code.

Image<Gray, byte> img = new Image<Gray, byte>();
Mat img2 = img.Mat;

To make life easier when converting an image to grayscale, we can use the CvInvoke class within EmguCV, and convert a Mat object to grayscale, instead of coverting it to an Image first, then converting it.

//Read in the image and store it in a Mat
Mat img = CvInvoke.Imread("img.jpg", ImreadModes.AnyColor);

//This Mat object is used to store the output
Mat output = new Mat();

//The CvInvoke method that will convert the image to grayscale
CvInvoke.CvtColor(img, output, ColorConversion.Bgr2Gray);

Although EmguCV can be difficult to grasp at first, once you can learn the basic functions within the wrapper, the library becomes very easy to use. The classes and functions used within EmguCV are ripped directly from OpenCV, and allow for easier implementation of computer vision algorithms within the .NET framework.

Advertisements

Creating A Credit Card OCR Application Part 2

Credit Card OCR II

With the second portion of the credit card ocr application, I wanted to focus on sprucing up the GUI and making the whole application more presentable. Most of my past developer experience has been related to back end and deep learning work, so I wanted to get better at front end and making applications more appealing to the eye. The contents of this repo can be found here.

Since the original application was coded in c#, the logical choice for my front end code is Xaml. Although it is very different to the other, more popular front end languages like JavaScript or CSS, coupling it with c# was my best option.

My early versions of the GUI were very minimal, with only the default textures in the xaml toolbox being used.

My first step in cleaning up my GUI was splitting up some of my functions and navigation into multiple pages. I separated two main functions into their own separate pages, the capture method relating to relaying a live camera feed to the GUI and capture an image when needed, and the output page where the detected numbers would go.

Below are the whole Xaml code for both the camera page and the output page.

<Page x:Class="Credit_Card_OCR.Pages.CameraPage"
      xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
      xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
      xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" 
      xmlns:d="http://schemas.microsoft.com/expression/blend/2008" 
      xmlns:local="clr-namespace:Credit_Card_OCR.Pages"
      mc:Ignorable="d" 
      d:DesignHeight="400" d:DesignWidth="700"
      Title="CameraPage">

    <Grid Background="DarkTurquoise">
        <Grid.RowDefinitions>
            <RowDefinition Height ="1*"/>
            <RowDefinition Height ="50"/>
        </Grid.RowDefinitions>
        <Grid.ColumnDefinitions>
            <ColumnDefinition Width="1*"/>
        </Grid.ColumnDefinitions>

        <Image x:Name="webcamOutput" Grid.Column="0" HorizontalAlignment="Center" VerticalAlignment="Center" Grid.Row="0" Stretch="Fill"/>
        <Button x:Name="btnCapture" Style="{StaticResource MetroButtonGreen}" Content="Start Camera Stream" Grid.Column="0" HorizontalAlignment="Stretch" VerticalAlignment="Stretch" Grid.Row="1" Click="btnCapture_Click"/>

    </Grid>
</Page>

<Page x:Class="Credit_Card_OCR.Pages.ImageListOutputPage"
      xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
      xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
      xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" 
      xmlns:d="http://schemas.microsoft.com/expression/blend/2008" 
      xmlns:local="clr-namespace:Credit_Card_OCR.Pages"
      mc:Ignorable="d" 
      d:DesignHeight="450" d:DesignWidth="800"
      Title="ImageListOutputPage">

    <Grid>
        <Grid.RowDefinitions>
            <RowDefinition Height ="1*"/>
            <RowDefinition Height="50"/>
        </Grid.RowDefinitions>
        <Grid.ColumnDefinitions>
            <ColumnDefinition Width="1*"/>
        </Grid.ColumnDefinitions>

        <Image x:Name="imgOutput" HorizontalAlignment="Center" VerticalAlignment="Center" Grid.Column="1" Stretch="Fill"/>
        <ListBox x:Name="lstOutput" VerticalContentAlignment="Stretch" Grid.Column="1" Grid.Row="6" HorizontalContentAlignment="Stretch"/>
        
    </Grid>
</Page>

Most of the code above is just simple boilerplate that Microsoft provides when creating a page, the items on the page that I created are marked with the tags < and />. The only component of the Xaml code above that is not implemented in Microsoft’s default Xaml code with the button style “metrogreen”, which is defined in the App.xaml.

<Application x:Class="Credit_Card_OCR.App"
             xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
             xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
             xmlns:local="clr-namespace:Credit_Card_OCR"
             StartupUri="MainWindow.xaml">
    <Application.Resources>
        <Style x:Key="MetroButtonAllert" TargetType="{x:Type Button}" >
            <Setter Property="Background" Value="OrangeRed" />
            <Setter Property="Foreground" Value="White" />
            <Setter Property="HorizontalContentAlignment" Value="Center" />
            <Setter Property="VerticalContentAlignment" Value="Center" />
            <Setter Property="Padding" Value="10 5" />
            <Setter Property="FontFamily" Value="Tahoma" />
            <Setter Property="FontSize" Value="14" />
            <Setter Property="BorderThickness" Value="2" />
            <Setter Property="Template">
                <Setter.Value>
                    <ControlTemplate TargetType="{x:Type Button}">
                        <Grid>
                            <Border
                                x:Name="Border"
                                Background="{TemplateBinding Background}"
                                BorderBrush="{TemplateBinding BorderBrush}"
                                BorderThickness="{TemplateBinding BorderThickness}"/>
                            <ContentPresenter
                                HorizontalAlignment="{TemplateBinding HorizontalContentAlignment}"
                                Margin="{TemplateBinding Padding}"
                                VerticalAlignment="{TemplateBinding VerticalContentAlignment}"
                                RecognizesAccessKey="True"/>
                        </Grid>
                        <ControlTemplate.Triggers>
                            <Trigger Property="IsPressed" Value="True">
                                <Setter Property="OpacityMask" Value="#AA888888"/>
                            </Trigger>
                            <Trigger Property="IsMouseOver" Value="True">
                                <Setter Property="Background" Value="#cf2a0e"/>
                            </Trigger>
                            <Trigger Property="IsEnabled" Value="False">
                                <Setter Property="Foreground" Value="#ADADAD" />
                            </Trigger>
                        </ControlTemplate.Triggers>
                    </ControlTemplate>
                </Setter.Value>
            </Setter>
        </Style>
        <Style x:Key="MetroButtonGreen" BasedOn="{StaticResource MetroButtonAllert}" TargetType="{x:Type Button}">
            <Setter Property="Background" Value="OliveDrab"/>
            <Style.Triggers>
                <Trigger Property="IsMouseOver" Value="True">
                    <Setter Property="Background" Value="#368a55"/>
                </Trigger>
            </Style.Triggers>
        </Style>
    </Application.Resources>
</Application>

The point of the code above is to create a style, a distinct set of parameters and events that can be assigned to different Xaml features to make it look better, similar to JavaScript and CSS. Here I have to styles, one for a green button and one for a red button, both of witch are used more in the next step in our Xaml code.

To control the navigation between these pages, we use multiple buttons on a stack panel, which allows a static control panel to exist while the rest of the GUI changes whenever needed.

<Window x:Class="Credit_Card_OCR.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
        xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
        xmlns:local="clr-namespace:Credit_Card_OCR"
        mc:Ignorable="d"
        Title="Credit Card Detection" Height="450" Width="800" Background="DarkTurquoise">

    <Grid>
        <Grid.RowDefinitions>
            <RowDefinition Height ="1*"/>
            <RowDefinition Height ="50"/>
        </Grid.RowDefinitions>
        <Grid.ColumnDefinitions>
            <ColumnDefinition Width="100"/>
            <ColumnDefinition Width="1*"/>
        </Grid.ColumnDefinitions>

        <StackPanel Grid.Row="0" Grid.RowSpan="2" Grid.Column="0" Grid.ColumnSpan="2">
            <Grid>
                <Grid.RowDefinitions>
                    <RowDefinition Height ="50"/>
                    <RowDefinition Height ="50"/>
                    <RowDefinition Height ="50"/>
                    <RowDefinition Height ="50"/>
                    <RowDefinition Height ="50"/>
                    <RowDefinition Height ="110"/>
                    <RowDefinition Height ="50"/>
                </Grid.RowDefinitions>
                <Grid.ColumnDefinitions>
                    <ColumnDefinition Width="100"/>
                    <ColumnDefinition Width="1*"/>
                </Grid.ColumnDefinitions>

                <Button x:Name="btnOCR" Grid.Column="0" Style="{StaticResource MetroButtonGreen}" Grid.Row="1" Click="btnOCR_Click">
                    <StackPanel HorizontalAlignment="Stretch" VerticalAlignment="Stretch" Visibility="Visible" Grid.Column="0">
                        <Image Source="Resources\ocrIcon.png" HorizontalAlignment="Center" Stretch="Uniform" Height="30" Width="50"/>
                        <TextBlock HorizontalAlignment="Center" FontSize="10"><Run Text="Capture Image"/></TextBlock>
                    </StackPanel>
                </Button>

                <Button x:Name="btnStart" Grid.Column="0" Style="{StaticResource MetroButtonGreen}" HorizontalAlignment="Stretch" Grid.Row="2" VerticalAlignment="Stretch" Click="btnStart_Click">
                    <StackPanel HorizontalAlignment="Stretch" VerticalAlignment="Stretch" Visibility="Visible" Grid.Column="0">
                        <Image Source="Resources\startIcon.png" HorizontalAlignment="Center" Stretch="Uniform" Height="30" Width="50"/>
                        <TextBlock HorizontalAlignment="Center" FontSize="10"><Run Text="Start Camera"/></TextBlock>
                    </StackPanel>
                </Button>
                
                <Button x:Name="btnOpen" Grid.Column="0" Style="{StaticResource MetroButtonGreen}" HorizontalAlignment="Stretch" Grid.Row="3" VerticalAlignment="Stretch" Click="btnOpen_Click">
                    <StackPanel HorizontalAlignment="Stretch" VerticalAlignment="Stretch" Visibility="Visible" Grid.Column="0">
                        <Image Source="Resources\folderIcon.png" HorizontalAlignment="Center" Stretch="Uniform" Height="30" Width="50"/>
                        <TextBlock HorizontalAlignment="Center" FontSize="10"><Run Text="Open Image"/></TextBlock>
                    </StackPanel>
                </Button>
                
                <Button x:Name="btnCapture" Grid.Column="0" Style="{StaticResource MetroButtonGreen}" HorizontalAlignment="Stretch" Grid.Row="4" VerticalAlignment="Stretch" Click="btnCapture_Click" IsEnabled="False">
                    <StackPanel HorizontalAlignment="Stretch" VerticalAlignment="Stretch" Visibility="Visible" Grid.Column="0">
                        <Image Source="Resources\cameraIcon.png" HorizontalAlignment="Center" Stretch="Uniform" Height="30" Width="50"/>
                        <TextBlock HorizontalAlignment="Center" FontSize="10"><Run Text="Capture"/></TextBlock>
                    </StackPanel>
                </Button>
                
                <Button x:Name="btnClear" Grid.Column="0" Style="{StaticResource MetroButtonAllert}" HorizontalAlignment="Stretch" Grid.Row="6" VerticalAlignment="Stretch" Click="btnClear_Click">
                    <StackPanel HorizontalAlignment="Stretch" VerticalAlignment="Stretch" Visibility="Visible" Grid.Column="0">
                        <Image Source="Resources\clearIcon.png" HorizontalAlignment="Center" Stretch="Uniform" Height="30" Width="50"/>
                        <TextBlock HorizontalAlignment="Center" FontSize="10"><Run Text="Clear"/></TextBlock>
                    </StackPanel>
                </Button>
            </Grid>
        </StackPanel>

        <Frame x:Name="frame" Grid.Column="1" Grid.Row="0" Grid.RowSpan="2" NavigationUIVisibility="Hidden"/>

    </Grid>
</Window>

For my stack panel, I have five buttons, one for running ocr with the color green, one for starting the camera with the color green, one for opening an image with the color green, one for capturing an image with the color green, and one for clearing all necessary objects with the color red.

The images on the buttons are from Icon8, which I choose them as they were simple and accomplished what the button was trying to do.

The OCR button is only used when an image is present, which launches the ocr detection method, of which we created back in the part one of this post. The start camera button initializes the camera page and creates a camera stream, which is created in the cs of the page.

using Emgu.CV;
using Emgu.CV.Structure;
using System;
using System.Drawing;
using System.Windows;
using System.Windows.Controls;

namespace Credit_Card_OCR.Pages
{
    /// <summary>
    /// Interaction logic for CameraPage.xaml
    /// </summary>
    public partial class CameraPage : Page
    {
        //The video capture stream that holds the video from the camera
        public VideoCapture stream = new VideoCapture(0);

        public CameraPage()
        {
            InitializeComponent();
        }

        public Bitmap GenerateTakenImage()
        {
            //Get the taken picture from the video stream
            Image<Bgr, byte> inputImg = CameraConfig.TakePicture(stream);

            //Convert the image to a bitmap
            Bitmap bit = inputImg.ToBitmap();

            return bit;
        }

        private void Capture_ImageGrabbed1(object sender, EventArgs e)
        {
            //Get the frame
            Bitmap frame = CameraConfig.GetFrame(stream);

            //Check if the frame taken is null
            //if it is, that means that no camera is connected,
            //so output that to the user and end the subscription
            if (frame == null)
            {
                MessageBox.Show("There is no camera connected");
                stream.ImageGrabbed -= Capture_ImageGrabbed1;
            }
            else
            {
                try
                {
                    //Pass each frame from the video capture to the output image
                    this.Dispatcher.Invoke(() =>
                    {
                        webcamOutput.Source = ImageUtils.ImageSourceFromBitmap(frame);
                    });
                }
                catch (System.Threading.Tasks.TaskCanceledException)
                {
                    //End the thread if its exited early
                    this.Dispatcher.InvokeShutdown();
                }
            }
        }

        private void btnCapture_Click(object sender, RoutedEventArgs e)
        {
            //Start grabbing frames from the webcam input
            stream.ImageGrabbed += Capture_ImageGrabbed1;
            stream.Start();
        }
    }
}

When the capture button is clicked on the capture page and start a subscription to constantly stream frames to the camera page. When the capture button on the MAINWINDOW is clicked, it grabs the latest frame being streamed and outputs it as a single still image ready for the OCR method. I put the streaming of the camera inside its own method, as if it was coupled inside the main thread, lag and loss of frames causes issues. The entire code for the main window can be found below.

using Credit_Card_OCR.Pages;
using Emgu.CV;
using Emgu.CV.CvEnum;
using System.Collections.Generic;
using System.Drawing;
using System.Windows;
using System.Windows.Forms;

namespace Credit_Card_OCR
{
    /// <summary>
    /// Interaction logic for MainWindow.xaml
    /// </summary>
    public partial class MainWindow : Window
    {
        ImageListOutputPage outputPage = new ImageListOutputPage();

        public MainWindow()
        {
            InitializeComponent();

            frame.Content = outputPage;
        }

        //Mat object to hold image that is taken
        public static Mat img = new Mat();

        //Bool variables to help with the flow of the algorithm
        bool isStream = false;
        bool isImageLoaded = false;

        private void btnStart_Click(object sender, RoutedEventArgs e)
        {
            //Set bool variable to use camera image
            isStream = true;

            //Create a new object for the page
            CameraPage camera = new CameraPage();

            btnCapture.IsEnabled = true;
            btnOCR.IsEnabled = false;

            //Navigate to the new page
            frame.NavigationService.Navigate(camera);
        }

        private void btnOCR_Click(object sender, RoutedEventArgs e)
        {
            if (isImageLoaded == true)
            {
                //If a video stream is currently happening, do the following, else do the last
                if (isStream == true)
                {
                    //Align the video capture image
                    img = AutoAlignment.Align(img);

                    Bitmap bit = img.ToBitmap();

                    //Pass the bitmap image to be displayed
                    outputPage.imgOutput.Source = ImageUtils.ImageSourceFromBitmap(bit);
                }
                else
                {
                    //Read in the desired image
                    img = OCR.ReadInImage();

                    //Convert the image to a bitmap, then to an image source
                    Bitmap bit = img.ToBitmap();
                    outputPage.imgOutput.Source = ImageUtils.ImageSourceFromBitmap(bit);
                }

                //Detect the text from the image
                string detectedText = OCR.RecognizeText(img);

                //Create a new list for the output
                List<string> output = new List<string>();

                //Add the detected string to the list
                output.Add(detectedText);

                //Display the list
                outputPage.lstOutput.ItemsSource = output;
            }
            else
            {
                System.Windows.MessageBox.Show("Please take or import an image");
            }
        }

        private void btnOpen_Click(object sender, RoutedEventArgs e)
        {
            //Create a string to hold the file location
            string fileLocation = string.Empty;

            //Create an openfiledialog object to select a photo
            OpenFileDialog dialog = new OpenFileDialog
            {
                InitialDirectory = @"C:\",
                Title = "Browse Images",

                CheckFileExists = true,
                CheckPathExists = true,

                FilterIndex = 2,
                RestoreDirectory = true,

                ReadOnlyChecked = true,
                ShowReadOnly = true
            };

            //If a photo is selected, do the following
            if (dialog.ShowDialog() == System.Windows.Forms.DialogResult.OK)
            {
                //Get the file location of the photo selected
                fileLocation = dialog.FileName;

                //If the length is greater than one, then use it to read in the image
                if (fileLocation.Length > 1)
                {
                    //Read in the image
                    img = CvInvoke.Imread(fileLocation, ImreadModes.AnyColor);
                }
            }

            //If the file location is greater than one, pass the loaded in image to be displayed
            if (fileLocation.Length > 1)
            {
                //Convert the image to a bitmap, then to an image source
                Bitmap bit = img.ToBitmap();
                outputPage.imgOutput.Source = ImageUtils.ImageSourceFromBitmap(bit);

                //Declare the bool variable that an image has been loaded
                isImageLoaded = true;
            }
        }

        private void btnCapture_Click(object sender, RoutedEventArgs e)
        {
            CameraPage cameraPage = new CameraPage();

            Bitmap taken = cameraPage.GenerateTakenImage();

            if (taken != null)
            {
                outputPage.imgOutput.Source = ImageUtils.ImageSourceFromBitmap(taken);
            }

            //Declare that image is used
            isImageLoaded = true;

            //Enable the detect button
            btnOCR.IsEnabled = true;

            //Clear the frame
            frame.Content = outputPage;
        }

        private void btnClear_Click(object sender, RoutedEventArgs e)
        {
            //Clear the image
            outputPage.imgOutput.Source = null;

            //Clear the frame
            frame.Content = outputPage;

            //Empty the list box
            if (outputPage.lstOutput.Items.Count > 0)
            {
                outputPage.lstOutput.ItemsSource = null;
            }

            //Set both bool variables to false
            isImageLoaded = false;
            isStream = false;

            //Disable capture button
            btnCapture.IsEnabled = false;
            btnOCR.IsEnabled = true;
        }
    }
}

Although this is pretty boiler plate code, and I must admit the front end code does look a little sloppy, but Its a step in a better direction then what the GUI was earlier. I am still working to add more features to the GUI, and I hope to change the color scheme and other parts of the GUI.

Creating a Friends Script Generator with Deep Learning

Friends Sequential Network Generator

Advertisements

Although I have worked with convolution neural networks in the past, I wanted to expand my experience in deep learning. My first project idea was to create a script generator using a simple sequential neural network. While there is a lot of calculus and linear algebra involved in deep learning, I wont add a bunch of it in this article, but I’ll likely create another post talking about it later on. To help me along the way, I used this Tensorflow tutorial article to help me implement the basic model breakdown. The repo for this post can be found here.

I was eager to start creating my network, but before I could, I had to collect all my data. In deep learning, while the model is important, both the quality and amount of data available can greatly help or hurt your chances at correct identification. I found a website with the full text for all the scripts online, but they were all nested, with each script located on another page with a hyper link pointing to it from the main website page. Now I could just simply copy each script from each page of the website, but that would take forever. Since we are programmers, I wanted to create a script that would automate it for me.

The obvious language of choice that I wanted to use was Python, to which I could use the libary called BeautifulSoup, which is great for web scrapping.

First we must import all necessary packages, as well as getting the current project directory.

# Import Packages
import httplib2
from bs4 import BeautifulSoup, SoupStrainer
import urllib.request
import os

# root directory of the project
root_path = os.path.dirname(os.path.realpath(__file__))

Next, we declare a new http object, and get an http request while storing both the status and response of the request.

# Declare an http object
http = httplib2.Http()

# Get the status and respose from the webpage
status, response = http.request('https://fangj.github.io/friends/')

To store our output, we create an array, which will store all the hyperlinks on the main website that point to each individual script, and the output path.

# Create a list
links = []

# (same directory) in append mode and 
file1 = open(root_path + "/dataset/dataset.txt","a", encoding="utf-8") 

Now that we’ve done all the boiler plate, we can actually start parsing the html and grabbing all necessary text. First we declare the type of parser we want to use, which will be the html parser. Next, we open the base website with all the hyperlinks to the friends scripts. Finally, we pass all relevant objects to the BeautifulSoup object to begin parsing.

# Open the webpage, declaring the right decoder and parser
parser = 'html.parser'  # or 'lxml' (preferred) or 'html5lib', if installed
resp = urllib.request.urlopen("https://fangj.github.io/friends/")
soup = BeautifulSoup(resp, parser, from_encoding=resp.info().get_param('charset'))

We then use a loop to grab all hyper links on the main website and append them to the links array we declared earlier.

# Add hyperlinks to a list
for link in soup.find_all('a', href=True):
    links.append("https://fangj.github.io/friends/" + link['href'])

At last we actually parse all html and extract the text.

# Open each webpage, read all text, then write the text to a file
for i in range(0, len(links)):
    resp = urllib.request.urlopen(links[i])
    print("Reading " + str(resp))
    soup = BeautifulSoup(resp, features="lxml")
    txt = soup.get_text()
    file1.write(txt)

# Close the file
file1.close()

Below is the full code for how I get the friends script dataset.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""
    This python script is used as a web scraper to gather all Friends scripts through various links
    found at this website https://fangj.github.io/friends/
"""

# Import Packages
import httplib2
from bs4 import BeautifulSoup, SoupStrainer
import urllib.request
import os

# root directory of the project
root_path = os.path.dirname(os.path.realpath(__file__))

# Declare an http object
http = httplib2.Http()

# Get the status and respose from the webpage
status, response = http.request('https://fangj.github.io/friends/')

# Create a list
links = []

# (same directory) in append mode and 
file1 = open(root_path + "/dataset/dataset.txt","a", encoding="utf-8") 

# Open the webpage, declaring the right decoder and parser
parser = 'html.parser'  # or 'lxml' (preferred) or 'html5lib', if installed
resp = urllib.request.urlopen("https://fangj.github.io/friends/")
soup = BeautifulSoup(resp, parser, from_encoding=resp.info().get_param('charset'))

# Add hyperlinks to a list
for link in soup.find_all('a', href=True):
    links.append("https://fangj.github.io/friends/" + link['href'])

# Open each webpage, read all text, then write the text to a file
for i in range(0, len(links)):
    resp = urllib.request.urlopen(links[i])
    print("Reading " + str(resp))
    soup = BeautifulSoup(resp, features="lxml")
    txt = soup.get_text()
    file1.write(txt)

# Close the file
file1.close()

Now we have one mega long file with all the friends scripts in text in it. Once this has been done, we can now begin creating our model to generate a random script.

To start off, we import all necessary libraries, while also getting the path of our current directory.

# Import necessary libraries
import tensorflow as tf
import numpy as np
import os
import time

# These are for generating a random logs folder
import string
import random

# Root directory of the project
root_path = os.path.dirname(os.path.realpath(__file__))

Although its not greatly important to include this feature, depending on your GPU and the amount of Vram it has, its best to enable memory growth, which allows for the GPU to only increase its memory consumption when needed, rather then just allocating a huge chuck of memory automatically.

#Allow GPU Growth
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  try:
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
  except RuntimeError as e:
    print(e)

Next we will declare some parameters. We declare them here so if we want to adjust something for our model, we wont have to go hunting in our code to find it.

############################################################
#  Settings
############################################################
# The maximum length sentence we want for a single input in characters
seq_length = 100

# Number of RNN units
rnn_units = 1024

# The embedding dimension
embedding_dim = 256

# Batch size
BATCH_SIZE = 65

# Number of epochs to run through
EPOCHS=10

BUFFER_SIZE = 100

# Steps per epochs
STEPS = 50

# Low temperatures result in more predictable text.
# Higher temperatures result in more surprising text.
# Experiment to find the best setting.
temperature = 1.0

# Number of characters to generate
num_generate = 200

# Initialization of loss value as a global variable
# to be used in multiple functions
loss = 0

To read in all of our data, we will create a function to handle all of those parameters for us. First we get the path to where we saved our big text file to.

  # Path to dataset
  dataset_path = os.path.join(root_path, r"dataset/dataset.txt")

Next, we decode all of the text and store it in one long string.

  # Read, then decode for py2 compat.
  text = open(dataset_path, 'rb').read().decode(encoding='unicode_escape')

To better randomize our data, we sort the text for unique characters and store them in a list.

  # The unique characters in the file
  vocab = sorted(set(text))

For randomization, we map characters to a dictionary and crab their indexes. We also slice up our data and create sequences for our dataset. Finally, we further split up our dataset which allows us to return the entire sorted, and split dataset as a return value.

  # Creating a mapping from unique characters to indices
  char2idx = {u:i for i, u in enumerate(vocab)}
  idx2char = np.array(vocab)

  text_as_int = np.array([char2idx[c] for c in text])

  # Create training examples / targets
  char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

  # Create the sequence from the dataset
  sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

  def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text
   
  # Split the dataset up for better readability and passing into epochs and batch sizes
  dataset = sequences.map(split_input_target)

Below is the entire create dataset method.

############################################################
#  Create Dataset
############################################################

def create_dataset():
  # Path to dataset
  dataset_path = os.path.join(root_path, r"dataset/dataset.txt")

  # Read, then decode for py2 compat.
  text = open(dataset_path, 'rb').read().decode(encoding='unicode_escape')

  # The unique characters in the file
  vocab = sorted(set(text))

  # Creating a mapping from unique characters to indices
  char2idx = {u:i for i, u in enumerate(vocab)}
  idx2char = np.array(vocab)

  text_as_int = np.array([char2idx[c] for c in text])

  # Create training examples / targets
  char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

  # Create the sequence from the dataset
  sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

  def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text
   
  # Split the dataset up for better readability and passing into epochs and batch sizes
  dataset = sequences.map(split_input_target)

  return dataset, vocab, idx2char, char2idx

Although their are a ton of models available with a ton of uses, for simplicity and to create a baseline to build from, we use a sequential model.

First we create a embedding layer, which turns the individual characters from our dataset into dense vectors which could be used to better pass through the rest of our model. Next, we use a GRU layer, which is the critical and most important layer of our model. Since we are creating a model that generates text, we need to know the history of inputted characters, so we can check if a combination of characters makes a valid word or matches with the dataset. Similar to an LSTM, which is also used heavily in text generation models, GRUs have gates, which can learn which data to ignore, and which data to include based on past history feed through the model. Finally, we add dense layer, which is the most commonly used layer in deep learning, and has both input and output nodes that have weight values between nodes.

############################################################
#  Model
############################################################

def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim,
                              batch_input_shape=[batch_size, None]),
    tf.keras.layers.GRU(rnn_units,
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)
  ])
  return model

Now we get the meat of our model python script where we begin to actually train our model. First we shuffle our dataset and get the length of the vocabulary that we are passing in our model.

  dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

  # Length of the vocabulary in chars
  vocab_size = len(vocab)

Next, we build our model by passing our parameters declared earlier.

  model = build_model(
    vocab_size = vocab_size,
    embedding_dim=embedding_dim,
    rnn_units=rnn_units,
    batch_size=BATCH_SIZE)

Next we get some data from a categorical distribution, then squeeze our data for better reshaping.

  for input_example_batch, target_example_batch in dataset.take(1):
    example_batch_predictions = model(input_example_batch)
    print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

  # Print out the different layers of the model
  model.summary()

  sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
  sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()

  sampled_indices

One of the main parameters that we need to keep track of is the loss of our model. This parameter helps us distinguish how well our model is learning. To distinguish our loss, we use sparse categorical crossentropy.

  def loss(labels, logits):
    return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

The optimizer we use in this example is called Adaptive Moment Estimation, or Adam. A modified version of gradient descent called stochastic gradient descent, Adam helps find the minimum for each epoch and minimize loss.

  model.compile(optimizer='adam', loss=loss)

To save our checkpoint and weight files, we must select the path and enable the option in our code. First we generate a random string so no matter how many times we try to train our model, the chances of overlapping log files would be very very small. Next we select the prefix of the checkpoints, then enable Tensorflow checkpoints.

  # generate a random string 15 characters long
  random_string = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(15))

  # Directory where the checkpoints will be saved
  checkpoint_dir = './logs/' + random_string + "/"

  # Name of the checkpoint files
  checkpoint_prefix = os.path.join(checkpoint_dir, "Model_{epoch}")

  checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
      filepath=checkpoint_prefix,
      save_weights_only=True,
      save_freq='epoch')

We then save our weights and fit our model. Once we fit our model, it begins training and generating weight files. We then select the path with the random string we generated earlier of where we want the weight files to go.

  model.save_weights(checkpoint_prefix.format(epoch=0))

  history = model.fit(dataset, steps_per_epoch=STEPS, epochs=EPOCHS, callbacks=[checkpoint_callback])

  # Create a path for the saving location of the model
  model_dir = checkpoint_dir + "model.h5"

  # Save the model
  # TODO: Known issue with saving the model and loading it back
  # later in the script causes issues. Working to fix this issue
  model.save(model_dir)

We then train from the last checkpoint, build our model, and print our the summary of it to the console.

  # Train from the last checkpoint
  tf.train.latest_checkpoint(checkpoint_dir)

  # Build the model with the dataset generated earlier
  model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)

  # Load the weights
  model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))

  # Build the model
  model.build(tf.TensorShape([1, None]))

  # Print out the model summary
  model.summary()

Below is all the code for the training of our model.

############################################################
#  Train Model
############################################################

def train_model(dataset, vocab):
  # Buffer size to shuffle the dataset
  # (TF data is designed to work with possibly infinite sequences,
  # so it doesn't attempt to shuffle the entire sequence in memory. Instead,
  # it maintains a buffer in which it shuffles elements).
  dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

  # Length of the vocabulary in chars
  vocab_size = len(vocab)

  model = build_model(
    vocab_size = vocab_size,
    embedding_dim=embedding_dim,
    rnn_units=rnn_units,
    batch_size=BATCH_SIZE)

  for input_example_batch, target_example_batch in dataset.take(1):
    example_batch_predictions = model(input_example_batch)
    print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

  # Print out the different layers of the model
  model.summary()

  sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
  sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()

  sampled_indices

  def loss(labels, logits):
    return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

  example_batch_loss  = loss(target_example_batch, example_batch_predictions)
  print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")

  model.compile(optimizer='adam', loss=loss)

  # generate a random string 15 characters long
  random_string = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(15))

  # Directory where the checkpoints will be saved
  checkpoint_dir = './logs/' + random_string + "/"

  # Name of the checkpoint files
  checkpoint_prefix = os.path.join(checkpoint_dir, "Model_{epoch}")

  checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
      filepath=checkpoint_prefix,
      save_weights_only=True,
      save_freq='epoch')

  model.save_weights(checkpoint_prefix.format(epoch=0))

  history = model.fit(dataset, steps_per_epoch=STEPS, epochs=EPOCHS, callbacks=[checkpoint_callback])

  # Create a path for the saving location of the model
  model_dir = checkpoint_dir + "model.h5"

  # Save the model
  # TODO: Known issue with saving the model and loading it back
  # later in the script causes issues. Working to fix this issue
  model.save(model_dir)

  # Train from the last checkpoint
  tf.train.latest_checkpoint(checkpoint_dir)

  # Build the model with the dataset generated earlier
  model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)

  # Load the weights
  model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))

  # Build the model
  model.build(tf.TensorShape([1, None]))

  # Print out the model summary
  model.summary()

  # Return the model
  return model

The next and final method we will create is to generate our script from our trained model. Using a seed string value to start our text generator, we then generate our predictions from our model to predict what character would come from the last character based on our training.

############################################################
#  Generate Text
############################################################

def generate_text(model, idx2char, char2idx, start_string):
  # Evaluation step (generating text using the learned model)

  # Converting our start string to numbers (vectorizing)
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  # Empty array to store our results
  text_generated = []

  # Here batch size == 1
  model.reset_states()
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
      predictions = tf.squeeze(predictions, 0)

      # using a categorical distribution to predict the character returned by the model
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # We pass the predicted character as the next input to the model
      # along with the previous hidden state
      input_eval = tf.expand_dims([predicted_id], 0)

      text_generated.append(idx2char[predicted_id])

  return (start_string + ''.join(text_generated))

Now that all of our methods have been created, we can set some arguments to select and choose what methods we want to call at a given time.

For our arguments, we have two types, train and generate. For training, we call our dataset and generate text based on that model. For generate we again call our dataset, then load int the path to the weights file that was saved before. We then build the model and generate the text.

############################################################
#  Configure
############################################################

if __name__ == '__main__':
    import argparse

    # Parse command line arguments
    parser = argparse.ArgumentParser(
        description='Train or Detect.')
    parser.add_argument("command",
                        metavar="<command>",
                        help="'train' or 'generate'")
    parser.add_argument('--weights', required=False,
                        metavar="/path/to/weights",
                        help="Path to weights file")
    parser.add_argument('--start', required=False,
                        metavar="start of string",
                        help="The word that will begin the output string")

    args = parser.parse_args()

    # Configurations
    if args.command == "train":
      # Load in the dataset and other function to use
      dataset, vocab, idx2char, char2idx = create_dataset()
      # Train the model
      model = train_model(dataset, vocab)
      print(generate_text(model, idx2char, char2idx,start_string=u"The "))
    
    if args.command == 'generate':
      # Load in the dataset and other function to use
      dataset, vocab, idx2char, char2idx = create_dataset()
      # Get the model path
      model_path = os.path.join(root_path, args.weights)
      # Length of the vocabulary in chars
      vocab_size = len(vocab)
      # Build the model with the dataset generated earlier
      model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)
      # Load the weights
      model.load_weights(model_path)
      # Build the model
      model.build(tf.TensorShape([1, None]))
      # Generate text
      print(generate_text(model, idx2char, char2idx,start_string=u"The "))

Below is the entire code for the model python script.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""
   The purpose of this python script is to both experiment with new deep learning techniques,
   and to further my knowledge on the subject of deep learning.

   I used this as a reference guide:
   https://www.tensorflow.org/tutorials/text/text_generation
"""

# Import necessary libraries
import tensorflow as tf
import numpy as np
import os
import time

# These are for generating a random logs folder
import string
import random

# Root directory of the project
root_path = os.path.dirname(os.path.realpath(__file__))

#Allow GPU Growth
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  try:
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
  except RuntimeError as e:
    print(e)

############################################################
#  Settings
############################################################
# The maximum length sentence we want for a single input in characters
seq_length = 100

# Number of RNN units
rnn_units = 1024

# The embedding dimension
embedding_dim = 256

# Batch size
BATCH_SIZE = 65

# Number of epochs to run through
EPOCHS=10

BUFFER_SIZE = 100

# Steps per epochs
STEPS = 50

# Low temperatures result in more predictable text.
# Higher temperatures result in more surprising text.
# Experiment to find the best setting.
temperature = 1.0

# Number of characters to generate
num_generate = 200

# Initialization of loss value as a global variable
# to be used in multiple functions
loss = 0

############################################################
#  Create Dataset
############################################################

def create_dataset():
  # Path to dataset
  dataset_path = os.path.join(root_path, r"dataset/dataset.txt")

  # Read, then decode for py2 compat.
  text = open(dataset_path, 'rb').read().decode(encoding='unicode_escape')

  # The unique characters in the file
  vocab = sorted(set(text))

  # Creating a mapping from unique characters to indices
  char2idx = {u:i for i, u in enumerate(vocab)}
  idx2char = np.array(vocab)

  text_as_int = np.array([char2idx[c] for c in text])

  # Create training examples / targets
  char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

  # Create the sequence from the dataset
  sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

  def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text
   
  # Split the dataset up for better readability and passing into epochs and batch sizes
  dataset = sequences.map(split_input_target)

  return dataset, vocab, idx2char, char2idx

############################################################
#  Model
############################################################

def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim,
                              batch_input_shape=[batch_size, None]),
    tf.keras.layers.GRU(rnn_units,
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)
  ])
  return model

############################################################
#  Train Model
############################################################

def train_model(dataset, vocab):
  # Buffer size to shuffle the dataset
  # (TF data is designed to work with possibly infinite sequences,
  # so it doesn't attempt to shuffle the entire sequence in memory. Instead,
  # it maintains a buffer in which it shuffles elements).
  dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

  # Length of the vocabulary in chars
  vocab_size = len(vocab)

  model = build_model(
    vocab_size = vocab_size,
    embedding_dim=embedding_dim,
    rnn_units=rnn_units,
    batch_size=BATCH_SIZE)

  for input_example_batch, target_example_batch in dataset.take(1):
    example_batch_predictions = model(input_example_batch)
    print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

  # Print out the different layers of the model
  model.summary()

  sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
  sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()

  sampled_indices

  def loss(labels, logits):
    return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

  example_batch_loss  = loss(target_example_batch, example_batch_predictions)
  print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")

  model.compile(optimizer='adam', loss=loss)

  # generate a random string 15 characters long
  random_string = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(15))

  # Directory where the checkpoints will be saved
  checkpoint_dir = './logs/' + random_string + "/"

  # Name of the checkpoint files
  checkpoint_prefix = os.path.join(checkpoint_dir, "Model_{epoch}")

  checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
      filepath=checkpoint_prefix,
      save_weights_only=True,
      save_freq='epoch')

  model.save_weights(checkpoint_prefix.format(epoch=0))

  history = model.fit(dataset, steps_per_epoch=STEPS, epochs=EPOCHS, callbacks=[checkpoint_callback])

  # Create a path for the saving location of the model
  model_dir = checkpoint_dir + "model.h5"

  # Save the model
  # TODO: Known issue with saving the model and loading it back
  # later in the script causes issues. Working to fix this issue
  model.save(model_dir)

  # Train from the last checkpoint
  tf.train.latest_checkpoint(checkpoint_dir)

  # Build the model with the dataset generated earlier
  model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)

  # Load the weights
  model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))

  # Build the model
  model.build(tf.TensorShape([1, None]))

  # Print out the model summary
  model.summary()

  # Return the model
  return model

############################################################
#  Generate Text
############################################################

def generate_text(model, idx2char, char2idx, start_string):
  # Evaluation step (generating text using the learned model)

  # Converting our start string to numbers (vectorizing)
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  # Empty array to store our results
  text_generated = []

  # Here batch size == 1
  model.reset_states()
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
      predictions = tf.squeeze(predictions, 0)

      # using a categorical distribution to predict the character returned by the model
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # We pass the predicted character as the next input to the model
      # along with the previous hidden state
      input_eval = tf.expand_dims([predicted_id], 0)

      text_generated.append(idx2char[predicted_id])

  return (start_string + ''.join(text_generated))

############################################################
#  Configure
############################################################

if __name__ == '__main__':
    import argparse

    # Parse command line arguments
    parser = argparse.ArgumentParser(
        description='Train or Detect.')
    parser.add_argument("command",
                        metavar="<command>",
                        help="'train' or 'generate'")
    parser.add_argument('--weights', required=False,
                        metavar="/path/to/weights",
                        help="Path to weights file")
    parser.add_argument('--start', required=False,
                        metavar="start of string",
                        help="The word that will begin the output string")

    args = parser.parse_args()

    # Configurations
    if args.command == "train":
      # Load in the dataset and other function to use
      dataset, vocab, idx2char, char2idx = create_dataset()
      # Train the model
      model = train_model(dataset, vocab)
      print(generate_text(model, idx2char, char2idx,start_string=u"The "))
    
    if args.command == 'generate':
      # Load in the dataset and other function to use
      dataset, vocab, idx2char, char2idx = create_dataset()
      # Get the model path
      model_path = os.path.join(root_path, args.weights)
      # Length of the vocabulary in chars
      vocab_size = len(vocab)
      # Build the model with the dataset generated earlier
      model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)
      # Load the weights
      model.load_weights(model_path)
      # Build the model
      model.build(tf.TensorShape([1, None]))
      # Generate text
      print(generate_text(model, idx2char, char2idx,start_string=u"The "))

Although deep learning can be an intimidating, breaking it down to smaller pieces can help you understand it better. Although I didn’t mention the math in this post, it is worth to read up on it, which could develop a better understanding of how deep learning works.

Advertisements

Create a Weather Texting Python Script

Weather SMS

Advertisements

For this blog post, it will be a little different, as this just consists of a 49 line python script. I wanted to create a simple script that would text me the weather in my area each morning when I woke up. Although I normally create projects in C#, I choose to do this in python, as it would both be simpler, while also allowing more flexibility in how data was sent across multiple APIs. The full code for this post can be found at this Github repo.

The first step in sending an SMS was to sign up for Twilio. It is important to note however, that while they do give you some trial money to send messages, you will have to eventually pay for their service if you send enough messages. The good news is currently, they give you around $15.00, and it only costs 0.01¢ per SMS.

Once you sign up for their service, you will have two items you will need to use for your python script, your account id and token. Once you have those, you can input them into the code like so.

import pyowm, json
from twilio.rest import Client

#Twilio settings
account = 'ACCOUNT ID'
token = 'TOKEN ID'

Next, we need to get the weather data in our area. To do this, we will use pyowm, which is a wrapper of OpenWeatherMap. Once you have the API key from OpenWeatherMap, you can input it in the code below.

#API key for pyowm
owm = pyowm.OWM('API KEY')

Now that all of our configurations have been set up, we can now start pulling and formatting data to send in a SMS. The first step is getting all weather data from the location we want to get the weather from. In python this is very easy and can be done in one line.

#Get location data for the relevent area
observation = owm.weather_at_place('CITY,COUNTRY')

It is important to note, that for the COUNTRY tag, you have to use your country abbreviations. For example, if you live in the United States, use US.

Next, we get the weather data from the location we specified earlier using the command below.

#Store relevent data from the declared place
w = observation.get_weather()

Now that we have our data, we must sort and format it. For our project, we only want four items, the current temp, the min and max temp for the day, and the wind gust. To do this, we just call two functions.

#Weather details
wind = w.get_wind()
temp = w.get_temperature('fahrenheit')

If we were to print wind and temp right now, the data would be in JSON format. To fix this, we must use the json package within python. First we dump our data into the json dump function, then load that data to further sort and fix the data.

#Dump and load json string
windDump = json.dumps(wind)
windLoad = json.loads(windDump)

Now we just specify what data of the json line that we want to keep and store.

#Store speed number
speed = windLoad["speed"]

Next, we will do the same thing for the temperature, splitting the output into three numbers, one for the current temp, one for the max temp, and one for the min temp.

#Dump and load jason string
tempDump = json.dumps(temp)
tempLoad = json.loads(tempDump)

#Store final, max, and min temp
tempFinal = tempLoad["temp"]
tempMax = tempLoad["temp_max"]
tempMin = tempLoad["temp_min"]

Now we are ready to format our final message to be displayed over SMS. To do this, we will concatenation our final message string to fit all relevant data, using new line characters for better formatting.

#Create the final message to display
finalMessage = ("Weather for CITY" 
    + "\n\nCurrent Temp: " + str(tempFinal) 
    + "\n\nMax Temp: " + str(tempMax) 
    + "\n\nMin Temp: " + str(tempMin) 
    + "\n\nWind Speed: " + str(speed))

Finally, we just have to declare our Twilio client and send our message. This last bit is very easy with the Twilio API, and can be done with the following lines.

#Use twilio client
client = Client(account, token)

#Send Message
client.messages.create(from_='TWILIO NUMBER', to='YOUR NUMBER', 
    body=finalMessage)

We then get an end result like this.

The whole code can be found below.

import pyowm, json
from twilio.rest import Client

#Twilio settings
account = 'ACCOUNT ID'
token = 'TOKEN ID'

#API key for pyowm
owm = pyowm.OWM('API KEY')

#Get location data for the relevent area
observation = owm.weather_at_place('CITY,COUNTRY')

#Store relevent data from the declared place
w = observation.get_weather()

#Weather details
wind = w.get_wind()
temp = w.get_temperature('fahrenheit')

#Dump and load json string
windDump = json.dumps(wind)
windLoad = json.loads(windDump)

#Store speed number
speed = windLoad["speed"]

#Dump and load jason string
tempDump = json.dumps(temp)
tempLoad = json.loads(tempDump)

#Store final, max, and min temp
tempFinal = tempLoad["temp"]
tempMax = tempLoad["temp_max"]
tempMin = tempLoad["temp_min"]

#Create the final message to display
finalMessage = ("Weather for CITY" 
    + "\n\nCurrent Temp: " + str(tempFinal) 
    + "\n\nMax Temp: " + str(tempMax) 
    + "\n\nMin Temp: " + str(tempMin) 
    + "\n\nWind Speed: " + str(speed))

#Use twilio client
client = Client(account, token)

#Send Message
client.messages.create(from_='TWILIO NUMBER', to='YOUR NUMBER', 
    body=finalMessage)

In order to make this task occur daily, I followed this guide, where you would just create a Bat file to run the python script, and create a process in Windows Task Scheduler.

As you can see, this whole process is fairly easy, and can be done in less then 50 lines of python. Although the process is entirely free, Twilio gives you enough capital that if you only send one SMS per day, it can last you 150,000 days.

Advertisements

Creating A Credit Card OCR Application Part 1

Credit Card OCR

Advertisements

Throughout the computer vision and technology communities, credit card optical character recognition and detection has become a growing trend for those who want to creating a starting program to get their feet wet in computer vision and OCR. Although we wont be creating an corporate grade OCR detection software today, we will be creating an introductory program that hopefully as time goes on, we can modify and expand to not only include more features, but also teach us on more computer vision and OCR techniques.

To being this algorithm, we are going to use Google’s Tesseract for our OCR detection. Developed by HP labs decades ago and includes over a hundred different languages, Tesseract has become the staple open source algorithm for any project using OCR. To tweak and enhance our detection results, we will be using EmguCV, which has Tesseract built into it, which is perfect for our needs. If you want to look at the full code for this project, you can view the projects GitHub repo.

The image that we are using to test our detection can be found below. Although it is pretty simple, it is perfect for our beginning program, and we will improve on our detection methods with harder images later on.

Before we can actually detect text on an image, we first must set up our program with the appropriate features to support OCR detection. To enable Tesseract detection in our program, we can use the following code below.

//Declare a new Tesseract OCR engine
private static Tesseract _ocr;

Here we declare a Tesseract object as a class level variable. We do this so we can modify and use the Tesseract object throughout multiple methods in our algorithm. Afterwards, using the code below, we configure certain settings for our Tesseract object to help generate the results that we want.

public static void SetTesseractObjects(string dataPath)
{
   //create OCR engine
   _ocr = new Tesseract(dataPath, "eng", OcrEngineMode.TesseractLstmCombined);
   _ocr.SetVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZ-1234567890/");
}

Using the dataPath variable, we specify where our training data file for the English language is located, of which you can download from this GitHub repo, and we specify what model was used to create it. Next, we limit what characters we want to detect, which is specified using the SetVariable and whitelist feature. It’s important to note however, that the last time I checked, the whitelist feature for Tesseract didn’t work, but I included it in the project for when the issue does get patched, we will be able to use it.

The next method that we create is used for loading in the image. Doing this is pretty easy, and we can use the following code below to load in our image in the Bgr color space.

public static Mat ReadInImage()
{
   //Change this file path to the path where the images you want to stich are located
   string filePath = Directory.GetParent(Directory.GetParent
      (Environment.CurrentDirectory).ToString()) + @"/Images/creditCard.png";

   //Read in the image from the filepath
   Mat img = CvInvoke.Imread(filePath, ImreadModes.AnyColor);

   //Return the image
   return img;
}

Once we have set up our detection and and image, we are ready to begin our credit card OCR detection. If we were to only detect text on the input image, then we would get multiple irrelevant characters and random symbols that we wouldn’t want. On top of that, currently for this project, we only want to detect the credit card numbers, and in order to do that, we must pass our image through some filters to distinguish the credit card numbers from everything else. Using the code below, we can help Tesseract out to achieve better detection results.

        private static List<Mat> ImageProccessing(Mat img)
        {
            //Resize the image for better uniformitty throughout the code
            CvInvoke.Resize(img, img, new Size(700, 500));

            Mat imgClone = img.Clone();

            //Convert the image to grayscale
            CvInvoke.CvtColor(img, img, ColorConversion.Bgr2Gray);

            //Blur the image
            CvInvoke.GaussianBlur(img, img, new Size(5, 5), 8, 8);

            //Threshold the image
            CvInvoke.AdaptiveThreshold(img, img, 30, AdaptiveThresholdType.GaussianC, ThresholdType.Binary, 5, 6);

            //Canny the image
            CvInvoke.Canny(img, img, 8, 8);

            //Dilate the canny image
            CvInvoke.Dilate(img, img, null, new Point(-1, -1), 8, BorderType.Constant, new MCvScalar(0, 255, 255));

            //Filter the contours to only find relevent ones
            List<Mat> foundOutput = FindandFilterContours(imgClone, img);

            return foundOutput;
        }

The first step in processing is to resize our image to a set standard dimensions. The advantage of this, is that if our credit card images are all the same in size, we can better control how we sort relevant pieces of our image for use later on.

After dimensions have been altered, we convert the image to grayscale. The purpose of this is that restricting color channels from three to one allows us to better control our image outcome later on. Our image converted to grayscale can be found below.

Next we blur the image, as to help eliminate any random or extra pixels on our image, while still retaining the most relevant information, like the card number, name, and expiration date. Our blurred image can be found below.

After blurring, we then threshold our image. There are a verity of different thresholding algorithms and types out there, but for our purposes, we will be using adaptive thresholding, to better distinguish relevant features but still eliminating for false positives. Our threshold image can be found below.

As you can see, the threshold image eliminated majority of the image, while still retaining most of the information that we want to keep. Although this is great, we still want to go a step further to make sure only the exact features we want to use are detected.

As you can see, the canny image is more defined on top of our threshold image, and further helps define all features that we will want to extract. Once we have these lines, we then dilate the image, as to connect some areas of the image that are in pieces, which can be found below.

After our image has been dilated, we then are ready to contour our image. The reason we passed our image through so many filters is to assist in the algorithms ability to detect contours and make it easier for us to filter them to only use relevant text that we want to detect. Using the FindandFilterContours method that we called in the above code, we declare it in the code below.

        private static List<Mat> FindandFilterContours(Mat originalImage, Mat filteredImage)
        {
            //Create a blank image that will be used to display contours
            Image<Bgr, byte> blankImage = new Image<Bgr, byte>(originalImage.Width, originalImage.Height);

            //Clone the input image
            Image<Bgr, byte> originalImageClone = originalImage.Clone().ToImage<Bgr, byte>();

            //Declare a new vector that will store contours
            VectorOfVectorOfPoint contours = new VectorOfVectorOfPoint();

            //Find and draw the contours on the blank image
            CvInvoke.FindContours(filteredImage, contours, null, RetrType.Ccomp, ChainApproxMethod.ChainApproxSimple);
            CvInvoke.DrawContours(blankImage, contours, -1, new MCvScalar(255, 0, 0));

            //Create two copys of the cloned image of the input image
            Image<Bgr, byte> allContoursDrawn = originalImageClone.Copy();
            Image<Bgr, byte> finalCopy = originalImageClone.Copy();

            //Create two lists that will be used elsewhere in the algorithm
            List<Rectangle> listRectangles = new List<Rectangle>();
            List<int> listXValues = new List<int>();

            //Loop over all contours
            for (int i = 0; i < contours.Size; i++)
            {
                //Create a bounding rectangle around each contour
                Rectangle rect = CvInvoke.BoundingRectangle(contours[i]);
                originalImageClone.ROI = rect;

                //Add the bounding rectangle and its x value to their corresponding lists
                listRectangles.Add(rect);
                listXValues.Add(rect.X);

                //Draw the bounding rectangle on the image
                allContoursDrawn.Draw(rect, new Bgr(255, 0, 0), 5);
            }

            //Create two new lists that will hold data in the algorithms later on
            List<int> indexList = new List<int>();
            List<int> smallerXValues = new List<int>();

            //Loop over all relevent information
            for (int i = 0; i < listRectangles.Count; i++)
            {
                //If a bounding rectangle fits certain dementions, add it's x value to another list
                if ((listRectangles[i].Width < 400) && (listRectangles[i].Height < 400)
                    && (listRectangles[i].Y > 200) && (listRectangles[i].Y < 300) && 
                    (listRectangles[i].Width > 50) && (listRectangles[i].Height > 40))
                {
                    originalImageClone.ROI = listRectangles[i];

                    finalCopy.Draw(listRectangles[i], new Bgr(255, 0, 0), 5);

                    smallerXValues.Add(listRectangles[i].X);
                }
            }

            //Sort the smaller list into asending order
            smallerXValues.Sort();

            //Loop over each value in the sorted list, and check if the same value is in the original list
            //If it is, add the index of the that value in the original list to a new list
            for (int i = 0; i < smallerXValues.Count; i++)
            {
                for (int j = 0; j < listXValues.Count; j++)
                {
                    if (smallerXValues[i] == listXValues[j])
                    {
                        indexList.Add(j);
                    }
                }
            }

            //A list to hold the final ROIs
            List<Mat> outputImages = new List<Mat>();

            //Loop over the sorted indexes, and add them to the final list
            for (int i = 0; i < indexList.Count; i++)
            {
                originalImageClone.ROI = listRectangles[indexList[i]];

                outputImages.Add(originalImageClone.Clone().Mat);
            }

            CvInvoke.Resize(allContoursDrawn, smallerOutput, new Size(originalImage.Width, originalImage.Height));
            CvInvoke.Imshow("Boxes Drawn on Image", smallerOutput);
            CvInvoke.WaitKey(0);

            CvInvoke.Resize(finalCopy, smallerOutput, new Size(originalImage.Width, originalImage.Height));
            CvInvoke.Imshow("Boxes Drawn on FinalCopy", smallerOutput);
            CvInvoke.WaitKey(0);

            return outputImages;
        }

Although this is a big chunk of code, I will break it down into smaller pieces to help you better understand it.

The first step is to find all contours on the image. Since we have run the image through multiple filters, we can find more distinct contours easier, where it would otherwise be more difficult. In the image below, we have drawn all found contours on a blank image.

Once the contours have been found, we then create bounding rectangles around each contour. The purpose of this is to create a more uniform approach to sorting our contours latter on, as well as allowing us to have access to coordinates for pixels and the width and height of each bounding rectangle.

As you can see, there are some parts of the image that got random contours detected, like in the parts of the image with the card holder and the expiration date. In order to only use contours we need, we must sort the relevant contours. We do this using this chunk of code below.

            //Loop over all relevent information
            for (int i = 0; i < listRectangles.Count; i++)
            {
                //If a bounding rectangle fits certain dementions, add it's x value to another list
                if ((listRectangles[i].Width < 400) && (listRectangles[i].Height < 400)
                    && (listRectangles[i].Y > 200) && (listRectangles[i].Y < 300) && 
                    (listRectangles[i].Width > 50) && (listRectangles[i].Height > 40))
                {
                    originalImageClone.ROI = listRectangles[i];

                    finalCopy.Draw(listRectangles[i], new Bgr(255, 0, 0), 5);

                    smallerXValues.Add(listRectangles[i].X);
                }
            }

Since we resized our image earlier to specific dimensions, we can use it to our advantage when we want to sort the contours. In the if statement above, we specify some certain dimensions, all of which are important to help distinguish only the card number that we want to read. Its important to note that this approach will only work for credit cards that have their numbers in the center of the card, not those like Discover, which has its numbers on the back of the card in the bottom left.

In the if statement, we make sure that the width and and height of each bounding box is less then 400 pixels. This helps eliminate some of the larger bounding boxes, like the whole credit card, the VISA logo, and the card owner. Next we check to see if the top left Y value of each bounding rectangle is greater than 200 pixels, but less then 300 pixels. This helps us further eliminate any extra bounding rectangles that could be detected at the top or bottom of the card. Lastly, just to be positive, we make sure the width of each bounding rectangle’s width and height is larger than 50 pixels and 40 pixels respectively.

In the end, if everything went smoothly, we end up with four bounding boxes, drawn on the image below.

To help format our detection better, we can use this one line of code that is built into C#, which will filter our contours along the x-axis from smallest to largest, which helps with the detection reading left to right. Once the list has been sorted, we loop over all of the found bounding boxes and check if one of our four bounding boxes is contained within the original list. If it is, we add the index of that to another list. In the end we should have two lists, one with the X values of the relevant bounding boxes, and the other with their indexes in the entire list of all bounding boxes.

//Sort the smaller list into asending order
smallerXValues.Sort();

//Loop over each value in the sorted list, and check if the same value is in the original list
//If it is, add the index of the that value in the original list to a new list
for (int i = 0; i < smallerXValues.Count; i++)
{
   for (int j = 0; j < listXValues.Count; j++)
   {
      if (smallerXValues[i] == listXValues[j])
      {
         indexList.Add(j);
      }
   }
}

Once we have found the exact bounding boxes we want we can now being the actual detection of the text on the specific parts of the image we want. Although we configured Tesseract earlier in the article, we will need to create a method to detect the text on the image.

        public static string RecognizeText(Mat img)
        {
            //Change this file path to the path where the images you want to stich are located
            string filePath = Directory.GetParent(Directory.GetParent
                (Environment.CurrentDirectory).ToString()).ToString() + @"/Tessdata/";

            //Declare the use of the dictonary
            SetTesseractObjects(filePath);

            //Get all cropped regions
            List<Mat> croppedRegions = ImageProccessing(img);

            //String that will hold the output of the detected text
            string output = "";

            Tesseract.Character[] words;

            //Loop over all ROIs and detect text on each image
            for (int i = 0; i < croppedRegions.Count; i++)
            {
                StringBuilder strBuilder = new StringBuilder();

                //Set and detect text on the image
                _ocr.SetImage(croppedRegions[i]);
                _ocr.Recognize();

                words = _ocr.GetCharacters();

                for (int j = 0; j < words.Length; j++)
                {
                    strBuilder.Append(words[j].Text);
                }

                //Pass the stringbuilder into a string variable
                output += strBuilder.ToString() + " ";
            }

            //Return a string
            return output;
        }

Here we call the configuration method we declared earlier and pass it a path to the tessdata file. Once that has happened, we then call the image processing method we created earlier and get a list of relevant regions of our image that we want to detect text from as our output. Afterwards, for each region in the list, we set the image to that instance in the list, and recognize the text in that image.

With Tesseract, we detect characters letter by letter, including white spaces. As such, we must use a string builder to add every character to our list. Once we have detected all text in the ROIs that we specified, while also adding a couple of white spaces for better formatting, we then get our output.

Here the expected output has been detected and outputted to our application. Although the process is simple and somewhat rudimentary, with some tweaking and configuration, we can make this process work for a larger array of credit cards.

While this part is up on GitHub right now, I do want to update this application in the future, with the possibility of using a camera to take a picture of a card, then doing some calculations to align the taken image to be in better proportions to are desired input image.

Below is the entire code in the OCR class that we created, and if you want to view the whole application, you can view my GitHub repo.

using Emgu.CV;
using Emgu.CV.CvEnum;
using Emgu.CV.OCR;
using Emgu.CV.Util;
using Emgu.CV.Structure;
using System;
using System.IO;
using System.Text;
using System.Drawing;
using System.Collections.Generic;

namespace Credit_Card_OCR
{
    /// <summary>
    /// Detect text on an image
    /// </summary>
    class OCR
    {
        //Proccess flow of algorithm:
        //Read in the image
        //Pass it through a veriety of filters
        //Find contours
        //Sort contours from left to right
        //Read all text in each sorted relevent ROI


        //Declare a new Tesseract OCR engine
        private static Tesseract _ocr;

        //This variable is used for debugging purposes
        static Mat smallerOutput = new Mat();

        /// <summary>
        /// Set the dictionary and whitelist for Tesseract
        /// Need to investigate if the whitelist works in later vertions of tesseract
        /// </summary>
        /// <param name="dataPath"></param>
        public static void SetTesseractObjects(string dataPath)
        {
            //create OCR engine
            _ocr = new Tesseract(dataPath, "eng", OcrEngineMode.TesseractLstmCombined);
            _ocr.SetVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZ-1234567890/");
        }

        /// <summary>
        /// Read in the image to be used for OCR
        /// </summary>
        /// <returns>A Mat object</returns>
        public static Mat ReadInImage()
        {
            //Change this file path to the path where the images you want to stich are located
            string filePath = Directory.GetParent(Directory.GetParent
                (Environment.CurrentDirectory).ToString()) + @"/Images/creditCard.png";

            //Read in the image from the filepath
            Mat img = CvInvoke.Imread(filePath, ImreadModes.AnyColor);

            //Return the image
            return img;
        }

        /// <summary>
        /// Pass the image through multiple filters and sort contours
        /// </summary>
        /// <param name="img">The image that will be proccessed</param>
        /// <returns>A list of Mat ROIs</returns>
        private static List<Mat> ImageProccessing(Mat img)
        {
            //Resize the image for better uniformitty throughout the code
            CvInvoke.Resize(img, img, new Size(700, 500));

            Mat imgClone = img.Clone();

            //Convert the image to grayscale
            CvInvoke.CvtColor(img, img, ColorConversion.Bgr2Gray);

            //Blur the image
            CvInvoke.GaussianBlur(img, img, new Size(5, 5), 8, 8);

            //Threshold the image
            CvInvoke.AdaptiveThreshold(img, img, 30, AdaptiveThresholdType.GaussianC, ThresholdType.Binary, 5, 6);

            //Canny the image
            CvInvoke.Canny(img, img, 8, 8);

            //Dilate the canny image
            CvInvoke.Dilate(img, img, null, new Point(-1, -1), 8, BorderType.Constant, new MCvScalar(0, 255, 255));

            //Filter the contours to only find relevent ones
            List<Mat> foundOutput = FindandFilterContours(imgClone, img);

            return foundOutput;
        }

        /// <summary>
        /// Find and sort contours found on the filtered image
        /// </summary>
        /// <param name="originalImage">The original unaltered image</param>
        /// <param name="filteredImage">The filtered image</param>
        /// <returns>A list of ROI mat objects</returns>
        private static List<Mat> FindandFilterContours(Mat originalImage, Mat filteredImage)
        {
            //Create a blank image that will be used to display contours
            Image<Bgr, byte> blankImage = new Image<Bgr, byte>(originalImage.Width, originalImage.Height);

            //Clone the input image
            Image<Bgr, byte> originalImageClone = originalImage.Clone().ToImage<Bgr, byte>();

            //Declare a new vector that will store contours
            VectorOfVectorOfPoint contours = new VectorOfVectorOfPoint();

            //Find and draw the contours on the blank image
            CvInvoke.FindContours(filteredImage, contours, null, RetrType.Ccomp, ChainApproxMethod.ChainApproxSimple);
            CvInvoke.DrawContours(blankImage, contours, -1, new MCvScalar(255, 0, 0));

            //Create two copys of the cloned image of the input image
            Image<Bgr, byte> allContoursDrawn = originalImageClone.Copy();
            Image<Bgr, byte> finalCopy = originalImageClone.Copy();

            //Create two lists that will be used elsewhere in the algorithm
            List<Rectangle> listRectangles = new List<Rectangle>();
            List<int> listXValues = new List<int>();

            //Loop over all contours
            for (int i = 0; i < contours.Size; i++)
            {
                //Create a bounding rectangle around each contour
                Rectangle rect = CvInvoke.BoundingRectangle(contours[i]);
                originalImageClone.ROI = rect;

                //Add the bounding rectangle and its x value to their corresponding lists
                listRectangles.Add(rect);
                listXValues.Add(rect.X);

                //Draw the bounding rectangle on the image
                allContoursDrawn.Draw(rect, new Bgr(255, 0, 0), 5);
            }

            //Create two new lists that will hold data in the algorithms later on
            List<int> indexList = new List<int>();
            List<int> smallerXValues = new List<int>();

            //Loop over all relevent information
            for (int i = 0; i < listRectangles.Count; i++)
            {
                //If a bounding rectangle fits certain dementions, add it's x value to another list
                if ((listRectangles[i].Width < 400) && (listRectangles[i].Height < 400)
                    && (listRectangles[i].Y > 200) && (listRectangles[i].Y < 300) && 
                    (listRectangles[i].Width > 50) && (listRectangles[i].Height > 40))
                {
                    originalImageClone.ROI = listRectangles[i];

                    finalCopy.Draw(listRectangles[i], new Bgr(255, 0, 0), 5);

                    smallerXValues.Add(listRectangles[i].X);
                }
            }

            //Sort the smaller list into asending order
            smallerXValues.Sort();

            //Loop over each value in the sorted list, and check if the same value is in the original list
            //If it is, add the index of the that value in the original list to a new list
            for (int i = 0; i < smallerXValues.Count; i++)
            {
                for (int j = 0; j < listXValues.Count; j++)
                {
                    if (smallerXValues[i] == listXValues[j])
                    {
                        indexList.Add(j);
                    }
                }
            }

            //A list to hold the final ROIs
            List<Mat> outputImages = new List<Mat>();

            //Loop over the sorted indexes, and add them to the final list
            for (int i = 0; i < indexList.Count; i++)
            {
                originalImageClone.ROI = listRectangles[indexList[i]];

                outputImages.Add(originalImageClone.Clone().Mat);
            }

            CvInvoke.Resize(allContoursDrawn, smallerOutput, new Size(originalImage.Width, originalImage.Height));
            CvInvoke.Imshow("Boxes Drawn on Image", smallerOutput);
            CvInvoke.WaitKey(0);

            CvInvoke.Resize(finalCopy, smallerOutput, new Size(originalImage.Width, originalImage.Height));
            CvInvoke.Imshow("Boxes Drawn on FinalCopy", smallerOutput);
            CvInvoke.WaitKey(0);

            return outputImages;
        }

        /// <summary>
        /// Detects text on an image
        /// </summary>
        /// <param name="img">The image where text will be extracted from</param>
        /// <returns>A string of detected text</returns>
        public static string RecognizeText(Mat img)
        {
            //Change this file path to the path where the images you want to stich are located
            string filePath = Directory.GetParent(Directory.GetParent
                (Environment.CurrentDirectory).ToString()) + @"/Tessdata/";

            //Declare the use of the dictonary
            SetTesseractObjects(filePath);

            //Get all cropped regions
            List<Mat> croppedRegions = ImageProccessing(img);

            //String that will hold the output of the detected text
            string output = "";

            Tesseract.Character[] words;

            //Loop over all ROIs and detect text on each image
            for (int i = 0; i < croppedRegions.Count; i++)
            {
                StringBuilder strBuilder = new StringBuilder();

                //Set and detect text on the image
                _ocr.SetImage(croppedRegions[i]);
                _ocr.Recognize();

                words = _ocr.GetCharacters();

                for (int j = 0; j < words.Length; j++)
                {
                    strBuilder.Append(words[j].Text);
                }

                //Pass the stringbuilder into a string variable
                output += strBuilder.ToString() + " ";
            }

            //Return a string
            return output;
        }
    }
}
Advertisements

Image Stitching in EmguCV

How to Stitch Images Together In EmguCV

Advertisements

Since the first camera was invented, humans have taken millions and millions of photos. With all these images floating around, some people have wanted to create a way to combine their photos of relevant objects into one long and big image, better known as a panorama. Although there are multiple ways to achieve this, for this article, we will be using EmguCV, and more specifically, the stitcher namespace and class.

The Example project for this article can be found on Github.

The four images I want to stitch can be found below. Across the four images, I made sure that there was enough overlap, as it will be needed to successfully stitch the images together.

The first step in stitching images together is reading in the desired photos from a folder and storing them in a list. In the code below, I read in all the images from the desired folder, and store them in the output list.

            //Store all paths to each individual file 
            //in the directory in an array
            string[] files = Directory.GetFiles(path);

            //Declare a list of Mat to store all images loaded in
            List<Mat> outputList = new List<Mat>();

            //For each item in the files array, 
            //read in the image and store it in the list
            foreach(var image in files)
            {
                //Read in the image and store it as a mat object
                Mat img = CvInvoke.Imread(image, ImreadModes.AnyColor);

                //Add the mat object to the list
                outputList.Add(img);
            }

Once the images have been successfully read in and stored in a list of Mat objects, we can now begin the image stitching process.

The first step in stitching images together in EmguCV, is declaring a new stitcher from the Stitching class. The reason for this is that in order to stitch images together, the process must be done using the Stitcher class in EmguCV, where we can better control how the process works and how our final image will be generated. This is done using the code below.

            //Declare a new stitcher
            Stitcher stitcher = new Stitcher();

The basis for image stitching is that in order to combine all the relevant images into a single photo, is the reliance of keypoints. If you want to dive deep into how keypoints work, you can find some examples online, but to simplify it, keypoints are used to find similar features across multiple images. Keypoitns are found by converting an image to gray scale, then using that gray scale image’s histogram to find relevant areas and creating keypoints at those locations.

In order to find keypoints on an image, we must first declare a detector. There are a verity of detectors available, both open source and patented, but for today’s project, we will use the Brisk detector. Although the Binary Robust Invariant Scalable Keypoints detector is fairly new, Brisk is an open source detector that I have found that plays well with multiple types of images, while still generating a solid final image without much warping or distortion. To declare the detector, you can use the code below. After the declaration, I also set the features finder of the new stitcher that we created above to be the Brisk detector that we just declared.

            //Declare the type of detector that will be used to detect keypoints
            Brisk detector = new Brisk();

            //Set the stitcher class to use the specified detector declared above
            stitcher.SetFeaturesFinder(detector);

Before we get to the final stage of image stitching, we must first convert all the images in our list to a vector. Although somewhat of an annoyance, the process is fairly easy, and can be done with a couple of lines of code.

            //Declare a vector to store all images from the list
            VectorOfMat matVector = new VectorOfMat();

            //Push all images in the list into a vector
            foreach (Mat img in images)
            {
                matVector.Push(img);
            }

Once we have pushed all of the images in our list to our new vector, we can now finally actually stitch our images together. In order to accomplish this, we must first declare the output variable that will hold our final stitched image. Once the new Mat object has been declared, we then pass it along with our vector to the stitcher. Doing so can be done with the code below.

            Mat output = new Mat();   
         
            //Stitch the images together
            stitcher.Stitch(matVector, output);

In order to view your result, you can output your image to c#’s WPF image source, or you can write the mat object to a folder. If you want to do the first option, you must first convert the output mat object into a bitmap, then convert that bitmap to an image source. Although fairly complicated and complex, I found some code online that will convert a bitmap to an image source.

        //Convert a bitmap to imagesource
        [DllImport("gdi32.dll", EntryPoint = "DeleteObject")]
        [return: MarshalAs(UnmanagedType.Bool)]
        public static extern bool DeleteObject([In] IntPtr hObject);

        public ImageSource ImageSourceFromBitmap(Bitmap bmp)
        {
            var handle = bmp.GetHbitmap();
            try
            {
                return Imaging.CreateBitmapSourceFromHBitmap(handle, IntPtr.Zero, Int32Rect.Empty, 
                    BitmapSizeOptions.FromEmptyOptions());
            }
            finally { DeleteObject(handle); }
        }

Once this block of code has been become funcitonal, you then just have to convert your mat object to a bitmap, then run that bitmap through this above code.

                //Convert the Mat object to a bitmap
                Bitmap img = output.ToBitmap();

                //Using the method below, convert the bitmap to an imagesource
                imgOutput.Source = ImageSourceFromBitmap(img);

Another, and more simpler way, would just be writing the image to your current project directory. This can be done using the following code.

            //Write the stitched image
            CvInvoke.Imwrite("Output.png", output);

Once the images have been stitched, we get our result.

The complete code for my stitching class can be found below.

    /// <summary>
    /// Contains all methods to stitch images together
    /// </summary>
    class ImageStitching
    {
        /// <summary>
        /// Pass a path to a folder of images and read in all those images and store them in a list
        /// </summary>
        /// <param name="path">The path to where the folder where the images are kept</param>
        /// <returns>A list of Mat objects</returns>
        public static List<Mat> GetImages(string path)
        {
            //Store all paths to each individual file in the directory in an array
            string[] files = Directory.GetFiles(path);

            //Declare a list of Mat to store all images loaded in
            List<Mat> outputList = new List<Mat>();

            //For each item in the files array, read in the image and store it in the list
            foreach(var image in files)
            {
                //Read in the image and store it as a mat object
                Mat img = CvInvoke.Imread(image, ImreadModes.AnyColor);

                //Add the mat object to the list
                outputList.Add(img);
            }

            //Return the list of read in mat objects
            return outputList;
        }

        /// <summary>
        /// Stitch images together
        /// </summary>
        /// <param name="images">The list of images to stitch</param>
        /// <returns>A final stitched image</returns>
        public static Mat StichImages(List<Mat> images)
        {
            //Declare the Mat object that will store the final output
            Mat output = new Mat();

            //Declare a vector to store all images from the list
            VectorOfMat matVector = new VectorOfMat();

            //Push all images in the list into a vector
            foreach (Mat img in images)
            {
                matVector.Push(img);
            }

            //Declare a new stitcher
            Stitcher stitcher = new Stitcher();

            //Declare the type of detector that will be used to detect keypoints
            Brisk detector = new Brisk();

            //Here are some other detectors that you can try
            //ORBDetector detector = new ORBDetector();
            //KAZE detector = new KAZE();
            //AKAZE detector = new AKAZE();

            //Set the stitcher class to use the specified detector declared above
            stitcher.SetFeaturesFinder(detector);

            //Stitch the images together
            stitcher.Stitch(matVector, output);

            //Return the final stiched image
            return output;
        }
    }

Although initially it sounded like a difficult task, image stitching in EmguCV is fairly easy, and can be done in less then a hundred lines of code. This article as just a starting example, but if you want to learn more about other methods within the image stitching class and its namespace, you can read up on its documentation here.

Advertisements