Mock Doctrine’s Persist

Ever wanted to mock out Doctrine’s persist and flush functionality. I did! Here is a template to use with PHP Mockery to help you get the job done. This example is from a project that uses Laravel and Doctrine.

 

        EntityManager::shouldReceive('persist')
                ->with(Mockery::on(function(&$data){ 
                    if(class_basename($data) === 'Contact') {
                    
                        $reflectionClass = new ReflectionClass(Contact::class);
                        $reflectionProperty = $reflectionClass->getProperty('id');
                        $reflectionProperty->setAccessible(true);
                        $reflectionProperty->setValue($data, 1);                       
                        $reflectionProperty = $reflectionClass->getProperty('report');
                        $reflectionProperty->setAccessible(true);
                        $reflectionProperty->setValue($data, new Report());                      
                        return true;
                    } elseif(class_basename($data) === 'AnotherClass') {
                        return true;
                    } else {
                        return false;
                    }
                }))
                ->shouldReceive('clear')
                ->andReturnNull()
                ->shouldReceive('flush')
                ->andReturnNull()

PHP Text Analysis Hits 100 commits

PHP Text Analysis is a library for performing text analysis, natural language processing and information retrieval tasks. Today, the 100th commit was made. The commit included the cosine similarity measure for comparing two text documents.

To celebrate, version 1.0 has been released or you can pull down the latest and greatest version of the library using the composer

composer require yooper/php-text-analysis

PHP Text Analysis : Rapid Automatic Keyword Extraction (RAKE)

I am the lead/sole developer on the PHP Text Analysis project. There are several well established PHP libraries for performing NLP and IR type tasks. The PHP Text Analysis project is an attempt to have a library that helps with basic text mining tasks by using descriptive statistics and unsupervised learning algorithms for text classification and keyword extraction.

RAKE is a keyword extraction tool that does not require any training. It will extract out a list of top keywords. It is recommended that you only take the top 1/3 of the results. For more information check out the

https://github.com/yooper/php-text-analysis for the source code OR

http://yooper.github.io/php-text-analysis/

PHP Mockery : Mocking An Abstract Factory

I created an abstract factory that used the passed in instance’s base class name to drive the object creation within the factory. I immediately ran into issues because Mockery creates a class name that was much different than what the factory was expecting. The mock class name was Mockery_1_Model_ORM_UserReportView, the factory was expecting Model\ORM\UserReportView. 

Using Mockery::mock was not going to meet my needs. Next, I tried,

Mockery::namedMock;

This call looked promising, because I could set the name of the class based on what the Abstract Factory was expecting.


$mock = Mockery::namedMock('Mock\UserReportView', 'Model\ORM\UserReportView');
$mock->shouldReceive('getId')
->andReturn(1)
->shouldReceive('setUserReport')
->andReturn($userReport);

Immediately after implementing this mocking approach, I received the error;

Declaration of Mock\UserReportView::setUserReport() should be compatible with Model\ORM\UserReportView::setUserReport(Model\ORM\UserReport $userReport)

The PHP Mockery error “Declaration should be compatible with”; would require me to refactor the type hinting within my Model classes. Which I am opposed to doing, because it makes my method signatures too generic and requires additional work.

I was able to work around the issue by doing the following:


$mock = Mockery::namedMock('Mock\UserReportView', 'stdclass');
$mock->shouldReceive('getId')
->andReturn(1)
->shouldReceive('setUserReport')
->andReturn($userReport);

return $mock;

Install PHP GNUGP Centos 6.5

I have a project that must encrypt and decrypt messages in PHP. For this project Centos 6.5 was selected for the main operating system. The problem is, installing PHP’s gnupg is not working. The internet has plenty of instructions for Ubuntu, but the only instructions I found online for Centos required installed 3rd party RPMs, which is not preferable.

Here is the command and output of the PHP’s pecl gnupg command


sudo pecl install gnupg

...

...

...

configure: error: Please reinstall the gpgme distribution

This error can be easily fixed by install the gpgme-devel library


sudo  yum -y install gpgme-devel

sudo pecl install gnupg

Xamarin Not Building

I got stumped today by Xamarin not building an iOS app within Visual Studio. An error of Build Failed kept showing up and the iOS simulator would not open on the mac. The more detailed message said “Xamarin.iOS does not support running or debugging the previous built version of your project. Please ensure your solution builds before running or debugging it.”

My fix for this issue was to sync the clocks on my Windows dev machine and the Mac dev machine, manually. Somehow Windows was unable to sync its own clock, the time server was down.

US States SQL available in Open Model

Do you need the US States in SQL, it is now available in the Open Model Project. Here is a direct link to the US states SQL Server schema and data, https://github.com/yooper/open-model/blob/master/schemas/rdbms/state.sql. I added in Unique constraints, too.

Job Board Schema Available in Open Model

Today I added the first schema to the open model project repository. The job board is a very simple, 3 table system for handling job posts. As the models evolve it will be easier to track and apply changes using Github. Go to https://github.com/yooper/open-model for the latest updates. Below is a copy of the schema. It uses MS SQL syntax to create the tables.


create table organization(
id INT IDENTITY NOT NULL PRIMARY KEY,
name VARCHAR(MAX),
created_on DATETIME NOT NULL,
)

create table employment_type(
id INT IDENTITY NOT NULL PRIMARY KEY,
name VARCHAR(MAX),
created_on DATETIME NOT NULL,
)

create table job_board(
id BIGINT IDENTITY NOT NULL PRIMARY KEY,
rate VARCHAR(256) NOT NULL,
benefits VARCHAR(MAX) NULL,
date_created_on DATETIME NOT NULL,
date_posted DATETIME NOT NULL,
education_requirements VARCHAR(MAX) NOT NULL,
employment_typd_id INT NOT NULL FOREIGN KEY REFERENCES employment_type(id),
experience_requirements VARCHAR(MAX) NOT NULL,
skill_requirements VARCHAR(MAX) NOT NULL,
responsibility_requirements VARCHAR(MAX) NOT NULL,
work_hours VARCHAR(MAX) NOT NULL,
job_title VARCHAR(256) NOT NULL,
job_location VARCHAR(MAX) NOT NULL,
date_closed_on DATETIME NOT NULL,
job_poster_user_id INT NOT NULL,
organization_id INT NOT NULL FOREIGN KEY REFERENCES organization(id)
)

TopoJSON and GeoJSON Maps By United States County

The project Datamaps, http://datamaps.github.io/, is awesome. It made it look so easy to make an interactive map. Be default you can make maps of the United States very easily. What I wanted has a Datamap of all the counties in Michigan. To make it easier for people, I put together all of the United States counties into separate files for easy access and distribution. Each file has the following properties set:

  • id = County Name
  • name = County Name
  • fips = FIPS code

The maps have been uploaded to a Github https://github.com/yooper/open-model. If you need topojson or geojson data files for counties in the United States go get them from my git repo.

 

Using R and Apache Module mod_log_firstbyte to find the performance bottlenecks

Do you need to find the performance bottlenecks  in your Apache application? So do I. Relying on end user reports on your web app being sluggish and slow is not really helpful. What we need is a set of processes and data that we can transform into some meaningful knowledge about our web app. Apache by default does a great job at providing information about each request. By default it does not provide the amount of time it took the script to generate the web page. Most approaches, I have seen on collecting this data, require placing additional code for capturing timestamps within the web app. This approach does not agree with me. The approach I want is a general approach that can be applied to any web app that can run under the Apache web server. Hence, you will need to install mod_log_firstbyte module,  https://code.google.com/p/mod-log-firstbyte/, in order to capture the time it takes between the request time and the response time.

Log Format

After you install, mod_log_firstbyte, you must add a new logging format. There are directions in the readme file that comes with, mod_log_firstbyte. I recommend collecting data over multiple weeks before doing your analysis. Here is the mod_log_firstbyte recommend log format

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %D %F" combined-with-firstbyte

The %F option outputs the time taken between the time the request was received and the response was produced.

Extracting Log Data

In a previous post, I use R to do some Apache log analysis. I am going to use R again because of the wealth of information that can be gleamed from the logs. First we need to load the log file into R.

access_log <- read.table(file="C:\\Users\\redbeard\\Documents\\YOUR_APP\\Access Logs\\ssl_access_log_overall.log")
access_log[1,]

Transforming Log Data

The web app I am using is Zend Framework 1.12, it is MVC based web app, so we can use some heuristics to remove extraneous data that does get used in our analysis. We are interested in:

  • column 1 , it contains the IP address of the remote user.
  • column 6, it contains the first line of the HTTP request header, and column 12, it contains the time between the request received and the response was sent.
  • column 11, contains “The time taken to serve the request, in microseconds.” This is how long it takes for the remote user / client to receive the total page
  • column 12, is the time between the request being received and the first byte of the response being written to the client

We start by parsing column 6, An example row will look like this POST /module/controller/action HTTP/1.1 We only need to capture the /module/controller/action part of the string. We can do so by using this snippet.


path <- gsub(pattern="\\/[a-z]+\\/[0-9]+", replacement="", x=sub("POST ","", sub("GET ","", sub(" HTTP/1.1", "", sub(" HTTP/1.0", "", access_log[,6])))), perl=TRUE)

This will strip off the identifiers that use numbers in the URL line. There must be a more elegant way to achieve this.

Plan on spending a lot of time write more string parsing functions to normalize your data.

I used this additional substitution to remove parameters I was not interested in.


path <- sub("\\?.*", path, replacement="", perl=TRUE)

For your data you may need to adjust how you parse the string. The other columns do not require an ETL / data extraction process in order to use them.

Load your data into a Data Frame

We are dealing with microseconds for values, lets transform those microseconds to seconds.


microseconds <- 1000000

df <- data.frame(ip=access_log[,1], path=path, trip_time=access_log[,11]/microseconds, first_byte=access_log[,12]/microseconds)

Analyzing Performance In Your Apache Logs with R

I will discuss this in a later post.