R mySQL Sqlite SQL Generator

R Insert SQL Generator

I need a generic way to create SQL statements within R. I prefer to develop a convention to follow that is re-usable. I hate hand-writing one off SQL SELECT and Insert statements, every time I need to push or pull data from the database into R.

#build an SQL insert query based on the data frame that gets passed in
getInsertSqlQuery <- function(tableName, dataFrame){

columns <- paste(colnames(dataFrame), sep="", collapse=", ")
values <- paste(":", colnames(dataFrame), sep="", collapse=", ")
sql <- paste("INSERT INTO ", tableName, " (", columns, ") VALUES (", values, ")", sep="")

sql
}

R Insert SQL Prepared Statement

Now that I have a working SQL generator, I need a convenient way to insert the data into the Database using RMySQL’s and SQLite’s interfaces.

#generic RSQLite & RMysql SQL Insert
insertData <- function (tableName, dataFrame){
    drv <- dbDriver("SQLite")
    #works with RMySQL too
    con <- dbConnect(drv, dbname="sqlite/blogger.db")

sql <- getInsertSqlQuery(tableName, dataFrame)

dbBeginTransaction(con)

dbGetPreparedQuery(con, sql, dataFrame)
dbCommit(con)
}

#From your application call
insertData(tableName="comments",dataFrame=data.frame("dollars"=c(3.44),"year"=c('2012'),units=c(5))

If you need to manipulate your data do that prior to calling insertData.

Analyze Apache Logs With R

I am trying to turn a new leaf and learn more about statistics. In order to demonstrate my new abilities I wanted to share with you how to analyze your Apache web logs using R.

Download R

I am going to be using R because it is free, open source and has a large community backing it. Download the latest version of R from http://cran.r-project.org.

Get Your Apache Log File

Log into your web server and download your access file(s) from Apache to your local computer or wherever you have R installed. I recommend merging your Apache log files together in order to increase the quality of the information you want to extract from your web logs. Merge your virtual host files if you serve CSS and Images from separate virtual hosts.

Parse Apache Log Files With R

I am using a standard Ubuntu Apache 2 configuration. Lets first examine the access log file.


access_log <- read.table(file="C:\\Users\\windoze\\Documents\\R\\data\\other_vhosts_access.log")
access_log[1,] # Display the different vectors in the access_log dataframe

This is the easiest way to parse the log file into a data frame for analysis.

R Bar Chart Of Apache HTTP Codes

Here is a nice visual break down of the HTTP codes your application is serving.
table(access_log[,8]) # Gives a nice text break down of HTTP codes served.
barplot(table(access_log[,8])) # Gives a nice bar plot visual of the HTTP codes served.
R Barplot of Apache HTTP Codes

R Barplot of Apache HTTP Codes

Later,

I will demonstrate how to extract more information from your access log such number of unique users visiting your site and when is the busiest time of week for your web site..

Video Encoding and Video Distribution (Youtube VS Akamai)

This piece reflects my opinion about encoding and distributing on-line video. Akamai provides a great way for distributing media using their content delivery network. Akamai is really nice because your marketing people can jump for joy because they can brand their own content. The downside to Akamai is the technical guy gets to field all the questions why HD video does not stream smoothly.

Youtube is better than Akamai for streaming video content, because YouTube:

  1. is free
  2. is a CDN
  3. has great video quality
  4. works on all PC and mobile devices
  5. encodes video

If you can not commit the resources to making multiple bit rate versions of a video and different resolutions do not use Akamai, use Youtube.

Database Dispatcher Part 1

I am working on a new project that will have a lot of data. MySQL has been selected as the database, Doctrine 2 for the ORM and Zend Framework for application framework. The project is going to be a data warehouse where each customer will have similar data. Instead of cramming all the data into a single database, I am going to partition the data in advance. Each client will have their own database. A dispatcher will be setup to handle managing the database connection to use based on the customer.

Initial Thoughts

In order to route the application software to the correct database settings I must have a monolithic database for storing the following fields:

  1. Id
  2. Username
  3. Password
  4. Database Settings
    1. Username
    2. Password
    3. Database name
    4. Host name or IP address
  5. Active

An alternative design might have

  1. Id
  2. Username
  3. Database Settings
  4. Active

Implementation Considerations

After implementing some of code I realized that a session must be setup in order to link the end-user to the proper database. The end-user must be logged into the web site before the session can be set. A default database must be setup that can be assigned to the end-user before they login. The alternative method for implementation would be to not run the database dispatcher code if a session has not been assigned and limit what actions are accessible.

I am going to implement a hybrid where a default database is assigned and all actions except the few pages are accessible.

Doctrine 2 DQL IN statement

I have been working with Doctrine 2 for the last month. This week, I really needed to pass an array of values to the Doctrine 2′s DQL IN statement. There is no documentation on Doctrine’s web site for passing in an array of identifiers. Here is how to pass in an array of identifiers into DQL

For Integers


$em->createQuery("SELECT users FROM Entities\User users WHERE users.id IN(".implode('  ',$userIds).")");

For Strings


$em->createQuery("SELECT users FROM Entities\User users WHERE users.id ('".implode("', '", $userUuids)."')");

The Query object does not support flattening arrays and throws the exception, “Doctrine\ORM\Query\QueryException: Invalid parameter number: number of bound variables does not match number of tokens”

Magento What Are The Processes

Magento great for developers, not great for everyday users

I have been working on a Magento web site for a couple of months now. I like Magento a lot because most of the feature requests I get from the operations, sales, shipping and receiving teams I can easily find a 3rd party module to use. That is so awesome! I love developing software but I hate reinventing the wheel.

What I dislike about Magento

I noticed that Magento lacks documentation for the most simple processes. The user manual I purchased does not have any documentation about editing orders. Magento is great if your customers just buy from your web site and require no additional help. I work at a company that offers exceptional customer care and shipping and receiving services. By being exceptional they get stung by Magento’s rigidness and lack of documentation a lot. I can live with Magento’s processes. I just wish they documented basic processes like editing orders, canceling orders, modifying orders after they have been invoiced, how the credit memo functionality works. If Magento is that popular why has no one documented these processes?

Change the company not the software

I will be spending some time documenting the processes within Magento (Your Welcome Varien). I will post the processes here to save people from the pain of trying to discover how to do basic order manipulations. Trying to customize Magento to function the way your business functions is unrealistic. When you go with Magento you are locked into the Magento way. Prepare to defend the rigid, undocumented processes and convince everyone else to drink the kool-aide with you.

RVM use not working RHEL5 / Centos 5

Hello All,

I just finished up installing rvm for ruby on my RHEL5  box. I did the system wide installation but could not get my settings to stick when I selected a different version of ruby through the rvm user interface. I had 2 major issues

  1. Permissions
  2. rvm use not working

Solutions:

Make sure your login account is a member of the rvm group. Make sure you add the following line to the end of your .bashrc file which is located in your Home directory.


# User specific aliases and functions
source "/usr/local/rvm/scripts/rvm"

Can’t connect to LimeSurvey database. Reason: No such file or directory

I am trying to install LimeSurvey and keep gettings this error, “Can’t connect to LimeSurvey database. Reason: No such file or directory”. After debugging the code I discovered if you specify ‘localhost’ as the hostname of your database server it does not use ‘localhost’ connect to the database. It uses the Unix file socket ‘unix:///tmp/mysql.sock’. On an Apple server the mySQL socket file is usually located in ‘/var/mysql/mysql.sock’. The easiest way to fix the problem is to create a soft link.


ln -s /var/mysql/mysql.sock /tmp/mysql.sock

gearman: error while loading shared libraries: libgearman.so.4: cannot open shared object file: No such file or directory

I am playing around with Gearman. I installed the software but got the error
“gearman: error while loading shared libraries: libgearman.so.4: cannot open shared object file: No such file or directory”

To fix the problem run the following command

sudo ldconfig

Presto it works! I am running RHEL 5.

When Not To Use AJAX and Javascript

Do Not Use Ajax

I have been working on project for the last couple months that uses AJAX calls to pull down data from the server. The more I work with AJAX and JavaScript the more I realize when I should not use it. If the data is static do not use an AJAX call to retrieve the data. Instead hide your data inside an HTML hidden field. Your front-end JavaScript code can easily read arrays, objects and values from the hidden field.

Do Not Dynamically Generate JavaScript Code

I am guilt of this mistake. Do not use backend code to generate code like this,


var data = 34;

Just put the data in HTML hidden field. Debugging dynamically generated JavaScript is harder. Use JavaScript to read and assign the value from the HTML hidden field.

Follow

Get every new post delivered to your Inbox.