Google Summer of Code 2014

Weeks 9-10: Completing basic calls in the GSoC

Shaumik Daityari — Tue, 12 Aug 2014 10:00:42 GMT

Last week was marked by travel plans, which is why I have decided to merge report of the work of two of these weeks together.

The first thing was to change the code as per the comments from my mentor. This involved primarily cleaning up the code, which can be seen in this commit. It was basically about cleaning up redundant code. For instance, look at the following —

    $clause = create_SQL_clause(array(
        "c.title" => $_GET["title"],
            "c.cat_id" => $_GET["category_id"],
        "c.primary_language" => $_GET["primary_language"]
    ));

    $clause_with_id = create_SQL_clause(array(
        "c.course_id" => $course_id
    ));

— was reduced to something like this.

    if ($course_id) {
        $sql_array = array(
            "c.course_id" => $course_id
        );
    } else {
        $sql_array = array(
            "c.title" => $_GET["title"],
            "c.cat_id" => $_GET["category_id"],
            "c.primary_language" => $_GET["primary_language"]
        );
    }
    $clause = create_SQL_clause($sql_array);

I wonder why I hadn't thought of this before!

Moving on, I started working on the last set of APIs on tests, questions and question categories.

The list of calls accomplished in Week 9 were the following —

Tests

GET /api/tests
GET /api/tests/[test_id]
GET /api/tests?title=[title]&start_date=[start_date]&end_date=[end_date]
GET /api/instructor/[instructor_id]/courses/[courseid]/tests/
GET /api/student/[student_id]/courses/[courseid]/tests/

Questions

GET /api/questions/
GET /api/questions/[question_id]
GET /api/questions/?question=[question]&category_id=[category_id]&type=[type]
GET /api/tests/[test_id]/questions/ 

GET /api/instructor/[instructor_id]/courses/[course_id]/tests/[test_id]/questions
GET /api/student/[student_id]/courses/[course_id]/tests/[test_id]/questions

Question Categories

GET /api/questions/categories/
GET /api/questions/categories/[category_id]
GET /api/questions/categories?name=[name]

POST /api/questions/categories/

PUT /api/questions/categories/[category_id]

DELETE /api/questions/categories/[category_id]

After traveling back to college and shifting to a new room, I finally got to resuming my GSoC work.

The first thing that I did was to changed the SQL queries. My queries were unreadable up until now and I decided to work on them after reading a book Simply SQL. The refactoring took a few days, but with the increased readability, I guess it was worth the effort.

The next part was to complete the rest of the calls. I completed the PUT, POST and DELETE calls for tests, and the only remaining calls are that of questions from our initial list.

Week 8 - Condensing the code

Shaumik Daityari — Mon, 14 Jul 2014 17:32:45 GMT

Last week, I left at the approval on an alternate branch that I was working on. That got approved and I continued more work on it this week.

What was the whole idea of an alternate branch?

I was using helper functions to reuse certain SQL queries easy. Not small ones — large ones with multiple tables that were pretty complex to understand at the first glance.

The basic idea was that it became too difficult for a new developer looking at the API for the first time to understand what was going on!

That is why I had to eliminate all such unnecessary functions — and create all API calls using two basic functions — api_backbone and create_SQL_clause. This way, a new developer had to understand the use of only these two functions to start creating APIs.

Thus started the task of elimination of those functions.

Next, I revisited the boilerplate class because there had been updates since I last modified it.

Now, it contains everything that you need to know to start writing APIs. You can just copy the boilerplate directory and start working. Inline comments explain what the code does, why it is needed and how to do it.

Thirdly, I started working on a new branch api_merge. Why? Short answer — to merge redundant classes.

Consider the following URLs.

GET /api/courses  
GET /api/courses/1

Ideally, these two functions perform the task of retrieving data from the same table AT_courses. However, I was doing the task through two different classes — CoursesList and CourseDetails. Why? Firstly, the old way meant I was doing these things through helper functions — so I didn't really understand what was going on as it appeared to be single line function calls to the helper functions. Secondly, different URL structures meant different classes to me — until I saw the bigger picture. The subtle differences could be removed with just a few lines of code.

So, I worked the rest of the week on merging such functions together. The eventual difference in code on the branch can be viewed here.

Just four more weeks left now. I am happy with the progress.

Week 7 - Some housekeeping

Shaumik Daityari — Sun, 06 Jul 2014 14:58:08 GMT

I started the week where I left off last week- at the admin functions. Although I had provided a lot of things to be done, one last thing was clearing of inactive tokens.

As the admin can set the period of time for which tokens are valid, there is a possibility of the database tables getting filled with expired tokens. There is a cron job available in ATutor, which I will explore later. For now, I have given an option to the admin to clear the inactive tokens.

I mentioned two weeks ago about introducing a DEBUG mode which prints the SQL query being executed. This week, I added few more adjustments to it, including displaying the token related queries.

Last week, I implemented a logging level in which all requests except GET requests were logged. This week, I changed the implementation of logging errors in case of GET requests in case this logging level was active.

Using the already matured api_backbone, I started implementing a few more of the API calls. I had started with the GET calls for members (instructors and students both), so I went ahead and implemented the POST, PUT and DELETE ones. There were many calls that I had to implement (for both students and instructors), a list of which can be found in the commit history. That concluded all member related calls. From the original list, I am now left with calls for questions and tests only.

Lastly, to reuse the SQL queries, I have been using functions like get_courses_main() and get_members_main(). To see how the code looks like if we do not reuse the SQL queries and write new ones for evert call, my mentor suggested I do that in a separate branch. You can find the alternate branch here. I will wait for his feedback before I decide to go ahead with one of the branches.

P.S. This week, I crossed 50 commits on my branch(es). That's about 1.3 commit a day - not a bad pace.

Week 6 - Creating admin functions

Shaumik Daityari — Mon, 30 Jun 2014 15:53:31 GMT

After making the api_backbone function more effective and efficient, this week saw some basic API calls followed by making admin functionaities. The week started with the mid term evaluations, which I passed.

I started off with a few calls for member lists from the table AT_members. A few API calls that were implemented are as follows-

GET /api/instructors/[instructor_id]/courses/[course_id]/instructors  
GET /api/instructors/[instructor_id]/courses/[course_id]/students

The first call returns the list of instructors for a particular course. The second one is for a list of students enrolled in a course. The access levels for these calls are admin and instructor, respectively.

Before we move on to the admin functionalities, let us discuss two more things that were implemented- logging level and token expiry.

Till last week, all API calls were being logged. Now, we have defined three logging levels- No logging, logging except GET calls and all logging. These are self explanatory, except the second, which would log all errors, even if it was a GET call.

Again, the token expiry date was set at one day from the date of creation or modification. However, this setting is now set by the admin to a certain number of days.

These settings are stored in the table AT_config with a name and value. These settings can be accessed anywhere in ATutor with the help of the array $_config[]. For instance, the settings that I am concerned with right now are $_config['api_logging_level'] and $_config['api_token_expiry'].

The admin functionality was created within the module that was created for the purpose. The first functions were the ability to change the logging level and token expiry.

Secondly, the admin could download the API log as a JSON file. I am currently thinking of creating the ability to download it as CSV. Lastly, the admin can clear the API log. This essentialy clears the AT_api_logs table.

This rounds up another productive week. Let's hope we have a great second half ahead!

Week 5 - Creating new API calls

Shaumik Daityari — Sun, 22 Jun 2014 12:09:38 GMT

I left last week's post by telling you about the creation of functions api_backbone and create_SQL_clause. This week, I started with re-factoring those functions.

The first issue with api_backbone was that there were too many arguments to be passed. The readability was taking a hit and by simply looking at a function call, it was difficult to understand which parameters represented what.

The simple solution to this was passing arguments like we do in JavaScript. Therefore, the code ended up looking something like the following.

function api_backbone($options) {  
    /*
     * Function to perform all API calls
     * Every call has a token, checks access level and performs a query
     * This function takes those as argument and logs the request
     */

    $defaults = array(
        "request_type" => HTTP_GET,
        "access_level" => ADMIN_ACCESS_LEVEL,
        "member_id" => -1
    );

    $options = array_merge($defaults, $options);

    ...

}

array_merge does exactly what $.extend() did, replacing common keys with values of options, but keeping the defaults in case they weren't present in options.

Some notable options in the function api_backbone are as follows.

returned_id_name - Set it to true and mysql_insert_id is run after an INSERT call. Useful in POST calls to create objects.
query_id_existence and query_id_existence_array contain the query to be run to check if an object exists, before the main query (typically to edit or delete it) is run.
one_row is true when the result of the query is supposed to consist of a single row.

The next task I did was to introduce a debug mode in the API. I figured it was difficult to print the query using vsprintf every time there was an error in the SQL query. That is why, I set a constant DEBUG in api/core/constants.php to true. Now, for every call to api_backbone, irrespective of the situation, prints the query being executed.

Another small, but significant change in terms of efficiency was the addition of JOIN to my SQL queries. I was using just the alias syntax, but a discussion on the SitePoint forums tells me that JOIN is more efficient. Since I am not an expert on the topic, I just took the advice.

The next change was to add a boilerplate class, which one could just copy the directory and start off a new app (after reading the inline comments of course).

Lastly, I tried create a few API calls using the functions that I had made. Turns out the task is pretty easy now. PUT, POST for courses and a few GET calls for member lists is what I accomplished in almost no time!

Another great week (this time, I didn't miss a post... yet) and here's hoping the next week is even better!

P.S. Mid Term evaluations are just a few days away. I double checked my proposal to ensure that I am well ahead of the schedule.

Week 4 - Making efficient code (aka "the daddy function")

Shaumik Daityari — Sun, 15 Jun 2014 14:45:31 GMT

Let me start by telling you a story that my mentor once told me. There are four stages in the work cycle of a developer.

Stage 1 - Make it work
It's about getting something to work- don't think much about other factors- just get it to work.
Stage 2 - Make it efficient
It's about solving a problem in a productive way- make least number of SQL queries, use less space, do not write unnecessary code.
Stage 3 - Make it elegant
Write your code in a very scalable and extendable way. Make your code re-usable, make use of existing paradigms and write as less code as possible. Think of it this way- if you copy paste a few lines from one function or class to another, you are doing it wrong.
Stage 4 - Do not code at all!
You might be surprised to know but sometimes it's good not to code at all. Before you start, you should check the feasibility of what you are about to make. Does it have a use case? Is it worth the effort you are going to put into it?

Although I have faced all four stages during this GSoC period, this week primarily consisted of stage 3.

Alex was quick to notice that all of my existing calls (those get and post functions in the router classes) essentially did the same thing- generate a log, check the token, check access level, make a query and print a response. Although I had functions for each of these tasks (with ATutor's queryDB for SQL queries), the functions looked the same!

That is why, this week involved creation of a "daddy" function- a backbone function which would run (almost) all API calls- which would be re-used for almost everything else. However, let us first look at a function that created SQL clauses.

function create_SQL_clause($terms, $requests, $prefix = "") {  
    /*
     * Function to create SQL clause
     * $terms is an associative array
     * The keys of $terms represent the variables in $requests
     * The values of $terms represent the column names that must be present
     * For example, create_SQL_clause(array(
     *                  "title" => "c.title",
     *                  "language" => "c.language"), $_GET) should return
     * "WHERE c.title = 'My Course' AND c.language = 'en'"
     * provided title and language are present in $_GET
     */
    $query = $prefix;
    foreach ($terms as $key => $value) {
        if ($requests["$key"]) {
            if ($query != "") {
                $query = $query."AND ";
            }
            $query = $query.$value." = '".$requests["$key"]."' ";
        }
    }
    return $query;
}

This function would essentially generate the SQL clauses (either WHERE in SELECT queries or SET in UPDATE queries) for the use in various functions. I believe the comments are enough to explain how it works.

This brings us to our backbone function. I rightfully call it api_backbone. Let us have a look at the skeleton.

function api_backbone($request_type, $token, $access_level, $query, $array = array(), $one_row = false, $callback_func = "mysql_affected_rows") {  
    /*
     * Function to perform all API calls
     * Every call has a token, checks access level and performs a query
     * This function takes those as argument and logs the request
     */
     ...
}

The $request_type stands for the HTTP request method (using some pre defined constants). The $token and $access_level have been discussed before.

queryDB uses vsprintf() to generate the SQL query, which is why it needs a string and an array (which it feeds to vsprintf). Those are effectively contained within $query and $array respectively. $one_row and $callback_func are also parameters passed to queryDB, standing for whether the result consists of a single row and what PHP/MySQL function should be run on the query before the result is presented, respectively.

The function is a bit long considering the things that I have to take care of, and you can have a look at the code here.

Week 3 - Developing Course APIs

Shaumik Daityari — Sun, 08 Jun 2014 15:36:07 GMT

I left last week with the idea of implementing the next two calls.

GET /api/students/[student_id]/courses  
GET /api/instructors/[instructor_id]/courses

The former would retrive a list courses that a student is enrolled in and the latter would get the list of courses an instructor teaches.

The emphasis in these two calls was to re-use the SQL queries for all calls that returned a list of courses. This was best accomplished by adding the following where clauses for each of the .

WHERE ... AND ('%s' = 'garbage_value' OR title like '%%s%') AND ...;

If a certain variable (like title) wasn't present in the URL parameters, it is assigned a garbage value. Therefore, the first part of the clause is true and this part doesn't affect the whole query in general. If the variable is present, the first part of the query is false and the second part determines if it's true overall.

After all that, we are having second thoughts about the complexity of the query. I will try an alternative version where I develop the WHERE clause using PHP and check which one is faster.

Additional API calls were implemented for students and instructors as follows. I believe they are self explanatory.

GET /api/students/[student_id]/courses/[course_id]  
GET /api/students/[student_id]/courses?title=[title]&category_id=[category_id]&primary_language=[primary_language]

GET /api/instructors/[instructor_id]/courses/[course_id]  
GET /api/instructors/[instructor_id]/courses?title=[title]&category_id=[category_id]&primary_language=[primary_language]

Another part that I completed in the week was to develop the APIs for course categories (except PUT pending some discussions with my mentor). (Note that these course categories are different from question categories and I would need to create an API for them later in the summer.)

This week also saw some changes in the structure of the code. Some of the logic that I kept in core/api_functions.php was moved to a shared directory. Core now contains functions related to the core functioning of the API (like logging and token management.)

I also changed the function print_error to print_message. The first argument specifies if it's an error or a success message to be printed. Chaning the function also involved adding a logging functionality to messages. (Remember, as of last week, only successes were being logged?)

Week 2 - Making the first API calls

Shaumik Daityari — Sun, 01 Jun 2014 08:31:00 GMT

After successfully creating the login and logout API calls, the next step was to create some basic API calls. The very first area that we decided to implement was courses.

The following API calls were implemented.

GET /api/courses/  
GET /api/courses/[course_id]  
GET /api/courses?title=[title]&category_id=[category_id]&primary_language=[primary_language]

In my previous post, I mentioned that I hadn't implemented how to decide the access level of a member. This week, I added the feature by making queries to two different tables- one for members and the other for admins. Since that was accomplished, I could proceed with two courses related calls for instructors and students.

GET /api/students/[student_id]/courses  
GET /api/instructors/[instructor_id]/courses

The first would return list the courses that a student is enrolled in and the second would return the list of courses that an instructor teaches.

I had created a function last week to authenticate an access token. Because of the two above APIs, I needed to cross check if the access token matched the student or instructor ID provided in the URL above. That would mean an extra query. To avoid that, I added an extra argument to the authentication function that returns the member_id along with the token. Here's how it looks.

function (..., $return_member_id = false) {  
    ...
    return array($token, $member_id);
}

How do I get the value?

list($token, $member_id) = get_access_token(..., true);

Pretty Pythonic, isn't it?

Another important task accomplished in the week is the logging of all API calls. The request URI, token, HTTP method, IP address and the response are logged in the database.

One last thing to do is to create logs in case of errors.

Week 1 - The coding starts

Shaumik Daityari — Sun, 25 May 2014 15:39:35 GMT

After completing a lot of discussions on what API calls we should have, we decided to go with four basic sets of API calls- courses for course related information, instructors for calls that instructors would use (to see the list of students for instance), students for calls that would be used by students and tests for test related calls.

The next task was to decide the access levels. We decided to have five.

    define("ADMIN_ACCESS_LEVEL", 1);
    define("INSTRUCTOR_ACCESS_LEVEL", 2);
    define("STUDENT_ACCESS_LEVEL", 3);
    define("TOKEN_ACCESS_LEVEL", 4);
    define("PUBLIC_ACCESS_LEVEL", 5);

The TOKEN_ACCESS_LEVEL gives access to anyone with a valid access token, which would be passed as a header x-AT-API-TOKEN. Those calls that do not require a token would have a PUBLIC_ACCESS_LEVEL.

The next step was to start with a few basic calls. I had already worked on a dummy class with Toro to demonstrate the handling of different kinds of variables. For obvious reasons, the two that I had to start with were /login/ and /logout/.

The existing code that handled the login in ATutor (/include/login_functions.inc.php) wasn't really modular and couldn't be reused by me. Therefore, I had to check how it worked and emulate the same.

I came up with a rudimentary version of the login function by adding checks for the status of the account. I am yet to put checks for the number of login attempts though.

On successful login, you are provided the API token, which you must use in every subsequent API call.

The token is generated by hashing a combination of the member_id, timestamp and a random number. It is then stored in a table along with an expiry date, which is 24 hours from the time of generation or last modification.

The logout function is also fairly simple. It removes the entry for the token in the database and returns a success message.

You can check the latest code here.

Discussions on the ATutor API - Community Bonding Period

Shaumik Daityari — Sat, 17 May 2014 12:24:22 GMT

It's been four weeks since the GSoC 2014 results were declared. I was selected by Inclusive Design Institute for developing a public API for their project, ATutor. I had worked with them last year in the GSoC too and my last year mentor, Alexey Novak, was going to mentor me yet again.

Four weeks of community bonding time is a long time and I have utilized this time in discussing the future strategies with my mentor. We had quite a few things to decide because we wanted a good API at the end of the summer. For reference, we took the examples of GitHub and Amara APIs.

Creating a module

We decided that although we would separate the API code from the rest of the ATutor code, we wanted to let the admin have the choice of whether to enable the API. The best way to do it within ATutor was to create a module (BTW, I really dislike reading documentation). If the module was not activated and someone was trying to access the API, we would just show them a message that the feature is disabled.

Choosing a web router class

ATutor is written in core PHP. True that there are a lot of functions within ATutor that do most of the heavy lifting, but it still remains in core PHP. Up until now, there was no need to develop a routing class. However, an API would need a router (unless you plan to create separate directories and pages for each function).

We narrowed down certain options, but finally decided to go with Toro. Although people call it a 'micro framework', the source just consists of a file with 120 odd lines. It was just perfect to add to ATutor. Toro is also designed specifically for creating a REST framework. To top it all, the 'Hello World' function is so simple, yet elegant.

 "MainHandler",
));

An important thing to note is that Toro does the routing, but the structure of the app is largely dependent on the developer. Having worked with Django so much, I decided to go with the flow. I split the core API into individual apps, each with their own urls.php and router_classes.php, which contains the routes and the handlers, respectively. Have a look at it here.

Making the list

Lastly, there remained one uphill task before I could start coding (Yeah, I hadn't started already!) I had to create a list of possible API calls that I would implement, with details of what parameters would be passed on with each request and what would be returned. The user access levels were to be decided later, once we moved on to implementing the user authentication. I have come up with a preliminary list of GET, POST, PUT and DELETE calls, and I can start coding (finally) once it is verified by my mentor.

The official coding begins in two more days, and it's going to be real fun, much like last year. Looking forward to crafting some mean looking code.