Spring Boot deployed on Heroku – On Cloud 9 ! – REST Part 2

This article is a continuation of REST Part 1 article, where I developed a small Spring REST application using Spring Boot and searched Twitter for trends found in a location, using Twitter4J API.

Please go through the application here : Spring Boot REST

In this article, I would deploy the same application in cloud. I have chosen Heroku as my cloud-of-choice.

I would follow the steps given below :

  1. Create a free Heroku account
  2. Download and install Heroku Toolbelt for Windows
  3. Prepare and push  the Spring Boot App for Heroku
  4. Test the application

Step – 1

———-

This is simple. Just go to https:www.heroku.com and signup for a free account. A free account is pretty basic, but, sufficient for this article. Heroku provides various pricing options like Free, Hobby, Standard and Performance:

heroku_pricing

You would notice that, the compute power is specified in something called a “DYNO”. A Heroku dyno is a lightweight Linux container that runs a single user-specified command. E.g, for my free account, I get 1 dyno with the following limitations : 512 MB RAM, 1 Web Dyno, 1 Worker Dyno, the container would sleep after 30 minutes of inactivity and it must sleep for 6 hours within a 24 hours cycle.

A Web Dyno is a process type that is defined in a file called “Procfile”. For our application, we would shortly built a Procfile.Only web dynos receive HTTP traffic from Heroku’s routers.

A Worker Dyno can be of any process type declared in theProcfile, other than “web”. Worker dynos are typically used for background jobs, queueing systems, and timed jobs. You can have multiple kinds of worker dynos in anapplication.

Please note that, the beauty of Heroku is that, it provides , what is known as a PaaS (Platform-As-A-Service). So, we end up having a Java HTTP container like tomcat/jetty which is capable of taking part in HTTP request-response cycles and even databases like PostGRESQL as a backend. So, all we need to do is build the application and then push the codebase from GIT / Maven to Heroku and we would end up having our application in cloud.

The other option would have been AWS EC2 (Amazon Web Service – Elastic Compute Cloud). Unlike, PaaS, EC2 provides, what is known as IaaS (Infrastructure-As-A-Service).

Simply put, the difference is that EC2 would provide the components necessary to build the environment of our choice, while Heroku(PaaS) would have the environment ready-made – But, at the cost of power and flexibility. For an IaaS like EC2, we would need to have a load-balancer sitting on top of a HTTP container like nginx and then configure and deploy DBs etc … But, it is worth mentioning that AWS does have a PaaS offering called Elastic Beanstalk, but, we would keep this debate out of our head for a while and come back to this article !

To summarize, PaaS, sort of, sits in the middle of a cloud stack, between IaaS and SaaS(Software-As-A-Service).With IaaS, we get robust provisioning, storage options, traffic routers and load balancers. With PaaS, we depend on the provider for such configuration and get a robust container with multi-language support (Java, Python, PERL, SCala etc ..), database (PostGRESQL etc ..) and with SaaS (like SalesForce) we only need to worry about the number of users and the entire application is already pre-built in the cloud.

To be honest, the debate whether one should select an IaaS(EC2) vs PaaS(Heroku) is ever raging, and I request you to take google’s help in finding out more about the differences and enrich your knowledge.

So, we end step – 1 with a free heroku account created. Let’s move on.

Step -2

———

Install Heroku Toolbelt for Windows 64 bit, which is akin to a Heroku Command Line Interface (CLI).

The link is : Heroku Toolbelt For Windows

I just made sure that I did not install it in the default “C:\Program Files …”, instead I installed it in a folder name with no space in it. It would install Heroku CLI, Ruby and GIT.

For this article, I would push the same Spring Boot Twitter application that was built in Part 1.

Copying the source from GitHub repo :

https://github.com/diptimanr/twitter-springboot

Then imported this project in my Eclipse workspace. Changed folder to the root of this project.

Then typed “heroku login” and the Heroku CLI fired up and asked for my heroku account credentials, as below.

heroku_cli

Step – 3

———-

Before the Spring Boot twitter application can be deployed in Heroku, a “Procfile” needs to be created. The procfile tell heroku if the deployed application is a web application or a java process that needs to run etc …

So, in a text editor just enter the following line and save the file as “Procfile” at the root directory of the project.

web: java -Dserver.port=$PORT -jar  target/twitter-boot-0.0.1-SNAPSHOT.jar

Notice, how we take advantage of the Spring Boot fat jar created as a result of a “mvn clean install”. Spring Boot, this way, has become an ideal fit for deploying on cloud.

In order to deploy to Heroku, we need to create a new Heroku app, by running the command :

“heroku create” in the root project directory.

heroku_create

Now, we can deploy the code by issuing a “git push heroku master” command. There would be a flurry of activities in the command promt. But, let’s just focus on some of the important steps of what’s really happening, under the hood.

The “heroku create” command essentially creates a remote repository in the local GIT repo. A random name is generated (for my case it is called : https://ancient-cove-46720.herokuapp.com/). Can’t believe that’s the name of my app ! ! No worries, we would change it shortly with the “heroku apps:rename” command ! !

So, with the “git push heroku master” the code gets deployed to heroku, heroku understands that it is a Java application from the Maven POM.xml file and then starts downloading the dependencies in the remote deployment package :

D:\Diptiman\ECLIPSE_TEST_WORKSPACE\twitterspringboot>git push heroku master  
 Counting objects: 36, done.  
 Delta compression using up to 4 threads.  
 Compressing objects: 100% (24/24), done.  
 Writing objects: 100% (36/36), 5.69 KiB | 0 bytes/s, done.  
 Total 36 (delta 1), reused 0 (delta 0)  
 remote: Compressing source files... done.  
 remote: Building source:  
 remote:  
 remote: -----> Java app detected  
 remote: -----> Installing OpenJDK 1.8... done  
 remote: -----> Installing Maven 3.3.9... done  
 remote: -----> Executing: mvn -B -DskipTests clean dependency:list install  
 remote:    [INFO] Scanning for projects...  
 remote:    [INFO] Downloading: https://repo.maven.apache.org/maven2/org/springframework/boot/spring-boot-starter-parent/1.3.1  
 remote:    [INFO] Downloaded: https://repo.maven.apache.org/maven2/org/springframework/boot/spring-boot-starter-parent/1.3.1.  
 remote:    [INFO] Downloading: https://repo.maven.apache.org/maven2/org/springframework/boot/spring-boot-dependencies/1.3.1.R  
 remote:    [INFO] Downloaded: https://repo.maven.apache.org/maven2/org/springframework/boot/spring-boot-dependencies/1.3.1.RE  
 remote:    [INFO] Downloading: https://repo.maven.apache.org/maven2/org/springframework/spring-framework-bom/4.2.4.RELEASE/sp  
 remote:    [INFO] Downloaded: https://repo.maven.apache.org/maven2/org/springframework/spring-framework-bom/4.2.4.RELEASE/spr 

…..

Once, the build is complete, it checks the procfile and runs the jar file, as below :

remote:    [INFO] ------------------------------------------------------------------------  
 remote:    [INFO] BUILD SUCCESS  
 remote:    [INFO] ------------------------------------------------------------------------  
 remote:    [INFO] Total time: 21.377 s  
 remote:    [INFO] Finished at: 2016-01-27T21:49:34+00:00  
 remote:    [INFO] Final Memory: 34M/264M  
 remote:    [INFO] ------------------------------------------------------------------------  
 remote: -----> Discovering process types  
 remote:    Procfile declares types -> (none)  
 remote:  
 remote: -----> Compressing... done, 61.2MB  
 remote: -----> Launching...  
 remote:    Released v4  
 remote:    https://ancient-cove-46720.herokuapp.com/ deployed to Heroku  
 remote:  
 remote: Verifying deploy.... done.  
 To https://git.heroku.com/ancient-cove-46720.git  
  * [new branch]   master -> master  
 D:\Diptiman\ECLIPSE_TEST_WORKSPACE\twitterspringboot> 

Step – 4

———-

Now, the twitter app is deployed in heroku cloud. To test it let’s hit the following URL :

https://ancient-cove-46720.herokuapp.com/twitterspringboot/location/tokyo

twitter_trend_tokyo

OR

https://ancient-cove-46720.herokuapp.com/twitterspringboot/location/paris

twitter_trend_paris

You would be able to see the top twitter trends for the location such as “paris”, “tokyo” etc …

This concludes , part 2 of the Spring Boot REST, deployed in cloud.

In, part 3 and the last part, I would develop and android app to consume this REST ful twitter trend search for locations JSON.

So long …

 

Advertisements

Container less REST – The beauty of Spring Boot – REST Part 1

Through, this article, I would develop a Spring Boot application with REST services exposed, to search Twitter trends. Containerless (or FatJar) application development has gained a lot of momentum, lately. Mostly, due to the ease of deploying these applications on cloud (like Heroku or AWS).

In my opinion, the term “container less” is a bit too far-fetched. It’s just that, the container and all dependencies of the application is normally built in a single monolithic JAR file, for easier deployment. If the Use Case permits, even databases like, H2, Derby are embedded within the Jar file.

Restful services packaged in a single jar could also be achieved through Dropwizard, but, this article would focus on Spring Boot as the product of choice.

Please keep in mind, such container less applications should never be designed, if the use cases are complex enough to have layered applications, distributed across multiple nodes etc … Container less applciations are quickly prototyped and deployed with ease.

In this article, I would develop a RESTful service, which would find the top twitter trends using Twitter4J API for a city / location provided. I  subsequent articles, I would deploy it in cloud and consume the service from an android application.

For usage of Twitter4J API, please go through my article on twitter4J API HERE.

I would use Eclipse 4.5.1 (Mars) for this article. I have already installed the latest Spring Tool Suite (STS) through Help -> Install New Software (Image below)

Spring_Tool_Suite_Eclipse

The article would follow the steps below :

Step 1 – Create a Spring Starter(Spring Boot) Project.

Step 2 – Manage dependencies through Maven (pom.xml)

Step 3 – Create a REST controller with Spring Boot

Step 4 – Create the Twitter Service and add Twitter dependencies

Step 5 – Run Spring Boot Application

Step 6 – Test using Spring MVC Junit

 

Step – 1

————

Once, your Eclipse would have all the Spring Tool Suite plugins installed, let’s create a Spring starter project first and let’s name it “springboottwitter” using the Eclipse Spring Boot wizard.

spring_starter_project

 

spring_starter_project_naming

We would select “web” from the Spring Boot configuration

spring_starter_web

Click “Next” and select the default and the Spring Boot project should be created with Eclipse downloading the artifacts from Maven repo. You might see exceptions in case the M2_REPO is not setup properly in Eclipse. To resolve, it, create the M2_REPO classpath variable in Eclipse and repeat the steps.

Step – 2

————

Source code of the project is kept at my GitHub repo : https://github.com/diptimanr/twitter-springboot

The generated pom.xml would have a <parent> element as below :

      <parent>  
           <groupId>org.springframework.boot</groupId>  
           <artifactId>spring-boot-starter-parent</artifactId>  
           <version>1.3.1.RELEASE</version>  
           <relativePath/> <!-- lookup parent from repository -->  
      </parent> 

and a dependency for “spring-boot-starter-web” :

   <dependency>  
          <groupId>org.springframework.boot</groupId>  
          <artifactId>spring-boot-starter-web</artifactId>  
    </dependency>

This dependency would actually bootstrap the HTTP container etc .. within the project. By default, Spring Boot packages Tomcat as the HTTP engine. But, we could use the following dependency exclusion in pom.xml, to include Jetty as the HTTP engine of choice :

 <dependency>  
   <groupId>org.springframework.boot</groupId>  
   <artifactId>spring-boot-starter-web</artifactId>  
   <exclusions>  
     <exclusion>  
       <groupId>org.springframework.boot</groupId>  
       <artifactId>spring-boot-starter-tomcat</artifactId>  
     </exclusion>  
   </exclusions>  
 </dependency>  
 <dependency>  
   <groupId>org.springframework.boot</groupId>  
   <artifactId>spring-boot-starter-jetty</artifactId>  
 </dependency>  

Now, let’s add the twitter4j dependencies, since, the application would actually query Twitter API using twitter4j API.

         <dependency>  
          <groupId>org.twitter4j</groupId>  
          <artifactId>twitter4j-stream</artifactId>  
          <version>4.0.4</version>  
        </dependency>  
         <dependency>  
          <groupId>org.twitter4j</groupId>  
          <artifactId>twitter4j-async</artifactId>  
          <version>4.0.4</version>  
        </dependency>  
        <dependency>  
          <groupId>org.twitter4j</groupId>  
          <artifactId>twitter4j-media-support</artifactId>  
          <version>4.0.4</version>  
        </dependency>  
        <dependency>  
         <groupId>com.twitter</groupId>  
         <artifactId>twitter-text</artifactId>  
         <version>1.6.1</version> <!-- or whatever the latest version is -->  
        </dependency> 

Now, the dependency setup is complete. Of course, we could have added the Spring MVC test dependencies, but, for this article, omitting all that.

Step -3

———

Spring Boot application, to run as a standalone program, needs to have an Application class which brings in the necessary infrastructure to create a main entry point towards running the application. By default, Spring Tool Suite would create an application class called “TwitterBootApplication”

package com.diptiman;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class TwitterBootApplication {

    public static void main(String[] args) {
        SpringApplication.run(TwitterBootApplication.class, args);
    }
}

The @SpringBootApplication annotation is equivalent to using @Configuration, @EnableAutoConfiguration and @ComponentScan with their default attributes.

Now, let’s focus on creating a Spring MVC controller with @RestController annotaion, which would ensure that the controller would be able to intercept HTTP requests and respond accordingly.

So, we create the following REST controller, which accepts a path variable “location” and HTTP GET methods. Once, the location is provided in the URL path, the controller would call the getTrendsForLocation() method of the TwitterService.

 package com.diptiman.controller;  
 import javax.inject.Inject;  
 import org.springframework.http.HttpStatus;  
 import org.springframework.http.ResponseEntity;  
 import org.springframework.web.bind.annotation.PathVariable;  
 import org.springframework.web.bind.annotation.RequestMapping;  
 import org.springframework.web.bind.annotation.RequestMethod;  
 import org.springframework.web.bind.annotation.RestController;  
 import com.diptiman.exception.ResourceNotFoundException;  
 import com.diptiman.service.TwitterService;  
 @RestController  
 @RequestMapping("/twitterboot")  
 public class TweetTrendController {  
      @Inject  
      private TwitterService twitterService;  
      @RequestMapping(value="/location/{location}", method=RequestMethod.GET)  
      public ResponseEntity<?> getTrends(@PathVariable String location){  
           if((null == location) || "".equals(location)){  
                throw new ResourceNotFoundException("Location not provided");   
           }  
           Iterable<String> trends = twitterService.getTrendsForLocation(location);  
           if(twitterService.getTrendsForLocation(location).isEmpty()){  
                return(new ResponseEntity<>(HttpStatus.NOT_FOUND));  
           }  
           return(new ResponseEntity<>(trends, HttpStatus.OK));  
      }  
 }  

Simple Spring MVC routine, code for this application uploaded in GitHub repo at :

 

https://github.com/diptimanr/twitter-springboot
Step – 4
———–
Twitter Service class takes the location from the REST controller, queries twitter4j API and returns the list of twitter trends for the location supplied, using Yahoo WOE (Where-On-Earth) id. Details of how this works has already been covered in my article HERE.  Twitter credentials are kept in the application.properties file and is read from there using the expressions “${property-key}” with an annotation of @Value.
 

@Component  
 public class TwitterService {  
      @Value("${twitter_consumer_key}")  
      private String twitter_consumer_key;  
      @Value("${twitter_consumer_key_secret}")  
      private String twitter_consumer_key_secret;  
      @Value("${twitter_access_token}")  
      private String access_token;  
      @Value("${twitter_access_token_secret}")  
      private String access_token_secret;  
      public List<String> getTrendsForLocation(String location){  
           List<String> twitterTrends = new ArrayList<String>();  
           Twitter twitter = getTwitterInstance();  
           try{  
                Integer idTrendLocation = getTrendLocation(location);  
                if(idTrendLocation == null || idTrendLocation == 0){  
                     System.out.println("Trend Location Not Found");  
                     throw new TwitterException("Trend Location Not Found");  
                }  
                Trends trends = twitter.getPlaceTrends(idTrendLocation);  
                for(int i = 0; i < trends.getTrends().length; i++){  
                     twitterTrends.add(trends.getTrends()[i].getName());  
                }  
           }catch(TwitterException te){  
                //System.err.println(te.getMessage());  
                twitterTrends.add(te.getMessage());  
           }  
           return(twitterTrends);  
      }  
      private Twitter getTwitterInstance(){  
           ConfigurationBuilder cb = new ConfigurationBuilder();  
           System.out.println(twitter_consumer_key);  
           cb.setDebugEnabled(true)  
                .setOAuthConsumerKey(twitter_consumer_key)  
                .setOAuthConsumerSecret(twitter_consumer_key_secret)  
                .setOAuthAccessToken(access_token)  
                .setOAuthAccessTokenSecret(access_token_secret);  
           TwitterFactory tf = new TwitterFactory(cb.build());  
           Twitter twitter = tf.getInstance();  
           return(twitter);  
      }  
      private Integer getTrendLocation(String locationName){  
           int idTrendLocation = 0;  
           try{  
                Twitter twitter = getTwitterInstance();  
                ResponseList<Location> locations;  
                locations = twitter.getAvailableTrends();  
                for(Location location : locations){  
                     if(location.getName().toLowerCase().equals(locationName.toLowerCase())){  
                          idTrendLocation = location.getWoeid();  
                          break;  
                     }  
                }  
                if(idTrendLocation > 0){  
                     return(new Integer(idTrendLocation));  
                }  
                return(0);  
           }catch(TwitterException te){  
                System.out.println("Failed to get trends : " + te.getMessage());  
                return(0);  
           }  
      }  
 }  

..............
application.properties ->

twitter_consumer_key = <YOUR_KKEY>
twitter_consumer_key_secret = <YOUR_KEY_SECRET>
twitter_access_token = <YOUR_ACCESS_TOKEN>
twitter_access_token_secret = <YOUR_ACCESS_TOKEN_SECRET>
 This completes Step 4. Now, the application is ready to go !

Step 5

----------

First, let’s do a clean build by issuing : mvn clean install in the root folder.

This would run a build and would also create a JAR file at the “target” folder.

For this application, the jar file is : twitter-boot-0.0.1-SNAPSHOT.jar

Our entire application is now contained within this jar file. Embedded tomcat, all dependencies, everything ! Since, it contains all the dependencies, the size is also not small.

To run the applciation on any OS : java -jar twitter-boot-0.0.1-SNAPSHOT.jar -> and Spring Boot would fire up the embedded HTTP engine and use the dependencies.

It could now be run on any operating system with java installed. No other configuration is required ! That’s the beauty of Spring Boot ! Instead of a WAR file, with a myriad of configuration file – just , one JAR – that’s it ! !

It could also be run from the project using : “mvn spring-boot:run” command.

So, if “london” is added as a location like :

 

http://localhost:8080/twitterspringboot/location/london

 

Twitter returns with the following JSON :

 

["#NationalHugDay","#Litvinenko","#CharacterMatters2016","#Draghi","#FishHipHop","Alex Teixeira","Victor Valdes","Papy Djilobodji","Kevin Nolan","Pet Shop Boys","Deane Smalley","Primavera","Gary Rowett","Au Quart De Tour","Jordan Rhodes","Alan Pardew","Arnold Schwarzenegger","Slaven Bilic","The War of the Worlds","Daniel Kitson","Arnie","#EdStone","#thismorning","#FMQs","#MAtech2016","#HewBeauty","#SquirrelAppreciationDay","#coasthour","#OnlyCallMyParentsIf","#ChildhoodObesity","#FolkloreThursday","#RIPBelieberLinda","#BBLSemis","#Savile","#swfuture16","#TryYourLuck","#SUAHour","#AsktheExpert","#ForceFed","#GlamourChloe","#LTC2016","#SyriaCrisis","#CarePioneers","#empowerwomen","#PCIPublicSquare","#Ask5thWaveMovie","#edinbudget","#ThirstyThursday","#1PUN","#EngagORS"]

We now have a container less Spring REST application which can find top twitter trends based on a given location !

 

Step 6

-------

Let us finish off the article with a small Spring MVC Junit testcase, which uses a Hamcrest matcher to check if there is a valid HTTP response returned when we hit the URL with a GET request :

Code, is kept at my GitHub repo :

https://github.com/diptimanr/twitter-springboot

 

@RunWith(SpringJUnit4ClassRunner.class)  
 @SpringApplicationConfiguration(classes = TwitterspringbootApplication.class)  
 @WebAppConfiguration  
 public class TwitterTrendTest {  
      @Inject  
      private WebApplicationContext webApplicationContext;  
      private MockMvc mockMvc;  
      @Before  
      public void setup(){  
           mockMvc = webAppContextSetup(webApplicationContext).build();  
      }  
      @Test  
      public void testGetAllpolls() throws Exception{  
           mockMvc.perform(get("/twitterspringboot/location/london"))  
                .andExpect(status().isOk());  
      }  
 }  

And when you run it in Eclipse, we get our beloved Green Bar ! !

twitter_test_junit

This is the end for this part 1. In subsequent articles, I would deploy this application in cloud and then try to build an Android app to ingest the json returned from Twitter …

 

So long ….

 

Cucumber + Selenium – BDD(Behaviour Driven Development) and Agile SCRUM

The other day I was talking to a dear friend of mine, who is busy working on implementing BDD(Beahviour Driven Development) features for the user stories in Agile SCRUM Sprints. He was using Cucumber java to create feature files from the user stories. These feature files in turn created the bed rock of acceptance criteria to be used during sign-off.

It is great to see that BDD (Behaviour driven Development) has picked up pace and is actively used in  AGIL SCRUM Sprints.

The process is simple. But, it requires participation from the business product owner and the SCRUM master to decide on the feature files to be created. The input feed for the sprint remains the same – a set of refined sprint backlog items created after prioritizing from the product backlog items. Then the SCRUM master, backlog owner, product owner, Business Analysts and functional testers discuss and brain storm on the acceptance criteria for the backlog items. For each user story provided by the business, feature files are created. It is easy to map a user story to feature one-to-one. But, in reality, it does not always happen so, and mostly, a user story gives rise to multiple BDD feature files.

What could also be added in the mix is Selenium , which would add the UI integration capabilities to a feature.

So, in this post, I would demonstrate the conversion of a very simple user story to a Cucumber feature file. Create a Junit runner for the BDD feature and then add selenium WebDriver browser capability to UI test the whole feature.

To quickly refresh the basic processes of an AGILE Scrum :

  1. AGILE Scrum sprint(monthly rhythm )lasts for 4 weeks (generally), with 2 weeks for development, 1 week QA and 1 week UAT
  2. Product Owners create a product backlog list
  3. SCRUM masters and product owners create a spring backlog item from the prioritized items of the product backlog list
  4. TDD is followed from day 1 and items are measured against T-Shirt sizes (S , M , L , XL , XXL …) based on the relative length of the user stories as perceived during an open estimate finalization meeting (Poker Planning Session).
  5. QA team designs test cases and finalizes acceptance criteria based on discussions with product owners.

To quickly refresh the basic processes of BDD (Behaviour Driven Development)

  1. User Stories from SCRUM and used / modified to feature files by the SCRUM master and the product owners.
  2. Feature files follows a “Feature – Scenario – Given – When – Then” kind of language to establish one or multiple acceptance criteria for the user story
  3. This lets the Development team to create effective tests to prove the scenarios described in the feature files
  4. The entire process moves from a requirement oriented project progress to an “acceptance criteria” drive project progress, enabling a tight coupling of outcome and ownership between IT and business

To quickly refresh capabilities of Selenium Web Driver :

  1. Selenium Web Driver provides capabilities to test the UI and integration flows of a web application by imitating browser events.
  2. Traditional test runners (JUint, TestNG) can be used to run these Selenium tests

 

In this post, I would take a very simple example of one web page with one textbox and a submit button. Upon clicking the submit button, a new page would be opened with the value entered in the previous page.

Let’s see, if we can combine BDD and Selenium together through this simple demonstration.

First Job is to create the simple web app and run it from eclipse on a tomcat container.

To achieve this, I created a simple Eclipse project called ‘bddweb’ by using the default ‘File -> New Project -> Dynamic Web Project’ and created 2 JSPs userDataEntry.jsp and success.jsp.

userDataEntry.jsp

=============

<%@ taglib prefix = "c" uri = "http://java.sun.com/jsp/jstl/core" %>  
...
<form method = "POST" action = "success.jsp">
....
  <input type = "text" id = "mytext" name = "mytext"/>
  <input type = "submit" id = "submit" value = "Submit" />
...
 success.jsp
 ============
<%@ taglib prefix = "c" uri = "http://java.sun.com/jsp/jstl/core" %>
....
<head>
  <title>Success Page</title>
</head>
....
<c:out value = "${param['mytext']}"/>
....
The idea is : userDataEntry.jsp screen has a input textfield, which the user would fill up and upon clicking submit, the next page would show up (success.jsp), with the value entered by the user.

Simplest of an example, but, it is enough to drive home the idea !

Just to test the application, let’s quickly run it on Eclipse Tomcat :

userdataentry

success

With this, our Step 1 is now complete. Let us now focus on the BDD part.

First, install the Eclipse Cucumber plugin from :

https://marketplace.eclipse.org/content/cucumber-jvm-eclipse-plugin

This would help us creating something called a ‘Feature File’ in the editor.

Let us quickly, refresh our memory on the objective :

  1. Since, the example takes place in an Agile SCRUM Sprint, requirements have arrived in the form of a User Story.
  2. We want to utilize Behaviour Driven Development(BDD) techniques and convert the User Story to a Feature file, so that, the product owner himself can set the acceptance criteria for the user story requirement / product feature
  3. Once, the feature file is created, we would use a combination of Junit, Cucumber and Selenium to test and automate the feature testing bit of the project.

Keeping in mind the objectives given above and the web screens already developed, we start by creating a simple maven quickstart archetype in eclipse. I have named my projectcucumber-bdd.

Here’s the main snippets of the POM file :

…………………

<properties>  
   <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>  
   <junit.version>4.11</junit.version>  
   <cucumber.version>1.1.7</cucumber.version>  
   <selenium.version>2.48.2</selenium.version>  
  </properties>  
  <dependencies>  
   <!-- JUnit Version -->  
   <dependency>  
    <groupId>junit</groupId>  
    <artifactId>junit</artifactId>  
    <version>${junit.version}</version>  
    <scope>test</scope>  
   </dependency>  
   <!-- Cucumber Version -->  
   <dependency>  
     <groupId>info.cukes</groupId>  
     <artifactId>cucumber-java</artifactId>  
     <version>${cucumber.version}</version>  
     <scope>test</scope>  
   </dependency>  
   <dependency>  
     <groupId>info.cukes</groupId>  
     <artifactId>cucumber-junit</artifactId>  
     <version>${cucumber.version}</version>  
     <scope>test</scope>  
   </dependency>  
   <!-- Selenium Version -->  
   <dependency>  
     <groupId>org.seleniumhq.selenium</groupId>  
     <artifactId>selenium-java</artifactId>  
     <version>${selenium.version}</version>  
   </dependency>  
  </dependencies>
............................

The Eclipse project is kept at my GitHub repo :

https://github.com/diptimanr/cucumber-bdd

I created the base package as com.diptiman.bdd and then in the test/resources folder, created the same package hierarchy -> com.diptiman.bdd and created a file called “datasubmit.feature”

The content of this feature file :

 

 Feature: Data Submit Action  
 Scenario: Successful Data Submission  
      Given user navigates to data entry page  
      When user enters data  
      Then Message Displayed data submitted successfully  

If you use the cucumber Eclipse plugin, you would also notice the highlighted key words, which makes it easy to spot errors.

 

Pretty intuitive, huh ! This is the goal and the beauty of BDD ! The product owner can write the file himself and thus, become and integral part of the acceptance criteria definition. Though, feature files can become complex in cases, give or take, the simplicity is addictive enough ! !

In Eclipse, considering that the cucumber eclipse plugin is successfully installed, if you run the feature file using “Run As -> Cucumber Feature”, the console would print the skeleton of something called the Step definition.

These are nothing but the methods which needs to be copied to the StepDefintion Java class and run using a Junit runner.

We would create the StepDefinition Java file and inject the Selenium WebDriver auto test snippets within the steps, so that, when the Junit runner runs through the step defitions, it runs the entire automation suite.

So, for the above feature file, we would create a DataSubmissionStepDefinition.java with the following steps :

package com.diptiman.bdd;  
 import static org.junit.Assert.*;  
 import java.util.concurrent.TimeUnit;  
 import org.openqa.selenium.By;  
 import org.openqa.selenium.WebDriver;  
 import org.openqa.selenium.WebElement;  
 import org.openqa.selenium.firefox.FirefoxDriver;  
 import cucumber.api.java.en.Given;  
 import cucumber.api.java.en.Then;  
 import cucumber.api.java.en.When;  
 public class DataSubmissionStepDefinition {  
      
      private WebElement element;  
      private WebDriver driver;  
      
      @Given("^user navigates to data entry page$")  
      public void user_navigaes_to_data_entry_page() throws Throwable {  
        driver = new FirefoxDriver();  
        driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);  
        driver.get("http://localhost:8080/bddweb/userDataEntry.jsp");  
      }  
      @When("^user enters data$")  
      public void user_enters_data() throws Throwable {  
           driver.findElement(By.id("mytext")).sendKeys("Cool Cucumber !");   
           driver.findElement(By.id("submit")).click();  
           driver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS);  
      }  
      @Then("^Message Displayed data submitted successfully$")  
      public void message_displayed_data_submitted_successfully() throws Throwable {  
            String successTitle = driver.getTitle();  
            assertEquals("Success Title Matched", successTitle, "Success Page");  
            driver.close();  
      }  
 }

Notice, how selenium now owns the feature steps and would drive the entire automation from within.

We create the Junit runner subsequently, to run the StepDefintion :

package com.diptiman.bdd;  
 import org.junit.runner.RunWith;  
 import cucumber.api.CucumberOptions;  
 import cucumber.api.junit.Cucumber;  
 @RunWith(Cucumber.class)  
 @CucumberOptions(monochrome = true)  
 public class BDDRunner {  
 } 

Now, if you rightclick and run the BDDRunner class as a JUnit test in Eclispe, the following would happen :

  1. JUNit runner would invoke the cucumber StepDefintion class to execute the steps matching the Feature file conditions
  2. Once, each method of the StepDefinition class start firing, it would create a Selenium WebDriver instance to open a Firefox browser (You would have to make sure that the firefox.exe is present in the System path)
  3. Selenium WebDriver would execute the automated test using the inputs provided through the selenium commands and would complete the test
  4. In the end you should see the familiar JUnit green test passed sign as well as the browser should be closed.

Again, this is only the simplest of examples, but, using cucumber, selenium and junit, Agile Scrum and BDD could be brought together on the table. Thus, forging a stronger bond between the IT Development team and the Business Product Owners.

So long ….

Search client for ElasticSearch Search Engine

This short post is about creating a search client for the search engine we have built using my previous post on ElasticSearch.

Since, the API to communicate with ElasticSearch relies on JSON as the medium of exchange, it is possible to use any HTTP Client to search an ElasticSearch index.

But, folks at ElasticSearch have created a very detailed and elaborate Java API for performing all the operations needed to communicate :

  1. Index creation
  2. Get Documents
  3. Delete documents
  4. Update documents
  5. Bulk API
  6. Search API
  7. Count API

Etc ….

In this post, I would create a Main class to run a quick search on the index I created in my previous post.

The code is :

 import java.util.Map;  
 import org.elasticsearch.action.search.SearchResponse;  
 import org.elasticsearch.action.search.SearchType;  
 import org.elasticsearch.client.Client;  
 import org.elasticsearch.index.query.QueryBuilders;  
 import org.elasticsearch.node.Node;  
 import org.elasticsearch.node.NodeBuilder;  
 import org.elasticsearch.search.SearchHit;  
 public class SearchMain throws Exception{  
     public static void main(String[] args) throws Exception{  
         Node node = NodeBuilder.nodeBuilder().clusterName("elasticsearch").node();  
         Client client = node.client();  
         searchDocument(client, "mycstutorial", "page", "html", "insertionsort");  
         if(node != null){  
             node.close();  
         }  
     }  
     public static void searchDocument(Client client, String index, String type, String field, String value){  
         SearchResponse searchResponse = client.prepareSearch(index)  
                                         .setTypes(type)  
                                         .setSearchType(SearchType.QUERY_AND_FETCH)  
                                         .setQuery(QueryBuilders.termQuery(field, value))  
                                         .setFrom(0).setSize(5).setExplain(true)  
                                         .execute()  
                                         .actionGet();  
         SearchHit[] results = searchResponse.getHits().getHits();  
         for(SearchHit hit : results){  
             Map<String, Object> result = hit.getSource();  
             System.out.println(result);  
         }  
     }  
 }

It is as simple as creating the Node abstraction, where my Elasticsearch is running, locating the index and then fire the Lucene search using the familiar Lucene terms and Queries.

Do not forget to close the node reference , once you are done ! ! !

Code is kept at the GitHub repo :

https://github.com/diptimanr/ES-Krwkrw

So, long …

Full Text Search Engine for a Web Site – ElasticSearch and KrwKrw / jSoup

In this post I would create a search engine which would be made up of scraped website content ingested through ElasticSearch.

Search Functionality is of course is a much bigger topic of discussion. Web crawling, information retrieval, indexing and subsequent search would comprise of an industrial grade search engine.

In this post, I take a much simpler use case. Let’s say, I have a website (html or dynamic pages) and I want to enable a search functionality within the website.

The steps that I am going to follow :

  1. Scrape the website and get the data extracted. Scraping , to be more accurate, extracts contents from a single domain. Unlike crawling, which is much bigger in scope and extracts data from multiple websites and domains.
  2. Create and index in ElasticSearch
  3. Index the data
  4. Apply search for terms on the indexed data.

Let’s implement the steps above one-by-one.

For scraping I was thinking of using jSoup (jsoup.org ), one of the most simple, powerful and open-source web scraping library with a clean API. Then,I stumbled upon the package “ Krwkrw ” (I pronounce it as CRAW-CRAW ! !), created by Dadepo Aderemi, on github (https://github.com/dadepo/Krwkrw ).

The beauty of Krwkrw is, it does the heavy lifting of using jSoup to extract content form the website and upload the data to ElasticSearch all by itself, using a well-designed Strategy Design Pattern implementation. In fact, it has support for Async Tasks and Callback interfaces as well.

So, I pulled the Krwkrw source code from git Hub to my Eclipse project ( a simple maven-archetype-quickstart ). Created a Main class where I provided the ElasticSearch IP, PORTs, etc … and implemented a ExitCallback class from the Krwkrw ExitCallback interface. The code is kept at my GitHub repo : https://github.com/diptimanr

 

Let’s go straight to the code then ….

I downloaded the source of Krwkrw from Dadepo’s Github repo :

https://github.com/dadepo/Krwkrw

and made 2 small changes :

  1. I took the configuration information from the ElasticSearchAction.java ( those were only for demo and make us understand how it works, in any case .) and put it in my main class.
  2. I created am ExitCallback, just to be notified once the scraping was over. Though, I did not run an AsyncTask, this would come in handy. For an AsyncTask, this is mandatory, to know when the scraping is finished.

I chose the public website, http://www.mycstutorials.com/articles/ as my experimental subject !

I already had my ElasticSearch server up and running at localhost : 9200. For setting up ElasticSearch server, please refer my earlier post on ElasticSearch server setup at the URL below :

Setting Up ElasticSearch Server

This is how I removed the configuration information from ElasticSearchAction.java :

package com.blogspot.geekabyte.krwler.util;  
 import com.blogspot.geekabyte.krwler.FetchedPage;  
 import com.blogspot.geekabyte.krwler.interfaces.KrwlerAction;  
 import org.elasticsearch.action.index.IndexResponse;  
 import org.elasticsearch.client.Client;  
 import org.jsoup.Jsoup;  
 import org.slf4j.Logger;  
 import org.slf4j.LoggerFactory;  
 import java.io.IOException;  
 import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder;  
 /**  
  * Implementation of {@link KrwlerAction} that inserts crawled pages  
  * to an ElasticSearch index.  
  *  
  * @author Dadepo Aderemi.  
  */  
 public class ElasticSearchAction implements KrwlerAction {  
   private Logger logger = LoggerFactory.getLogger(ElasticSearchAction.class);  
   private String clusterName;  
   private String indexName;  
   private String documentType;  
   private String hostName;  
   private int idCount = 0;  
   private boolean convertToPlainText = false;  
   private int port;  
   Client client;
........................
........................
Notice the execute() method signature in the ElasticSearchAction.java class :

 response = client.prepareIndex(getIndexName(), getDocumentType(), String.valueOf(++idCount))  
                       .setSource(jsonBuilder()  
                        .startObject()  
                        .field("url", page.getUrl())  
                        .field("title", page.getTitle())  
                        .field("sourceUrl", page.getSourceUrl())  
                        .field("html", convert? Jsoup.parse(page.getHtml()).text(): page.getHtml())  
                        .field("status", page.getStatus())  
                        .field("loadTime", page.getLoadTime())  
                       .endObject())  
                       .execute()  
                       .actionGet(); 

The fields are known Lucene fields and good enough for the purpose of our experiment- url, title, sourceUrl, html, status and loadTime.

For our simple search scenarios, we can search within the ‘html’ field and then display the ‘sourceUrl’ as an URL for the result.

As mentioned in my previous post on setting up an index with ElasticSearch, we can create a detailed mapping for the above fields and modify the default ElasticSearch properties.

Let’s now, look at the main class :

ESClient.java -> package com.blogspot.geekabyte.krwler.client

package com.blogspot.geekabyte.krwler.client;  
 import java.io.IOException;  
 import java.net.URISyntaxException;  
 import org.elasticsearch.client.Client;  
 import org.elasticsearch.node.Node;  
 import org.elasticsearch.node.NodeBuilder;  
 import com.blogspot.geekabyte.krwler.Krwkrw;  
 import com.blogspot.geekabyte.krwler.interfaces.callbacks.ExitCallbackImpl;  
 import com.blogspot.geekabyte.krwler.util.ElasticSearchAction;  
 public class ESClient {  
     public static void main(String[] args) {  
         try{  
             Node node = NodeBuilder.nodeBuilder().node();  
             Client client = node.client();  
             ElasticSearchAction action = ElasticSearchAction.builder()  
                     .convertToPlainText(true)  
                     .setClient(client)  
                     .setClusterName("elasticsearch")  
                     .setDocumentType("page")  
                     .setHost("localhost")  
                     .setIndexName("mycstutorial")  
                     .setPort(9200)  
                     .buildAction();  
         Krwkrw crawler = new Krwkrw(action);  
         crawler.setDelay(5);  
         //Set<String> strings = crawler.crawl("http://www.mycstutorials.com/articles/");  
         ExitCallbackImpl exitCallbackImpl = new ExitCallbackImpl();  
         exitCallbackImpl.callBack(crawler.crawl("http://www.mycstutorials.com/articles/"));  
         crawler.onExit(exitCallbackImpl);  
         System.exit(0);  
         }catch(URISyntaxException use){  
             System.out.println(use.getMessage());  
         }catch(InterruptedException ie){  
             System.out.println(ie.getMessage());  
         }catch(IOException io){  
             System.out.println(io.getMessage());  
         }  
     }  
 }

This is the main class which I created and kept it under the package ‘client’.

First, I entered the ElasticSearch configuration information through the Builder fluent API of ElasticSearchAction class of Krwkrw.

I created an index called ‘ mycstutorial’ and a type called ‘page’.

Then the crawler starts fetching content with a 5 second delay between each request fetch. Then I created an implementation of KrwlrExitCallback to be notified when the content extraction was finished, and I forcefully terminated the main thread using a System.exit(0).

The Exit Callback implementation is :

 package com.blogspot.geekabyte.krwler.interfaces.callbacks;  
 import java.util.Set;  
 public class ExitCallbackImpl implements KrwlerExitCallback {  
     @Override  
     public void callBack(Set<String> crawledUrls) {  
         System.out.println("Crawling Complete .....");  
     }  
 }  

Now, let us run the main method, extract the data, push the data in the ElasticSearch index and then test the search functionality using Sense.

Once, the main method runs, this is what the log reports :

 [main] INFO org.elasticsearch.node - [Kingo Sunen] version[1.5.0], pid[18896], build[5448160/2015-03-23T14:30:58Z]   
  [main] INFO org.elasticsearch.node - [Kingo Sunen] initializing ...   
  [main] INFO org.elasticsearch.plugins - [Kingo Sunen] loaded [], sites []   
  [main] INFO org.elasticsearch.node - [Kingo Sunen] initialized   
  [main] INFO org.elasticsearch.node - [Kingo Sunen] starting ...   
  [main] INFO org.elasticsearch.transport - [Kingo Sunen] bound_address {inet[/0:0:0:0:0:0:0:0:9301]}, publish_address {inet[/192.168.1.85:9301]}   
  [main] INFO org.elasticsearch.discovery - [Kingo Sunen] elasticsearch/-1BVFVb6TSKPcFLSPpQSHw   
  [elasticsearch[Kingo Sunen][clusterService#updateTask][T#1]] INFO org.elasticsearch.cluster.service - [Kingo Sunen] detected_master [Ogress][BlW2QcJeT-Ot35yqNCoEww][LT024395][inet[/192.168.1.85:9300]], added {[Ogress][BlW2QcJeT-Ot35yqNCoEww][LT024395][inet[/192.168.1.85:9300]],}, reason: zen-disco-receive(from master [[Ogress][BlW2QcJeT-Ot35yqNCoEww][LT024395][inet[/192.168.1.85:9300]]])   
  [elasticsearch[Kingo Sunen][clusterService#updateTask][T#1]] INFO org.elasticsearch.gateway.local.state.meta - [Kingo Sunen] [mycstutorial] dangling index, exists on local file system, but not in cluster metadata, auto import to cluster state [YES]   
  [main] INFO org.elasticsearch.http - [Kingo Sunen] bound_address {inet[/0:0:0:0:0:0:0:0:9201]}, publish_address {inet[/192.168.1.85:9201]}   
  [main] INFO org.elasticsearch.node - [Kingo Sunen] started   
  [main] INFO com.blogspot.geekabyte.krwler.Krwkrw - Fetched http://www.mycstutorials.com/articles/ with User Agent: Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6 and Referral www.google.com   
  [main] INFO com.blogspot.geekabyte.krwler.util.ElasticSearchAction - Index page at http://www.mycstutorials.com/articles/ into ElasticSearch, with ElasticSearch id of 1   
  [main] INFO com.blogspot.geekabyte.krwler.Krwkrw - Crawled http://www.mycstutorials.com/articles/   
  [main] INFO com.blogspot.geekabyte.krwler.Krwkrw - 5 seconds delay before next request   
 One by one, all the pages from the website is crawled and fetched and added in the index provided in the Main class.  
 Once, the entire operation is over, the exit callback fires, as below :  
 [main] INFO com.blogspot.geekabyte.krwler.Krwkrw - 5 seconds delay before next request   
  [main] INFO com.blogspot.geekabyte.krwler.Krwkrw - Fetched http://www.mycstutorials.com/articles/sorting/mergesort with User Agent: Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6 and Referral www.google.com   
  [main] INFO com.blogspot.geekabyte.krwler.util.ElasticSearchAction - Index page at http://www.mycstutorials.com/articles/sorting/mergesort into ElasticSearch, with ElasticSearch id of 10   
  [main] INFO com.blogspot.geekabyte.krwler.Krwkrw - Crawled http://www.mycstutorials.com/articles/sorting/mergesort   
  [main] INFO com.blogspot.geekabyte.krwler.Krwkrw - 5 seconds delay before next request   
  Crawling Complete .....

Now, we have an index with name : ‘mycstutorial’, which has the content of the entire website : , http://www.mycstutorials.com/articles/        extracted and added as Lucene indexes within ElasticSearch. Cool ! ! !

To test the search functionality, let’s fire up Sense at :

http://localhost:9200/_plugin/marvel/sense/index.html

A simple : “GET /mycstutorial/_search “  would respond with :

{   
   "took": 749,   
   "timed_out": false,   
   "_shards": {   
   "total": 5,   
   "successful": 5,   
   "failed": 0   
   },   
   "hits": {   
   "total": 10,   
   "max_score": 1,   
   "hits": [   
    {   
     "_index": "mycstutorial",   
     "_type": "page",   
     "_id": "4",   
     "_score": 1,   
     "_source": {   
      "url": "http://www.mycstutorials.com/",   
      "title": "mycstutorials.com",   
      "sourceUrl": "http://www.mycstutorials.com/articles/sorting/quicksort",   
      "html": "mycstutorials.com menu - Articles featured articles .........
........................
.......................

The default mappings created by ElasticSearch can be seen with a :

“GET /mycstutorial/_mapping” , which responds with :

{   
   "mycstutorial": {   
   "mappings": {   
    "page": {   
     "properties": {   
      "html": {   
      "type": "string"   
      },   
      "loadTime": {   
      "type": "long"   
      },   
      "sourceUrl": {   
      "type": "string"   
      },   
      "status": {   
      "type": "long"   
      },   
      "title": {   
      "type": "string"   
      },   
      "url": {   
      "type": "string"   
      }   
     }   
    }   
   }   
   }   
  }

To test our full-text-search functionality, let us search for ‘insertionsort’ by issuing the Sense REST GET command :

GET mycstutorial/_search   
  {   
  "query": {   
   "query_string": {   
   "default_field": "html",   
   "query": "insertionsort"   
   }   
  }   
  }   

And ElasticSearch responds with :

 

{   
   "took": 2,   
   "timed_out": false,   
   "_shards": {   
   "total": 5,   
   "successful": 5,   
   "failed": 0   
   },   
   "hits": {   
   "total": 1,   
   "max_score": 0.0390625,   
   "hits": [   
    {   
     "_index": "mycstutorial",   
     "_type": "page",   
     "_id": "6",   
     "_score": 0.0390625,   
     "_source": {   
      "url": "http://www.mycstutorials.com/articles/sorting/insertionsort",   
      "title": "Sorting Algorithms - Insertion Sort Tutorial, Example, and Java code",   
      "sourceUrl": "http://www.mycstutorials.com/articles/",   
      "html": "Sorting Algorithms - Insertion Sort Tutorial, Example, and Java code menu ............
..................

This illustrates, the power and ease of creating a simple full-text-search engine for website using ElasticSearch, jSoup and Krwkrw.

I have kept the sample eclipse project at my github repo.

I would also add a short post on creating a Java Search client class for testing the search shortly.

https://github.com/diptimanr/ES-Krwkrw

So long ……

 


 




 

Twitter4J – Extracting data from Twitter with Twitter API

In this post, I would demonstrate how to create a simple, standalone Eclipse project to connect to Twitter and extract data using the wonderful Twitter4J API.

Although, the API is quite simple to use, extracted data from Twitter, by itself, does not mean a lot.

But, analysing the data using sophisticated tools reveal amazing insights into habits of net users.

I intend to use this post as a starting point to explain and describe the various ways of using Twitter data. In subsequent posts, I wish to build a simple Android App to connect and extract twitter data and later another post to analyse twitter data using sentiment analysers.

To connect and extract data from Twitter using Twitter APIs, the first step is to create a twitter app and get it approved. Pre-requisite to this step is to have a twitter username and password.

Connect to :

https://apps.twitter.com/

with your twitter username and password, there would be a button to create a new application.

Normally, Application Name, Application Description and Website are the 3 mandatory fields to fill-in.

twitter_create_app_form

Submitting the form successfully would create the new app , as below :

twitter_app_created

Click on the Keys and Access Tokens tab, and you would see the following :

twitter_consumer_keys

Consumer Key and the Consumer secret (API Secret) are 2 of the application settings that we would use for authentication.

In the same page below, there should be a button with the name “ Create Access Token” – clicking it would produce the following :

twitter_access_tokens

 

Access Token, Access Token Secret along with the Consumer Key and the Consumer Secret would essentially complete the authentication process.

Creating and Application and then getting the keys and authentication is known as “Application-user authentication”. In this case , the signed request both identifies the application’s identity in addition to the identity accompanying granted permissions of the end-user you’re making API calls on behalf of, represented by the user’s access token. Other mode of authentication is “Application-only authentication”, where the application send authentication requests without the user-context. Though, in the second case, it is not possible to post tweets.

So, we would  go with “Application-user authentication”

Point to note here, is that for both the above scenarios, twitter applies a 15 minutes, rate limit window. At this time, users represented by access tokens can make 180 requests/queries per 15 minutes. Using application-only auth, an application can make 450 queries/requests per 15 minutes on its own behalf without a user context.

Now that we have all our secret keys and access tokens, let’s straight jump into creating a simple project to connect to twitter.

Using, Eclipse -> New Project -> New Maven project -> select maven-quickstart-archetype and any package name of your choice.

The POM.XML should contain the following :

 <dependency>  
     <groupId>org.twitter4j</groupId>  
     <artifactId>twitter4j-core</artifactId>  
     <version>4.0.4</version>  
   </dependency>  
   <dependency>  
     <groupId>org.twitter4j</groupId>  
     <artifactId>twitter4j-stream</artifactId>  
     <version>4.0.4</version>  
   </dependency>  
    <dependency>  
     <groupId>org.twitter4j</groupId>  
     <artifactId>twitter4j-async</artifactId>  
     <version>4.0.4</version>  
   </dependency>  
   <dependency>  
     <groupId>org.twitter4j</groupId>  
     <artifactId>twitter4j-media-support</artifactId>  
     <version>4.0.4</version>  
   </dependency>

This would enable twitter4j dependencies.

Let’s create a simple search for our first scenario. Create a public static void main(String args[]) method like below. The code would search for tweets with the hashtag “London” for 4th December, 2015

private final static String CONSUMER_KEY = "<YOUR CONSUMER KEY>";   
    private final static String CONSUMER_KEY_SECRET = "<YOUR CONSUMER KEY SECRET>";   
    private final static String ACCESS_TOKEN = "<Your access token>";   
    private final static String ACCESS_TOKEN_SECRET = "<Your access token secret>";   
    public static void main(String[] args) throws Exception{   
       ConfigurationBuilder cb = new ConfigurationBuilder();   
       cb.setDebugEnabled(true)   
         .setOAuthConsumerKey(CONSUMER_KEY)   
         .setOAuthConsumerSecret(CONSUMER_KEY_SECRET)   
         .setOAuthAccessToken(ACCESS_TOKEN)   
         .setOAuthAccessTokenSecret(ACCESS_TOKEN_SECRET);   
       TwitterFactory tf = new TwitterFactory(cb.build());   
       Twitter twitter = tf.getInstance();   
       Query query = new Query("#london");   
       LocalDate date = LocalDate.of(2015, 12, 04);   
       DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyyMMdd");   
       String formattedDate = date.format(formatter);   
       query.setSince(formattedDate);   
       QueryResult result;   
       do{   
         result = twitter.search(query);   
         List<Status> tweets = result.getTweets();   
         for(Status tweet : tweets){   
            System.out.println("@"+tweet.getUser().getScreenName() + "|||" + tweet.getText()+"|||"+ tweet.isRetweeted());   
         }   
       }while((query = result.nextQuery()) != null);   
    }  

And, the result would print the twitter handle, the actual text of the tweet and a Boolean flag if the tweet was retwitted.

Also, note the ConfigurationBuilder class, which I have selected, because of it’s simplicity in using the fluent API. There are many different ways of completing the authentication process and create the ConfigurationBuilder. I recommend going through their short yet precise example filled site at :

http://twitter4j.org/en/index.html

Twitter4J is an unofficial API for accessing Twitter API data, but, it is simple, yet very powerful.

Another small example is given below for finding out the top trending topics for a specific location(London in this case).

 Twitter twitter = TwitterUtil.getTwitterInstance();  
         ResponseList<Location> locations;  
         locations = twitter.getAvailableTrends();  
         Integer idTrendLocation = getTrendLocation("london");  
         if(idTrendLocation == null){  
             System.out.println("Trend Location Not Found");  
             System.exit(0);  
         }  
         Trends trends = twitter.getPlaceTrends(idTrendLocation);  
         for(int i = 0; i < trends.getTrends().length; i++){  
             System.out.println(trends.getTrends()[i].getName());  
         }  
..........
..........
private static Integer getTrendLocation(String locationName){  
         int idTrendLocation = 0;  
         try{  
             Twitter twitter = TwitterUtil.getTwitterInstance();              
             ResponseList<Location> locations;  
             locations = twitter.getAvailableTrends();  
             for(Location location : locations){  
                 if(location.getName().toLowerCase().equals(locationName.toLowerCase())){  
                     idTrendLocation = location.getWoeid();  
                     break;  
                 }  
             }  
             if(idTrendLocation > 0){  
                 return(new Integer(idTrendLocation));  
             }  
             return(null);  
         }catch(TwitterException te){  
             te.printStackTrace();  
             System.out.println("Failed to get trends : " + te.getMessage());  
             return(null);  
         }  
     }

The location is again based on Yahoo WOE(Where On Earth) Identifier. It is a 32 bit reference identifier that identifies any feature on earth. So, the location specified denotes a WOE id, on which Twitter applies its search.

Working examples are kept at my github folder:

https://github.com/diptimanr/Twitter4JDemo

 

So, long ….


 

ElasticSearch – Introduction, Creating an Index and Simple Search

I started to dabble with ElasticSearch pretty late ! I was more into understanding Lucene and Tika better. Now, I really feel, I missed out on the beauty and simplicity of ElasticSearch.

These folks have taken Full-Text-Search to a whole new level. Through, this post, I would try to illustrate the basics that I have learnt of ElasticSearch and in the next post I would create a small search engine of a regular website, which would be crawled and the crawled data fed to ElasticSearch for providing full-text-search capabilities.

For, the demo, I had installed the following on my 64 bit Windows 8.1 laptop :

  1. Eclipse Mars
  2. Java 8 – with JAVA_HOME setup
  3. Downloaded ElasticSearch 1.5 zip file and unzipped it in a folder : D:\elasticsearch-1.5.0

 

The ElasticSearch download link would be at :

https://www.elastic.co/downloads/past-releases/elasticsearch-1-5-0

You should be able to see a folder structure like below :

ElasticSearch Folder
Directory Structure of ElasticSearch

While, this post would only delve with things within the “bin” and the “plugins” folder, it is important to know that the “data” folder is used to store the index (or, actually the shard) files allocated on the node.

First thing, we would do would be to install the Graphical Monitoring Tool for ElasticSearch, known as “ Marvel”. Marvel would be installed as a plugin with the following command, run from the folder ,  D:\elasticsearch-1.5.0 :

/bin/plugin -i  elasticsearch/marvel/latest

Marvel needs a paid license for production usage, but, for development use, it is free.

Once, the Marvel plugin is installed, you can check the “plugins” folder, it should have the “marvel” folder with the marvel jar file.

Next, elasticsearch server is started from D:\elasticsearch-1.5.0\bin, by running the elasticsearch.bat file. To shut it down, a regular Ctrl-C is good enough !

Everytine, the elasticsearch server starts, it starts with a random (and funny ! !) node name, like :

[Silverclaw] version[1.5.0], pid[8752], build[5448160/2015-03-23T14:30:58Z]

It is interesting to note that the random names are part of the design and is buried deep in the source code at :

src/main/resources/config/names.txt

By default, there are 3000 Marvel comic superhero characters in this text file, and each time the ES server starts up, it picks up with a random character name as the node name.

Though, all the names can be changed through, /config/elasticsearch.yml configuration file.

To, access Marvel, go to :

http://localhost:9200/_plugin/marvel/

This would open up the Marvel dashboard as below :

marvel_dashboard
marvel_dashboard

This dashboard provides a comprehensive view of the nodes, cluster, shards, index query etc …

To access the graphical REST client “Sense” to interact with ES, go to :

http://localhost:9200/_plugin/marvel/sense/index.html

This provides, a left pane where all REST commands are written and a right pane where the results are displayed. As an alternative to Sense, cURL could also be used to fire the REST commands.

I prefer Sense, for it’s code-completion and beautiful interface.

Now that, the basic ES  server is up and running. I would create an index, insert data into the index and execute some search queries against the index.

Let’s say, I want to create an index where I would create an index of albums released by Grunge bands of the early 90’s.

To create the index, issue the following command in the left pane of Sense :

PUT  /grunge-bands

You would see that ES responds with the following message on the right pane :

{

   “acknowledged”: true

}

sense_command_panes
sense_command_panes

The response is in JSON. We would see in a short while that the request is also made through JSON.

Since, the request – response model communicates visa JSON, it becomes extremely easy for web applications, mobile applications to easily interact with ES.

Now, we decide to put some data in the index. ES, is known to support a schema-less mode of communications, which means, that even without specifying the data types of the fields which would be stored in the index, ES would intelligently process the types of the fields.

But, we would still provide a “mapping” of the fields which needs to be stored, analyzed and indexed.

Now that the index is created, we can quickly query the index as well :

GET /grunge-bands generates the following response :
{
   "grunge-bands": {
      "aliases": {},
      "mappings": {},
      "settings": {
         "index": {
            "creation_date": "1449066091435",
            "number_of_shards": "5",
            "number_of_replicas": "1",
            "version": {
               "created": "1050099"
            },
            "uuid": "Rv0d--AeR7ycOEKUNOacvA"
         }
      },
      "warmers": {}
   }
}

We would talk about shards and replicas in a different posts. Let’s focus on the “mappings” key for the time being. We would shortly create a new mapping and create the type and fields of our index.

Let’s fire the following mapping definition for our “grunge-bands” index :

Notice, how I have introduced the “album” type just in front of the mapping in the URL.

PUT /grunge-bands/album/_mapping
{
  "album" : {
  "properties" : {
    "id" : {
      "type" : "integer","store" : "yes", "index" : "not_analyzed"
    },
    "name" : {
      "type" : "string", "index" : "analyzed"
    },
    "band" : {
      "type" : "string", "index" : "analyzed"
    },
    "year" : {
      "type" : "integer","store" : "yes", "index" : "not_analyzed"
    },
    "genre" : {
      "type" : "string", "index" : "analyzed"
    }
  }
 }
}

ES responds with the following in the right pane :

{

“acknowledged”: true

}

 

You can quickly check the mappings easily with the following command on the left pane of Sense :

GET /grunge-bands/_mapping

The right pane would explicitly describe the mappings that we have just now created.

Now that, the mapping is created for the album type, let us insert some data in the index :

To insert our first data , issue a POST request to our index, such as :

POST /grunge-bands/album
{
  "id" : 1,
  "name" : "Nevermind",
  "band" : "Nirvana",
  "year" : 1991,
  "genre" : ["grunge, rock"]
}

The right pane of Sense would respond as below :

{
   "_index": "grunge-bands",
   "_type": "album",
   "_id": "AVFjKAg1YJhE74wR8ljo",
   "_version": 1,
   "created": true
}

You can immediately check the data by the following query :

GET /grunge-bands/album/_search

ES would repond with the following :

{  
   "took": 54,  
   "timed_out": false,  
   "_shards": {  
    "total": 5,  
    "successful": 5,  
    "failed": 0  
   },  
   "hits": {  
    "total": 1,  
    "max_score": 1,  
    "hits": [  
      {  
       "_index": "grunge-bands",  
       "_type": "album",  
       "_id": "AVFjKAg1YJhE74wR8ljo",  
       "_score": 1,  
       "_source": {  
         "id": 1,  
         "name": "Nevermind",  
         "band": "Nirvana",  
         "year": 1991,  
         "genre": [  
          "grunge, rock"  
         ]  
       }  
      }  
    ]  
   }  
 } 

Most of the fields are self-explanatory, but, special focus please on the fields : hits, total, _score

If you have any background of working with Lucene before, you would readily understand, that, under the hood ES actually have returned a search from the index created.

You can also issue a simple json query like :

GET /grunge-bands/album/_search
{
  "query": {
    "query_string": {
      "default_field": "name",
      "query": "nevermind"
    }
  }
}

Which would give you a similar result as above.

With this, I intend to finish off this post. Creation of an index, insertion of data and a simple query is what I wanted to demonstrate.

In the next post, I intend to create a simple full-text-search-engine for a small website.

So long …..