Saturday, August 3, 2013

How To Write A Multi Threaded Web Service Client to Upload Files To an Entity Extraction Web Service

Recently Rosoka Cloud was made available on the Amazon AWS Marketplace providing
enexpensive and easy access to high quality Entity and Relationship Extraction on an as needed bases.

You can use the drag and drop features, or the file by file or upload, or you can us a simple client to perform batch processing for either your research or production needs.  Here is an example code to be able to upload files to the web service.  The example code will recursively loop through all the files and files in sub-directories to send them to upload them to the web service for processing.  The program is multithreaded to take advantage of the multithreading in the web service itself and eliminate time waiting for the uploads if a single thread used.  The RosokaCloud web service itself processes the uploaded document to identify languages used in the document, perform entity extraction and entity relationship extraction.  The client program can choose between JSON or XML output.

The following is the main program. It has been simplified to for clarity in understanding the upload process. (Bandwidth, number of CPU, memory, disk IO speeds and other factors should be considered in setting the number of threads used).



package rosokacloudsampleclient;

import com.imt.rosokacloadclient.RosokaCloudClientBase;
import com.imt.rosokacloadclient.RosokaCloudClientThread;
import java.io.File;
import java.io.IOException;
import java.net.URI;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class ProcessFileDirectoryThreaded {

    private static RosokaCloudClientBase rcc = new RosokaCloudClientBase();
    private static ExecutorService executor = null;
    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) throws IOException {
        // set the number of threads to allow to run at one time
        executor = Executors.newFixedThreadPool(4);
        
        // where to read from
        String rootpath="../mydocs";
        // where to 
        String outtpath="./outdir";
        
        // for testing.
        URI baseURI = RosokaCloudClientBase.getBaseURI();
        // for server
        //rcc.setBaseURI("replacewith server URL");
       
        File sf=new File(rootpath);
        RosokaCloudClientBase.setRootpath(sf.getAbsolutePath());
        File of=new File(outtpath);
        RosokaCloudClientBase.setOutpath(of.getAbsolutePath());
        
        
        visitAllFiles(sf);

        System.out.println("All submited");
        executor.shutdown();
    }

    // Process only files under dir
    public static void visitAllFiles(File dir) {
        if (dir.isHidden()) {
            return;
        }
        if (dir.isDirectory()) {
            String[] children = dir.list();
            for (int i = 0; i < children.length; i++) {
                visitAllFiles(new File(dir, children[i]));
            }
        } else {
            process(dir);
        }

    }

    private static void process(File dir) {
        RosokaCloudClientThread rcct = new RosokaCloudClientThread(dir);
        
        executor.submit(rcct);
        // rcc.proscessFileRosokaCloud(dir);
    }
}
And the Thread code:

/**
 * This code provide a sample java client for interfacing with the
 * Rosoka-Cloud™ web services
 *
 * @author IMT Holdings Corp.
 */
public class RosokaCloudClientThread  implements Runnable{
    private File inputfile;
    
    public RosokaCloudClientThread(File inputfile) {
        this.inputfile=inputfile;
    }

    /**
     * This method sill send a file to the web service to process. The web
     * service will return processed results as JSON.
     *
     * @param baseURI URI of the web service
     * @param inputstring string to process
     */
    public void run() {
        try {
            ClientConfig config = new DefaultClientConfig();
            Client client = Client.create(config);
            WebResource service = client.resource(RosokaCloudClientBase.getBaseURI()  );

            // fill in the form with what to process.  "file" for a file
            FormDataMultiPart mform = new FormDataMultiPart().field("file", 
                    inputfile, MediaType.MULTIPART_FORM_DATA_TYPE);
            // set type of response, this gives xml but you can get json with "application/json" 
            mform.field("responseType", "application/xml");

            // make the call and get the response.
            ClientResponse response = service.path("rosoka").
                    type(MediaType.MULTIPART_FORM_DATA).post(ClientResponse.class, mform);

            // for the example only print out the results as a string
            String inpath = inputfile.getCanonicalPath();
            String endpath = inpath.substring(RosokaCloudClientBase.getRootpathlength());
            String outp =RosokaCloudClientBase.getOutpath()+ endpath + "_OUT.xml";
            System.out.println("writing: "+endpath);
            File dc=new File(outp);
            File parentFile = dc.getParentFile();
            if(!parentFile.exists()){
                parentFile.mkdirs();
            }
            Writer out = new BufferedWriter(new OutputStreamWriter(
                    new FileOutputStream(outp), "UTF-8"));
            try {
                out.write(response.getEntity(String.class));
            } finally {
                out.close();
            }
            //       System.out.println(response.getEntity(String.class));
            // You can use the XSD of the web site to marshal and unmarshal results 
            // for manipulattion.
        } catch (IOException ex) {
            Logger.getLogger(RosokaCloudClientThread.class.getName()).log(Level.SEVERE, null, ex);
        }

    }
}

1 comment:

  1. I blog frequently and I seriously thank you for your content.
    This article has really peaked my interest.
    I am going to book mark your blog and keep checking
    for new details about once per week. I opted in for your RSS feed too.


    Check out my weblog: microsoft points code generator

    ReplyDelete