Friday, July 23, 2010

Misplaced Modifiers

A cow milking machineImage via Wikipedia

Misplaced modifiers create interesting problems in computational linguistics.
Looking at the following four line story illustrates the problem.

Man milks cow with machine.
Man attacks cow with ax.
Police hit man with nightstick.
Police arrest man with ax.

Subject,verb, object, modifier. Is the structure is the same for all three sentences. Yet a person has very little trouble interpreting the correct meaning.
So, what was the cow doing with the ax?
1) The man would have used a miking machine to milk the cow with.
2) The man used the ax to attack the cow.
3) The police used a nightstick to stop and disarm the man who had an ax.
Why would the police use an ax to arrest a man?
4) The police arrested a man who had an ax.

Typing the four line story in Microsoft Word does not result in an of the "green squiggly" lines to let an author know that there is a misplaced modifier used in any of the sentences.
Does Microsoft Word need a better grammar check? Not necessarily, although it would be nice, since most readers would interpret the meaning correctly, and the average writer would never finish writing if they had to correct absolutely everything.

Yet if the computational linguistics is trying to extract information out of the four line story then pure structure rules break down. Some cognitive linguistic processing is needed to get this simple example correct.

Interesting enough the Wikipedia seems to have misplaced it's misplaced modifiers by redirecting to "dangling modifiers".







Tuesday, June 22, 2010

Quantum Linguistics

David Hilbert, mathematicianImage via Wikipedia

The concept of Quantum Linguistics is a concept that is newly emerging in linguistic theory. Yes if you Google thiis you will find all sorts of articles on Neural Linguistic Programming and Quantum Linguistics, which is not the subject of this blog. This blog entry is dealing with computational linguistics

Modern approaches in computational linguistics using semantic analysis typically model words and their meanings by vectors. The common approaches using this type of vector representation are latent semantic analysis (LSA) or (LSI) Latent Semantic Indexing, probabilistic latent semantic analysis (pLSA), latent Dirichlet allocation, topic model, or word association space (WAS).

One of the leading approaches to semantic analysis is LSI which uses text co-occurrence matrices and data-analysis technique employing singular value decomposition (SVD). These types of procedures are fully automatic and allow us to analyze texts by computers without the involvement of any human understanding.

These and other achievements of LSA/LSI raise the question of its relevance for the problem of brain functioning and AI (yep like the other NLP). More interesting, is in similarities between LSA/LSI and formal structures of Quantum Information Theory (QIT).

LSA/LSI is essentially a Hilbert-space formalism. A series of words are represented by vectors spanning a finite-dimensional space and text passages are represented by linear combinations of such words, with appropriate weights related to the frequency of occurrence of the words in the text. Similarity of meaning is represented by scalar products between certain word-vectors (belonging to the so-called semantic space).

In Quantum Information Theory, words, also treated as vectors, are being processed by quantum algorithms or encoded/decoded by means of quantum cryptographic protocols. At first glance LSA/LSI is in this context a natural candidate as a starting point for `quantum linguistics'.

However, LSA/LSI has conceptual problems. The greatest difficulty of LSA/LSI is that it treats text as a `bag of words', a set where order is irrelevant. The difficulty is a serious one since it is intuitively clear that the syntax is important for the evaluation of text meaning. The sentences `Mary hit John' and `John hit Mary' cannot be distinguished by LSA; `Mary did hit John' and `John did not hit Mary' have practically identical LSA representations because `not' is in LSA a very short vector. What LSA/LSI can capture is that the sentences are about violence.

Since the LSI loses the significance of the sequence order the entanglements are also lost. Or the Bell entanglement is lost.

Fortunately there are other approaches that are Hilbert-space formalisms that can hold on to syntax relationship. These can provide huge steps forwards in computational linguistics.

Friday, June 4, 2010

Ripple Effect

labeled diagram of Acer E360 Socket 939 mother...Image of Foxconn motherboard via Wikipedia

What will the effect of the suicides at Foxconn have on the computer market?

Foxconn is the largest manufacturer of electronics and computer components worldwide and mainly manufactures on contract to other companies. Its clients include: Cisco, Apple Inc. (AAPL 255.20, -0.77, -0.30%) , Hewlett-Packard Co. (HPQ 45.86, -0.19, -0.41%) Dell Inc. (DELL 13.24, -0.52, -3.80%) Nokia Corp. (NOK 9.63, +0.05, +0.52%) , Nintendo Co. (JP:7974 26,650, -10.00, -0.04%) and Sony Corp. (SNE 29.07, -0.17, -0.58%).

Well the press won't be good, but they are a white label company, but not a lot of press about it will make it to the end users of their products. But some will. And none of the company that use their products will want to be associate with a "sweat shop." So there will be some action, because most of those companies will put some pressure on Foxconn for good PR. As a result of this pressure Foxconn is raising pay by 20%.

Usually costs are passed onto the consumer. But Foxconn will need good PR. So most of this increase will be swallowed in the short term by a reduced profit margin for Foxconn. And similarly, for the sake of avoiding bad PR most of Foxconn's customers will swallow the increased cost for the short term. But that is for the short term. More like a delay.

So increased cost will bounce against the supply and demand curves for the price that the market will bear. Given that huge markup on consumer electronics. The long term effect won't be very big. What?
The workers, work 12 hours a day assembling gadgets in return for about $300 a month. A small fraction of the cost of a single unit that they will have produced.

Before screaming boycott, remember that is better (actually high pay for the region) than the alternative of no pay for those workers. The ethics of worker pay is not always as easy as the knee jerk reaction. Hey, a 20% raise would make me a little happier.

Monday, May 10, 2010

Look and Feel

Google's Chrome Web BrowserImage by neo.wave via Flickr

New "look and feel" seems to be big these days. Lots of things have been getting a new look and feel lately. Firefox web browser. Java with Nimbus. Google with choices on the side. Yahoo's new layout.
For each of these how mush is new and how much new look and feel allowing the owner to customize.
This seem to be the latest thing in large companies that haven't added much in core functionality.
Yes once there isn't munch new capabilities to inspire costumers to pick their product over others, the next logical step seems to be to allow the costumer to tailor it to make it their own.
So can this be used as an indicater in investment of technology on where a product is in it's life cycle? Yes and no.
It is more an indication on how the company is investing in its technology.
Which would be an indication of the companies future success.

Wednesday, April 7, 2010

Unstructured Text

Figure 3-4: how data models deliver benefitImage via Wikipedia

Dealing with structured data is something that is pretty much understood these days. Dealing with unstructured data is something that at most is poorly dealt with and very underutilized.
First, considered how is unstructured data dealt with. Most of it is stuffed into a text index and then forgotten until someone does a key word search.
But the value of unstructured text becomes clear when you consider where it is used in day to day business. News articles for business opportunities is one thing. but look harder. Lets say look at the Toyota acceleration issue. How about if you could process customer complaints. Accident reports etc.
If you look in Wikipedia for unstructured data the article has "Merrill Lynch in 1998 cited estimates that as much as 80% of all potentially usable business information originates in unstructured form."
While there are tools for extracting information from free text (like ours) the point of this blog entry is to consider why the data isn't being processed. In considering this are several factors;
1) knowledge of the existence of the technology, 2) economic considerations, 3) knowledge that the data is there and can be put into a usable form. To go through these lets look at the Toyota acceleration.
Expecting the mechanical engineers responsible for designing the acceleration and breaking components used in a car probably knows very little about computational linguistics, and is not very likely to be keeping up with what the state of the art in technologies outside of their field of expertise.
Expecting the car company to make a large investment in computational linguistics to be able take their data and convert it into something that the mechanical engineers might find useful is a hard thing to do unless it can be turned can be shown as a direct return on investment. Additionally knowing that the data is embedded in service reports in the form of customer comments and the technician service reports.
On the other hand, the insurance company might have different point of view. Here is all the paper from filed claims. One of the fundamental jobs of the insurance companies is to take all that paper (unstructured data) and convert it into actuarial tables so that they can quickly adjust your rates for the type of car you own.
Now if the insurance company starts raising rates on that car model, then the economic significance becomes more important to the car company to get to that data first.
So, it becomes a matter of knowing the value of the data, and the cost to get it.

Friday, March 5, 2010

Wikipedia, Einstien and Darwin

Albert Einstein during a lecture in Vienna in ...Image via Wikipedia

Many people have commented on the decline of contributions to Wikipedia. To me the answer to this decline is rather simple.
At first the Wikipedia let pretty much anyone contribute. But it has become apparent that Wikipedia is pretty much limited itself to trying to be an encyclopedia complete with all works being able to site references.
So what is wrong with this. Nothing if your aim is to be an encyclopedia.
But their is a limit to this in that it does not allow for the creation and addition of new information, ideas and concepts.
So what is wrong with that? That is easy to show with a couple of historically significant works.
1. Charles Darwin "On the Origin of Species" was published without references.
2 Albert Einstein's papers had few or no references in many of his papers. For example "On the Electrodynamics of Moving Bodies" has footnotes but no references or citations. published in June 30, 1905 would be unexceptionable for Wikipedia. This paper was published as part of Einstein's 1923 book "The Principle of Relativity"

So much for the value of original thought.

Saturday, February 6, 2010

"Snowmaggeddon" and "Snowpocalypse"

The logo of the United States National Weather...

"Snowmaggeddon" as the President called it, and others "Snowpocalypse" in D.C. appropriate after the Presidents visit to Copenhagen.
Just Wondering what this does for the global warming data.

record event report
National Weather Service Baltimore MD/Washington DC
630 PM Sat Feb 06 2010

... Preliminary indications of two-day storm snowfall record exceeded
at Baltimore/Washington international Thurgood Marshall Airport...

At 4:54 PM EST this afternoon... a 24.8 inch two-day storm total
snowfall was estimated at Baltimore/Washington international
Thurgood Marshall Airport.

Preliminary indications are that this 24.8 inch two-day storm total
snowfall exceeds the previous two-day storm total snowfall record of
24.4 inches from 16-17 February 2003.

As with any major climate record achievement... this preliminary
record amount will be quality controlled by noaa's National climatic
data center over the next several weeks.

Saturday, January 30, 2010

Nimbus look and feel


Nimbus has been out for a while now. It has some great features. But why isn't it being used more?

Nimbus is the name of a look-and-feel designed by Sun for the Java Desktop System. It provides a great alternative for the Windows look and feel. But it provides more than that. It has a collection of features that allow developers to customize the look and feel of their products. Nimbus provides the ability to put custom skins on your product to help brand your product.

This is how is does that. All painting for components is done with simple stateless implementations of the Painter interface. These painters are stored in the UIDefaults table so they can be replaced if you would like to change the look of components or can be used in your own components if you would like to create a custom table header for example that looks the same as the standard Nimbus one plus something extra. All colors, icons and fonts are derived off UIDefaults keys so the whole UI can be customized by changing values in the UIDefaults tabl
All of the colors, fonts, icons, borders and painters are exposed through the UIDefaults table which means they are available to your 3rd party components to help you skin them in a Nimbus style.

So why isn't it being used more often given the amount of time that it has been out? There are two main reasons.
The first reason is the default file browser. Basically it is short. What this means to the user is that they have to scroll over all the time to be able to find anything in a directory of folder that has more than a handful of items in it. This is a rather annoying feature that would discourage any developer familiar with usability. So many developers will put their application look and feel back to the good old favorites hat they are familiar with.

The second reason is that the text objects all have the feature (or bug depending on point of view) that don't work as expected. The more troubling of these is the foreground, background, and highlighting behavior. Another reason for most developers to give up on Nimbus.
At present the best way to deal with this is a work around. The work around is basically to use set the text back to the old style handling.

To check to see if Nimbus is present and enable Nimbus if it is and then set the text panes so that they have the expected behavior you can use the following:
(Note: This example includes the text object decleration but this is usually done somewhere else in the IDE GUI build e.g. Netbeans

for (LookAndFeelInfo info : UIManager.getInstalledLookAndFeels()) {
if ("Nimbus".equals(info.getName())) {
try {
UIManager.setLookAndFeel(info.getClassName());
} catch (ClassNotFoundException ex) {
// your exception handling
} catch (InstantiationException ex) {
// your exception handling
} catch (IllegalAccessException ex) {
// your exception handling
} catch (UnsupportedLookAndFeelException ex) {
// your exception handling
}
break;
}
}

// this is usually done somewhere else in the IDE GUI build e.g. Netbeans
javax.swing.JTextPane DocViewTextPane = new javax.swing.JTextPane();

// set text pane so it will behave as expected.
DocViewTextPane.setUI(new javax.swing.plaf.basic.BasicEditorPaneUI());



Wednesday, January 13, 2010

Netbeans, Eclipse and Oracle Sun

Image representing NetBeans as depicted in Cru...Image via CrunchBase

What will happen to Netbeans with the Oracle acquisition of Sun? Once the merger happens I think it will become obvious surprisingly quickly.

Netbeans is the number one competition to Eclipse which is of course owned by IBM. With that said and the number one alternative to Eclipse, Oracle may go full guns on adoption and of course lots of new capabilities and features Netbeans would be available in short order. That would be my guess. One more straw to grasp from IBM.

On the other hand, Oracle also purchased BEA Weblogic which uses Eclipse as its IDE. That would mean a substantial cost to convert, as well as some alienation by developers un-wanting to make the switch to a new IDE no matter whose it was.

Yes there are all those other IDE's that Oracle also owns. But how far have they gone, not far. But Netbeans is a competitor.

You can read from developer comment to developer comment (only counting ones that have actually used current versions of both) I use Eclipse at work, because I have to, but I prefer and use Netbeans for personal projects. Yes I to have fallen into this category.

Why do I like Netbeans better? Its not a reason commonly given. The interaction with CM is much better! I have seen many numbers of my development team mangle or loose checkins, commits, and all sorts of other misshapes in Eclipse when dealing with CM repositories. To me that is loss productivity, and development costs. But that is a side issue to the article.

I think Oracle will be good for Netbeans as a serious competition to yet another IBM product. IBM the only commercial competitor to Oracles core business line.



Reblog this post [with Zemanta]

Wednesday, January 6, 2010

MySQL and Oracle and the EC

Screenshot of the MySQL administrator in Fedor...Image via Wikipedia

The review by the European Commission of the Oracle Sun merger just got a whole lot more interesting with the activities of one of the founders of MySQL. Michael "Monty" Widenius, the creator of MySQL stepped into the fray opposing the take over, delivering over 10,000 signature to the EC of those opposed.

I highly recommend reading Monty's blog at http://monty-says.blogspot.com/2009/12/help-saving-mysql.html

If you have code running on MySQL let the EC know just how much competition to Oracle there really is.

Regarding Sun's drop in 25% drop in revenue, just compare that to automotive and housing. But even more, compare that to the same industry. Among the top five server vendors, Dell was hit hardest, with quarterly server revenue tumbling 31.2 percent. Hewlett-Packard showed a 26.2 percent decline. Sun Microsystems watched its revenue dive 25 percent. IBM saw its sales drop 19.9 percent. Sales at Fujitsu/Fujitsu-Siemens fell 18.8 percent. Claims about merger uncertainty effecting sales seems to be way out of proportion.

Reblog this post [with Zemanta]

Thursday, December 24, 2009

2 feet of Global Warming

Single-stage Simplicity snow thrower in use on...Image via Wikipedia

With the record snowfall of 2 feet here in the Washington D.C. area I have developed my plan to help fight global warming and stimulate the economy. I am going to buy a snow blower before that big snowstorm in the mid west gets here.

I came up with this plan as I was shoveling the 2 feet of snow of my driveway and thinking about Copenhagen conference and Climategate. As I was shoveling I watch one of my neighbors taking her dog out for a walk. It was obvious I could blame her, yes apparently dogs put out 2.5 time the amount of CO2 than my SUV.
"We should always be suspicious of computer models, for while they look impressive, they may very well be constructed on mathematical models that the programmer assumes, but which may not be the ones that nature uses."

Reblog this post [with Zemanta]

Sunday, December 6, 2009

Java Examples

Computer directory listingImage via Wikipedia

Ever Google to figure out how to do something in java. Its always fun.
Try Googling:
file recursion java
you will get:
Results 1 - 10 of about 886,000 for file recursion java

A typical java subject search gives you everything you already know but not the one thing you need. You will get snippets of code. And often you will get Q/A threads asking how to do the same thing you are searching for with no reply. And you will see, copy after copy of it.

For those that found this page looking for java file recursion:

package blogsamples;

import java.io.File;
import java.io.IOException;
import java.util.logging.Level;
import java.util.logging.Logger;

/**
* File Recursioin example
*/
public class RecursiveFileListing {

/**
* @param args the command line arguments
*/
public static void main(String[] args) {
listAllFiles(new File(args[0]));
}
public static void listAllFiles(File dir) {
// if it is a directory get the contents.
if (dir.isDirectory()) {
String[] children = dir.list();
for (int i = 0; i < children.length; i++) {
// call itself
listAllFiles(new File(dir,children[i]));
}
} else {
// if it is a file print out the file.
try {
System.out.println(dir.getCanonicalPath());
} catch (IOException ex) {
Logger.getLogger(RecursiveFileListing.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
}




Reblog this post [with Zemanta]

Sunday, November 29, 2009

Global Warming Climate Hoax



You may want to re-evaluate where you are investing if you are into technology companies which are strongly tied to climate change technologies.

Yep Global Warming seems to be a hoax, or at least lots of the data is apparently faked, highly filtered and not standing up to direct review of the data (assuming that News about the hoax has even reached the New York Times.


While it may be possible for to ignore outlets such as Fox news, and talk shows on the subject when the articles about the financial impacts start appearing in the Wall Street Journal that is starting to have impact to investors. If that is enough to get your attention, then the lawyers that are lining up to filing suits should. There are suites against NASA Goddard, CRU and others in the works. Most of these have the freedom of information acts as opening salvo's, but it gets much worse than that.

At the heart of all this is what is meant by peer review. Problem appears to be who the "peers" are is a stacked deck. Appears that Phil Jones of the CRU and his colleague Michael Mann of Penn State where controlling who the peers are, and disallowing anyone not of like "opinion." To bad no one has access to the data for independent review, or at least till now. And it seems the warming is do to the books being cooked. To quote Mr Mann from his email "Perhaps we should encourage our colleagues in the climate research community to no longer submit to, or cite papers in, this journal. We would also need to consider what we tell or request of our more reasonable colleagues who currently sit on the editorial board." In other words, keep dissent out of the respected journals. When that fails, redefine what constitutes a respected journal to exclude any that publish inconvenient views. For those not familiar these are the groups responsible for the IPCC which is the U.N. To bad such notables as Freeman Dyson , Fred Singer, Ian Plimar, John Coleman, John Christy, Richard Lindzen, Henk Tennekes, and others are not considered peers. (A good summary of who all these experts are and their views is here.)


If you believe or not believe in global warming after this, you should still pay attention to the impact of the controversy on Cap and Trade Bill technology investments. It is very big business and huge impact to wealth redistribution if their is global warming. And climatologist are in a lot less demand if not.

If you can't make up your mind, go with energy technology that would still have value even without global warming, it will still have value it there is.

Reblog this post [with Zemanta]

Friday, November 20, 2009

European Commision and Oracle

Not looking so good for Oracle-Sun merger. The EC issued it's statement of objections, a formal sheet spelling out its concerns, to the merger. EC has to rule by mid-January whether it will clear or block the deal. The EC's deputy director general for mergers and antitrust, said at a Washington antitrust conference that no final decision had been made on the deal, and confirmed that the commission's key concern centered on Sun's open-source MySQL database software and its potential combination with software sold by Oracle.

A quick look at the MySQL case studies will show why Oracle would not want to give it up. Telephone companies (specially European ones), cable companies, craigslist, Yahoo, Ticketmaster are some of the more notable.

Reblog this post [with Zemanta]