Saturday, July 17, 2010

XMLCheatSheet

URI:
Uniform Resource Identifiers.
Describe all points (locations) in the information space, even
those that do not have a physical presence. Consist of:
  • schema;
    http,
    ftp,
    file,
    mailto,
    imap,
    https
  • schema-specific
    part:
    all that goes behind the colon
  • URL:
    Uniform Resource Locators.
    Type of URI. Consist of:
  • schema:
    protocol, e.g.,
    http
  • server:
    e.g., www.w3.org
  • path;
    e.g., projects/project1
  • URN:
    Uniform Resource Names. Type
    of URI. They are pointers to resources, but without reference to
    particular locations.
  • schema:
    urn
XML:
Extensible Markup Language.
Framework for defining markup languages. Inherently
internationalized

all xml documents are written in the Unicode alphabet.
Root node: the tag (conceptual object) at the top.
Root element: all the information contained in the children of the root node.
Text node: node that contain only text. It has no children (it’s the text itself, not the tags
it’s between)
This text is also called character
data.
Attribute node: pair or name and value associated with an element
node
(not only tags are nodes)
Element: logical grouping of
the information represented by its descendants.
XML parser: tool that constructs a tree representation of a textual XML document.
XML serializer: tool that constructs an XML document from a tree.
Namespaces: used for solving name clashes.
They can be given shorter names using namespace declaration, e. g.:
<… xmls: foo=“http://www.w3.org/pjts”>
URLs are used because they are unique (an only the owner of a domain would use it).
XPath:
language for navigating xml trees.
An XPath location path (expression) evaluates to a sequence of nodes of a specific tree. It is built as a sequence of location steps, each step separated from the previous one by /. Each step consists of:
  • Axis:
    keyword that indicates the node or nodes we are looking for in
    relation to the node test.
    child,
    descendant, parent, attribute, …
  • Node
    test
    : specifies a node by name or a type of nodes by their
    properties.
    • text()
    • comment()
    • the name of the node (the tag)
    • node()
  • Predicates
    (not necessary): Boolean conditions for selecting, or not, the
    nodes. Written in
    […].
Schema:
formal definition of the syntax of an XML-based language.
Schema language: formal language for expressing schemas.
Schema processor: implementation of a schema language that checks if a document is valid (syntactically correct according to the
schema).
Schema languages:
  • DTD:
    Document Type Definition.
    First schema language. Not written in XML. Does not support namespaces.
  • Document type declarations: reference to the DTD schema:
root SYSTEMURI
  • Element declarations
    element-name content-model
Content models:
EMPTY,ANY,#PCDATA


  • Attribute list declarations
    element-name attribute- definitions



  • Each attribute definition has the form:

att-name
att-type att-declaration

Attribute types: CDATA,NMTOKEN,ID,IDREF.

Declaration types: #REQUIRED, #IMPLIED,value, #FIXED “value.

  • XML Schema: official schema language written in Xml. Main constructs:
    • Simple type definition: describes text without markup (character
      data + attributes).
    • Complex type definition: describes text that may contain markup (elements + attributes + character data).
    • Element declarations: associates an element name with a simple
      type or with a complex type.
    • Attribute declaration: associates an attribute name with a simple
      type.
- The root element contains an attribute targetNamespace that indicates the namespace being described.

- A document can point to the schema with a schemaLocation attribute in the root.

XSL:
Extensible Stylesheet Language.
Language for specifying presentations of XML documents. Components:
  • XSLT:
    XSL Transformations. Language
    for specifying transformations between XML languages.
  • XSL-FO:
    XSL Formatting Objects.
    Target language for specifying physical layout.

XQuery:
language to query XML documents in a similar way to SQL. It’s an extension of XPath.





DOM:
Document Object Model. API
that allow us to parse, navigate, manipulate, and serialize a XML document. It’s common to all languages and thus very general and complex.
Methods:
parentNode,
previousSibling, nextSibling, firstChild, childNodes,
getAttributeNode, attributes
Interfaces:
Node, Element, Attr, Text, DocumentType,
Notation, Entity, EntityReference, CharacterData,
ProcessingInstruction, CDATASection, Comment, NodeList,
NamedNodeMap

JDOM:
DOM for Java. It’s an API for XML that is specific to Java, and thus easier to use.
Interfaces: Parent.
Abstract classes: Content.
Classes: Comment, DocType, Element, EntityRef, ProcessingInstruction, Text, Document, CDATA.
Methods:
getContent, getNamespace, getDescendants, getAttributeValue,
setAttribute, getChildren, getName
JDOM has built in support for evaluating XPath expressions and for performing XSLT transformations.
JDOM doesn’t have its own parser, so it uses the DOM parser or the SAX parser.
SAX:
Simple API for XML. Framework
for streaming XML. It views an XML document as a stream of events.
It calls the appropriate method when it reads through an event (document starts, start tag encountered…).
The DefaultHandler class provides empty implementations of all possible event handlers. We must extend it and override its methods.
SAX Filters: events handlers that may act upon the various events and send them on to a parent handler (similar to UNIX
pipes).
XML Data Binding: mapping schemas into collections of classes = generating a group of classes from a schema.
JAXB:
Java Architecture for XML Binding.

Sorting Algorithms

Bubble Sort

On each pass, successive neighboring pairs are compared. If a pair is in decreasing order, its values are swapped.

This process continues until all elements are sorted.

Best-time: O(n) (1 pass only – if they are all ordered)

Worst-time: O(n^2)

Selection Sort

Finds the largest number in the list and places it last. Then it finds the second largest and places it next to the last, and so on.

Worst-time: O(n^2)

Insertion Sort

From left to right it identifies a list with a number of elements that increases by one in each pass. It starts with the first element in the list.

It looks at the new element it’s considering and inserts it in the right place in the imaginary sublist.

Worst-time: O(n^2)

Merge Sort

Divides the array into two and applies merge sort recursively.

Average-time: O(n log(n))

Worst-time: O(n log(n))

Quick Sort

First the algorithm selects an element in the array to use as pivot.

Then it divides the array into two parts such that all the elements in the first part are less than or equal to the pivot and all the elements in the second part are greater than the pivot.

To do this it searches for the first element from the left forward that is greater than the pivot, then searches for the first element from the right backwards that is less than or equal to the pivot. Then it swaps them.

Then it does the same thing in each halve.

It does this in the array itself, so there’s not need to merge them afterwards.

Average-time: O(n log(n))

Worst-time: O(n^2)

Heap Sort

1. Creates a heap object

2. Add all elements to the heap

3. Removes all elements form the heap

Time: O(n log(n))

Bucket Sort

Each bucket holds the elements with the same key value.

Radix Sort

Divides the keys into subgroups based on their radix position.

Emacs Basic Commands

C-g cancel partially typed or accidental command
CTRL-x u undo the last change
C-x C-s save-buffer
C-x C-c save-buffers-kill-emacs
======
CTRL-k Copy line to end (kill line to end)
CTRL-y re-insert ('yank') the last text that was killed
======
C-x 2 Split the selected window into two windows, one above the other (split-window-vertically).
C-x 5 Split the selected window into two windows positioned side by side (split-window-horizontally).
======
C-x o Select another window (other-window). That is o, not zero.
C-M-v Scroll the next window (scroll-other-window).
C-x C-f open a new file
======
C-v scroll-down
ESC-v scroll-up
ESC < beginning-of-buffer
ESC > end-of-buffer
======
C-s Incremental search forward (isearch-forward).
C-r Incremental search backward (isearch-backward).

Enterprise Design Patterns

DOMAIN LOGIC
Transaction Script: organizes business logic by procedures where each procedure handles a single request for the presentation

  • Domain logic primarily organized by the transactions that you carry out with the system
  • You can either have several transaction scripts in one class, or each in its own class (using the Command pattern: a command object encapsulates an action and its parameters)

Domain Model: an object model of the domain that incorporates both behavior and data

  • A simple domain model looks very much like the database design with mostly one domain object for each DB table

Table Module: a single instance that handles the business logic for all rows in a database table or view

It organizes domain logic with one class per table in the database, and a single instance of a class contains the various procedures that will act on data


The primary distinction with “Domain Model” is that, if you have many orders, a “Domain Model” will have one order object per order while a “Table Module” will have one object to handle all orders

Table Module has no notion of an identity for the objects it’s working with

Table Module may include queries as factory methods. The alternative is a “Table Data Gateway”

Service Layer: defines an application’s boundary with a layer of services that establishes a set of available operations and coordinates the application’s response in each operation

A service layer defines an application’s boundary and its set of available operations from the perspective of interfacing client layers

You need it in applications with more than one kind of client of its business logic.

MAPPING TO RELATIONAL DATABASES

Gateway: an object that encapsulates access to an external system or resource

Row Data Gateway: an object that acts as a Gateway to a single record in a data source. There is one instance per row

- You have an instance for each row that’s returned by a query

- You face the issue of where to put the find operations. It often makes sense to have separate finder objects

- It’s sometimes difficult to tell the difference between a “Row Data Gateway” and an “Active Record” => if there’s any domain logic present, then it’s an “Active Record”

- Most often used with “Transaction Script”

Table Data Gateway: an object that acts as a Gateway to a database table. One instance handles all the rows in the table

- Single object for each table in the DB => if you use recordset

- Goes very well with table module

(Record Set: “A recordset is a data structure that consists of a group of database records, and can either come from a base table or as the result of a query to the table.”)

Active Record: an object that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data

It can have static find methods.

Any business logic sits directly in it.

Data Mapper: A layer of mappers that moves data between objects and a database while keeping them independent of each other and the mapper itself

- Makes your indirection layer entirely responsible for the mapping between domain objects and database tables

- It uses “Identity Map” to see if a class is already loaded.

---
Summary: Gateway not recommended. If domain logic is simple and there’s a close correspondence between classes and tables =>Active Record. Something more complicated > Data Mapper.
---

OBJECT-RELATIONAL BEHAVIORAL PATTERNS

As you read objects and modify them, you have to ensure that the DB state you are working with stays consistent. If you read some objects, it’s important to ensure that the reading is isolated so that no other process changes any of the objects you’ve read while you are working on them.

Unit of Work: maintains a list of objects affected by a business transaction and coordinates the writing out of changes and the resolution of concurrency problems

As soon as you start doing something that may affect a database, you create a Unit of Work to keep track of changes.

Keeps track of all objects read form the database, together will all objects modified in any way. It also handles how updates are made to the DB. Instead of the app programmer invoking explicit save methods, the programmer tells the unit of work to commit. The unit of work is like a controller of the database mapping.

A Unit of Work keeps track of everything you do during a business transaction that can affect the database. When you’re done, it figures out everything hat needs to be done to alter the database as a result of your work.

- With caller registration, the user of an object has to remember to register the object with the Unit of work for changes

- With object registration, loading an object form the database registers the object as clean; the setting methods register the object as dirty. (The usual trick here is to place registration methods in object methods)

- With a unit of work controller, rather than marking objects as dirty, the Unit of Work takes a copy at read time and then compares the object at commit time

Identity Map: ensures that each object gets loaded only once by keeping every loaded object in a map. Looks up objects using the map when referring to them

In a simple case, with an isomorphic schema, you’ll have one map per DB table.

You need to ensure that each session gets its own instance that’s isolated from any other session instance. Thus you need to put the identity map on a session-specific object.

Lazy Load: an object that doesn’t contain all of the data you need but knows how to get it

For loading data form a DB into memory it’s handy to design things so that as you load an object of interest you also load the objects that are related to it. However, loading one object can have the effect of loading a huge number of related objects. A Lazy Load interrupts this loading process, leaving a marker in the object structure so that if the data is needed it can be loaded only when it is used.

- Lazy initialization: every access to the field checks first to see if it’s null. If so, it calculates the value of the field before returning the field. (No good if null is a valid value for that field)

- Virtual proxy: it’s an object that looks like the object that should be in the field, but doesn’t actually contain anything. Only when one of its methods is called does it load the correct object from the DB

- Value holder: object that wraps some other object. To get the underlying object you ask the value holder for its value, but only on the first access does it pull the data from the DB

- Ghost: is the real object in a partial state. When you load the object from the DB it contains just its ID. Whenever you try to access a field it loads its full state

===============

Dependency Injection (DI) in computer programming refers to the process of supplying an external dependency to a software component. It is a specific form of inversion of control where the concern being inverted is the process of obtaining the needed dependency.

When the dependency injection technique is used to decouple high-level modules from low-level services, the resulting design guideline is called the Dependency inversion principle.


Dependency lookup: We avoid hard-coding the names of classes on which we depend into our code because static binding severely limits our options for how the software is configured as it runs. Instead, we hard-code that name of a "component broker" that returns to us a ready to use object. The component broker provides some means for the client software or perhaps a system configuration manager to tell the SUT in question what objects to use for each component request.

Networking

Subnetting
It’s objective is to reduce the total number of network numbers that are assigned.

The idea is to take one IP address and allocate in it several physical networks, now referred to as subnets. The subnets should be close to each other, because at a distant point in the Internet, they will all look like a single network, having only one network number between them.

They will all be part of the same big network, but inside they are a bunch of smaller networks. They all actually share the big network IP, but the mask allows us to identify them as subnets.

Now we need a subnet mask that will be used by a router to forward to the right subnet. The forwarding table will now contain .


CIDR: Classless InterDomain Routing
The problem with subnetting is that the address within which we have subnets has to be big, and it probably has inefficiencies. To solve these inefficiencies we would have to give each subnet its own lower class IP address. But then the backbone router would need to have one entry for each (with subnetting, all the subnets look like one net from the distance, and it’s the router at the entry the one in charge of sending a packet to the right subnet).

To avoid having inefficiencies (and running out of IP addresses) and having too many entries in the backbone router’s table, CIDR lets us use a single entry in a table for a lot of different networks.

If we have an AS with 16 class C networks, instead of handing out 16 class Cs at random, we give it 16 continuous class C addresses. They will now have the same top 20 bits. This can be the network number for all of them.

We need a new notation; the 20 bits prefix for all the networks 192.4.16 though 192.4.31 is represented as 192.4.16/20 (the 20 bits that define the network). To represent a single class C (24 bits long) we would sue 192.4.16/24.

Longest Match rule


BGP: Border Gateway Protocol
BGP assumes that the Internet is an arbitrarily connected set of ASs (Autonomous Systems). The goals of BGP are first to find some path, second to make sure the path is compliant with the policies of the various ASs.

BGP advertises complete paths as an enumerated list of ASs to reach a particular network.


Remote Procedure Call
A client sends a request message to a server, and the server responds with a reply message, with the client blocking while waiting.

Functions that must be performed by any RPC protocol (and that distinguish it from UDP?):

* Provide a name space for uniquely identifying the procedure to be called (could be implemented by defining a set of fields in the request message format)

* Match each reply message to the corresponding request message (implemented by uniquely identifying request-replies pairs using a message ID field)


TCP Congestion Control
The idea is for each source to determine how much capacity is available in the network. TCP makes use of three mechanisms for this:

1. Additive Increase/Multiplicative Decrease

It uses a new variable: Congestion Window (cwnd)

* Additive Increase: each time we get an ACK we increment the cwnd (by 1, for example)
* Multiplicative Decrease: each time a timeout occurs we cut cwnd in half

2. Slow Start (slow in comparison with initial TCP behavior, not additive increase)

Increases the cwnd rapidly from a cold start (exponentially, rather than linearly). Still uses multiplicative decrease on timeout.

3. Fast Retransmit and Fast Recovery

* Fast Retransmit: when the receiver gets a packet out of order – earlier data did not arrive – it resends the same ACK that it sent last time. After 3 duplicate ACKs, the sender retransmits the packet.

* Fast Recovery: removes the slow start phase that happens between when fast retransmit detects a lost packet and additive increase begins. Instead of starting from 1 when a timeout occurs, it starts by the last cwnd divided by 2.


TCP Tahoe
Includes all the mentioned congestion control mechanisms, except fast recovery.

TCP Reno
Includes all of them, plus an optimization known as header prediction (optimizing for the common case that segments arrive in order).

TCP Vegas
The end hosts try to predict congestion by looking at changes in the sending rate. It compares the measured throughput rate with the expected one.

Transfer JSON encoded files in PHP with CURL

CLIENT (makes request)
  • How do we transform a PHP object into JSON?
          $doc = json_encode($request) -> returns a string containing the JSON representation of $request.
  • How do we use the POST method to reach a page in a different server (plus there is no form in the file that builds the Json file)?
          We use CURL to transfer the file using POST (POST is defined in the options)

          curl_setopt($curl_handle, CURLOPT_POST, TRUE);

          curl_setopt($curl_handle, CURLOPT_POSTFIELDS, $doc);
  • How do we get the response that the server returns (saying if the request was successful)
          When we execute the transfer with cURL we get a response:

            $buffer = curl_exec($curl_handle);

           $response = json_decode($buffer);
  • Why is the response a JSON doc? --> Because that’s how the server built it
  • Steps:
  1. Create the object (associative array) in PHP
  2. Encode it into a JSON document:   $doc = json_encode($request);
  3. Transmit it using cURL:                 $buffer = curl_exec($curl_handle);

SERVER (processes the request, returns response)
  • Since we don’t have the $_SERVER variable, how do we get the file that is being transferred by cURL?
           We use $doc = file_get_contents(‘php://input’)
    • php://input allows you to read raw POST data
    • file_get_contents reads an entire file into a string
  • How do transform the string into something that we can process with PHP?
    • json_decode($doc) takes a JSON encoded string and converts it into a PHP variable
    • How do we transmit the response:
      • echo json_encode($response);

    What’s different in JavaScript?

    1. It is case sensitive.

    2. JavaScript is a loosely typed language (we don’t need to define the type of the variables).

    3. Variables are declared with the keyword “var”.

    4. Unlike C, C++, and Java, JavaScript does not have block-level scope. All variables declared in a function, no matter where, are defined throughout the function.

    var scope = “global”;

    function f(){
       alert(scope); //Displays “undefined”, not “global”*
       var scope = “local”;
       alert(scope); //Displays “local”
    {

    5. When you declare a global variable you are actually doing is defining a property of the global object.

    6. In client-side JS, the window object serves as the global object.

    7. Objects map property names to arbitrary property values

    8. In JavaScript, a function is a special kind of object that has executable code associated with it. Since functions are objects, they have properties and methods.

    9. JavaScript supports function literals. They are just like other function, except that they do not have a name. They can appear within other JavaScript functions:

    var square = function(x) { return x*x; }

    10. Functions are not only syntax, but also data, which means that they can be assigned to variables, stored in the properties of objects or the elements of arrays, returned from functions, passes as arguments to functions, etc.

    Functions can be assigned to an object’s properties. Example:

    var dog;

    dog.color = function() { … }

    In fact, a method is nothing more than a function that is stored in a property of an object, and invoked through that object.

    When a function is invoked as a function rather than as a method, the “this” keyword refers to the global object.

    11. JS functions can be written so that they work with any number of arguments (p. 129).

    12. Functions run in the scope in which they are defined, not the scope from which they are executed (p. 141).

    13. Functions are a combination of code to be executed and the scope in which to execute them. This is known as “closure”.

    14. JS is a true object-oriented language that uses prototype-based inheritance instead of class-based inheritance.

    15. Objects in JS may have ant number of properties, and properties may be dynamically added to an object.

    16. Objects are usually created with the keyword “new”.

    17. It supports an object literal syntax that allows you to create an object and specify its properties like this (no constructor needed):

    var point = {x:3, y:4};

    18. Primitive types are manipulated by value, and reference types (objects – including arrays and functions) are manipulated by reference.

    19. Strings are immutable.

    20. The power of client-side JavaScript is that it has access to the contents of the HTML document.

    Wednesday, July 14, 2010

    Java Collections Cheatsheet

    Main operations: Insert, Search, Delete, Iterate (visit all the items in a collection).

    Java Collections Framework provides three major types of collections: set, list and map.

    They are defined in the interface Set, List and Map.

    Set
    : no repeated elements.
    List
    : elements stored in insertion order (but we can also insert elements anywhere in the list).
    Map
    : stores key-value pairs.
    Tree
    : elements stored based in their natural order.


    CONCEPTS

    • Array: Only built-in structure. Fixed size. Quick insertion (at the end). Slow search (unless we know the element’s index). Slow deletion.
    • Stack: Last-In First-Out (LIFO). Slow access to other items (except last one).
    • Queue: First-In First-Out (FIFO). Slow access to other items (except first one).

    (Stacks and Queues are usually implemented by arrays or linked-lists with the constrains of FIFO or LIFO. There is no need for a custom implementation in the Java Collections Framework).

    • Linked-list: The list contains only a reference to the first item. Each item contains a reference to the next one. Quick insertion (at the beginning). Slow deletion (except at the beginning). Slow search.
    • Set: Group of elements with no duplicates. Usually implemented by a HashTable (=Map) where the keys have no meaning, or a Tree.
    • Map: collection of key-value pairs. Usually implemented by HashTable (Array + LinkedLists, for example). Quick insertion. Quick search if key is known. Slow deletion. Inefficient memory usage (good if there won’t be too many deletions).
    • Binary Tree: Only two children per node. The tree only contains a reference to the root (the first node). Each node contains a reference to its left and right nodes.
    • Binary Search Tree: the nodes are ordered by key value. When we insert an item, the tree looks for the right position to insert it. Quick search, insertion, deletion (it tree is balanced). It might become unbalance if the elements are inserted in order.
    • Red-Black Tree: balanced Binary Search Tree.


    IMPLEMENTATIONS (in Java)

    Interfaces

    • Collection
    • List: Collection whose elements can be traversed in insertion order.
    • Queue: Collection that maintains its elements in processing order.
    • Set: Collection that cannot contain duplicate objects.
    • SortedSet: Set whose elements are ordered. => Deprecated. Replaced by NavigableSet
    • Map: Collection of key-value pairs. Each key maps to at most one value (does not allow duplicate keys). Both the keys and the values must be objects, so primitive values will be wrapped.
    • SortedMap: Map whose keys are ordered. => Deprecated. Replaced by NavigableMap

    Classes

    Implement List:

    • LinkedList: elements will be traversed in insertion order. Elements can be accessed, added, and removed efficiently at either end of the list (it’s a doubly linked list). It also implements Queue and Deque.
    • ArrayList: elements will be traversed in insertion order. Elements can be accessed efficiently by index (unlike in LinkedList).
    • Vector: elements will be traversed in insertion order. Resizable array. Synchronized.

    Implement Queue (elements in processing order):

    • PriorityQueue: elements will be accessed according to priority order.
    • ArrayDeque: elements will be accessed according to either FIFO or LIFO processing order. It also implements Dequeue.

    Implement Set (unique elements):

    • HashSet: no particular order. It’s a HashTable with no duplicates (we don’t know and don’t care about the key that maps to each value).
    • LinkedHashSet: + elements traversed in insertion order.
    • TreeSet: elements will be traversed in the order specified by the compareTo method of the elements = element order. It uses a balanced binary tree à all operations are efficient. It also implements NavigableSet.

    Implement Map (unique keys):

    • HashMap: no particular order. HashTable where each key maps to a value (we can retrieve both keys and values).
    • LinkedHashMap: + elements will be traversed in key insertion order.
    • IdentityHashMap: + compares keys using == instead of the equals method.
    • HashTable: no particular order. Synchronized.
    • TreeMap: elements will be traversed in the order specified by the compareTo method of the keys = key order. It uses a balanced binary tree à all operations are efficient. It also implements NavigableMap.

    If the elements have hashCode method, they can be used as HashSet elements or HashMap keys.

    If the elements have a compareTo method, they can be used as TreeSet elements or TreeMap keys.

    Do software patents help or harm progress?

    The following is a paper I wrote for my Computer Ethics class during my master's at LUC.

    -----------------------------------------------------------------------------------------------------------------

    The intention of the patent system is to protect the small inventor and thus promote invention. Unfortunately, in the case of software patents, it doesn't work that way. Small inventors, as we'll see, get more harm than good from the current system of software patents, and innovation is hampered instead of promoted.

    The first problem in software patents is that it is not about patenting a single implementation of an idea (like in the Pharmaceutical industry, where a patent is granted for one chemical formula), it is about patenting the idea itself. And the patent grants a monopoly over the use of that idea for a long period of time. As anybody familiar with software development knows, a program is usually not based in any one single idea, but in a compound of ideas, most of them already in use.

    The second problem is the way in which the Patent Office bases its decision for granting a patent in this area. Firstly, they try to find out if there is any prior art regarding the patent application. But this is a very difficult task with software patents because a lot of prior art is in the form or books, which are not included in their database, or it is just part of the industry's "folklore", which means it's not written anywhere but everybody knows about it. Secondly, they consider if the invention to be patented is non-obvious, which is something very subjective, and difficult to determine by a person who is not a professional in the software industry.

    The third problem, and maybe the most important one, is that it is almost impossible to know how many patents may cover a program, due to the great number of them, the obscure language they are written on, the lack of a good classification, and the fact that patent applications that are pending are secret (but retroactive). Small companies can easily infringe someone else’s patent without even knowing it. Trying to find out if a program infringes any patents is costly, and the legal process of defending yourself when being sued for patent infringement is event costlier. Most small and medium companies cannot afford it.

    If a company wants to use an idea that has been patented from someone else, they can try to license it. The owner of the patent does not have the obligation to license it, though, even if he is not using it. And when the owner does license the patent, it is usually for a sum around 5% of gross sales of the software product. This is already a very high sum, which might make an inventor think it twice before deciding to commercialize a software product, but if the program needs more than one license it’s almost impossible to get it to the market. The profit margin is gone.

    Big companies can avoid the hamper to progress that software patents are thanks to cross-licensing, but small companies and inventors cannot. They don’t own a sufficiently large number of licenses to force big companies to cross-license with them when they want to use a patent that a large company owns. On the other hand, it is very difficult that these small inventors can benefit from patents, because their programs probably infringe the patents of big companies, who can force them to cross-license with them. The only case in with a small company cannot be forced to cross-license is when this company is patent troll.

    As we can see, the group that is supposed to be benefited by the software patent system, small inventors, is not benefited at all. They don’t have enough resources to participate in the cross-licensing game, to confront a claim of patent infringement, or to make an exhaustive research to make sure they don’t infringe anybody else’s patent.

    Big companies are more benefited than harmed. They can force small companies to cross-license with them, thus being able to use their ideas at a very low cost, and is not usually a problem for them if the idea has been patented by another big company, because they can also cross-license. Unfortunately, this does not mean that their investment in Research and Development will be higher; big companies already own vast patent pools that they can use as leverage.

    One last issue to consider is that the investment in software development is not that high that it is necessary to protect the idea for long time to let the creator recover the losses, or they would not invent any more. Software development has the unique quality of requiring very low investment to produce and commercialize.

    There are two groups that are clearly benefited from the current patent system: patent trolls, and patent lawyers. Nevertheless, these are not the groups that the system intended to protect, and their contribution to innovation is close to null.

    Finally, we have the general public. This group is the biggest one, and we think it would be benefited by the modification or suppression of the current patent system. With the current system the offer of programs is more limited than if patents didn't exist, and thus prices are higher. If there were no software patents, there would be a lot more of cheaper high quality software that would solve many of the issues that companies and individuals face in different areas of their lives.

    From a utilitarian perspective we can conclude that the way in which the current system is implemented in the US is not good. The disadvantages are bigger than the advantages.

    One solution would be to get rid of software patents completely. Another solution is the one proposed by John Preston, President and Chief Executive Officer of Continuum Energy Technologies LLC, and Senior Lecturer at the Massachusetts Institute of Technology, in which the patent holder has to license its patent for a pre-established fee. This would eliminate the advantages that big companies, with big patent pools, have over small ones.

    There are also efforts that seem to go in the direction of helping the patenting process. For example, Google is scanning millions of books, which might help in the search of previous art. They also created a patent search engine, which might help inventors finding those that cover their program. Nevertheless, these are limited solutions and a change in the administration's policies is needed to get to a system that is fair for everybody, both big and small inventors, as well as the general public. Our society can greatly benefit from the advancements in software, and software has the unique qualities of zero cost of reproduction, and zero barriers to creativity, that could make for an exponential development of the world we live in.