Mango12 created an interesting competition to kick off the New Year, and I decided to try it out. It’s a simple task along the lines of a code kata, and I recommend you try it yourself before looking over my solution.
Task: Compress a list of words into the shortest string that contains all words.
Test: “testing”, “ginger”, “german”, “minutes” should become “minutestingingerman”
Here is my approach: create a weighted graph connecting each term then recursively reduce the highest weighted edges.
The test to see if it works begins with four terms.

Two different edges can be reduced. The test won’t be affected by which one I choose. If I optimize this in the future, it would be prudent to check the nodes and merge the two with no other good matches.

The choice is clear this time.

After the final merge, we end up with minutestingingerman.
I started off with a test to prove that it works.
[Test]
public void ShouldCompressListOfWordsToShortestStringThatContainsAllWords()
{
var terms = new[] {"testing", "ginger", "german", "minutes"};
StringGraph graph = new StringGraph(terms);
Assert.AreEqual("minutestingingerman", graph.Compress());
}
I then created a StringGraph class with weighted nodes. The Node class contain edges and a function to calculate weight when a node is added. In building the solution, I encountered a problem where I was mixing mutable and immutable methods. If I return to this in the future, I should decide which direction I want to go in. I decided to just make it work for the time being.
Since I have a myriad of classes for building the graph, I will focus on a few important points. The solution is available here if you’re interested in trying it out for yourself.
Calculating Weight
I created a Node class that contains two generics: one for the value type and another for the weight type. To create edges, nodes are added and the weight must be calculated at that time. I decided to make the weight calculation a function that can be assigned by creator. The default function will set the weight to the default of the weight type.
public class Node<T, TWeight> : INode<T, TWeight>
{
private List<IEdge<T, TWeight>> edges = new List<IEdge<T, TWeight>>();
Func<INode<T, TWeight>, INode<T, TWeight>, TWeight> calculateWeight;
public Func<INode<T, TWeight>, INode<T, TWeight>, TWeight>
CalculateWeight
{
get
{
if (calculateWeight == null)
{
calculateWeight = (n1, n2) => default(TWeight);
}
return calculateWeight;
}
set
{
calculateWeight = value;
}
}
public IEnumerable<IEdge<T, TWeight>> Edges
{
get
{
return edges;
}
private set
{
edges = new List<IEdge<T, TWeight>>(value);
}
}
public T Value { get; private set; }
public Node(T value)
{
this.Value = value;
}
public void Add(INode<T, TWeight> node)
{
var edge = new Edge<T, TWeight>(this, node,
CalculateWeight(this, node));
edges.Add(edge);
}
}
StringGraph assigns the function which checks for the overlap at the end of the parent node and the beginning of the child node.
calculateWeight = (node, newNode) =>
{
int maxLength = node.Value.Length < newNode.Value.Length ?
node.Value.Length : newNode.Value.Length;
for (int i = node.Value.Length - maxLength; i < maxLength; i++)
{
if (node.Value.EndsWith(
newNode.Value.Substring(0, maxLength - i)))
{
return maxLength - i;
}
}
return 0;
};
The values generated by this function correlate with the numbers in the graphs above.
Reducing the Graph
The Reduce method is meant to determine two nodes that should be merged and create a new, smaller graph. To make my test turn green, I wrote the following method.
public StringGraph Reduce()
{
if (nodes.Count <= 1)
{
return this;
}
var edges = from n in nodes
from e in n.Edges
select e;
var edge = edges.First(n => n.Weight == edges.Max(e => e.Weight));
return new StringGraph(nodes.Select(n => n.Value)
.Where(str => str != edge.Parent.Value &&
str != edge.Node.Value)
.With(edge.Combine().Value));
}
Although I didn’t put all the necessary error checking in this program, I felt it important that someone does not try to reduce a 1 or 0 node graph. I then pull up every edge, obtain the first one with the highest weight, then combine the nodes it connects.
This part of the code could use some optimization. I am creating an entirely new graph, which may be correct in a class library where immutable types are expected. However, StringGraph is far from immutable, and there is overhead from making it build up a new graph. If I wanted to generate a reduced graph without affecting the current instance, it may be better to implement cloning and a better node merging algorithm. The edges going to the parent and from the child will typically be the same.
More Improvements
There are other improvements that can be made. I’m not currently detecting and removing nodes that represent inner strings, and there are likely better algorithms for NP problems. This four string example is rather simplistic, and there is much more ground that can be covered. Give it a shot and create your own solution, or improve upon the one I’ve presented. Be sure to tell me your approach in the comments!