Bookmark and Share

Shortest String That Contains All Words

Mango12 created an interesting competition to kick off the New Year, and I decided to try it out. It’s a simple task along the lines of a code kata, and I recommend you try it yourself before looking over my solution.

Task: Compress a list of words into the shortest string that contains all words.

Test: “testing”, “ginger”, “german”, “minutes” should become “minutestingingerman”

Here is my approach: create a weighted graph connecting each term then recursively reduce the highest weighted edges.

The test to see if it works begins with four terms.

image

Two different edges can be reduced. The test won’t be affected by which one I choose. If I optimize this in the future, it would be prudent to check the nodes and merge the two with no other good matches.

image

The choice is clear this time.

image

After the final merge, we end up with minutestingingerman.

I started off with a test to prove that it works.

[Test]
public void ShouldCompressListOfWordsToShortestStringThatContainsAllWords()
{
    var terms = new[] {"testing", "ginger", "german", "minutes"};
    StringGraph graph = new StringGraph(terms);
    Assert.AreEqual("minutestingingerman", graph.Compress());
}

I then created a StringGraph class with weighted nodes. The Node class contain edges and a function to calculate weight when a node is added. In building the solution, I encountered a problem where I was mixing mutable and immutable methods. If I return to this in the future, I should decide which direction I want to go in. I decided to just make it work for the time being.

Since I have a myriad of classes for building the graph, I  will focus on a few important points. The solution is available here if you’re interested in trying it out for yourself.

Calculating Weight

I created a Node class that contains two generics: one for the value type and another for the weight type. To create edges, nodes are added and the weight must be calculated at that time. I decided to make the weight calculation a function that can be assigned by creator. The default function will set the weight to the default of the weight type.

public class Node<T, TWeight> : INode<T, TWeight>
{
    private List<IEdge<T, TWeight>> edges = new List<IEdge<T, TWeight>>();
    
    Func<INode<T, TWeight>, INode<T, TWeight>, TWeight> calculateWeight;

    public Func<INode<T, TWeight>, INode<T, TWeight>, TWeight> 
                                               CalculateWeight { get { if (calculateWeight == null) { calculateWeight = (n1, n2) => default(TWeight); } return calculateWeight; } set { calculateWeight = value; } } public IEnumerable<IEdge<T, TWeight>> Edges { get { return edges; } private set { edges = new List<IEdge<T, TWeight>>(value); } } public T Value { get; private set; } public Node(T value) { this.Value = value; } public void Add(INode<T, TWeight> node) { var edge = new Edge<T, TWeight>(this, node,
                                        CalculateWeight(this, node)); edges.Add(edge); } }

StringGraph assigns the function which checks for the overlap at the end of the parent node and the beginning of the child node.

calculateWeight = (node, newNode) =>
    {
        int maxLength = node.Value.Length < newNode.Value.Length ?
            node.Value.Length : newNode.Value.Length;
        for (int i = node.Value.Length - maxLength; i < maxLength; i++)
        {
            if (node.Value.EndsWith(
                  newNode.Value.Substring(0, maxLength - i))) { return maxLength - i; } } return 0; };

The values generated by this function correlate with the numbers in the graphs above.

Reducing the Graph

The Reduce method is meant to determine two nodes that should be merged and create a new, smaller graph. To make my test turn green, I wrote the following method.

 

public StringGraph Reduce()
{
    if (nodes.Count <= 1)
    {
        return this;
    }

    var edges = from n in nodes
                from e in n.Edges
                select e;
    var edge = edges.First(n => n.Weight == edges.Max(e => e.Weight));
    
    return new StringGraph(nodes.Select(n => n.Value)
                                .Where(str => str != edge.Parent.Value && 
                                                    str != edge.Node.Value) .With(edge.Combine().Value)); }

Although I didn’t put all the necessary error checking in this program, I felt it important that someone does not try to reduce a 1 or 0 node graph. I then pull up every edge, obtain the first one with the highest weight, then combine the nodes it connects.

This part of the code could use some optimization. I am creating an entirely new graph, which may be correct in a class library where immutable types are expected. However, StringGraph is far from immutable, and there is overhead from making it build up a new graph. If I wanted to generate a reduced graph without affecting the current instance, it may be better to implement cloning and a better node merging algorithm. The edges going to the parent and from the child will typically be the same.

More Improvements

There are other improvements that can be made. I’m not currently detecting and removing nodes that represent inner strings, and there are likely better algorithms for NP problems. This four string example is rather simplistic, and there is much more ground that can be covered. Give it a shot and create your own solution, or improve upon the one I’ve presented. Be sure to tell me your approach in the comments!

blog comments powered by Disqus

KodefuGuru.GetInfo()

Chris Eargle
LinkedIn Twitter Technorati Facebook

Chris Eargle
Telerik Developer Evangelist, C# MVP

JustCode

Telerik .NET Ninja

 

INETA Community Speakers Program

 

MVP - Visual C#

 

Friend of RedGate

World Map

Month List

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

© Copyright 2010
Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer’s view in any way.