Second, the decoding depends on the type of character: the algorithm constrains words to those contained in a dictionary while it allows arbitrary non-word-character strings. "an" is more probable than "oi") which further improves the result. What is the difference between the chars and the wordChars parameter: the algorithm has to know which characters form a word for two reasons.
Beam search with character-LM: "A randan number: 1234." It additionally scores character-sequences (e.g. It also only uses the NN output, but it uses more information from it and therefore produces a more accurate result. Choosing just one best candidate might be suitable for the current time step, but when we construct the full sentence, it may be a sub-optimal choice.
