Log in

Semagic bugfixing: XMPlay music detection - 'Twas brillig, and the slithy toves did gyre and gimble in the wabe [entries|archive|friends|userinfo]

[ website | Beware the Jabberwock... ]
[ deviantArt | the-boggyb ]
[ FanFiction | Torkell ]
[ Tumblr | torkellr ]

[Random links| BBC news | Vulture Central | Slashdot | Dangerous Prototypes | LWN | Raspberry Pi]
[Fellow blogs| a Half Empty Glass | the Broken Cube | The Music Jungle | Please remove your feet | A letter from home]
[Other haunts| Un4seen Developments | Jazz 2 Online | EmuTalk.net | Feng's shui]

Semagic bugfixing: XMPlay music detection [Saturday 8th October 2016 at 9:11 pm]

[Feeling |accomplishedaccomplished]
[Playing |Fluke - Snapshot [Puppy]]

I'd forgotten just how primitive the string parsing in C is.

One bug in Semagic is to do with the music detection for XMPlay: if a track has an Artistsort field and that happens to be returned before the Artist field, then Semagic will pick up that one instead. Actually it's more fun than that, because it'll actually include a few extra characters and return something like "ort Enigma".

XMPlay provides this information in one massive string where each tag is on its own line and has a tab separating the name and the value. Semagic searches for a tag by comparing the start of each line against the tag name, which is where the bug occurs as if you're only comparing the first 6 characters then "artist" and "artistsort" are equal.

Now, if I was coding this up from scratch I'd parse the entire string and dump it into a hash map. Probably with something like this if I was using Java:

String[] lines = input.split('\n');
for (String line in lines) {
  String[] tokens = line.split('\t', 2);
  if (tokens.length == 2) {
    tagMap.put(tokens[0], tokens[1]);

But unfortunately I'm using C and so have to roll my own string manipulation. I also have to do my own memory management which is even more annoying... and I can't be bothered with all that malarkey, so I'll try bodging a fix by checking if the field name is followed by a tab. A quick recompile... and it works!

If anyone's interested, here's the patch (apply it against v1.8.0.1):

Huh. I've spotted another bug - Semagic picks the first matching entry, but there can be multiple ones. In particular if a MP3 file has both ID3v1 and ID3v2 tags then it'll find the ID3v1 tag which could very well be truncated (because somebody thought 30 characters would be enough for anybody). That'll be much trickier to fix without rewriting the string parsing.
Link | Previous Entry | Share | Next Entry[ 3 pennies | Penny for your thoughts? ]

[User Picture]From: olego
Thursday 13th October 2016 at 5:06 am (UTC)
-if (!_tcsncicmp(token,p,token_len) && !_tcsncmp(p + token_len, _T("\t"), 1)){
+if (!_tcsncicmp(token,p,token_len) && p[token_len] == _T('\t')) {

A bit more legible ;-).
(Reply) (Thread)
[User Picture]From: boggyb
Thursday 13th October 2016 at 5:50 pm (UTC)
You can probably tell these days I work with languages that actually understand strings :)
(Reply) (Parent) (Thread)
[User Picture]From: olego
Thursday 13th October 2016 at 6:28 pm (UTC)
In the Visual C++ team, we're still fairly close to the hardware. C++ is now a lot better about strings, it has the class string with a .length() function, but it's still fairly clumsy to do things like string.replaceAll:


And super-clumsy to do =~ s/A/B/:

(Reply) (Parent) (Thread)