Sublime Forum

ST3 spell checker confused by punctuation

#1

Every word touching a single- or double curly quote or m-dash (or n-dash?) or containing a nice curly apostrophe is tagged by the spell checker as misspelled. This may be true of other punctuation as well. It’s hard to write a properly spelled and punctuated Web article when Sublime Text claims that every other word is misspelled. I end up having to spell check a sea of red squigglies by eye, which sort of defeats the purpose of having a spell checker.

Since the spell checker DOES handle straight quotes, apostrophes, and ordinary hyphens correctly, it may be a simple matter of adding characters to an internal list of characters to strip out or mapping these characters to their ASCII equivalents inside the spell checker before looking for them in the dictionary.

0 Likes

#2

I have similar problems with how the spell checker decides what is a word. All these false positives make it very hard to notice the actual typos. Are there any work arounds?

For those interested, the theme is Material. http://equinusocio.github.io/material-theme/

Correcting misspellings is also a problem. For example, elysium should be capitalized, but because Sublime Text thinks the word is elysiu, it corrects it to Elysiumm with an extra m.

0 Likes

#3

I’m not sure what markup language/syntax that is, however it looks very similar to https://github.com/sublimehq/Packages/issues/103.

It may be possible to fix via a simple tweak to the tmLanguage/sublime-syntax file. I haven’t fixed it that way with the Markdown package yet since my hope is to see if I can solve it at a higher level.

0 Likes

#4

This file uses the AsciiDoc package but I see the same issues with MarkdownEditing. By higher level, do you mean in Sublime Text itself, rather than in a package?

In this screenshot all the words between square brackets are in the dictionary.

Additionally, it messes up the correction or ignoring of words. Eg. “soo” becomes “soon” which reads as “soonn”.

The words between quotes are actually not a problem, even though the squiggles extend further then they should. Correcting edmonton to capitalize it does as I’d expect.

0 Likes

#5

I just submitted a PR to the AsciiDoc package to fix this. https://github.com/SublimeText/AsciiDoc/pull/11

The issue is that the syntax definition used a capture group where it didn’t need one, which affected how Sublime Text tokenized, and thus spell-checked, the text.

I’ve also pushed changes to the Markdown package at https://github.com/sublimehq/Packages that fix the same kind of spelling issues for link text and image descriptions.

0 Likes

#6

is this related ? https://github.com/SublimeTextIssues/Core/issues/937

0 Likes

#7

Dev Build 3096 contains a fix:

“Syntax: .sublime-syntax files with unused captures no longer cause spellcheck errors.”

That and/or wbond’s change fixes the major issue. Both AsciiDoc and Markdown files are looking much better to me.

The only problem I’m still seeing in AsciiDoc is that multiple words are considered a single word in some cases. For example, tags=main or include::{sourcedir (the closing paren } isn’t included in the word for whatever reason).

Do you think that’s a core issue or something that can be fixed in AsciiDoc?

0 Likes

#8

Can you elaborate on this?

0 Likes

#9

My gut reaction is that it can be fixed in AsciiDoc.

For some regular expressions in syntax definitions, a group is needed for repetition or backreferences, but not for assigning a scope name. Generally, Sublime Text turns every capture group into a separate token, since 99% of the time, users want a different scope name for each capture group.

Spell checking is performed on tokens and the final character in AsciiDoc and Markdown were the matched item for a capture group without a scope name, thus creating a separate token. This caused spell-checking to happen to the word in two parts. This was originally fixed in the ST lexer back around build 3068 by not creating a separate token when a capture group is not assigned a scope name, however a subtle bug appeared when .sublime-syntax was introduced, and a regression was caused. The regression was fixed in 3098.

All of that is to explain what was fixed in Sublime Text itself. What you are describing sounds like the syntax definition may not be assigning different scopes to parts of the syntax. I would expect the = to have a different scope than the two words around it.

0 Likes

#10

I’m afraid there are still some issues with punctuation. The following screen shot is using plain text as I have no package installed for Go’s play format. This is the latest Sublime Text 3100 build from today and it doesn’t pick out several words (obviously func should be in my ignore list, that it picks out fine).

0 Likes

#11

The misspellings seem to be triggered by various uncommon punctuation. For example, ^, ` and =.

Plain Text is actually more of a problem here than a help, since this isn’t plain text, but a markup language. See, plain text does not create scopes for different syntax, so tokenizing is done on high-level english punctuation. Normally ^, etc are not part of prose.

If you use the correct syntax definition with the markup, and the syntax definition properly defines tokens, the spell checker should receive usable words, and should give accurate results.

If you provide some examples of misspellings in an AsciiDoc document, I should be able suggest tweaks to the syntax that would help.

1 Like

#12

Hi Will,

Thanks for your help. I’ve opened an issue in the AsciiDoc bundle:

I find the whole system a bit odd, especially in the case of plain text. At least for English, contractions and maybe hyphens are the only place I can think of where context matters. Otherwise I don’t see any disadvantage to splitting on any and all non-letter characters for the purposes of spellcheck.

0 Likes

#13

Build 3104 includes significant changes to the spell checker, which should hopefully resolve spelling issues without having to modify language grammars.

Let me know how it works for you.

0 Likes

#14

@wbond, do you know why its in “Hello, it’s me” gets marked as an spelling error? Is using “smart quotes”. Is this dictionary related? Im asking so we can find a way to improve the dictionaries. Issue been raised here https://github.com/SublimeTextIssues/Core/issues/1090

0 Likes

#15

That should be resolved with 3104. We rewrote how the spell checker interacts with the tokens that were parsed by the syntax, so punctuation (other than an apostrophe between two letters) creates boundaries between words, resulting in them being checked separately.

Definitely let me know if you see any remaining issues with spell checking so we can get them resolved.

0 Likes

#16

OK that’s nice, I saw it, is a huge improvement

About the other error. I may failed to express correctly, the exact text:

it’s

is marked as an spelling error. because use a fancy apostrophe. I want to know if this is related to ST or to the dictionary files (I tested with many dictionaries, and none can get the word right). I added to the .AFF of the British dictionary the following code and the words with fancy apostrophe seems good now (you may need to remove Cache, Index folders and restarting),

ICONV 1
ICONV ’ '

Could this be handle by ST instead of having to edit all dictionaries? There are probably more kinds of fancy apostrophe , I could help get a list

Solving this spellcheck is almost perfect. There are ton of text that use fancy apostrophes

1 Like

Dev Build 3106
#17

Thanks. I went through the issue I reported for AsciiDoc using Dev Build 3107 and all the issues are fixed.

I’m seeing one place where it’s reporting a spelling error but probably shouldn’t. It’s for text that is italicized, such as _one true brace style_. It sees _one as a word that should be changed to one.

This is for AsciiDoc and it also sometimes happens for Markdown GFM bundle. It seems to happen more often if the italics isn’t closed. I didn’t notice a problem with emphasis *one true brace style*.

Personally, I think whether or not the italics/bold is closed or not shouldn’t effect spell check. That should be the realm of linters.

Also, I sometimes have crashes when adding a word. Today I added “stroganoff” to the dictionary, but it crashed and failed to add the word. After reopening Sublime I added the word again and it worked fine. Is there a crashlog or a process to report crashes?

I’ve seen the issue with smart quotes before, but I try to use plain quotes and plain apostrophes in all my text and let tools make them smart in the final output, so I haven’t verified that issue in the latest build.

0 Likes

#19

Currently I have Sublime Text crashing while typing long texts files. You can find this issue here on Core issue tracker. You can also report your issue there:

  1. https://github.com/SublimeTextIssues/Core/issues/1832 Several crashes while typing text

Perhaps it is related. Also I know of these other crash issues:

  1. ST3 crashes frequently since Build 3124
  2. Debug symbols for Sublime Text build 3126 to figure out from where crash is coming
  3. ST3 3017 random crashes while typing
  4. Sublime Text 3 constantly crashes on Windows 10
  5. ST3 Build 3118 crash when using
0 Likes

#20

most likely it was this

1 Like