Community discussions

MikroTik App
 
Tuxmaster
just joined
Topic Author
Posts: 8
Joined: Sun Dec 03, 2023 11:27 am

UTF-8 representation problem?

Sat Apr 06, 2024 4:46 pm

Hi,
I have noted, that UTF-8 characters will works in the web ui, but fails under the ssh console.
For example edit an description text under the web ui by inserting an "ä".
The character will be shown correctly in the web ui.
But in the ssh console, it will be printed as "�".
So I think the SSH console should either be able to display UTF-8 characters correctly or the WEB ui should reject them.
Tested with RouterOS 7.14.2.
 
User avatar
Amm0
Forum Guru
Forum Guru
Posts: 3605
Joined: Sun May 01, 2016 7:12 pm
Location: California

Re: UTF-8 representation problem?

Sat Apr 06, 2024 7:26 pm

Mikrotik for sure clarify this better.

AFAIK, in webfig, you're NOT allowed to use the full UTF-8 charset actually. e.g. while web page does use UTF-8 for unicode input — I think it's transformed into [pre-unicode] Windows 1252 codepage for storage when submitted & then on display in webfig back to UTF-8. It's not stored as UTF-8. So if you enter an é or ü in a winbox comment, it should display same in webfig (even though on the webpage been returned UTF-8). But if the UTF-8 (e.g. an emoji) is not in the Windows 1252 page, it won't accept it.

And to your point, SSH does not do the same CP1252 <=> UTF-8 conversation done by webfig. For SSH, they take the approach to show a hexstring. e.g. so something like éü will show as E9FC - which is the CP1252 "Latin1" high ASCII codes for those characters. If you see the "�" in SSH that actually the unicode char for an invalid unicode char (e.g. since there is no UTF-8 escape sequence but >127 ASCII code – which happens when Windows-1252 code pages is used when UTF-8 is expected).

For reference, this is the "allowed" UTF-8 characters:
Screenshot 2024-04-06 at 9.10.54 AM.png
This mainly applies to comments. Other things it get more confusing since you can "manual" enter UTF-8 escape sequence into a string using scripting. e.g. SSID will show the full UTF-8 range but the UTF-8 must provided via CLI using the raw UTF-8 escaped into RouterOS string e.g. /interface wireless set ([find]->0) ssid="\E2\9D\8C" to make an SSID named: ❌

AFAIK, there is NO way to change single-byte code page RouterOS will use either. So poses problems for Cyrillic-based, Arabic, Chinese, Hebrew, etc. which was solved by unicode in the 90s. Or even Baltic languages, like Latvian, since not all the "funky letters" are not support by Windows-1252 either.
You do not have the required permissions to view the files attached to this post.
 
Tuxmaster
just joined
Topic Author
Posts: 8
Joined: Sun Dec 03, 2023 11:27 am

Re: UTF-8 representation problem?

Sun Apr 07, 2024 9:35 am

I have now looked a little further and found out that the behaviour is totally different.
In some input masks it is coded in others not. :(
Here it is partially escaped:
/interface/wifi/security/print
if you enter UTF-7 characters via the Web UI.
Yes for the name, no for the comment.
Sample:
;;; ��
name="WPA3 via key \FC" ...
 
User avatar
Amm0
Forum Guru
Forum Guru
Posts: 3605
Joined: Sun May 01, 2016 7:12 pm
Location: California

Re: UTF-8 representation problem?

Sun Apr 07, 2024 5:24 pm

I have now looked a little further and found out that the behaviour is totally different.
In some input masks it is coded in others not. :(
Oh, I totally agree it ain't consistent. e.g.
Mikrotik for sure clarify this better.

My point was more it's not really UTF-8 anywhere, other than at the edges of webfig. And at best case, you use chars from the Windows-1252 charset, even if more unicode/UTF-8 is accepted in webfig. i.e. é Ø ñ are okay while ž Ж ת would certainly not. Now winbox, running on Windows with different codepage, is different story...IDK

My grip is Mikrotik just say formally how it suppose to work. I limit myself to use us-ascii but that easy for me since English doesn't need any codepage.
 
Tuxmaster
just joined
Topic Author
Posts: 8
Joined: Sun Dec 03, 2023 11:27 am

Re: UTF-8 representation problem?

Sun Apr 07, 2024 7:45 pm

I use one the web UI and and the SSH connection, because I don't use Windows.
So the winbox will be not an option for me.
But in fact the behaviour should be uniform. Either everything or nothing escaped.
 
tangent
Forum Guru
Forum Guru
Posts: 1422
Joined: Thu Jul 01, 2021 3:15 pm
Contact:

Re: UTF-8 representation problem?

Mon Apr 08, 2024 1:07 pm

because I don't use Windows. So the winbox will be not an option for me.

WINE runs WinBox well.

But in fact the behaviour should be uniform. Either everything or nothing escaped.

It's not nearly that simple. The stupendous compound complications of human languages are collectively and imperfectly reflected in the design and implementation of Unicode. It is naïve and unreasonable to expect a full Unicode implementation in RouterOS. The libicu binaries are about 37 megs on the system I'm typing this on, over twice the flash space of a good many ROS devices.

(Aside for @Amm0: Did you note the existence proof showing that ASCII is incomplete for encoding all English prose? 😜)

It's not directly on-point, but this famous Stack Overflow answer addresses a similarly child-like wish related to Perl. If a programming language for much bigger machines can't "just do the right thing" by default, the case is hopeless for a small embedded system like ROS. The best we can hope for is a blind shift to UTF-8 without any interpretation, causing the OS to store the bytes as-is and leave all interpretation to client programs running on much bigger systems.

I predict this will not happen in the v7 line, ever. This is the kind of major breaking change you'd do in a future v8.
 
User avatar
Amm0
Forum Guru
Forum Guru
Posts: 3605
Joined: Sun May 01, 2016 7:12 pm
Location: California

Re: UTF-8 representation problem?

Mon Apr 08, 2024 5:35 pm

(Aside for @Amm0: Did you note the existence proof showing that ASCII is incomplete for encoding all English prose? 😜)
Well, my №1 unicode needs are more 18th century — I use the em—dash & en–dash a lot. 😂😜

OP has a point, webfig likely should enforce the same rules. But I too don't expect unicode support in the V7 branch — changes to allow/disallowed chars will causes someone config to break. So I guess I view that winbox is the "reference" what's allowed, and webfig didn't do same, that's a bug in webfig.

Mikrotik is purported to be working on a "multiplatform client", so I'm sure the topic of unicode will come up again.
 
User avatar
mkx
Forum Guru
Forum Guru
Posts: 11748
Joined: Thu Mar 03, 2016 10:23 pm

Re: UTF-8 representation problem?

Mon Apr 08, 2024 9:55 pm

Mikrotik is purported to be working on a "multiplatform client" ...

US-ASCII works on all modern platforms just fine :wink:

For the record: my native language doesn't fit in any western 8-bit encodings, even less in 7-bit US-ASCII, so I'm grateful for UTF-8. But when it comes to networking gadgets, need for anything but US-ASCII is beyond my comprehension.
 
abbio90
Member Candidate
Member Candidate
Posts: 251
Joined: Fri Aug 27, 2021 9:16 pm
Contact:

Re: UTF-8 representation problem?

Sun Apr 21, 2024 10:43 am

I'm no expert on this, but here is a script that does a conversion UTF-8. try to see if modifying it can work for you
https://foisfabio.it/index.php/2023/06/ ... -telegram/

Who is online

Users browsing this forum: Ahrefs [Bot], anav, Bing [Bot], DanMos79, holvoetn, JDF, Joe1vm, regisc, torms and 50 guests