Module talk:Lang-zh

Module:Lang-zh is permanently protected from editing because it is a heavily used or highly visible module. Substantial changes should first be proposed and discussed here on this page. If the proposal is uncontroversial or has been discussed and is supported by consensus, editors may use {{edit template-protected}} to notify an administrator or template editor to make the requested edit.

Writing systems

This module falls within the scope of WikiProject Writing systems, a WikiProject interested in improving the encyclopaedic coverage and content of articles relating to writing systems on Wikipedia. If you would like to help out, you are welcome to drop by the project page and/or leave a query at the project’s talk page.Writing systemsWikipedia:WikiProject Writing systemsTemplate:WikiProject Writing systemsWriting system articles

China

This module is within the scope of WikiProject China, a collaborative effort to improve the coverage of China related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ChinaWikipedia:WikiProject ChinaTemplate:WikiProject ChinaChina-related articles

Taiwan

This module is within the scope of WikiProject Taiwan, a collaborative effort to improve the coverage of Taiwan on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.TaiwanWikipedia:WikiProject TaiwanTemplate:WikiProject TaiwanTaiwan articles

Hong Kong

This module is within the scope of WikiProject Hong Kong, a project to coordinate efforts in improving all Hong Kong-related articles. If you would like to help improve this and other Hong Kong-related articles, you are invited to join this project.Hong KongWikipedia:WikiProject Hong KongTemplate:WikiProject Hong KongHong Kong articles

Hong Kong To-do:

Attention needed (60)

...needing expert attention (4) • ...without infoboxes (23)

Collaboration needed

Recommend topic

Improvement needed

GA-Class articles (60) • B-Class articles (291)

Cleanup needed

C-Class articles (1,023)

Image needed (348)

Destub needed

Start-Class articles (5,424) • Stub-Class articles (6,730)

Deorphan needed

...orphans

Page creation needed

Requested articles

Miscellaneous tasks

...maintain popularity • ...assess the un-Class (390) • ...assess the un-Importance (1,322)

Macau

This module is within the scope of WikiProject Macau, an attempt to better organize and improve articles related to Macau.MacauWikipedia:WikiProject MacauTemplate:WikiProject MacauMacau articles

Singapore

This module is within the scope of WikiProject Singapore, a collaborative effort to improve the coverage of articles related to Singapore on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.SingaporeWikipedia:WikiProject SingaporeTemplate:WikiProject SingaporeSingapore articles

Malaysia

This module is within the scope of WikiProject Malaysia, a collaborative effort to improve the coverage of Malaysia and Malaysia-related topics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MalaysiaWikipedia:WikiProject MalaysiaTemplate:WikiProject MalaysiaMalaysia articles

To help centralise discussions and keep related topics together, Template talk:Lang-zh and Template talk:Lang-zh/doc redirect here.

Archives

1, 2, 3, 4, 5

This page has archives. Sections older than 180 days may be automatically archived by when more than 4 sections are present.

Template-protected edit request on 5 April 2024

This edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request.

I would like to enable the option "first=poj" analogously to "first=j". The "first=j" option allows Cantonese romanisations to be given before Mandarin romanisations, in articles where Cantonese is more relevant. The proposed "first=poj" option would allow Hokkien romanisation (POJ) to be given first, in articles where Hokkien more relevant, e.g. for Bukit Ho Swee, Hong-Gah Museum, Tamsui District.

I believe this could be achieved by adding the following:

From line 114, after:

	local j1 = false -- whether Cantonese Romanisations go first

insert:

	local poj1 = false -- whether Hokkien Romanisations go first

From line 121, after:

			if (testChar == "j") then
				j1 = true
			 end

insert:

			if (testChar == "poj") then
				poj1 = true
			end

(The variable is named "testChar" but it is defined by the regular expression "%a+", which will match not only a single character but also longer strings.)

(On a separate note, there seems to be a superfluous space before "end" on lines 120 and 123.)

From line 137, after:

	if (j1) then
		orderlist[4] = "j"
		orderlist[5] = "cy"
		orderlist[6] = "sl"
		orderlist[7] = "p"
		orderlist[8] = "tp"
		orderlist[9] = "w"
	end

insert:

	if (poj1) then
		orderlist[4] = "poj"
		orderlist[5] = "p"
		orderlist[6] = "tp"
		orderlist[7] = "w"
		orderlist[8] = "j"
		orderlist[9] = "cy"
		orderlist[10] = "sl"
	end

This puts POJ before the Mandarin and Cantonese romanisations. Freelance Intellectual (talk) 08:49, 5 April 2024 (UTC)[reply]

Done * Pppery * _{it has begun...} 02:53, 15 April 2024 (UTC)[reply]

Double-quotes around glosses

Is there a reason we use double-quotes rather than single-quotes to show the output of |tr=? MOS:SIMPLEGLOSS suggests we should prefer singles. — OwenBlacker (he/him; Talk) 17:37, 18 June 2024 (UTC)[reply]

Because |l= is used for literal translations & glosses, and |tr= is (much more rarely) used for non-literal translations. Remsense诉 17:39, 18 June 2024 (UTC)[reply]

Aha, that makes sense. So I have probably been misusing |tr= when I should have been using |l=. Thank you! — OwenBlacker (he/him; Talk) 18:03, 18 June 2024 (UTC)[reply]

Commas within literal glosses

What should we do if there needs to be a comma within a literal translation? I noticed this on Yi Jian Mei (song), where the quotes should be placed around the whole comma-separated phrase, not individually around each side of the comma. pacificboy (talk) 03:56, 11 July 2024 (UTC)[reply]

My assumption when adding this feature was that if one needed to add a comma, it should probably be treated as a proper translation, not a gloss. It turns out I never use this formatting, so I could very plausibly disable it. Remsense诉 05:49, 11 July 2024 (UTC)[reply]

Ah, that makes sense! I’ll convert it to a translation. Thanks. pacificboy (talk) 02:45, 12 July 2024 (UTC)[reply]

Template-protected edit request on 17 August 2024

This edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request.

I propose the following changes to add Tâi-lô romanization support. Of course, POJ covers 95% of Hokkien/Minnan use cases (hence why I have added the "tailo" IANA subtag) but it could still be useful for Taiwanese-specific pages. Additions and modifications below:

--- Module:Lang-zh
+++ Module:Lang-zh

@@ after line 29 @@ local labels = {
 	["sl"] = "Sidney Lau",
    ["poj"] = "Pe̍h-ōe-jī",
+	["tl"] = "Tâi-lô",
	["zhu"] = "Zhuyin Fuhao",
	["l"] = "lit.",
    
@@ after line 46 @@ local wlinks  = {
 	["poj"] = "Pe̍h-ōe-jī",
+	["tl"] = "Tâi-uân Lô-má-jī Phing-im Hong-àn",
    
@@ after line 63 @@ local ISOlang = {
 	["poj"] = "nan-Latn",
+	["tl"] = "nan-Latn-tailo",

@@ after line 74 @@ local italic  = {
 	["poj"] = true,
+	["tl"] = true,

@@ at line 136 @@
-	local orderlist = {"c", "s", "t", "p", "tp", "w", "j", "cy", "sl", "poj", "zhu", "l", "tr"}
+	local orderlist = {"c", "s", "t", "p", "tp", "w", "j", "cy", "sl", "poj", "tl", "zhu", "l", "tr"}

@@ after line 150 @@ if (poj1) then
		orderlist[4] = "poj"
-		orderlist[5] = "p"
-		orderlist[6] = "tp"
-		orderlist[7] = "w"
-		orderlist[8] = "j"
-		orderlist[9] = "cy"
-		orderlist[10] = "sl"
+		orderlist[5] = "tl"
+		orderlist[6] = "p"
+		orderlist[7] = "tp"
+		orderlist[8] = "w"
+		orderlist[9] = "j"
+		orderlist[10] = "cy"
+		orderlist[11] = "sl"
	end

MSG17 (talk) 15:53, 17 August 2024 (UTC)[reply]

@MSG17: This sounds reasonable, and would be helpful on pages such as Penang Hokkien where both POJ and TL are used in the article text. @Pppery or @Jonesey95, would you be able to help here? Freelance Intellectual (talk) 13:03, 19 September 2024 (UTC)[reply]

I'll take a look at this ASAP, thank you for your improvements! Remsense ‥ 论 13:06, 19 September 2024 (UTC)[reply]

Done Remsense ‥ 论 13:48, 19 September 2024 (UTC)[reply]

Further romanization discussion

Coming off of my request to add Tâi-lô, what other romanization systems should be added to the template? I feel like Pha̍k-fa-sṳ annd Wugniu could be helpful. I don't see any IANA latn subtages for other Sinitic languages however. MSG17 (talk) 15:53, 17 August 2024 (UTC)[reply]

Trailing bold in l= not being removed

In

{{zh|t=竹子林站|j=Zuk1 Zi2 Lam4 Zaam6|l = '''Bamboo Forest station'''}}

, the opening bold markup is properly removed, but the trailing bold markup is not removed. It looks like the regular expression at

term = string.gsub(term, "^([ \"']*)(.*)([ \"']*)$", "%2")

needs some adjustment to the middle wildcard search. – Jonesey95 (talk) 13:23, 16 September 2024 (UTC)[reply]

@Jonesey95: This is because the * operator is greedy, so .* matches everything else in the string. Changing .* to .*? would make it lazy, so that the final term catches all trailing characters. In other words, change the line of code to:

term = string.gsub(term, "^([ \"']*)(.*?)([ \"']*)$", "%2")

Freelance Intellectual (talk) 13:51, 16 September 2024 (UTC)[reply]

Thanks! That fixed the problem at Zhuzilin station and probably other pages. – Jonesey95 (talk) 17:26, 16 September 2024 (UTC)[reply]

Thank you for fixing my shoddy regex, by the way. Remsense ‥ 论 13:05, 19 September 2024 (UTC)[reply]

@Jonesey95 and Remsense: On further reflection, this doesn't work as intended. I had thought the string was a regex, but it is in fact a Lua pattern, which is slightly different. The Lua equivalent of *? is - which would give:

term = string.gsub(term, "^([ \"']*)(.-)([ \"']*)$", "%2")

Writing .*? in Lua (as I suggested above) actually means greedily matching all characters (.*) followed by a single question mark (? can also be an operator, but Lua pattern operators can't be nested so in this context it is interpreted as a literal). So actually the new pattern usually doesn't make a substitution, unless there is a question mark. This means it usually fails, e.g. where there are multiple glosses separated by commas and spaces, the spaces are not stripped. However, looking at what the pattern match applies to, I'm not completely sure I understand why the quotes should be stripped in the first place (is there a set of testcases to check against?). At Zhuzilin station, the current code makes no substitution, and so it keeps the bold formatting, presumably as intended. The old code meant that the bold formatting was stripped at the beginning and not the end, so the rest of the article became bold (which was a bad and confusing error). Correcting .*? to .- as above would strip both, making it impossible to add bold formatting. Is the intention to catch cases where an editor unnecessarily adds quotes to the gloss? Is this a common problem? If so, is removing the ability to add bold and italic formatting a fair price to pay?

If we want to strip one quote mark but no more (so that we catch editors manually adding quotes, but allow formatting), pattern matching is a bit more complicated. I think it would be easiest to separate the stripping of whitespace and quotes. When stripping one single quote, we need to check that there isn't more than one, but we also need to allow the string to contain an apostrophe (so we can't just use [^']- in the middle) and a gloss could potentially be a single character (so we can't just use [^'].-[^'] in the middle). So it seems easiest to strip the leading and trailing quotes separately. This gives three lines (I've also removed two sets of brackets that were capturing substrings that weren't used):

term = string.gsub(term, "^ *(.-) *$", "%1")
term = string.gsub(term, "^[\"']?([^\"'].-)$", "%1")
term = string.gsub(term, "^(.-[^\"'])[\"']?$", "%1")

Freelance Intellectual (talk) 15:43, 24 September 2024 (UTC)[reply]

I think it's fine to strip all quote marks, in any quantity. That was the original intent of the code, and I don't see any complaints on this page. Adding bold to text is probably against WP:MOS, and adding italics should be done with a parameter. People can use <b>...</b> and <i>...</i> tags if they insist on them. – Jonesey95 (talk) 15:51, 24 September 2024 (UTC)[reply]

Okay. I had taken your comment about fixing the Zhuzilin station article to mean that keeping the bold markup was intended, but I can see why it could be discouraged. I've also just found Template:Lang-zh/testcases (I had only looked under Module:Lang-zh before), and I don't see any testcases for stripping markup. So, if stripping markup is the desired functionality, the .- version above would work. I think it would make sense to document this, since there are three different kinds of thing being stripped: whitespace, markup, and quotes (double quotes aren't markup). It could be documented either on Template:Lang-zh/doc or directly as a code comment next to the line we're discussing, e.g. "remove trailing and leading spaces, quotes, and bold/italic markup". Freelance Intellectual (talk) 20:39, 24 September 2024 (UTC)[reply]

Currently, this stripping only applies to literal glosses and not translations, but they should reasonably be treated the same. So, fixing the pattern, matching all whitespace (not just spaces), expanding the comments, and applying the same to the translation, I suggest changing lines 236-247 to the following:

			elseif (part == "l") then
				local terms = ""
				-- put individual, potentially comma-separated glosses in single quotes
				-- (first strip leading and trailing whitespace and quotes, including bold/italic markup)
				for term in val:gmatch("[^;,]+") do
					term = string.gsub(term, "^([%s\"']*)(.-)([%s\"']*)$", "%2")
					terms = terms .. "&apos;" .. term .. "&apos;, "
				end
				val = string.sub(terms, 1, -3)
			elseif (part == "tr") then
				-- put translations in double quotes
				-- (first strip leading and trailing spaces and quotes, including bold/italic markup)
				val = string.gsub(val, "^([%s\"']*)(.-)([%s\"']*)$", "%2")
				val = "&quot;" .. val .. "&quot;"
			end

Freelance Intellectual (talk) 09:31, 25 September 2024 (UTC)[reply]

@Jonesey95 and Remsense: What do you think? Are you happy with the above suggestion?

Also, instead of directly using a Lua string pattern, it might be more readable and maintainable to use an existing function for stripping leading and trailing characters, namely mw.text.trim:

			elseif (part == "l") then
				local terms = ""
				-- put individual, potentially comma-separated glosses in single quotes
				-- (first strip leading and trailing whitespace and quotes, including bold/italic markup)
				for term in val:gmatch("[^;,]+") do
					term = mw.text.trim(term, "%s\"'")
					terms = terms .. "&apos;" .. term .. "&apos;, "
				end
				val = string.sub(terms, 1, -3)
			elseif (part == "tr") then
				-- put translations in double quotes
				-- (first strip leading and trailing spaces and quotes, including bold/italic markup)
				val = mw.text.trim(val, "%s\"'")
				val = "&quot;" .. val .. "&quot;"
			end

Freelance Intellectual (talk) 09:02, 27 September 2024 (UTC)[reply]