Jump to content

Wikipedia talk:Article size

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
(Redirected from Wikipedia talk:SIZE)

Removing kb limits

[edit]

My understanding is there is strong consensus to remove the kb limits thus:

Readable prose size What to do
> 15,000 words Almost certainly should be divided or trimmed.
> 9,000 words Probably should be divided or trimmed, although the scope of a topic can sometimes justify the added reading material.
> 8,000 words May need to be divided or trimmed; likelihood goes up with size.
< 6,000 words Length alone does not justify division or trimming.
< 150 words If an article or list has remained this size for over a couple of months, consider combining it with a related page. Alternatively, the article could be expanded; see Wikipedia:Stub.

But @Onetwothreeip reverted with "I am sympathetic to some kind of change like this, but really need a strong consensus". I thought we only needed consensus, but I'm not used to guideline changes. Is there consensus to remove the kb limits or good reasons to retain them? Tom B (talk) 15:48, 26 November 2023 (UTC)[reply]

I'd be in favour, but I think we need to make sure that WP:SIZESPLIT has the same units, so as not to confuse people. When we added the word counts earlier, some people suggested a transition period of a few years to ease people referring to the old kb units. —Femke 🐦 (talk) 16:02, 26 November 2023 (UTC)[reply]
A few years? Tom B (talk) 16:14, 26 November 2023 (UTC)[reply]
Given that people rarely look at guidelines they're familiar with, I think a year may be quite reasonable. I would prefer a change now, as the word limit suggestions are easy enough to interpret. —Femke 🐦 (talk) 16:18, 26 November 2023 (UTC)[reply]
When asked i have told my students for years about page length over kb (as the academic world does) . "Articles should range between 8,000 and 10,000 words, or approximately thirty-five pages in length, and would include a 150 word lead." Moxy- 16:20, 26 November 2023 (UTC)[1][2][3][4][5][6][7][8][reply]
I see no reason to remove the byte guideline. Each article's word count is not going to be immediately apparent. It is right that the word count is the more prominent guideline, but the byte count is simply an equivalent to that. However, I would support a change that makes it clearer that this table applies to articles that are predominantly prose content. Onetwothreeip (talk) 21:05, 26 November 2023 (UTC)[reply]
Au contraire; few (even experienced) editors understand how to calculate readable prose, and quite often misquote the KB when referring to size. I support removal of the confusing and dated KB bits. SandyGeorgia (Talk) 21:09, 26 November 2023 (UTC)[reply]
I would think that calculating readable prose is simply a matter of counting how many words there are in the prose of the article. I accomplish this myself by copying and pasting the text into something that counts words. Saying that a certain article kilobyte size of a prose article is generally equivalent to a certain amount of words should be complementary to the word count guideline. If it's being misquoted or confused, we should rewrite the guidelines or add clarification instead. Onetwothreeip (talk) 21:39, 26 November 2023 (UTC)[reply]
Rewriting doesn't help considering the most frequent misapplication of this guideline is by people who don't read it. They look at the page, see the KB, and apply that without any knowledge of how readable prose is calculated. I see this routinely (and we're still seeing it, even on this page). Removing KB is the way to go here, as it is not a good approximation of readable prose. SandyGeorgia (Talk) 12:55, 27 November 2023 (UTC)[reply]
My impression is that few or no people have defended using kb size as a metric in the discussion so far. Granted it's not a huge number of participants. And it might be worth thinking about having a guideline on kb size as well, given technical issues ... but on a different page. Jo-Jo Eumerus (talk) 10:02, 28 November 2023 (UTC)[reply]

I support Tom B's edit. We really need to accept that Wikipedia is NOTPAPER. Traditional size limits are a thing of the past, with the exception of mobile users.

In fact, I support removal of any form of maximum total word or byte length for articles and instead support "ease of access limits" on section size. It should be possible to easily hop to a section and open it using a mobile device. AFAIK, unduly large sections might be problematic. (Maybe not, so let's discuss that.) Section size is more of a concern than total article size. Finding info by searching is not a problem, including trillions of bytes. OTOH, "opening" an unduly large section might be a problem for some users. -- Valjean (talk) (PING me) 17:19, 26 November 2023 (UTC)[reply]

Remove KB, keep readable prose word limits; we've covered this at length elsewhere on this page. SandyGeorgia (Talk) 13:06, 27 November 2023 (UTC)[reply]

Counting the number of words has become much easier than it used to be. The Wikipedia:Prosesize script is available in Special:Preferences#mw-prefsection-gadgets. It adds a "Page size" item to end of the Tools section (sidebar or dropdown menu, depending on your skin). WhatamIdoing (talk) 18:44, 27 November 2023 (UTC)[reply]
While these tools are great, they are unfortunately not accessible to the majority of Wikipedia editors, who are not going to know how to use them. That is why guidelines like these are important. Onetwothreeip (talk) 20:17, 27 November 2023 (UTC)[reply]
@SandyGeorgia: Do you have any examples of the guideline being misapplied in such a way? Onetwothreeip (talk) 20:15, 27 November 2023 (UTC)[reply]
So that I don't have to troll back through my contribs for an example, look no further than a spinoff from the discussion here, which uses overall size in KB, rather than readable prose in words, for articles that don't necessarily need to be split. SandyGeorgia (Talk) 22:55, 27 November 2023 (UTC)[reply]
User:Onetwothreeip: you don't own this guideline. You are the only one to disagree with simplifying in this direction, please stop reverting. This is not a major change in the guideline, as the conversions is mostly constant between articles. Getting readable prose in kb or words requires mostly the same tools, so that's not an argument against simplifying either. —Femke 🐦 (talk) 11:10, 29 November 2023 (UTC)[reply]
@Femke and Tpbradbury: It's a major change to the guideline. I said there was "not much" recent discussion, meaning not enough. There have only been a few participants on this when this is a part of the guideline which gets quoted very often. I support change in this area, but the kB equivalents are useful for editors who cannot or don't want to use tools to find the exact word count. Onetwothreeip (talk) 19:36, 29 November 2023 (UTC)[reply]
They would still need the exact same tools to find the kilobyte equivalent is a readable prose, right? We've had a transition period where we showed both. We have no one objecting to this removal of confusing kilobytes, not even you. That's not make Wikipedia more bureaucracy than it already is. —Femke 🐦 (talk) 19:46, 29 November 2023 (UTC)[reply]
Your revert summary said there has not recent discussion, but that is untrue. So why did you revert? Tom B (talk) 11:47, 29 November 2023 (UTC)[reply]
Following from above, we can avoid going to a full community discussion by refactoring the table to include the kilobyte measures as supplementary detail equivalent to the word count measures, while still emphasising the primacy of the word count measures. Onetwothreeip (talk) 19:43, 29 November 2023 (UTC)[reply]
  • WP:CAREFUL, please. There's nothing like consensus for a change in the discussion here. Suggest a RFC. VQuakr (talk) 20:56, 29 November 2023 (UTC)[reply]
    Do we really need an RfC for removal a unit that continues to confuse? I think one or two more people engaged in this discussion should be able to establish a consensus strong enough for everybody to be happy. @VQuakr: would you be willing to weigh in yourself? —Femke 🐦 (talk) 21:05, 29 November 2023 (UTC)[reply]
    Yes, a RFC is absolutely warranted to establish consensus for a change to a guideline like this. Personally I think the kB measurement is fine, though supplementing with words to match the format that the existing page size tool produces might be helpful, but the discussion on this has been sprawling and fragmented. A RFC would help not only to establish consensus but to distill the reasoning in a more terse, readable format that might help me refine my personal position. Or to put another way, we need a WP:SIZE guideline on sizing guideline discussions. :) VQuakr (talk) 21:09, 29 November 2023 (UTC)[reply]
    There's no need for an RfC over something so minor.
    I agree with SandyGeorgia that kB limits mislead some users; I've seen it repeatedly. The total page kB is more prominent and easier to find than the prosesize kB. DFlhb (talk) 21:46, 29 November 2023 (UTC)[reply]
    Mmmkay but there isn't consensus for a change, so correct we don't need an RFC as long as you're good with the status quo. VQuakr (talk) 22:18, 29 November 2023 (UTC)[reply]
    I count 5 editors for, 3 editors who didn't express a clear view but seem to lean 'for', and only you and 123IP opposing. DFlhb (talk) 12:55, 30 November 2023 (UTC)[reply]
    User:VQuakr: Personally I think the kB measurement is fine what do you see as the added value of the kB unit? In terms of consensus, I read 5 people explicitly supporting, one or two who are not explicit but seem to defend arguments in favour of simplification (?), onetwothreeip saying they are sympathetic to the change, and objecting mostly on procedural reasons, and you. —Femke 🐦 (talk) 13:01, 30 November 2023 (UTC)[reply]
    I count 5 editors for... right, this isn't a sufficient level of involvement to change a guideline per WP:CONLEVELS. VQuakr (talk) 17:17, 30 November 2023 (UTC)[reply]
    VQuakr, you're the 1 editor who wants to revert back and include kb limits? it's not worth an RFC because one editor wants to include info everyone else wants to remove? Tom B (talk) 18:17, 30 November 2023 (UTC)[reply]
    No, I am not the "1 editor" who disagrees with this proposal. VQuakr (talk) 18:19, 30 November 2023 (UTC)[reply]
    Who are the other editors who want retain the kb limits pls? Tom B (talk) 18:22, 30 November 2023 (UTC)[reply]
    You started this section, no? Surely you are aware of the participants in the ensuing discussion, which makes me think maybe I'm not understanding your question. But it seems moot regardless, as there hasn't been enough involvement yet to establish any consensus for a change. To recap, you started this section with My understanding is there is strong consensus to remove the kb limits..., even though such a consensus did not and does not exist. Editing policies and guidelines is both hard and methodical, which is why I've suggested a RFC. I'm frankly not understanding why there is resistance to that, as it is a quite routine step in making this sort of change to a guideline. If you, Femke, and others are correct that this is a slam-dunk proposal then it will garner wide support, so a RFC will just help your case by getting a sufficient level of involvement in the approval of the proposal. VQuakr (talk) 18:35, 30 November 2023 (UTC)[reply]
    Let's keep focussed on content. VQuark, what do you see as the added value of having the kB in there?
    We can start an RfC, but I see editor hours as something precious that I do not want to call on unnecessary. This removal of a confusing unit does not change the guideline in any practical sense, so I do not see the need for an extraordinarily level of consensus. —Femke 🐦 (talk) 19:05, 30 November 2023 (UTC)[reply]
    Neither words nor kB are confusing units of measurement. While I agree this doesn't substantially change the spirit of the guideline, it literally does change it practically speaking; we're changing the units of measurement and there seems to be resistance to the idea of retaining both per the discussion above. Agreed there is no need for an extraordinary level of consensus, but there is a need for consensus. We've got a guideline that has referenced kB of readable prose for well over a decade; more than a half-dozen editors' involvement is warranted before changing. VQuakr (talk) 19:10, 30 November 2023 (UTC)[reply]
    In case you missed it, there is a recent example from a smart and experienced user [1] confusing the easily accessed markup size with readable prose size in kb. If experienced users get confused, how do newer users navigate this? The resistence about retaining both is therefore warranted.
    Let's agree to disagree on the level of consensus. Not sure if I want to open an RfC.. —Femke 🐦 (talk) 19:27, 30 November 2023 (UTC)[reply]
    You're conflating readable prose size vs units of measurement. We care about readable prose, not raw page size, whether we're measuring in kB or words. No, I do not find a single example of someone being temporarily confused remotely convincing that there is a problem or that the proposed change is a solution. I'm also not clear on how your final sentence The resistance about retaining both is therefore warranted logically follows from anything you've said prior. VQuakr (talk) 19:38, 30 November 2023 (UTC)[reply]
    Ah, I now understand. I thought you meant "there is resistance against retaining both", which I agreed with but found a confusing statement given your preceding comments. About is ambiguous here; I should have done a bit more thinking before replying. Funny, to have a misunderstanding of this kind when talking about a guideline with a similar ambiguity for the poor reader. —Femke 🐦 (talk) 20:03, 30 November 2023 (UTC)[reply]

I realize this discussion is a couple of months old at this point, but I wanted to push back on the opening statement:

My understanding is there is strong consensus to remove the kb limits...

I strenuously object to any change which results in the appearance of prose size (or equivalent expressions such as readable prose) as the sole column header or sole yardstick of article size. Doing so leads to complete absurdities, such as assessing our #1 longest article in main space with over 35,000 words, as having only "45 prose words" if prose size is the measure. Please see WT:Splitting#The term 'prose size' for details. I've restored word count in the column header pending discussion of this. There is nothing wrong with taking multiple measures into consideration, and that is what should be done here; there is absolutely nothing wrong with considering raw kb count as one of those measures. Mathglot (talk) 02:03, 25 February 2024 (UTC)[reply]

Well, the problem is that kb counts are misleading people into thinking the HTML/code size is important. "kb" doesn't necessarily mean HTML, but a lot of people are interpreting it as such. Jo-Jo Eumerus (talk) 08:31, 25 February 2024 (UTC)[reply]
I'm not sure I fully understand your reasoning, but I support the change, as word count is plain English, and readable prose is not. —Femke 🐦 (talk) 08:40, 25 February 2024 (UTC)[reply]
Agreed; but playing devil's advocate for a moment against my own position, the opposing argument might go: "Perhaps so, but prose words is well-defined and we can generate a replicable, exact count with a tool, whereas word count is poorly defined: what does it even include?" There is some validity to that, but I think the weakness of prose words and the absurd example resulting from its weakness outweighs that argument. In the end, I don't think we have any one statistic that by itself is sufficient to make the call about splitting, and we should recognize that there are multiple factors involved. Imho, ultimately the table should have additional columns, to provide more information for making a good decision. Mathglot (talk) 09:36, 25 February 2024 (UTC)[reply]
IP non sequitur.
You’re moving to fast forward 24.35.154.137 (talk) 23:06, 27 March 2024 (UTC)[reply]

Collapsed per WP:TALK: not improving the article. Your Teahouse comment was equally pointless. Mathglot (talk) 00:09, 28 March 2024 (UTC)[reply]

yes a simplifying change. the risk is that people count the references etc, but hopefully the readable prose definition underneath will mitigate, Tom B (talk) 17:58, 26 February 2024 (UTC)[reply]

Aside on markup size

[edit]

@VQuakr and Tpbradbury: (and others) I think a reasonable solution would be for the table to indicate that an article of 15,000 is on average a certain amount of kilobytes in markup size, and so on for each. This would provide an easy equivalent for editors to use, maintain consistency with guidelines, and promote the primacy of considering length in terms of word count. Onetwothreeip (talk) 09:16, 30 November 2023 (UTC)[reply]

That's not feasible. There is no one-to-one conversion between markup size in kb and readable prose length (in either kb or words). Some articles are strongly cited, with detailed citation information. Others are only partially cited, or have a very short citation style. A strongly cited article can easily be three times the size as a weakly cited article for the same word lenght. —Femke 🐦 (talk) 12:52, 30 November 2023 (UTC)[reply]
I agree that markup size isn't relevant to a discussion about article length, which whether in words or kB is a discussion about readable prose not raw size. VQuakr (talk) 17:17, 30 November 2023 (UTC)[reply]
Femke, that is why I am saying that the conversion should be a rough average, essentially an estimate. We could even include a range, but we don't need to include for outliers. Markup size is an indication of readable size whether we like it or not, and often the most accessible indication. Onetwothreeip (talk) 19:19, 30 November 2023 (UTC)[reply]
There is going to be a correlation on average, but it varies such a large amount that it's doubtful the average will be at all helpful for application to an individual article. (I would also suggest that the two are confused enough already, and that further confusion does not aid discussion of prose or actual issues with markup size such as WP:PEIS.) CMD (talk) 05:16, 1 December 2023 (UTC)[reply]
I agree. An average may be interesting, but it has no proscriptive relevance to any weird "paper mind" (those who don't fully understand and apply NOTPAPER) ideas of an "ideal" article's size. Articles will naturally fall along the X-axis of a bell curve with extremes on each side, and no attempts should be made to shorten long articles in an attempt to make them more "average" in length. Both very short and very long articles have their place. -- Valjean (talk) (PING me) 06:24, 1 December 2023 (UTC)[reply]
This would have no effect on your key point, however for the sake of accuracy, article size by no means follow a bell curve. I'd wager there's a Y-maximum at around X=9kb followed by a slow decline with a long tail, ending at 844kb. Mathglot (talk) 20:54, 30 December 2023 (UTC)[reply]
Some articles are also very markup-heavy for other reasons, such as lots of {{lang}}.  — SMcCandlish ¢ 😼  03:50, 1 December 2023 (UTC)[reply]
Extending SMcCandlish's thought with something I haven't seen mentioned on this page, namely that the raw byte count in in the history is not the number of characters you see when you look at the wikicode in the Preview window (even with visible spaces and newlines); it is strictly the number of bytes. The number of characters may be lower, as the byte count is bytes are not characters, and beyond the simple ASCII character set many characters take two or three bytes to be expressed in UTF-8. This is more likely to be a factor when non-ASCII characters are used in the article, including articles involving foreign scripts, mathematics, symbols, and other non-ASCII characters, which may requires two, or three bytes to express in UTF-8. Mathglot (talk) 11:19, 29 December 2023 (UTC)[reply]

Refs (kb limits)

[edit]

  1. ^ "European Journal of Futures Research". SpringerOpen. May 20, 2013. Retrieved November 26, 2023.
  2. ^ "instructions". academic.oup.com. Retrieved November 26, 2023.
  3. ^ "Manuscript Submission Guidelines: AERA Open: Sage Journals". Sage Journals. January 1, 2023. Retrieved November 26, 2023.
  4. ^ "Early Modern Women: An Interdisciplinary Journal: Instructions for authors". Early Modern Women: An Interdisciplinary Journal. November 17, 2019. Retrieved November 26, 2023.
  5. ^ Development and Change. Wiley. doi:10.1111/(issn)1467-7660. ISSN 0012-155X.
  6. ^ "Submissions". Global Labour Journal. February 3, 2022. Retrieved November 26, 2023.
  7. ^ "BGSU SSCI Journal Publishing Guide" (PDF). Retrieved November 26, 2023.
  8. ^ "Guide for authors". ScienceDirect.com by Elsevier. January 6, 2016. Retrieved November 26, 2023.

How to calculate the number of printed pages for an article

[edit]

How is that done? Is there a way to figure out from the kb how many words and then how many pages? -- Valjean (talk) (PING me) 21:39, 27 August 2024 (UTC)[reply]

Pinging User:Mathglot, as they have edited this page several times. Maybe they can help me.-- Valjean (talk) (PING me) 19:46, 29 August 2024 (UTC)[reply]
Valjean, the easiest method is empirical: i.e., use the print-page feature in your browser, and look at the number of pages in the default format proposed by their print algorithm. In many browsers Ctrl+P will get you that. This page, for example, with default printer settings is 13 pages with default margins in my Chromium based browser.
Are you asking about a formula to predict the number of printed pages? Here are some of the things you need to consider: word count, font size and type, page size and margins, line spacing, paragraph structure including indentation, heading and subheading style, image and table style, footnotes and other appendixes, columnnization, special formatting (pull quotes, collapsible content, etc.), css of the page and the skin you are using and any overrides in your common.css or common.js that would alter the page format. Depending what your use case is, I would just let the browser do it. Another alternative, is any of the words-per-page calculators you will find online.
Another way, is to devise a rule of thumb based on some statistics about the total size of printed Wikipedia, and divide that by the total kb size or total number of words. Per Wikipedia:Size in volumes, if you printed your article in the page style of Encyclopedia Britannica (skipping images) there are about 1,333 words per page (1,333,333 words per volume / 500 sheets per volume * 2 pages per sheet), so to calculate your number of printed EB-style pages, divide the number of words in your article by 1,333. If you want to calculate it based on the raw kb size you see in the history, bear in mind that the kb value is slightly higher than the number of characters, because each printable UTF-8 character can be represented by 1, 2, or 3 bytes. Assumptions at the link are 8.3 bytes per word, amounting to 6 characters per word, so that gives 1,333 words per EB page * 8.3 bytes per word = 11064 bytes per page; so divide the raw kb size of your article by 11,064 to get a rough count of EB-style printed pages. However, EB pages have about two to three times the number of words as you would find on standard printer paper with typical font and margins, so that may bring you back to your online wpp calculator again. Keep in mind that none of these methods consider images. Hope this helps, Mathglot (talk) 20:36, 30 August 2024 (UTC)[reply]
Wow! I pinged the right person. You know a lot about this. Thanks so much. I have enough info now to work with. -- Valjean (talk) (PING me) 02:36, 31 August 2024 (UTC)[reply]

Project page fails to explain how to find the word count

[edit]

This project page fails to prominently inform the reader how to find the word count of an article. Jc3s5h (talk) 17:07, 10 September 2024 (UTC)[reply]

 Fixed see WP:WORDCOUNT or simply.....Preferences → Gadgets → Browsing → Tick Prosesize: add a toolbox link to show the size of and number of words in a page (direct link), and then save.Moxy🍁 17:16, 10 September 2024 (UTC)[reply]
Thanks. Jc3s5h (talk) 20:20, 10 September 2024 (UTC)[reply]