Jump to content

User:Dank/Regex

From Wikipedia, the free encyclopedia

List of US inventoried hardwoods

[edit]
\*@1(.*?)@2(.*?)@3(.*?)@4(.*?)@5(.*?)@6(.*?)@7(.*?)@8(.*?)@9(.*?)(\n)

rowscopes:
!scope="row"

\*(\d+)@(\w+)@(\d+)\,(\d+)\,(\-)?(\d+)(\n)
|-$7|{{cvt|$1|ft}}; $2@@{{cvt|$3|-|$4|in|cm}}@@{{cvt|$5$6|F}}$7

then change @@ to <br />

\*(\w+)\,(\w+)\,(\w+)\,(\w+)(\n)
|-$5|D: $1<br />F: $2<br />L: $3<br />S: $4$5

@4(\d+)@5(\d+)@6[\w \.]+@7(\w+)@8(.*?)@9(.*?)(\n)
{{sfn|$3|$5|1990|pp=$1–$2}}$6

then delete the stuff preceding the previous @4; replace || by |; and do the ones with a hyphen or with 3 or more authors by hand.

* {{Cite book |last1= |first1= |last2= |first2= |pages=– |chapter='''' |editor-last1=Burns |editor-first1=Russell M. |editor-last2=Honkala |editor-first2=Barbara H. |title=Silvics of North America, Volume 2. Hardwoods. |publisher=US Forest Service, Department of Agriculture (US Government Printing Office) |location=Washington, DC |year=1990 |isbn=978-0-16-029260-6 }}

* {{Cite book |last1=$7 |first1=$6 |last2=$9 |first2=$8 |pages=$4–$5 |chapter=''$1'' |editor-last1=Burns |editor-first1=Russell M. |editor-last2=Honkala |editor-first2=Barbara H. |title=Silvics of North America, Volume 2. Hardwoods. |publisher=US Forest Service, Department of Agriculture (US Government Printing Office) |location=Washington, DC |year=1990 |isbn=978-0-16-029260-6 }}$10

Where the most complicated line is "*Arbutus menziesii[https://plants.usda.gov/home/plantProfile?symbol=Arme].Pacific madrone.124.132@Philip M. McDonald@1@ and John C., II Tappeiner", do:
\*(\w+ \w+)\[(.+?)\]\.(.+?)\.(\d+)\.(\d+)@(.+?)(\w+)(@1@ and .+?)?(\w+)?(\n)
*@1$1@2@3$3@4$4@5$5@6$6@7$7@8$8@9$9$10

then delete "@1@ and " throughout.

(\n)\[\[File\:(.+?)\|thumb.+?\n\[\[File\:(.+?)\|thumb.+?\n\[\[File\:(.+?)\|thumb.+?\n
$1*@2$2@3$3@4$4

\*@2(.+?)@3(.+?)@4(.+?)(\n)
|-$4|{{Multiple image |perrow=3 | total_width = 400px | image_style = border:none; | border = infobox$4| image1 =$1$4| alt1 =landscape$4| image2 =$2$4| alt2 =bark$4| image3 =$3$4| alt3 =foliage}}$4

List of US forest-inventory conifers

[edit]
Start with a bulleted list of species in this format: (5 means the USDA symbol is ACNI5; 46 is the page number in the 1991 inventory)
*Acer nigrum5.black maple.46

Create links to the USDA Plants Database (but remember this place in the article history; you'll need the version without the links, too):
This will add the 4-letter codes:
\*(\w\w)([a-z]+) (\w\w)([a-z]+)(\.)(.+?)(\n)
*$1$2 $3$4.$1$3$5$6$7
then, for (\.), substitute (\d\.), then (\d\d\.)
And this creates the urls:
\*(\w+ \w+)\.(\w+)\.
*$1[https://plants.usda.gov/home/plantProfile?symbol=$2].

Create a data table in this format: 
https://en.wikipedia.org/w/index.php?title=User:Dank/Sandbox/8&oldid=1224700887#temp4
(except: the "uses" column is a string of y and @, not y and n). "Uses" mirrors these categories from "Suitability/Use": Christmas Tree, Lumber, Naval Store, Nursery Stock, Post, Pulpwood, Veneer.

Create the real table:
(\n)\|\-\n\|(\w+?)\n\|([y@]+?)\n\|(\w+?)\n\|(\w+?)\n\|(\w+?)\n\|(\w+?)\n\|(\w+?)\n\|(.+?)\n\|(.+?)\n\|(\w+?)\n\|(\w+?)\n\|(\w+?)\n\|(.+?)\n\|(\w+? \w+?) (.+?) (\d+) (\d+) ([`a-zA-Z]+)
$1|-$1!scope="row" |''[[$15]]'' ()$1|Uses: $3@1$2 ''$15'']: Characteristics}}{{sfn|$19|1991|pp=$17–$18}}$1|No$1----$1$16$1----$1{{cvt|$4|ft}}; $5@1$2 ''$15'']: Characteristics}}$1|pH $9–$10$1{{cvt|$11|-|$12|in|cm}}<br/>$1{{cvt|$14|F}}@1$2 ''$15'']: Characteristics}}$1|D: $7<br/>F: $8<br/>L: $6<br/>S: $13<br/>@1$2 ''$15'']: Characteristics}}$1|

@1
{{sfn|National Plant Data Team|2023|loc=[https://plants.usda.gov/home/plantProfile?symbol=

S: intolerant
S:<br/>intolerant

[in case I forget]
intermediate{
medium{

Add the common names.

Rearrange the y-@ string to the proper order for: construction, landscaping, posts, pulpwood, terpenes, veneers, winter holiday decorations.
([y@])([y@])([y@])([y@])([y@])([y@])([y@])
$2$4$5$6$3$7$1

Or:
(marker)(.)(.)(.)(.)(.)(.)(.)
(marker)$2$4$5$6$3$7$1

(?<=\|Uses\: ......)(y)
, winter holiday decorations
(?<=\|Uses\: .....)(y)
, veneers
(?<=\|Uses\: ....)(y)
, terpenes
(?<=\|Uses\: ...)(y)
, pulpwood
(?<=\|Uses\: ..)(y)
, posts
(?<=\|Uses\: .)(y)
, landscaping
(?<=\|Uses\: )(y)
construction

Remove any leftover @

|Uses:_,_ -> |Uses:_

` -> |

https://en.wikipedia.org/w/index.php?title=User:Dank/Sandbox/8&oldid=1224733838#temp0
Alphabetize by last name, then add refs for single authors to reference section, from a table in that format:
\*(\w+ \w+) (\d+) (\d+) (\w+)\, (.+?)(\n)
*{{cite book |last1=$4 |first1=$5 |pages=$2–$3 |chapter=''$1'' | editor-last1=Burns | editor-first1=Russell M. | editor-last2=Honkala | editor-first2=Barbara H. | title=Silvics of North America, Volume 1. Conifers. | publisher=US Forest Service, Department of Agriculture (US Government Printing Office) | location=Washington, DC | year=1991 | isbn=978-0160292606 }}$6

Fill in the second column and add images. If desired, this can be added manually to the last column:
{{Multiple image |perrow=2 | total_width = 360px | image_style = border:none; | border = infobox
| image1 =
| alt1 =landscape
| image2 =
| alt2 =landscape
| image3 =
| alt3 =bark
| image4 =
| alt4 =cone and foliage
}}

List of Canadian forest-inventory conifers

[edit]
Remove uppercase codes at end of each line
[A-Z ]+(\n)
$1

Do lines where the common name isn't two words by hand; add "/" at the end

For the remaining lines, remove all but first two and last two words of each line
(\w+ \w+ )(.+?)(\w+ \w+)(\n)
$1$3$4

Remove each /. Add * at the beginning of each line

Add links and italics:
\*(.+?) (\w+) (\w+)(\n)
*''[[$2 $3]]'',[https://commons.wikimedia.org/wiki/$2_$3] $1$4

Check POWO for synonyms and Commons for sufficient images. Check on maps.

Add:
==Key==
:Provinces: AB [[Alberta]], BC [[British Columbia]], MB [[Manitoba]], NB [[New Brunswick]], NL [[Newfoundland and Labrador]], NS [[Nova Scotia]], NT [[Northwest Territories]], NU [[Nunavut]], ON [[Ontario]], PE [[Prince Edward Island]], QC [[Quebec]], SK [[Saskatchewan]], YT [[Yukon]]

==Species== 
{|class="sortable wikitable plainrowheaders"
|+{{sronly|Species}}
! scope="col" width="1%" |Species (or genus) and a [[common name]]{{sfn|CNFI|loc=Tree Species List}}{{sfn|POWO}}{{efn-la|The taxonomy (classification) comes from POWO.}}
! scope="col" class=unsortable width="15%" |Distribution in Canada{{sfn|Burns|Honkala|1991}}
! scope="col" class=unsortable width="30%" style="min-width:120px;" |Description and uses
! scope="col" class=unsortable width="10%" |Co-named North American [[forest#Types|forest types]]{{sfn|Burns|Honkala|1991}}
! scope="col" width="1%" |[[Family (biology)|Fam­ily]]{{sfn|POWO}}
! scope="col" class=unsortable width="1%" |Images
|-
|}

Create the table
\*''\[\[(\w+ \w+)\]\]''\,\[(.+?)\] (\w+ \w+)( \w+)?( \w+)?(\n)
|-$6!scope="row" |''[[$1]]'' ($3$4$5)$6|[[|thumb|100px|center|BC |alt=Species distribution in Canada]]$6|$6|$6|$6|{{Multiple image | width = 120px | image_style = border:none; | border = infobox$6| footer =$6| image1 =$6| alt1 =$6| image2 =$6| alt2=$6| image3 =$6| alt3=$6}}$6

At some point, fill in the "family" column, and (if necessary) add explanations to the Key.

Add distribution maps

If necessary, add {{CSS image crop}}:
\)(\n)\|
)$1|{{CSS image crop$1|Image = $1|bSize = 120$1|cWidth = $1|cHeight = $1|oTop = $1|oLeft = $1|Location = center$1|Description = $1|Alt = Species distribution in Canada$1}}$1|

Do images; get heights approximately even by cropping. Do alt text. Add license info to the talk page.

Create list of parameters for {{cite book}}.
Do regex on chapter pages and authors from Silvics in the form "@456-462 Silas Little and Peter W. Garrett":
\|@(\d+)\-(\d+) (.+?) (\w+)(\n)
|first1=$3 |last1=$4 |pages=$1–$2$5
\|@(\d+)\-(\d+) (.+?) (\w+) and (.+?) (\w+)(\n)
|first1=$3 |last1=$4 |first2=$5 |last2=$6 |pages=$1–$2$7
\|@(\d+)\-(\d+) (.+?) (\w+)\, (.+?) (\w+)\, and (.+?) (\w+)(\n)
|first1=$3 |last1=$4 |first2=$5 |last1=$6 |first3=$7 |last3=$8 |pages=$1–$2$9

Append chapter names:
"row" \|''\[\[(\w+ \w+)\]\]''(.+?)(\n)\|(.+?)\n\|(.+?)\n\|
"row" |[[$1]]$2$3|$4$3|$5 |chapter=''$1''$3|

Create a blank "References" section. Create a bulleted separate entry in References for each list of parameters, except: when the author(s) is/are the same, combine into one ref.

Convert this bulleted list into properly formatted {{cite book}} citations, swapping the "first1" and "last1" on each line:
''(\n)
| editor-last1=Burns | editor-first1=Russell M. | editor-last2=Honkala | editor-first2=Barbara H. | title=Silvics of North America, Volume 1. Conifers. | url=https://www.fs.usda.gov/research/treesearch/1547 | publisher=United States Government Printing Office (Department of Agriculture, Forest Service) | location=Washington, DC | year=1991 | isbn=978-0160292606 }}$1
\*\|first1=(.+?)\|last1=(.+?)\|
*{{cite book |last1=$2|first1=$1|

Alphabetize the list

Convert each list of parameters into an appropriate {{sfn}} citation:
(\n)\|first1=(.+?)\|last1=(.+?) \|first2=(.+?)\|last2=(.+?) \|first3=(.+?)\|last3=(.+?) \|pages=(\d+.\d+)
$1|{{sfn|$3|$5|$7|1991|pp=$8}}
(\n)\|first1=(.+?)\|last1=(.+?) \|first2=(.+?)\|last2=(.+?) \|pages=(\d+.\d+)
$1|{{sfn|$3|$5|1991|pp=$6}}
(\n)\|first1=(.+?)\|last1=(.+?) \|pages=(\d+.\d+)
$1|{{sfn|$3|1991|pp=$4}}

Remove "chapter=..." from these lines:
 \|chapter(.+?)(\n)
$1

Add end-sections

...

If I'm using the PLANTS database, this will add the species name to each sfn:
"row" \|''\[\[(\w+ \w+)\]\](.+?)(\n)\|(.+?)\n\|(.+?)\|2023\|loc=
"row" |''[[$1]]$2$3|$4$3|$5|2023|loc=''$1'':

...

convert e.g. "*https:...symbol=PIST, loc=Fact Sheet,|first2=John |last2=Dickerson"
^\*(.+?)\, loc=(.+?)\,(.+?)(\n)
*$3 |url=$1$4

If necessary, to add Burns citations to the blank column:
\}\{\{(.+?)\|1991\|pp=(\d+.\d+)\}\}(\n)\|\n\|(\w+) family
}{{$1|1991|pp=$2}}$3|{{$1|1991|pp=$2}}$3|$4 family

[add ---- where needed]

Adding a cite after ----:
(\n)\|\{\{sfn\|National(.+?)cs\}\}\{\{(.+?)\}\}\n\|\{\{(.+?)\}\}\n\-\-\-\-\n
$1|{{sfn|National$2cs}}{{$3}}$1|{{$4}}$1----$1{{sfn|National$2cs}}



Plant family tables

[edit]

Adding hair space before cites in 3rd col:
!scope="row"(.+?)(\n)\|(.+?)\n\|(.+?)\{\{
!scope="row"$1$2|$3$2|$4&hair;{{

Condensing Template:Multiple image
image\n\| width = 120px\n\| image_style = border\:none\;\n\| border
image | width = 120px | image_style = border:none; | border

Moving Christenhusz cites to the first column:
 family\)(\n)\| \[\[(.+?)\{\{sfn\|Chr(.+?)\}\}\n
 family){{sfn|Chr$3}}$1| [[$2$1

Fix VE soft-hyphen bug:
#xAD
shy

Find and mark Lamiales, etc. lines in [[List of plant family names with etymologies]]:
|| [[Lamiales]]
|| [[Lamiales]]zz

Remove anchors:
\{\{anchor\|\w+\}\}

Remove data-sort:
(\n)\|data\-sort(.+?)\|
$1|

Remove non-marked lines:
!scope="row" \|(.+?)\n\|(.+?)\n\|(.+?)\]\]\n\|(.+?)\n\|\-\n
(then remove the zz markers)

Consider whether "Gl" is needed as a source

Automated copyediting of the etym column:
Note the space:
\|\| P(\n)\|
||$1|, for 

\|\| \w(\n)\|(.+?)plant name \|\|
||$1|, from a $2plant name ||

\|\| G(\n)\|(.+?) \|\|
||$1|, from Greek for "$2" ||

Then finish copyediting by hand, and blank the LG column

change two widths to 15%; add soft hyphens

fix Chr cites:
{{sfn|Christenhusz|p
{{sfn|Christenhusz|Fay|Chase|2017|p

move final Chr cites to the Orders column:
ales\]\](\n)\| (\w\w)?\{\{sfn\|Chr(.+?)\n\|\-
ales]]{{sfn|Chr$3$1| $2$1|-

remove final spaces

put total Chr page range in column header for "Family"

add vernacular names; check spacing after running this:
"row" \|\[\[(.+?)\]\] (.+?)(\n)
"row" |[[$1]] ($2 family)$3

add end-sections and etym. sfns, using for instance:
(\n)\| CS(\d+) (\d+)\n\|\-
$1| {{sfn|Stearn|2002|p=$3}}{{sfn|Coombes|2012|p=$2}}$1|-

or:
(\n)\| Bu(.+?)\n\|\-
$1| {{sfn|Burkhardt|2018|p=$2}}$1|-

combine 3 columns into the etym column, first for the Chr-only etym rows:
family\)(\n)\|''\[\[(.+?)\]\]'' \|\|\n\|(.+?) ?\|\|(.+?)\{\{sfn\|Chr(.+?)\}\}\n\|\n\|\-
family)$1|''[[$2]]''$3{{sfn|Chr$5}}$1|$1|$1|$4{{sfn|Chr$5}}$1|$1|-

and then for the others:
family\)(\n)\|''\[\[(.+?)\]\]'' \|\|\n\|(.+?) ?\|\|(.+?)\{\{sfn\|Chr(.+?)\}\}\n\| \{(.+?)\n\|\-
family)$1|''[[$2]]''$3{$6$1|$1|$1|$4{{sfn|Chr$5}}$1|$1|-

add "synonym" language.  Do POWO cites for synonyms.

add table headers (with proper page range for Chr).

Add total genera and single-letter code for POWO database (if any).

if total genera=1, add "genus" and move the single-letter code right one column:
(\n)\|1([a-z])\n\|
$1|1 genus, $1|$2

otherwise, add "genera" and move the single-letter code right one column:
(\n)\|(\d+)([a-z])\n\|
$1|$2 genera, $1|$3

Do the leftovers:
(\n)\|1\n
$1|1 genus, $1

(\n)\|(\d+)\n
$1|$2 genera, $1

Do the cites for the POWO databases:
(\n)\|e\n
$1|{{sfn|POWO|loc=Flora of Tropical East Africa}}$1
(\n)\|n\n
$1|{{sfn|POWO|loc=Neotropikey}}$1
(\n)\|s\n
$1|{{sfn|POWO|loc=Flora of Somalia}}$1
(\n)\|t\n
$1|{{sfn|POWO|loc=Trees of New Guinea}}$1
(\n)\|w\n
$1|{{sfn|POWO|loc=Flora of West Tropical Africa}}$1
(\n)\|z\n
$1|{{sfn|POWO|loc=Flora of Zambesiaca}}$1

Add ipni cite to second col (after "# genera" has been added in 3rd col):
"row" \|\[\[(\w+)(.+?)(\n)\|(.+?)\n\|(\d+) genera
"row" |[[$1$2$3|$4{{sfn|IPNI|loc=[ $1, Type]}}$3|$5 genera

Adding POWO cite to genera:
"row" \|\[\[(\w+)(.+?)(\n)\|(.+?)\n\|(.+?)\,
"row" |[[$1$2$3|$4$3|$5,{{sfn|POWO|loc=$1}}

Copy Chr cites into the Description & Uses column, after the POWO cites (but check the first few):
(\n)\|(.+?)ales\]\]\{\{sfn\|Chr(.+?)\}\}\n\|\n\|\-
{{sfn|Chr$3}}$1|$2ales]]{{sfn|Chr$3}}$1|$1|-

Do IPNI and USDA cites

Use FGVP (whatever is available) for the distribution column, then copy Chr cites to the remaining rows: [This works when there's something after the first citation in that column in each row]
ceae\}\}(\n)\|(.+?)\n\|(.+?)ales\]\]\{\{sfn\|Chr(.+?)\}\}\n\|\n
ceae}} in{{sfn|Chr$4}}$1|$2$1|$3ales]]{{sfn|Chr$4}}$1|$1

Do either this or the next seven:
At some point, add image code (and add alt text later):
\|(\n)\|\-\n
|{{Multiple image |width=120px |image_style=border:none; |border=infobox$1| footer = ''[[]]''$1| image1 =$1| image2 =  }}$1|-$1

(For John)
=(\w+)ceae\}\}(.+?)(\n)\|(.+?)\n\|(.+?)\n\|\n\|\-
=$1ceae}}$2$3|$4$3|$5$3|''[[c:Category:$1ceae]]''$3|-

(if necessary)
\|(\w+ \w+\W?)(\n)\|\-
|''[[c:Category:$1]]''$2|-

adding alt parameters:
(\n)\|(.+?)\n\}\}\n\|\-
 | alt1="flowers"$1|$2 | alt2="foliage"$1}}$1|-

Removing "thumb" etc. from John's raw image lists:
(\.jpg)\|(.+?)\]\](\n)
$1]]$3

Adding * and colon:
[[F
*[[:F

convert raw list of images to table:
(\n)(.+?)\n\*\[\[\:File\:(.+?)\]\]\n\*\[\[\:File\:(.+?)\]\]\n
|{{Multiple image |width=120px |image_style=border:none; |border=infobox$1| footer = ''[[$2]]''$1| image1 = $3 | alt1= "flowers"$1| image2 = $4 | alt2="foliage"}}$1|-$1

(\n)(.+?)\n\*\[\[\:File\:(.+?)\]\]\n
|{{Multiple image |width=120px |image_style=border:none; |border=infobox$1| footer = ''[[$2]]''$1| image1 = $3 | alt1= "flowers"}}$1|-$1

removing (...) in last col:
 \((.+?)\)(\n)\|\-
$2|-

This might be needed after copying an image column:
\}\}(\n)\n\|\-
}}$1|-

Replacing @ for cites (example):
@(\d+)\-(\d+)
{{sfn|Kubitzki et al 1993|pp=$1–$2}}

@(\d+)
{{sfn|Kubitzki et al 1993|p=$1}}

double-hyphen to dash
(\d+)\-\-(\d+)
$1–$2

Moving POWO cites to the end of the cell:
(1 genus|\d genera)\,\{\{sfn\|POWO\|loc=(\w+)\}\}(.+?)(\n)
$1,$3{{sfn|POWO|loc=$2}}$4

Remove Chr cites from Orders, or add them to the first column, as needed

TFA

[edit]
(\W\W)\|\| (\w+)\|\| (\d{4})
$1|| [[User:$2|$2]] || $3