Board logo

标题: 【练习-009】批处理实现大数值排序 [打印本页]

作者: pusofalse    时间: 2008-8-4 08:26     标题: 【练习-009】批处理实现大数值排序

a.txt中有20行随机产生的数列,如下:
  1. 2928326128601232462131283250710027308938740594716691200992050511576
  2. 5352129649530193383124730478244772348721985707222557212265817305
  3. 217141333532296179938475175265792931789219830308392472584606305
  4. 2371620291160322081050531817416284225477019123161801285941026814244
  5. 283929972304551060318886921731765136928849135391662294051194618754
  6. 1809165929787147057932949630411324311737224509104016550662932273
  7. 27396236084901303873154718299242931819623155304661177528921164510335
  8. 254221462410491137971033914630292752245114969186002809930190939425
  9. 1085287492160525651862932475207612387312368408826675135332406418337
  10. 2567810118246621010283281198810903279355871571118961177731143829148
  11. 23727111515524141721964179351992331180134926914198081871053303186
  12. 378579502856625703213542353218420835730692264021219729654278515442
  13. 30215186011014395001656818458819061824708536511543271701327524725
  14. 223702764213159156022932717903282522044350522584222768193271431422354
  15. 3079720530119542370417125702274761144023302102641160114921224469221
  16. 15642298214000242538839193816839550322381321993212316517861828002
  17. 13042178002978222022331319116624809338275899045263351248023569
  18. 11252165681825711849278422768716060438517976169102391532289954712000
  19. 105084292396529699311371735329685626410510259482788519645152723476
  20. 29674179062831103792824121564178225289202161443911094228581583531951
复制代码
要求通过纯批处理按照数列的大小顺序,正序输出如下:
  1. 13042178002978222022331319116624809338275899045263351248023569
  2. 217141333532296179938475175265792931789219830308392472584606305
  3. 1809165929787147057932949630411324311737224509104016550662932273
  4. 5352129649530193383124730478244772348721985707222557212265817305
  5. 15642298214000242538839193816839550322381321993212316517861828002
  6. 23727111515524141721964179351992331180134926914198081871053303186
  7. 30215186011014395001656818458819061824708536511543271701327524725
  8. 105084292396529699311371735329685626410510259482788519645152723476
  9. 254221462410491137971033914630292752245114969186002809930190939425
  10. 283929972304551060318886921731765136928849135391662294051194618754
  11. 378579502856625703213542353218420835730692264021219729654278515442
  12. 1085287492160525651862932475207612387312368408826675135332406418337
  13. 2371620291160322081050531817416284225477019123161801285941026814244
  14. 2567810118246621010283281198810903279355871571118961177731143829148
  15. 2928326128601232462131283250710027308938740594716691200992050511576
  16. 3079720530119542370417125702274761144023302102641160114921224469221
  17. 11252165681825711849278422768716060438517976169102391532289954712000
  18. 27396236084901303873154718299242931819623155304661177528921164510335
  19. 29674179062831103792824121564178225289202161443911094228581583531951
  20. 223702764213159156022932717903282522044350522584222768193271431422354
复制代码
每行的数值都远远超过了cmd所能计算的最大数值。
要求,正确输出,不生成临时文件,代码高效通用。完成题目,视思路加分。
----------------------------------------------
至此,仍没有两全其美的代码,简洁高效但不通用,请参照2楼第一个代码与3楼的代码。
一定程度上的通用,请参照2楼第二个代码与6楼代码。
作者: batman    时间: 2008-8-4 14:03

先说明下思路:
  此题确实是出给我们的一大难题,为什么这要说呢?楼主要求的是文本内所有超大数值的排序,
所有的数值都远远超过了cmd所能运算的最大数值,用常规的比较法都是行不通的这是难点之一;
第二这些数值是随机生成的且字符总数不定,甚至可能达到数行和数十行之长,如采用逐字符判断
的方法来确实行最大字符数,效率将会是此类方案所无法逾越的障碍;第三楼主要求不生成临时文
件,这对用findstr /o来获取行最大字符的方案来讲,无疑是锁上了大门。
  综上所述总结如下:
  代码要通用就要获取行最大字符数,一种方法是逐字符法,用逐字符法效率上就存在很大问题;
第二种方法是使用findstr /o一次性获取每行字符偏移量,再通过处理获得行最大字符数,效率上比
逐字符法是高多了,但因为单用findstr /o是不能获得文本行未行字符偏移量的,必须要对未行强加
回车,在不破坏原文件的情况下就要用到临时文件。
  而我们写代码时一般遵循四条原则:高效率、通用、简洁、尽量不生成临时文件,其中首要的
一条就是高效,其次是通用,至于简洁和有无临时文件都不是主要考虑因素,所以依此主次关系我
给出以下两种方案:
一、通用性差一点(数值字符都在一行内是绝对没问题的),效率高,代码简洁,无临时文件:
  1. @echo off&setlocal enabledelayedexpansion
  2. for /l %%i in (1,1,80) do set "kong=!kong!#"
  3. for /f %%i in (1.txt) do (
  4.      set "str=%%i%kong%"
  5.      set "a=!str:~,80!"
  6.      set "a=!a:%%i=!"
  7.      set "_!a!%%i=a"
  8. )
  9. for /f "delims==_" %%i in ('set _') do (
  10.      set "str=%%i"
  11.      echo !str:#=!
  12. )
  13. pause>nul
复制代码
当然也能通过修改80的值为更高来提高其通用性。
二、通用性极高,效率一般,生成临时文件,代码较复杂:
  1. @echo off&setlocal enabledelayedexpansion
  2. set "max=0"&set "a=0"
  3. for /f %%i in (1.txt) do echo %%i>>2.txt
  4. echo.>>2.txt
  5. for /f "tokens=1,2* delims=:" %%i in ('findstr /n /o .* 2.txt') do (
  6.     set /a n+=1,m=n-1
  7.     set "num=%%i"&set "_!n!=%%j"&set "#%%i=%%k"
  8.     if !m! gtr 0 set /a a=_!n!-_!m!-2
  9.     if !max! lss !a! set "max=!a!"
  10. )
  11. set /a num-=1
  12. for /l %%i in (1,1,%max%) do set "kong=!kong!#"
  13. for /l %%i in (1,1,%num%) do (
  14.     set "str=!#%%i!%kong%"
  15.     set "a=!str:~,%max%!"
  16.     call,set "a=%%a:!#%%i!=%%"
  17.     set ".!a!!#%%i!=a"
  18. )
  19. for /f "delims==." %%i in ('set .') do (
  20.     set "str=%%i"
  21.     echo !str:#=!
  22. )
  23. del /q 2.txt&pause>nul
复制代码

[ 本帖最后由 batman 于 2008-8-5 21:14 编辑 ]
作者: terse    时间: 2008-8-4 16:39

  1. @echo off&setlocal enabledelayedexpansion
  2. for /l %%i in (1,1,100) do set "var=0!var!"
  3. for /f %%i in (1.txt) do (
  4. set str=!var!%%i
  5. set .!str:~-100! !random!=a
  6. )
  7. for /f "delims=.= " %%i in ('set .') do for /f "tokens=* delims=0" %%i in ("%%i") do echo %%i
  8. pause>nul
复制代码

作者: pusofalse    时间: 2008-8-4 21:52

这题在调试的时候才知道果真好麻烦啊~
感叹楼上的,高人~
作者: pusofalse    时间: 2008-8-4 22:07

batman兄的第二个代码有误哦,测试文本。
  1. 29324200852651210028213071109630551685419682237192661910031596813525985
  2. 139192824221705323683853099069511460582426579937521145284152777127372
  3. 3228832607224652155316459166842936030363170611320231631619428405179384047
  4. 812170830663138941085534183172610653136294485195711016316241388511507
  5. 461336171716126377142221699522246153038595327243922576267943609636
  6. 1638925082316833036661013264793101030929370185741649225072170182874123709
  7. 19104273682033120216783266533081314891292107581812412876152371871916174
  8. 28765483830845209484709271702280288292290214606465320169291811087021312
  9. 309619124727589342731614454314507972257492438810339484727971340026983
  10. 12631365789691379909125415226544733128052344013430802923375228394920
  11. 1209434542725110215429171429642266631332923718109092547688512906917377
  12. 9235325441420415502171072042026910159313029658501976719241297361608917193
  13. 104303563083330908200213037916087115501115715163303019513237753157722853
  14. 13541756476689943590415015749186092594311926102972518323503012611090
  15. 32555129948775117222104620611451024122155601418467473071538131259929122
  16. 2548313857163852897313874201491508320410217331915310293928115978926830836
  17. 1728916573194511826201557603196381113155825309211851122813285308953794
  18. 279911732611489993427112229571538210465273573238478681351713760130927475
  19. 203542497728828465536044342329627237153851428124837111485541856124091
  20. 11167192753125043631539774133216256184840111392039610101233269561536
  21. 4903155797025161392428517782198905721614416681144701698415315575419818
  22. 1622389334882239291124631810189733645149042278127844293232621112717646
  23. 20215127882431066880816236289679811281382028731393254147717121765097
  24. 16645326181395012332177468835293330012549029775211691470493332222112797
  25. 933428889205984801230861004178721772312389185772899226893711797343359
  26. 16120191173270725617184072248627813152502180710713269551966628181211112497
  27. 1632311921601282512366484925858327402625626369309771407222363122614443
  28. 293592703625624960888281127241740451826035343113994824114651110092
  29. 893354022094424096154953052425998786321972607611409284852914205609188
  30. 1420612643180582349118041199391970330622603629175101501699075131221324749
复制代码

[ 本帖最后由 pusofalse 于 2008-8-4 22:09 编辑 ]
作者: pusofalse    时间: 2008-8-4 23:12

没保存,害自己写了两遍。贴上来,免得再丢失。
  1. @echo off&setlocal enabledelayedexpansion
  2. set m=0
  3. for /f "tokens=1,* delims=:" %%a in ('findstr/o .* 1.txt') do (
  4.     set/a n+=1,l=n-1,y+=1
  5.     set ..!n!=%%a
  6.     set ##!y!=%%b
  7.     if !n! geq 2 (
  8.            call,set/a s=%%..!n!%%-%%..!l!%%-2,line+=1
  9.            call,set "_!s!=%%_!s!%%%%##!line!%% "
  10.            if !s! geq !m! set m=!s!
  11.          )
  12. )
  13. for /f "skip=1 delims=:" %%a in ('^(echo !##%y%!^&echo.^)^|findstr/o .*') do set/a final=%%a-3
  14. call,set "_%final%=%%_!final!%% !##%y%!"
  15. if %final% geq !m! set m=%final%
  16. for /l %%a in (1 1 %m%) do (
  17.       if defined _%%a (
  18.                for %%i in (!_%%a!) do set -%%i=faith
  19.   for /f "delims=-=" %%s in ('set -') do (
  20.   echo %%s
  21.   set "-%%s="
  22.   )
  23. )
  24. )
  25. pause>nul
复制代码

作者: youxi01    时间: 2008-8-4 23:20

个人较为认同3F的方案
这个问题很早之前在dos联盟也有讨论
当时和随风讨论也是认定补位的方法准确而高效
作者: batman    时间: 2008-8-5 20:24

原帖由 pusofalse 于 2008-8-4 21:52 发表
这题在调试的时候才知道果真好麻烦啊~
感叹楼上的,高人~

谢谢兄弟指出,我的第二个代码已修正,其实3楼的代码并不通用,如下测试结果:
测试文本:
1.txt
  1. 29324200852651210028213071109630551685419682237192661910031596813525985
  2. 13919282422170532368385309906951146058242657993752114528415277712737232
  3. 28832607224652155316459166842936030363170611320231631619428405179384047
  4. 81217083066313894108553418317261065313629448519571101631624138851150746
  5. 13361717161263771422216995222461530385953272439225762679436096361638
  6. 99999999999999999999999999999999999999999999999999999999999999999999999
  7. 99999999999999999999999999999999999999999999999999999999999999999999999
  8. 99999999999999999999999999999999999999999999999999999999999999999999999
  9. 999999999999999999999999999999999999999999999999999999999
复制代码

运行3楼代码结果如下:
  1. 29448519571101631624138851150746133617171612637714222169952224615303859532724392
  2. 25762679436096361638
  3. 99999999999999999999999999999999999999999999999999999999999999999999999999999999
  4. 99999999999999999999
复制代码

而运行我第二楼第二个通用代码,结果如下:
  1. 99999999999999999999999999999999999999999999999999999999999999999999999999999999
  2. 99999999999999999999999999999999999999999999999999999999999999999999999999999999
  3. 99999999999999999999999999999999999999999999999999999999999999999999999999999999
  4. 99999999999999999999999999999
  5. 29324200852651210028213071109630551685419682237192661910031596813525985139192824
  6. 22170532368385309906951146058242657993752114528415277712737232288326072246521553
  7. 16459166842936030363170611320231631619428405179384047812170830663138941085534183
  8. 17261065313629448519571101631624138851150746133617171612637714222169952224615303
  9. 85953272439225762679436096361638
复制代码

ps:3楼的代码和我一楼的代码是一个意思,就是在前面补0,只不过我的是补足80位(针对数值字符在一
行的情况),而3楼的是补足100位(字符超过100位结果不正确了),而且好像我的代码效率还要稍高一
点。
作者: batman    时间: 2008-8-5 20:38

原帖由 pusofalse 于 2008-8-4 23:12 发表
没保存,害自己写了两遍。贴上来,免得再丢失。

@echo off&setlocal enabledelayedexpansion
set m=0
for /f "tokens=1,* delims=:" %%a in ('findstr/o .* 1.txt') do (
    set/a n+=1,l=n-1,y+=1
    set ...

使用管道又麻烦又影响效率,加个临时文件不是更好?
作者: terse    时间: 2008-8-6 13:22

原帖由 batman 于 2008-8-5 20:24 发表

谢谢兄弟指出,我的第二个代码已修正,其实3楼的代码并不通用,如下测试结果:
测试文本:
1.txt

29324200852651210028213071109630551685419682237192661910031596813525985
13919282422170532368385309906 ...

是的  我在3楼的代码正如兄指出的通用性不是很好 于是改为多步算法 用100多KB的文件试下  效率似乎高多了 一样没临时文件
不知什么原因用兄 二楼的通用代码试那100多KB的文件在我这里出错了
  1. @echo off&setlocal enabledelayedexpansion
  2. for /f "skip=1 tokens=1* delims=:" %%i in ('findstr /o ".*" 2.txt') do (
  3.     set/a m+=2,n=%%i-m-t,t=%%i-m
  4.      set str=%%j
  5.      if !n! gtr !d! set/a d=n
  6. )
  7. for /f "skip=1 delims=:" %%i in ('^(echo %str%^&echo.^)^|findstr /o ".*"') do set/a m=%%i-3
  8.     if %m% gtr %d% set/a d=m
  9.     for /l %%i in (1,1,%d%) do set "var=$!var!"
  10.     for /f "delims=" %%i in (2.txt) do (
  11.     set str=!var!%%i
  12.     set .!str:~-%d%! $!random! !random! !random!=a
  13. )
  14. for /f "delims=.=$" %%i in ('set .') do echo %%i
  15. pause
复制代码
漏了最后行的$ 补上
唉!还是发现我的计算还有问题  继续修正
重复行和空格问题 只能处理一项 等有完善方案
我想这样处理重复行和空格问题 也好  就一个分割符的问题了  文本中有分隔符 就处理不了

[ 本帖最后由 terse 于 2008-8-7 01:44 编辑 ]
作者: huahua0919    时间: 2008-8-6 18:14

多次测试没发现Set排序会出错
  1. @echo off&setlocal enabledelayedexpansion
  2. for /f %%i in (a.txt) do (
  3. set .0000000000000000000000000000000000000000000%%i=a
  4. )
  5. for /f "tokens=1 delims=.=" %%i in ('set .') do (
  6. set m=%%i
  7. set m=!m:~-90!
  8. set _!m!=b
  9. )
  10. echo 由小到大排序:
  11. for /f "tokens=1* delims=0" %%i in ('set _') do (
  12.     set /a n+=1
  13.     for /f "delims==" %%a in ("%%j") do (
  14.         set _!n!=%%a
  15.         echo %%a)
  16. )
  17. echo 由大到小排序:
  18. for /l %%i in (%n% -1 1) do (
  19. call echo %%_%%i%%
  20. )
  21. pause
复制代码

作者: terse    时间: 2008-8-6 18:33

原帖由 huahua0919 于 2008-8-6 18:14 发表
多次测试没发现Set排序会出错@echo off&setlocal enabledelayedexpansion
for /f %%i in (a.txt) do (
set .0000000000000000000000000000000000000000000%%i=a
)
for /f "tokens=1 delims=.=" %%i in ('set .') ...

看你取后90位 以及补上的43个0  你测试的文件行字符如大于90 ;文件行最大字符和最小字符差大于43 理论上有差错
还有 从大排到小  可以在FOR里试    set _^|sort/r  这样可以少个FOR吧
作者: huahua0919    时间: 2008-8-6 18:38

我在网吧,刚好网吧系统里没sort这个命令,所以就没用,
至于上面的问题是根据先有情况测试的,也可以计算出最大和最小之差,然后再添加0和截取的数字应该没问题了,但不知上面所说超过100之类的情况是为什么?
作者: huahua0919    时间: 2008-8-6 18:43

其实用if可以判断的只要把文本全部现成如下形式:
  1. 0002.371620.291160.322081.050531.817416.28422.54770.19123.16180.1285.94102.6814.244
复制代码
截取的每个字符长度一样就可以用if进行比较大小,可以不用set




欢迎光临 批处理之家 (http://bathome.net./) Powered by Discuz! 7.2