Board logo

标题: [技术讨论] PowerShell下载 双色球 分布图数据【直接写入EXCEL】 [打印本页]

作者: xczxczxcz    时间: 2019-1-9 20:19     标题: PowerShell下载 双色球 分布图数据【直接写入EXCEL】

本帖最后由 xczxczxcz 于 2019-1-10 13:08 编辑

纯为下载数据。非为购彩而写。天朝彩票都是坑。世界所有程序猿一起写,都中不了天朝彩票。
一句话,想让你中就能中,不让你中只要不买完所有注数你就中不了。除非你的运势很强悍。
===非 HTML 节点法====PS3.0以上.
  1. Function DownLotteryData {
  2. Param ( [Array]$DATA, [string]$year, [int]$n )
  3. $Ball=$RedBlue=$arr=$array= New-Object "System.Collections.ArrayList"
  4. (1..52) |ForEach-Object {$Ball += 'A'} #期号+日期+33+16
  5. for ( $i = 0; $i -lt $DATA.count; $i++ ) {
  6. if ( $n -le 8 ) {
  7. $arr += $DATA[$i] #开奖数据
  8. if ($n -le 2) { $num = $n } else { $num = [int](([string]($DATA[$i])).TrimStart('0')) + 2 } #分布图
  9. $Ball[$num] = $DATA[$i] #分布图
  10. $n++
  11. } else {
  12. $arr += $DATA[$i] #开奖数据
  13. $num = [int](([string]($DATA[$i])).TrimStart('0')) + 35 #分布图
  14. $Ball[$num] = $DATA[$i] #分布图
  15. $RedBlue += ,$Ball #33+16分布图
  16. $array += ,$arr #开奖数据
  17. $arr = $Ball = New-Object "System.Collections.ArrayList" #开奖数据 分布图
  18. (1..52) |ForEach-Object {$Ball += 'A'} #33+16+日期
  19. $n = 1
  20. }
  21. }
  22. $content = $AllBall = New-Object "System.Collections.ArrayList"
  23. for ( $i = 0; $i -lt ( $DATA.count /9 ); $i++ ) {
  24. $Content += ( $array[$i] -join ' ' ).Replace('.','/') #开奖数据
  25. $AllBall += ( $RedBlue[$i] -join ',' ).Replace('.','_').Replace('A',' ') -replace '^(\s+)?,?','' #33+16分布图
  26. }
  27. $Content | Set-Content .\开奖数据\简略数据\$year.txt -enc Default -force #开奖数据
  28. $AllBall | Set-Content .\开奖数据\分布图\$($year + '.csv') -enc Default -force #分布图 EXCEL
  29. }
  30. #下载数据
  31. Remove-Item .\开奖数据 -Rec -force -ea 0
  32. new-item .\开奖数据\简略数据 -type Directory -force
  33. new-item .\开奖数据\分布图 -type Directory -force
  34. for ( $year = 2003; $year -le 2019; $year++ ) {
  35. $url = "https://kjh.55128.cn/ssq-history-$year.htm"
  36. Invoke-WebRequest -uri $url -Outfile "$env:temp\Downh.log"
  37. $PageData = (( Get-Content "$env:temp\Downh.log" -ReadCount 0 -enc utf8 ) `
  38. -match '^(\s+)?<td>(\d{7}|\d{4}(\.\d{2}){2})</td>$|<li(\s+)?class.*>\d{2}</li>$' ) `
  39. -replace "(\s+)?</?(td)?(li)?((\s+)class='ball.*-24')?>",''
  40. DownLotteryData $PageData $year 1
  41. }
  42. Remove-Item "$env:temp\Downh.log" -force
  43. pause
复制代码

作者: 523066680    时间: 2019-1-9 21:17

本帖最后由 523066680 于 2019-1-9 22:16 编辑

写过抓双色球往期结果的,多线程,以前写的丑(发的代码不完整,好像还有另一个脚本负责导出EXCEL,一时没找着)
  1. use Encode;
  2. use Modern::Perl;
  3. use Time::HiRes qw/time sleep/;
  4. use threads;
  5. use threads::shared;
  6. use Try::Tiny;
  7. use Mojo::UserAgent;
  8. use File::Basename;
  9. use File::Path qw/make_path/;
  10. use File::Slurp;
  11. STDOUT->autoflush(1);
  12. our $workdir = "D:\\Temp\\Double_Ball_Lottery";
  13. make_path $workdir unless -e $workdir;
  14. #chdir $workdir;
  15. our $ua;
  16. our $main = "http://kaijiang.500.com";
  17. our @links :shared;
  18. our @ths;
  19. $ua = Mojo::UserAgent->new();
  20. $ua = $ua->max_redirects(5);
  21. print "Getting Links ... ";
  22. get_links( \@links );
  23. say "Done";
  24. #创建线程
  25. grep { push @ths, threads->create( \&thread_func, $_ ) } ( 0 .. 3 );
  26. #等待运行结束
  27. while ( threads->list(threads::running) ) { sleep 0.2 };
  28. #线程分离/结束
  29. grep { $_->detach() } threads->list(threads::all);
  30. quit();
  31. sub thread_func
  32. {
  33.     our (@links, $workdir);
  34.     my ( $id ) = @_;
  35.     my $ua = Mojo::UserAgent->new();
  36.     my ($link, $file, $res, $times);
  37.     while ( $#links > 0 )
  38.     {
  39.         $link = shift @links;
  40.         $file = $workdir ."\\". basename($link);
  41.         
  42.         if ( -e $file ) { say "$id - $link already exists"; next };
  43.         say "$id - $link";
  44.         $times = 0;
  45.         while (1)
  46.         {
  47.             try { $res = $ua->get($link)->result  }
  48.             catch
  49.             {
  50.                 printf "[%d] getting %s, retry: %d\n", $id, basename($link), $times++;
  51.                 sleep 3.0;
  52.             };
  53.             last if ( defined $res and $res->is_success );
  54.             return if ( $times > 10 );
  55.         }
  56.         write_file( $workdir ."\\". basename($link), $res->body );
  57.     }
  58. }
  59. sub get_links
  60. {
  61.     my ($aref) = @_;
  62.     my $html = read_file("simple.htm");
  63.     my $dom = Mojo::DOM->new( $html );
  64.     for my $e ( $dom->at(".iSelectList")->find("a")->each )
  65.     {
  66.         push @$aref, $e->attr("href");
  67.     }
  68. }
  69. sub quit { system("pause"); }
复制代码
原来是18年2月写的,之前抓了03年到18年1月的html,导出excel(没显示日期,就是搞个分布图),
https://share.weiyun.com/5V1MvmE
作者: xczxczxcz    时间: 2019-1-10 12:33

回复 2# 523066680
谢谢! 补充一个 直接写 EXCEL 的。*.xlsx格式。

从2003 -2019 最新约 180秒。写入EXCEL 速度感觉也可以。优化一下会更好。基本上是比较完整的 EXCEL操作。
所有的数据保存在 一个文档内,每年一个标签。闲时自用。如果只用分布图可以删除掉简略数据。
  1. $time=get-date
  2. # EXCEL 对象
  3. $excel = New-Object -ComObject Excel.Application
  4. Start-Sleep -Seconds 2
  5. $Excel.Visible = $true
  6. if ( [Io.File]::Exists("$Pwd\Lottery.xlsx") ) { $Exist = $true } else { $Exist =$Null }
  7. if ( $Exist ) {
  8. $workbook = $excel.Workbooks.Open("$Pwd\Lottery.xlsx")
  9. $NewestName = $workbook.worksheets.Item(1).name
  10. $workbook.worksheets.Application.DisplayAlerts = $false
  11. $workbook.worksheets.Item(1).delete() #删除最近一年 追加下载
  12. $sheet = $workbook.worksheets.add()
  13. $sheet = $workbook.worksheets.Item(1)
  14. } else {
  15. $NewestName = 2003 #找不到数据则全新下载
  16. $workbook = $excel.Workbooks.add()
  17. $sheet = $workbook.worksheets.Item(1)
  18. }
  19. $borderWeight = "microsoft.office.interop.excel.xlBorderWeight" -as [type]
  20. Function ExcelMerge {
  21. Param ( [Array]$CSVData, [string]$year )
  22. $workbook.worksheets.Item(1).Name = $year
  23. $sheet = $workbook.worksheets.Item($year)
  24. #全局
  25. $sheet.Rows.HorizontalAlignment = 3
  26. $sheet.Rows.VerticalAlignment = 2
  27. $sheet.Columns.RowHeight = 16
  28. $RangeHeight = $sheet.Range("A1:B1")
  29. $RangeHeight.RowHeight = 24
  30. #合并单元格
  31. $CELL1 = $sheet.Cells.item(1,3)
  32. $CELL2 = $sheet.Cells.item(1,35)
  33. $CELLRange = $sheet.Range( $CELL1 , $CELL2 )
  34. $CELLRange.Merge()
  35. $CELL1 = $sheet.Cells.item(1,36)
  36. $CELL2 = $sheet.Cells.item(1,51)
  37. $CELLRange = $sheet.Range( $CELL1 , $CELL2 )
  38. $CELLRange.Merge()
  39. #标题
  40. $CELL1 = $sheet.Cells.item(1,1)
  41. $CELL2 = $sheet.Cells.item(1,51)
  42. $CELLRange = $sheet.Range( $CELL1 , $CELL2 )
  43. $CELLRange.Borders.Weight = $borderWeight::xlThin
  44. $CELLRange.Font.bold = $true
  45. $CELLRange.Font.Size = 13
  46. $CELLRange.Font.ColorIndex = 1
  47. $sheet.cells.item(1,1) = '期号'
  48. $sheet.cells.item(1,2) = '开奖日期'
  49. $sheet.cells.item(1,3) = '红球'
  50. $sheet.cells.item(1,36) = '兰球'
  51. #红球
  52. $CELL1 = $sheet.Cells.item(2,3)
  53. $CELL2 = $sheet.Cells.item($($CSVData.Count + 1),35)
  54. $CELLRange = $sheet.Range( $CELL1 , $CELL2 )
  55. $CELLRange.Borders.Weight=$borderWeight::xlHairLine
  56. $CELLRange.Font.ColorIndex = 3
  57. $CELLRange.Font.Size = 12
  58. $CELLRange.Columns.ColumnWidth = 2.5
  59. #兰球
  60. $CELL1 = $sheet.Cells.item(2,36)
  61. $CELL2 = $sheet.Cells.item($($CSVData.Count + 1),51)
  62. $CELLRange = $sheet.Range( $CELL1 , $CELL2 )
  63. $CELLRange.Borders.Weight=$borderWeight::xlHairLine
  64. $CELLRange.Font.ColorIndex = 5
  65. $CELLRange.Font.Size = 12
  66. $CELLRange.Font.bold = $true
  67. $CELLRange.Columns.ColumnWidth = 2.5
  68. #日期
  69. $CELL1 = $sheet.Cells.item(2,1)
  70. $CELL2 = $sheet.Cells.item($($CSVData.Count + 1),2)
  71. $CELLRange = $sheet.Range( $CELL1 , $CELL2 )
  72. $CELLRange.Borders.Weight = $borderWeight::xlThin
  73. $CELLRange.Font.Size = 12
  74. $CELLRange.Font.ColorIndex = 1
  75. $RangeWidth = $sheet.Range("A1:A2")
  76. $RangeWidth.EntireColumn.ColumnWidth = 8
  77. $RangeWidth = $sheet.Range("B1:B2")
  78. $RangeWidth.EntireColumn.ColumnWidth = 11
  79. #处理开奖数据
  80. $StartLine = 2
  81. for ( $line =0; $line -lt $CSVData.Count; $line++ ) {
  82. $EachCell = @($CSVData[$line].Split(','))
  83. for ( $v=1; $v -le $EachCell.Count; $v++ ) { $sheet.Cells.Item($StartLine,$v) = $EachCell[$($v -1)] }
  84. $StartLine++
  85. }
  86. if ( [int]$year -ne 2019 ) { $sheet = $workbook.worksheets.add() }
  87. }
  88. # 提取开奖数据
  89. Function DownLotteryData {
  90. Param ( [Array]$DATA, [string]$year, [int]$n )
  91. $Ball=$RedBlue=$arr=$array= New-Object "System.Collections.ArrayList"
  92. (1..52) |ForEach-Object {$Ball += 'A'} #期号+日期+33+16
  93. for ( $i = 0; $i -lt $DATA.count; $i++ ) {
  94. if ( $n -le 8 ) {
  95. $arr += $DATA[$i] #开奖数据
  96. if ($n -le 2) { $num = $n } else { $num = [int](([string]($DATA[$i])).TrimStart('0')) + 2 } #分布图
  97. $Ball[$num] = $DATA[$i] #分布图
  98. $n++
  99. } else {
  100. $arr += $DATA[$i] #开奖数据
  101. $num = [int](([string]($DATA[$i])).TrimStart('0')) + 35 #分布图
  102. $Ball[$num] = $DATA[$i] #分布图
  103. $RedBlue += ,$Ball #33+16分布图
  104. $array += ,$arr #开奖数据
  105. $arr = $Ball = New-Object "System.Collections.ArrayList" #开奖数据 分布图
  106. (1..52) |ForEach-Object {$Ball += 'A'} #33+16+日期
  107. $n = 1
  108. }
  109. }
  110. $content = $AllBall = New-Object "System.Collections.ArrayList"
  111. for ( $i = 0; $i -lt ( $DATA.count /9 ); $i++ ) {
  112. $Content += ( $array[$i] -join ' ' ).Replace('.','/') #开奖数据
  113. $AllBall += ( $RedBlue[$i] -join ',' ).Replace('.','_').Replace('A',' ') -replace '^(\s+)?,?','' #33+16分布图
  114. }
  115. $Content | Set-Content .\开奖数据\简略数据\$year.txt -enc Default -force #开奖数据
  116. ExcelMerge $AllBall $year
  117. }
  118. #下载数据
  119. new-item .\开奖数据\简略数据 -type Directory -force
  120. for ( $year = [int]$NewestName; $year -le 2019; $year++ ) {
  121. $url = "https://kjh.55128.cn/ssq-history-$year.htm"
  122. Invoke-WebRequest -uri $url -Outfile "$env:temp\Downh.log"
  123. $PageData = (( Get-Content "$env:temp\Downh.log" -ReadCount 0 -enc utf8 ) `
  124. -match '^(\s+)?<td>(\d{7}|\d{4}(\.\d{2}){2})</td>$|<li(\s+)?class.*>\d{2}</li>$' ) `
  125. -replace "(\s+)?</?(td)?(li)?((\s+)class='ball.*-24')?>",''
  126. DownLotteryData $PageData $year 1
  127. }
  128. Remove-Item "$env:temp\Downh.log" -force
  129. if ( $Exist ) { $workbook.Save() } else { $workbook.SaveAs("$Pwd\Lottery.xlsx") }
  130. $workbook.Close()
  131. $Excel.quit()
  132. $excel = $null
  133. [GC]::Collect()
  134. ([datetime]::Now -$time).totalmilliseconds
  135. pause
复制代码





欢迎光临 批处理之家 (http://bathome.net./) Powered by Discuz! 7.2