标题: [文本处理] 【已解决】求助批处理提取特定文本并分列 [打印本页]
作者: jave000 时间: 2021-6-23 10:00 标题: 【已解决】求助批处理提取特定文本并分列
本帖最后由 jave000 于 2021-9-10 10:12 编辑
同类搜索简述:
按创建和修改时间将一批文本文件,从一个文件夹批量复制到另一个文件夹,并且两个文件夹的路径中间都有一个变量,需要手动输入,以及其中一个路径涉及到桌面,但为了可以分享给同事使用,识别任意电脑名下的桌面路径。
同时对所有文本的内容,筛出以序号为行首的内容,并对其进行分列,再转换到csv文件,同时增加一列生成其文本文件名。
同时自动打开excel。
复杂度:多行文本在分列后,同一列的部分内容要归入同一个单元格。其中部分列的内容只有一行。
感谢楼下诸位
@powershell -c "Get-Content '%~0' | Select-Object -Skip 1 | Out-String | Invoke-Expression" & exit /b
set-location $PSScriptRoot
$jave = read-host "roject Model Folder"
$CXMPV = [Environment]::GetFolderPath("Desktop")
$order = "2,3,4,5,6,1"
$sour = "\\btssvr9\pds1\$jave\3d\\iso_dgn\MOC_2021\"
$dest = "$CXMPV\Print\"
do
{
$minute = read-host "Minutes of Minutes"
$minute = $minute.trim()
}
while ($minute -match "\D")
Get-ChildItem -Path $sour -File |
Where-Object { ($_.CreationTime -gt (get-date).AddMinutes(-$minute)) -or ($_.LastWriteTime -gt (get-date).AddMinutes(-$minute)) } |
foreach-object {
write-host $_.fullname
copy-item $_.fullname -Destination $dest
}
Invoke-Item $sour
Remove-Item $dest* -Include z-mto.csv
if (-not ([string]::IsNullOrEmpty($args[0])))
{
$sour = $args[0]
}
function zget-data()
{
[System.Collections.ArrayList] $sirenas = @()
$rem = "^\s{3}\d{1}|^\s{2}\d{2}|^\s{11,12}\S.+"
$renfs = "(\w{4})(\w{1,4})?(\w{1,4})?(\w{1,5})?(.*)"
$renfd = "`$1-`$2-`$3-`$4-`$5"
$res = "`n(.{4})(.{46})(.{13})(.{15})(.+)((?:`n\s{11,12}.+)*)"
#$res = "`n\s{2,3}(\d{1,2})\s{4,}(\S.+?\S)\s{2,}(\d+(?:X\d+)?)\s{2,}(\S+(?:\s\S+)*)\s{2,}(\d+(?:[.]\d+)?(?:\sM)?)((?:`n\s{11,12}.+)*)"
$red = "`t`$1`t`$2`$6`t`$3`t`$4`t`$5::"
Get-ChildItem -path $sour "*.prt" | foreach-object {
write-host " "$_.basename
$nf = $_.basename.ToUpper() -replace $renfs,$renfd -replace "-+$","";
$a = "`n" + ((get-content -Encoding utf8 -path $_.fullname ) -match $rem -join "`n") ;
if ( $a.length -gt 1 )
{
$a = $a -replace $res,$red -replace "[ `n]+"," " -replace " *`t *","`t"
$a = $a -replace ":","" -replace "::","`n" -replace "(?m)\sm$","" -replace "(?m)^(?=`t)",$nf
$sirenas.add($a) | out-null
}
}
$sirenas = $sirenas -split "`n"
$reos = "^([^`t]*)`t([^`t]*)`t([^`t]*)`t([^`t]*)`t([^`t]*)`t([^`t]*)$"
$reod = $order -replace ",","`t" -replace "(?=\d)","`$"
$sirenas = $sirenas -replace $reos, $reod
return $sirenas
}
function zout-csv()
{
$reos = "^([^`t]*)`t([^`t]*)`t([^`t]*)`t([^`t]*)`t([^`t]*)`t([^`t]*)$"
$reod = '"$1","$2","$3","$4","$5","$6"'
$sirenas -replace $reos,$reod | out-file -encoding utf8 ($dest + "z-mto.csv")
}
function zout-excel()
{
try
{
$Excel = New-Object -ComObject Excel.Application -ErrorAction Stop
}
catch
{
return
}
$Excel.Visible = $true
$Workbook = $Excel.Workbooks.Add()
$Sheet = $Workbook.Worksheets.Item(1)
$v = [string[,]]::new($sirenas.count,1)
for ( $i =0 ; $i -lt $sirenas.count ; $i++ )
{
$v[$i,0] = $sirenas[$i]
}
$rng = "A1:A" + $sirenas.count
$Sheet.range($rng).value2 = $v
$colA = $sheet.range("A1").EntireColumn
$colrange = $sheet.range("A1")
$colA.texttocolumns($colrange,1,1,$false,$true,$false,$false,$false) | out-null
$sheet.columns.autofit() | out-null
$Workbook.SaveAs(($PSScriptRoot + "\" + $dest))
$excel.Quit()
[system.GC]::Collect()
}
[System.Collections.ArrayList] $sirenas = @()
$sirenas = zget-data
zout-csv
zout-excel
作者: Batcher 时间: 2021-6-23 10:27
回复 2# jave000
请上传到网盘试试
作者: qixiaobin0715 时间: 2021-6-23 10:36
上传网盘内容最好有:
1.源文件
2.每个步骤中,文件处理后的示范样式。
作者: newswan 时间: 2021-6-23 13:30
本帖最后由 newswan 于 2021-6-23 14:39 编辑
基本看明白了,用excel做比较好
处理1个文件
powershell- $file = "1.txt"
- $fc = get-content $file
- $i = 0
- while ($i -le $fc.count)
- {
- if ($fc[$i] -match "^\s\s\s\d\s|^\s\s\d\d\s")
- {
- $a = $fc[$i] -split "\s\s\s*"
- $a[5] = $a[5] -replace "\sm",""
- while ($fc[$i+1] -match "^\s{11}\S")
- {
- $a[2] = $a[2] + " " + $fc[$i+1].trim()
- $i += 1
- }
- if (-not ($fc[$i+1] -match "^\s{11}\S"))
- {
- $a
- }
- }
- $i += 1
- }
复制代码
作者: newswan 时间: 2021-6-23 13:42
本帖最后由 newswan 于 2021-6-23 13:46 编辑
4 楼样本 提取结果- 1
- PIPE, SMLS, HG/T 20553, ASTM A312 TP316L, SAWN END, CS 1000-27, DN200 - 219.1 X 4
- 200
- I512912
- 2.1
-
- 2
- PIPE, SMLS, HG/T 20553, ASTM A312 TP316L, SAWN END, CS 1000-27, DN150 - 168.3 X 3.6
- 150
- I512911
- 1.6
-
- 3
- PIPE, SMLS, HG/T 20553, ASTM A312 TP316L, SAWN END, CS 1000-27, DN50 - 60.3 X 2.9
- 50
- I512906
- 0.3
-
- 4
- PIPE, SMLS, HG/T 20553, ASTM A312 TP316L, SAWN END, CS 1000-27, DN25 - 33.7 X 2.6
- 25
- I512903
- 0.3
-
- 5
- PIPE, SMLS, HG/T 20553, ASTM A312 TP316L, SAWN END, CS 1000-27, DN20 - 26.9 X 2
- 20
- I512902
- 0.2
-
- 6
- CONCENTRIC REDUCER, SMLS, GB/T 12459, R(C), ASTM A312 TP316L, WELD PREP. ACC. TO CS 416, ENDPREP. ACC. TO CS 416, CS 1000-28, DN200 X 150 - 219.1 X 4/168.3 X 3.6
- 200X150
- I512654
- 1
-
- 7
- ELBOW 90 C, SMLS, GB/T 12459, 90E(L), ASTM A312 TP316L, WELD PREP. ACC. TO CS 416, CS 1000-28, DN200 - 219.1 X 4
- 200
- I512504
- 1
-
- 8
- ELBOW 90 C, SMLS, GB/T 12459, 90E(L), ASTM A312 TP316L, WELD PREP. ACC. TO CS 416, CS 1000-28, DN50 - 60.3 X 2.9
- 50
- I512498
- 1
-
- 9
- WELDNECK FLANGE, EN 1092-1, TYPE 11, SA 182 F316L, PN10, FLANGE CONTACT FACE TO FORM B1 DIN EN 1092-1, ENDPREP. ACC. TO CS 416, CS 1000-34, DN200 - 219.1 X 4
- 200
- I512820
- 1
-
- 10
- WELDNECK FLANGE, EN 1092-1, TYPE 11, SA 182 F316L, PN16, FLANGE CONTACT FACE TO FORM B1 DIN EN 1092-1, ENDPREP. ACC. TO CS 416, CS 1000-34, DN150 - 168.3 X 3.6
- 150
- I512819
- 1
-
- 11
- WELDNECK FLANGE, EN 1092-1, TYPE 11, SA 182 F316L, PN16, FLANGE CONTACT FACE TO FORM B1 DIN EN 1092-1, ENDPREP. ACC. TO CS 416, CS 1000-34, DN50 - 60.3 X 2.9
- 50
- I512814
- 1
-
- 12
- WELDNECK FLANGE, EN 1092-1, TYPE 11, SA 182 F316L, PN40, FLANGE CONTACT FACE TO FORM B1 DIN EN 1092-1, ENDPREP. ACC. TO CS 416, CS 1000-34, DN25 - 33.7 X 2.6
- 25
- I512799
- 1
-
- 13
- BLIND FLANGE, EN 1092-1, TYPE 05, SA 182 F316L, PN40, CONTACT FACE- FORM A EN 1092-1, CS 1000-37, DN25
- 25
- I512946
- 1
-
- 14
- WELDNECK FLANGE, EN 1092-1, TYPE 11, SA 182 F316L, PN40, FLANGE CONTACT FACE TO FORM B1 DIN EN 1092-1, ENDPREP. ACC. TO CS 416, CS 1000-34, DN20 - 26.9 X 2
- 20
- I512798
- 1
-
- 15
- GASKET, DIN EN 1514-1-IBC, NQ, 1.4401/GRAPHITE/1.4571, PN16, WN 1000-841 DN 200 THICKNESS 1.6
- 200
- I224625
- 1
-
- 16
- GASKET, DIN EN 1514-1-IBC, NQ, 1.4401/GRAPHITE/1.4571, PN16, WN 1000-841 DN 150 THICKNESS 1.6
- 150
- I224624
- 2
-
- 17
- GASKET, DIN EN 1514-1-IBC, NQ, 1.4401/GRAPHITE/1.4571, PN40, WN 1000-841 DN 50 THICKNESS 1.6
- 50
- I224645
- 1
-
- 18
- GASKET, DIN EN 1514-1-IBC, NQ, 1.4401/GRAPHITE/1.4571, PN40, WN 1000-841 DN 25 THICKNESS 1.6
- 25
- I224638
- 2
-
- 19
- GASKET, DIN EN 1514-1-IBC, NQ, 1.4401/GRAPHITE/1.4571, PN40, WN 1000-841 DN 20 THICKNESS 1.6
- 20
- I224636
- 1
-
- 20
- SCREWED CONNECTION, DIN EN ISO 4017/4032, A2-70, CS 1000-16, NUT (H=0.9XD) M20 X 70
- 20
- I91453
- 16
-
- 21
- SCREWED CONNECTION, DIN EN ISO 4017/4032, A2-70, CS 1000-16, NUT (H=0.9XD) M20 X 75
- 20
- I91454
- 8
-
- 22
- SCREWED CONNECTION, DIN EN ISO 4014/4032, A2-70, CS 1000-16, NUT (H=0.9XD) M16 X 65
- 16
- I91413
- 4
-
- 23
- SCREWED CONNECTION, DIN EN ISO 4014/4032, A2-70, CS 1000-16, NUT (H=0.9XD) M12 X 50
- 12
- I91371
- 4
-
- 24
- SCREWED CONNECTION, DIN EN ISO 4014/4032, A2-70, CS 1000-16, NUT (H=0.9XD) M12 X 55
- 12
- I91372
- 8
-
- 25
- ERROR READING SPECIALTY MATERIAL DESCRIPTION LIBRARY
- 150
- V710RE11F3121
- 1
-
- 26
- SLIDE SHOE SIDE MOVABLE COMPANY STD WN8205-2 TYPE 5 (INS.-THK. 120 MM), UST37-2 WITH COATING COMPANY STANDARD WN 8110, SHOE-LENGTH 300 MM, CLAMPED
- 200
- S0W-200
- 1
-
- 1
- PIPE, SMLS, HG/T 20553, ASTM A312 TP316L, SAWN END, CS 1000-27, DN250 - 273 X 4
- 250
- I512913
- 0.5
-
- 2
- CONCENTRIC REDUCER, SMLS, GB/T 12459, R(C), ASTM A312 TP316L, WELD PREP. ACC. TO CS 416, ENDPREP. ACC. TO CS 416, CS 1000-28, DN250 X 200 - 273 X 4/219.1 X 4
- 250X200
- I512658
- 1
-
- 3
- ELBOW 90 C, SMLS, GB/T 12459, 90E(L), ASTM A312 TP316L, WELD PREP. ACC. TO CS 416, CS 1000-28, DN250 - 273 X 4
- 250
- I512505
- 1
-
- 4
- BLIND DISC, CS 473 PN 10 SERIES 2, SA 240 316L, FLANGE MOUNTING DIMENSION, PN10, CLAMPED PART CONTACT FACE FORM A DIN EN 1092-1, CS 1000-33, DN200
- 200
- I512980
- 1
-
- 5
- WELDNECK FLANGE, EN 1092-1, TYPE 11, SA 182 F316L, PN10, FLANGE CONTACT FACE TO FORM B1 DIN EN 1092-1, ENDPREP. ACC. TO CS 416, CS 1000-34, DN250 - 273 X 4
- 250
- I512821
- 1
-
- 6
- WELDNECK FLANGE, EN 1092-1, TYPE 11, SA 182 F316L, PN10, FLANGE CONTACT FACE TO FORM B1 DIN EN 1092-1, ENDPREP. ACC. TO CS 416, CS 1000-34, DN200 - 219.1 X 4
- 200
- I512820
- 2
-
- 7
- GASKET, DIN EN 1514-1-IBC, NQ, 1.4401/GRAPHITE/1.4571, PN10, WN 1000-841 DN 250 THICKNESS 1.6
- 250
- I224614
- 1
-
- 8
- GASKET, DIN EN 1514-1-IBC, NQ, 1.4401/GRAPHITE/1.4571, PN16, WN 1000-841 DN 200 THICKNESS 1.6
- 200
- I224625
- 3
-
- 9
- SCREWED CONNECTION, DIN EN ISO 4017/4032, A2-70, CS 1000-16, NUT (H=0.9XD) M20 X 75
- 20
- I91454
- 8
-
- 10
- SCREWED CONNECTION, DIN EN ISO 4014/4032, A2-70, CS 1000-16, NUT (H=0.9XD) M20 X 85
- 20
- I91456
- 8
-
- 11
- SCREWED CONNECTION, DIN EN ISO 4014/4032, A2-70, CS 1000-16, NUT (H=0.9XD) M20 X 80
- 20
- I91455
- 12
-
- 12
- BELLOWS SEAL VALVE, HANDWHEEL, CLIMBING, V2436W, 1.4581, 1.4571, 1.4408, 1.4404, PN10, FLANGE CONTACT FACE TO FORM B1 DIN EN 1092-1, WN 8480, DN 200
- 200
- I372364
- 1
复制代码
作者: jave000 时间: 2021-6-23 14:32
回复 8# newswan
十分感谢,我试了一下,基本成功运行(元件描述中出现了很多string Trim(Params char[] trimChars), string Trim() string Trim(Params char[] trimChars), string Trim())。
只是结果我用不起来,希望批处理能完成提取NO DESCRIPTION AND SPECIFICATION (MM) ARTICLE-NO QTY这五列的正文内容,将其分别放入对应的单元格,其中DESCRIPTION AND SPECIFICATION这一列需要将原来的各行合并起来(我看你的代码已经实现了,本来以为做不到所以我一开始就没提,惊为天人),并且依然保持按列排序(必须排成五列)。其他内容我一概不要。
好像bat是不能保存excel文件的吧?我也不知道txt怎么实现,如果结果不能直接变成excel的单元格,我也希望将结果手动复制过去后能自动分布到excel的五列里。
我后面还要继续将结果合并ARTICLE-NO列的重复项,并将QTY列的“ M”删除,以及将重复项的数据求和。(这是最终结果,但上一步未分重复项的结果我也是要的,便于后期校核数据是否遗漏。)
另外powershell我很陌生,刚才也是摸索着用的,没想到复制代码过去,只是右击,就自动粘贴了,它生成结果后没有保存成文件,是代码里没写这一段还是不支持?好像使用起来没有bat方便,bat我只要放到相应文件夹双击就直接出来结果了,适合分享给其他同事。
作者: jave000 时间: 2021-6-23 14:34
回复 5# Batcher
管理员好,我是在公司发的,公司屏蔽了所有网盘的网址,我实在上传不了,这里的附件我也上传不了,无论是zip还是txt,都是失败
我试试把文本发到邮箱,晚上从自己电脑发过来。
谢谢
作者: newswan 时间: 2021-6-23 14:37
本帖最后由 newswan 于 2021-6-23 14:45 编辑
回复 10# jave000
改一下啊,trim 必须带括号 trim()
选其中一句,插入为第一行,保存为bat- @powershell -c "Get-Content '%~0' | Select-Object -Skip 1 | Out-String | Invoke-Expression" & exit /b
- @powershell -c "Get-Content '%~0' | Select-Object -Skip 1 | Out-String | Invoke-Expression" & pause & exit
- #&cls&@Powershell "& {[ScriptBlock]::Create("'#' + (gc '%~f0' -raw)").Invoke()}" & pause & exit
- #&cls&@Powershell "& {[ScriptBlock]::Create("'#' + ([io.file]::ReadAllText('%~f0',[text.encoding]::Default))").Invoke()}" & pause & exit
- #&cls&@powershell -c "Get-Content '%~0' | Select-Object -Skip 1 | Out-String | Invoke-Expression" & pause&exit
复制代码
作者: jave000 时间: 2021-6-23 14:44
回复 9# idwma
谢谢,运行结果很好,就是特别的慢,每一行数据跳出一个闪烁窗口,一秒大约生成两三行数据,数据都在cmd里,然后手动复制到excel对么?但是我任意键就自动关闭了……
作者: newswan 时间: 2021-6-23 14:46
本帖最后由 newswan 于 2021-6-23 14:51 编辑
- $fileSour = "1.txt"
- $fileDest = "11.txt"
-
- $fc = get-content $fileSour
- [System.Collections.ArrayList] $da = @()
- $i = 0
- while ($i -le $fc.count)
- {
- if ($fc[$i] -match "^\s\s\s\d\s|^\s\s\d\d\s")
- {
- $a = $fc[$i] -split "\s\s\s*"
- $a[5] = $a[5] -replace "\sm",""
- while ($fc[$i+1] -match "^\s{11}\S")
- {
- $a[2] = $a[2] + " " + $fc[$i+1].trim()
- $i += 1
- }
- if (-not ($fc[$i+1] -match "^\s{11}\S"))
- {
- $da.add($a -join "`t") | out-null
- }
- }
- $i += 1
- }
- $da | out-file $fileDest
复制代码
作者: newswan 时间: 2021-6-23 15:00
处理目录下多个文件- $pathSour = "a"
- $fileDest = "11.txt"
-
- [System.Collections.ArrayList] $da = @()
-
- get-childitem -path $pathSour *.rpt | foreach-object {
- $fc = get-content $_
- $i = 0
- while ($i -le $fc.count)
- {
- if ($fc[$i] -match "^\s\s\s\d\s|^\s\s\d\d\s")
- {
- $a = $fc[$i] -split "\s\s\s*"
- $a[5] = $a[5] -replace "\sm",""
- while ($fc[$i+1] -match "^\s{11}\S")
- {
- $a[2] = $a[2] + " " + $fc[$i+1].trim()
- $i += 1
- }
- if (-not ($fc[$i+1] -match "^\s{11}\S"))
- {
- $da.add($a -join "`t") | out-null
- }
- }
- $i += 1
- }
- }
- $da | out-file $fileDest
复制代码
作者: newswan 时间: 2021-6-23 15:10
文件是 prt 还是 rpt ?
作者: jave000 时间: 2021-6-23 15:15
回复 14# newswan
复制到excel已经是自动分列了,感谢你给我涨了知识。
出现了两处识别错误:
第一个是蝶阀,HAND LEVER, K2560C出现的位置不对,而且中间多了个空格,导致复制到excel后,后面两列错位,猜测是因为BUTTERFLY VALVE中间有两个空格,这是我们数据库管理员输入描述有误,不知道是否能解决这种偶尔出现多个空格的BUG,毕竟真正用来分列的空格起码超过五个,能否设置小于三个的不算。
17 BUTTERFLY VALVE, HAND LEVER, K2560C, 100 IC02606 1
NEOTECHA TYPE -TRIM N07,
EN-GJS-400-18U-LT/PTFE, PN10, SCREW
IN HOLE, CONTACT FACE FORM B1 DIN EN
1092-1, INTEGR.GASKET, WN 8480, DN
100
17 BUTTERFLY NEOTECHA TYPE -TRIM N07, EN-GJS-400-18U-LT/PTFE, PN10, SCREW IN HOLE, CONTACT FACE FORM B1 DIN EN 1092-1, INTEGR.GASKET, WN 8480, DN 100 VALVE, HAND LEVER, K2560C, 100 IC02606 1
第二个是弯管,同样的BUG,两个空格导致识别错误。关于数量这里的“ M”我也不清楚能不能删,会不会干扰到描述里万一也有这样的字符,如果不行,我手动处理。
3 PIPE BEND, RADIUS 2,5 X D, WN 9900-1, 80 I302777 0.4 M
H2/1.4404, SAWN END, - DN 80 - 88.9 X
2.3
3 PIPE BEND, H2/1.4404, SAWN END, - DN 80 - 88.9 X 2.3 RADIUS 2,5 X D, WN 9900-1, 80 I302777 0.4 M
作者: newswan 时间: 2021-6-23 15:19
本帖最后由 newswan 于 2021-6-23 15:25 编辑
回复 19# jave000
改这一句,4个空格分列- $a = $fc[$i].trim() -split "\s{4,}*"
复制代码
作者: newswan 时间: 2021-6-23 15:27
本帖最后由 newswan 于 2021-6-23 15:39 编辑
如果 保留m ,删除这一句- $a[-1] = $a[-1] -replace "\sm",""
复制代码
作者: jave000 时间: 2021-6-23 15:48
本帖最后由 jave000 于 2021-6-24 00:27 编辑
回复 18# idwma
CMD窗口闪烁,58kb运行了一分钟,不太好用,不过结果很好
你这个代码我学习一下希望能尽快看懂,十分感谢
作者: jave000 时间: 2021-6-23 15:48
回复 17# newswan
若干*.prt,多个文件的运行结果是一个空白文本文档
作者: newswan 时间: 2021-6-23 15:53
回复 23# jave000
修改路径 $pathsour
作者: jave000 时间: 2021-6-23 16:02
回复 20# newswan
我直接把路径这一行删了,就成功运行了。
但是关于设置空格数量的,无论是刚才的单个文件,还是现在的多个文件,这两段代码,修改空格数量的代码后,运行都是空白
作者: newswan 时间: 2021-6-23 16:27
回复 25# jave000
发现有这么一行,2个空格分隔- 12 BELLOWS SEAL VALVE, HANDWHEEL, CLIMBING, 200 I372364 1
复制代码
暂时,解决办法,你先找例外情况,手动修改空格
或者导出的时候,导出为csv格式
找到所有例外情况,看看情况再决定
作者: jave000 时间: 2021-6-23 16:40
回复 26# newswan
$a = $fc[$i].trim() -split "\s{4,}*"
这个代码改成2或3都不行,都是空白,我删了.trim()也是一样的结果
目前这样如果难以解决,也很实用了,我私信你了
我刚看到版规,直接得到结果应该是有偿的
不知应该怎么感谢
作者: newswan 时间: 2021-6-23 16:44
- $pathSour = "."
- $fileDest = "11.txt"
-
- [System.Collections.ArrayList] $da = @()
-
- get-childitem -path $pathSour *.rpt | foreach-object {
- $fc = get-content $_
- $i = 0
- while ($i -le $fc.count)
- {
- if ($fc[$i] -match "^\s\s\s\d\s|^\s\s\d\d\s")
- {
- $a = @("") * 6
- $a[1] =$fc[$i].substring(0,4).trim()
- $a[2] =$fc[$i].substring(4,46).trim()
- $a[3] =$fc[$i].substring(50,13).trim()
- $a[4] =$fc[$i].substring(63,18).trim()
- $a[5] =$fc[$i].substring(81).trim()
- #$a = $fc[$i] -split "\s{2,}"
- #$a[-1] = $a[-1] -replace "\sm",""
- while ($fc[$i+1] -match "^\s{11}\S")
- {
- $a[2] = $a[2] + " " + $fc[$i+1].trim()
- $i += 1
- }
- if (-not ($fc[$i+1] -match "^\s{11}\S"))
- {
- $da.add($a -join "`t") | out-null
- }
- }
- $i += 1
- }
- }
- $da | out-file $fileDest
复制代码
按长度截取也可以
作者: WHY 时间: 2021-6-23 21:04
- $reg = '(?-i)^\s+\d+\s+.*?(\S+)(?<!X)\s+(\d+(?:\.\d+)?)(?>\s+M)?$';
- $arr = (gc *.txt) -match $reg -replace $reg, '$1,$2';
- sc 1.csv $arr;
复制代码
作者: jave000 时间: 2021-6-23 23:45
本帖最后由 jave000 于 2021-6-23 23:52 编辑
回复 29# WHY
你的代码好少啊……但是确实能生成两列,好强,它还能修改成显示五列吗?
作者: newswan 时间: 2021-6-23 23:52
本帖最后由 newswan 于 2021-6-24 00:00 编辑
- $Excel = New-Object -ComObject Excel.Application
- $Excel.Visible = $true
- $Workbook = $Excel.Workbooks.Add()
- $Sheet = $Workbook.Worksheets.Item(1)
- for ($i =1 ; $i -le $da.count ; $i++)
- {
- $Sheet.cells($i,1).value = $da[$i-1]
- }
- $colA=$sheet.range("A1").EntireColumn
- $colrange=$sheet.range("A1")
- $colA.texttocolumns($colrange,1,1,$true,$true,$false,$false,$false)
- $sheet.columns.autofit()
- $Workbook.SaveAs(($PSScriptRoot + "\" + $fileDest))
- $excel.Quit()
复制代码
生成 excel 文件 加到后面
作者: newswan 时间: 2021-6-23 23:54
只需要保留 2 4 5 ?
作者: jave000 时间: 2021-6-23 23:55
本帖最后由 jave000 于 2021-6-23 23:59 编辑
回复 32# newswan
出现了这样的连续报错,运行速度也很慢,但也出了结果,只不过结果文件特别大,200KB的若干PRT文件,生成了一个4MB多的txt文件
无法对 Null 数组进行索引。
所在位置 行:11 字符: 13
+ if ($fc[$i] -match "^\s\s\s\d\s|^\s\s\d\d\s")
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: ( [],RuntimeException
+ FullyQualifiedErrorId : NullArray
无法对 Null 数组进行索引。
所在位置 行:11 字符: 13
+ if ($fc[$i] -match "^\s\s\s\d\s|^\s\s\d\d\s")
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [],RuntimeException
+ FullyQualifiedErrorId : NullArray
无法对 Null 数组进行索引。
所在位置 行:11 字符: 13
+ if ($fc[$i] -match "^\s\s\s\d\s|^\s\s\d\d\s")
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [],RuntimeException
+ FullyQualifiedErrorId : NullArray
无法对 Null 数组进行索引。
所在位置 行:11 字符: 13
+ if ($fc[$i] -match "^\s\s\s\d\s|^\s\s\d\d\s")
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [],RuntimeException
+ FullyQualifiedErrorId : NullArray
作者: jave000 时间: 2021-6-23 23:57
回复 33# newswan
老大你还是早点休息吧,我受之有愧。如果两个空格很难解决,我手改也行的。
作者: newswan 时间: 2021-6-23 23:59
本帖最后由 newswan 于 2021-6-24 00:18 编辑
回复 34# jave000
我这里正确的,你检查下?- $pathSour = "."
- $fileDest = "11"
-
- [System.Collections.ArrayList] $da = @()
-
- get-childitem -path $pathSour *.rpt | foreach-object {
- $fc = get-content $_
- for ($i = 0 ; $i -le $fc.count ; $i++)
- {
- if ($fc[$i] -match "^\s\s\s\d\s|^\s\s\d\d\s")
- {
- $a = @("") * 6
- $a[1] =$fc[$i].substring(0,4).trim()
- $a[2] =$fc[$i].substring(4,46).trim()
- $a[3] =$fc[$i].substring(50,13).trim()
- $a[4] =$fc[$i].substring(63,18).trim()
- $a[5] =$fc[$i].substring(81).trim()
- $a[5] = $a[5] -replace "\sm",""
- while ($fc[$i+1] -match "^\s{11}\S")
- {
- $a[2] = $a[2] + " " + $fc[$i+1].trim()
- $i += 1
- }
- if (-not ($fc[$i+1] -match "^\s{11}\S"))
- {
- $da.add($a -join "`t") | out-null
- }
- }
- }
- }
- $da | out-file ($fileDest + ".txt")
复制代码
作者: newswan 时间: 2021-6-24 00:06
回复 35# jave000
我还在下新的 office 2021版的
作者: jave000 时间: 2021-6-24 08:47
回复 32# newswan
很感谢了,我觉得我要学习很久才能看懂你写的是什么
请问为什么一定要设置$pathSour = "."
之前的那个为什么设置成两个空格就运行失败
现在这个代码比较复杂,确实运行正常,就是很奇怪为什么生成的文件会大很多
作者: newswan 时间: 2021-6-24 10:20
回复 33# jave000
基本没啥改的了
$pathSour 作用是 rpt 文件夹
数据文件长度是明确的,按长度截取很简单
$a[0] 未使用,可以用来记录文件名、日期或者其他什么的
作者: newswan 时间: 2021-6-24 10:21
回复 33# jave000
生成的文件大很多?什么意思
作者: jave000 时间: 2021-6-24 10:32
回复 35# newswan
当总文件数量很多时,cmd窗口会出现之前发的报错信息,不过生成结果问题不大。
发现一个错误:
1 PIPE, SMLS, HG/T 20553, ASTM A312 80 I512908 13.5 M
TP316L, SAWN END, CS 1000-27, DN80 -
88.9 X 3.2
1 PIPE, SMLS, HG/T 20553, ASTM A312 TP316L, SAWN END, CS 1000-27, DN80 - 88.9 X 3.2 80 I512908 1 3.5
此处13.5被拆开了,当我把文本复制到excel时,这个1被分到了I512908那个单元格
我觉得现在的代码太复杂了,可能会出现更多的问题,要不我还是用最开始那个双空格BUG的代码吧
作者: newswan 时间: 2021-6-24 10:39
回复 36# jave000 - 1 PIPE, SMLS, HG/T 20553, ASTM A312 200 I512912 2.1 M
- 2 PIPE, SMLS, HG/T 20553, ASTM A312 150 I512911 1.6 M
- 3 PIPE, SMLS, HG/T 20553, ASTM A312 50 I512906 0.3 M
- 4 PIPE, SMLS, HG/T 20553, ASTM A312 25 I512903 0.3 M
- 5 PIPE, SMLS, HG/T 20553, ASTM A312 20 I512902 0.2 M
- 6 CONCENTRIC REDUCER, SMLS, GB/T 12459, 200X150 I512654 1
- 7 ELBOW 90 C, SMLS, GB/T 12459, 90E(L), 200 I512504 1
- 8 ELBOW 90 C, SMLS, GB/T 12459, 90E(L), 50 I512498 1
- 9 WELDNECK FLANGE, EN 1092-1, TYPE 11, SA 200 I512820 1
- 10 WELDNECK FLANGE, EN 1092-1, TYPE 11, SA 150 I512819 1
- 11 WELDNECK FLANGE, EN 1092-1, TYPE 11, SA 50 I512814 1
- 12 WELDNECK FLANGE, EN 1092-1, TYPE 11, SA 25 I512799 1
- 13 BLIND FLANGE, EN 1092-1, TYPE 05, SA 182 25 I512946 1
- 14 WELDNECK FLANGE, EN 1092-1, TYPE 11, SA 20 I512798 1
- 15 GASKET, DIN EN 1514-1-IBC, NQ, 200 I224625 1
- 16 GASKET, DIN EN 1514-1-IBC, NQ, 150 I224624 2
- 17 GASKET, DIN EN 1514-1-IBC, NQ, 50 I224645 1
- 18 GASKET, DIN EN 1514-1-IBC, NQ, 25 I224638 2
- 19 GASKET, DIN EN 1514-1-IBC, NQ, 20 I224636 1
- 20 SCREWED CONNECTION, DIN EN ISO 20 I91453 16
- 21 SCREWED CONNECTION, DIN EN ISO 20 I91454 8
- 22 SCREWED CONNECTION, DIN EN ISO 16 I91413 4
- 23 SCREWED CONNECTION, DIN EN ISO 12 I91371 4
- 24 SCREWED CONNECTION, DIN EN ISO 12 I91372 8
- 25 ERROR READING SPECIALTY MATERIAL 150 V710RE11F3121 1
- 26 SLIDE SHOE SIDE MOVABLE COMPANY STD 200 S0W-200 1
- 1 PIPE, SMLS, HG/T 20553, ASTM A312 250 I512913 0.5 M
- 2 CONCENTRIC REDUCER, SMLS, GB/T 12459, 250X200 I512658 1
- 3 ELBOW 90 C, SMLS, GB/T 12459, 90E(L), 250 I512505 1
- 4 BLIND DISC, CS 473 PN 10 SERIES 2, SA 200 I512980 1
- 5 WELDNECK FLANGE, EN 1092-1, TYPE 11, SA 250 I512821 1
- 6 WELDNECK FLANGE, EN 1092-1, TYPE 11, SA 200 I512820 2
- 7 GASKET, DIN EN 1514-1-IBC, NQ, 250 I224614 1
- 8 GASKET, DIN EN 1514-1-IBC, NQ, 200 I224625 3
- 9 SCREWED CONNECTION, DIN EN ISO 20 I91454 8
- 10 SCREWED CONNECTION, DIN EN ISO 20 I91456 8
- 11 SCREWED CONNECTION, DIN EN ISO 20 I91455 12
- 12 BELLOWS SEAL VALVE, HANDWHEEL, CLIMBING, 200 I372364 1
复制代码
你数数各个列的宽度,第4楼的宽度非常整齐,如果不整齐,找下原因
作者: newswan 时间: 2021-6-24 10:54
本帖最后由 newswan 于 2021-6-24 11:03 编辑
- $pathSour = "."
- $fileDest = "11"
-
- [System.Collections.ArrayList] $al = @()
-
- get-childitem -path $pathSour *.rpt | foreach-object {
- $fc = get-content $_
- for ($i = 0 ; $i -le $fc.count ; $i++)
- {
- if ($fc[$i] -match "^\s{2,3}(\d{1,2})(.{46})(.{13})(.{16})(.+)")
- {
- $a = @("") * 6
- $a[0] = $_.basename
- for ($j = 1 ; $j -le 5 ; $j++)
- {
- $a[$j] = $matches[$j].trim()
- }
- $a[5] = $a[5] -replace "\sm",""
- while ($fc[$i+1] -match "^\s{11}\S")
- {
- $a[2] = $a[2] + " " + $fc[$i+1].trim()
- $i += 1
- }
- if (-not ($fc[$i+1] -match "^\s{11}\S"))
- {
- $al.add($a -join "`t") | out-null
- }
- }
- }
- }
- $al | out-file ($fileDest + ".txt")
复制代码
你自己数宽度,宽度依次是 4 46 13 16 x
作者: idwma 时间: 2021-6-24 14:13
再来个简单粗暴的
@echo off
findstr /b /r /c:"^ *[0-9]........................................" a.txt > b.txt
pause
作者: jave000 时间: 2021-6-24 15:04
本帖最后由 jave000 于 2021-6-24 15:43 编辑
回复 39# idwma
@echo off
findstr /b /r /c:"^ *[0-9]........................................" *.prt > b.txt
exit
神仙代码,只是描述没有组合全,只有第一行,而且也没有自动分列,手动分列后有部分错乱,但是很神奇啊……英文句号的个数有含义么?
作者: newswan 时间: 2021-6-24 15:07
本帖最后由 newswan 于 2021-6-24 16:10 编辑
- $pathSour = ".\prt"
- $fileDest = "11"
-
- [System.Collections.ArrayList] $al1 = @()
- [System.Collections.ArrayList] $al2 = @()
-
-
- get-childitem -path $pathSour "*.prt" | foreach-object {
- $fc = get-content $_.fullname
- for ($i = 0 ; $i -lt $fc.count ; $i++)
- {
- if ($fc[$i] -match "^\s{2}[\s\d]\d\s{4}")
- {
- $al1.add($fc[$i] + "`t" + $_.basename) | out-null
- #$re = '^\s{2,3}(\d{1,2})\s{4}(.+)\s+([^\s]+)\s+([^\s]+)\s+([\d]+[.]?[\d]*\s?M?)$'
- $re = '^\s{2,3}(\d{1,2})(.{46})(.{13})(.{16})(.+)'
- $fc[$i] -match "$re" | out-null
- $a = @("") * 6
- $a[0] = $_.basename
- for ($j = 1 ; $j -le 5 ; $j++)
- {
- $a[$j] = $matches[$j].trim()
- }
- $a[5] = $a[5] -replace "\sm",""
- while ($fc[$i+1] -match "^\s{11}\S")
- {
- $a[2] = $a[2] + " " + $fc[$i+1].trim()
- $i += 1
- }
- if (-not ($fc[$i+1] -match "^\s{11}\S"))
- {
- $al2.add($a -join "`t") | out-null
- }
- }
- }
- }
- $al1 | out-file ($fileDest + "-1.txt")
- $al2 | out-file ($fileDest + "-a.txt")
-
-
- $v = [string[,]]::new($al2.count,1)
- for ($i =0 ; $i -lt $al2.count ; $i++)
- {
- $v[$i,0] = $al2[$i]
- }
-
- $Excel = New-Object -ComObject Excel.Application
- $Excel.Visible = $true
- $Workbook = $Excel.Workbooks.Add()
- $Sheet = $Workbook.Worksheets.Item(1)
-
- $rng = "A1:A" + $al2.count
- $Sheet.range($rng).value2 = $v
-
- $colA = $sheet.range("A1").EntireColumn
- $colrange = $sheet.range("A1")
- $colA.texttocolumns($colrange,1,1,$false,$true,$false,$false,$false) | out-null
- $sheet.columns.autofit() | out-null
- $Workbook.SaveAs(($PSScriptRoot + "\" + $fileDest))
- $excel.Quit()
复制代码
$re 有两个,还是按列分割比较好。
作者: newswan 时间: 2021-6-24 18:35
- $pathSour = ".\prt"
- $fileDest = "11"
-
- [System.Collections.ArrayList] $al1 = @()
- [System.Collections.ArrayList] $al2 = @()
-
-
- get-childitem -path $pathSour "*.prt" | foreach-object {
- $fc = get-content $_.fullname
- $nf = $_.basename.ToUpper()
- $nf = $nf -replace "(\w{4})(\w{1,4})?(\w{1,4})?(\w{1,5})?(.*)","`$1-`$2-`$3-`$4-`$5" -replace "-+$",""
- for ($i = 0 ; $i -lt $fc.count ; $i++)
- {
- if ($fc[$i] -match "^\s{2}[\s\d]\d\s{4}")
- {
- $al1.add($fc[$i] + "`t" + $_.basename) | out-null
- #$re = '^\s{2,3}(\d{1,2})\s{4}(.+)\s+([^\s]+)\s+([^\s]+)\s+([\d]+[.]?[\d]*\s?M?)$'
- $re = '^\s{2,3}(\d{1,2})(.{46})(.{13})(.{16})(.+)'
- $fc[$i] -match "$re" | out-null
- $a = @("") * 6
- $a[0] = $nf
- for ($j = 1 ; $j -le 5 ; $j++)
- {
- $a[$j] = $matches[$j].trim()
- }
- $a[5] = $a[5] -replace "\sm",""
- while ($fc[$i+1] -match "^\s{11}\S")
- {
- $a[2] = $a[2] + " " + $fc[$i+1].trim()
- $i += 1
- }
- if (-not ($fc[$i+1] -match "^\s{11}\S"))
- {
- $al2.add($a -join "`t") | out-null
- }
- }
- }
- }
- $al1 | out-file ($fileDest + "-1.txt")
- $al2 | out-file ($fileDest + "-a.txt")
-
-
- $v = [string[,]]::new($al2.count,1)
- for ($i =0 ; $i -lt $al2.count ; $i++)
- {
- $v[$i,0] = $al2[$i]
- }
-
- $Excel = New-Object -ComObject Excel.Application
- $Excel.Visible = $true
- $Workbook = $Excel.Workbooks.Add()
- $Sheet = $Workbook.Worksheets.Item(1)
-
- $rng = "A1:A" + $al2.count
- $Sheet.range($rng).value2 = $v
-
- $colA = $sheet.range("A1").EntireColumn
- $colrange = $sheet.range("A1")
- $colA.texttocolumns($colrange,1,1,$false,$true,$false,$false,$false) | out-null
- $sheet.columns.autofit() | out-null
- $Workbook.SaveAs(($PSScriptRoot + "\" + $fileDest))
- $excel.Quit()
复制代码
作者: jave000 时间: 2021-6-24 23:14
回复 42# idwma
十分感谢,我还在自学中
作者: WHY 时间: 2021-6-25 19:13
本帖最后由 WHY 于 2021-6-25 21:54 编辑
- $dir = 'E:\Test\PRT'; #存放prt文件的目录路径
- $reg = '(?-i)^\s+(\d+)\s+(.+)(?<!\s)\s+(\d+(?:X\d+)?)\s+(.+)(?<![\sX])\s+(\d+(?:\.\d+)?(?>\s+M)?)$';
- $arr = (gc ($dir + '\*.prt')) -match $reg -replace $reg, '"$1","$2","$3","$4","$5"';
- sc 1.csv $arr;
复制代码
用顶楼的附件测试,得到的结果:5列、4600行
作者: jave000 时间: 2021-6-25 22:08
回复 45# WHY
感谢大神,我正在看正则符号争取早日看懂你的代码运行原理。
这里有三个状况,一是描述这一列只显示第一行;二是管道长度的单位“ M”没被删掉;三是没有显示所在文件对应的管号。
我努力学习,尽快做到虽然不会写,但是会改。
十分感谢
作者: newswan 时间: 2021-6-25 23:17
本帖最后由 newswan 于 2021-6-25 23:46 编辑
- $pathSour = ".\prt"
- $fileDest = "11"
-
- [System.Collections.ArrayList] $al2 = @()
-
- get-childitem -path $pathSour "*.prt" | foreach-object {
- $nf = $_.basename.ToUpper()
- $nf = $nf -replace "(\w{4})(\w{1,4})?(\w{1,4})?(\w{1,5})?(.*)","`$1-`$2-`$3-`$4-`$5" -replace "-+$",""
- $fc = (get-content $_.fullname) -match ''^\s{2,3}\d{1,2}\s{4}.+|^\s{11}\S.+''
- for ($i = 0 ; $i -lt $fc.count ; $i++)
- {
- if ($fc[$i] -match "^\s{2}[\s\d]\d\s{4}")
- {
- #$re = '^\s{2,3}(\d{1,2})\s{4}(.+)\s+([^\s]+)\s+([^\s]+)\s+([\d]+[.]?[\d]*\s?M?)$'
- $re = '^(.{4})(.{46})(.{13})(.{16})(.+)'
- $fc[$i] -match "$re" | out-null
- $a = @("") * 6
- $a[0] = $nf
- for ($j = 1 ; $j -le 5 ; $j++)
- {
- $a[$j] = $matches[$j].trim()
- }
- $a[5] = $a[5] -replace "\sm",""
- while ($fc[$i+1] -match "^\s{11}\S")
- {
- $a[2] = $a[2] + " " + $fc[$i+1].trim()
- $i += 1
- }
- $al2.add($a -join "`t") | out-null
- }
- }
- }
- $al2 | out-file ($fileDest + ".txt")
复制代码
作者: WHY 时间: 2021-6-25 23:41
回复 46# jave000
"一是描述这一列只显示第一行" 这句是个啥意思?
别的很好改- $fd = 'E:\Test\PRT'; #存放prt文件的目录路径
- $arr = @();
- $reg = '(?-i)^\s+(\d+)\s+(.+)(?<!\s)\s+(\d+(?:X\d+)?)\s+(.+)(?<![\sX])\s+(\d+(?:\.\d+)?)(?>\s+M)?$';
- forEach( $f In (dir $fd -Filter *.prt) ){
- $s = $f.BaseName;
- $s = $s -replace '^(....)(....)(....)(.....)(.+)$', '$1-$2-$3-$4-$5';
- $arr += @(gc $f.FullName) -match $reg -replace $reg, ('"$1","$2","$3","$4","$5","' + $s + '"');
- }
- sc 1.csv $arr;
复制代码
作者: idwma 时间: 2021-6-26 09:06
本帖最后由 idwma 于 2021-6-26 10:41 编辑
新的灵感
稍做修改
@echo off&setlocal EnableDelayedExpansion
set out=ffff.txt
del !out!
set d=3
for /f "delims=" %%k in ('dir/b *.prt') do (
set na=%%~nk
set na=!na:~0,4!-!na:~5,4!-!na:~9,4!-!na:~12,5!-!na:~16!
for /f "delims=" %%i in (%%k) do (
set b=%%i
for /l %%j in (1,1,30) do (
if "!b:~0,3!" equ " %%j" set e=!b:~0,85! & set d=0
if "!b:~0,4!" equ " %%j" set e=!b:~0,85! & set d=0
)
if "!b:~0,11!" equ " " (if !b:~-3! neq NPD set c=!c!!b:~11! & set d=1)
if "!d!" equ "1" set f=!na! !e:~0,49!!c! !e:~-38!
if "!d!" equ "0" echo;!f! >> !out! && set "c=" & set "f=" & set d=1
)
set d=3
)
echo;!f! >> !out!
pause
作者: newswan 时间: 2021-6-26 11:17
最稳妥的应该这样:导出为csv格式。
作者: jave000 时间: 2021-6-26 22:13
本帖最后由 jave000 于 2021-6-26 22:36 编辑
回复 48# WHY
不知道为啥,这个论坛我上传不了附件和图片。
我的第二列是元件描述,这个描述包含了第二行第三行第四行等若干行,这些都要合并到第一行的描述里
样本(bisca321ag12lr009m1.prt):
1 INSERTED NOZZLE, CS 9901, PP/UP-GF PIPE 200X50 I143276 1
CLASS B, LAMINATED END, LAMINATED
END, , DN 200 X 50 - 208 X 13 MATING
DIMENSION 208 X 2.7
成品(自动分列,管号这一列放在哪无所谓,但要大写):
1 INSERTED NOZZLE, CS 9901, PP/UP-GF PIPE CLASS B, LAMINATED END, LAMINATED END, , DN 200 X 50 - 208 X 13 MATING DIMENSION 208 X 2.7 200X50 I143276 1 BISC-A321-AG12-LR009-M1
作者: jave000 时间: 2021-6-26 22:23
本帖最后由 jave000 于 2021-6-26 23:15 编辑
回复 49# idwma
你好,谢谢,运行有报错,但是我偶然运行出了一次结果,很接近需要了,只是4、4、4、5、1+的判断错了,可以下载我一楼发的附件链接
我在你代码里没看到有写路径,不知道这个路径是怎么从代码里体现出来的
“找不到 E:\Downloads\PRT\ffff.txt”
作者: newswan 时间: 2021-6-27 00:03
三种分列方式,对比,还是按宽度截取,最佳- $pathSour = ".\prt"
- $fileDest = "11"
-
- [System.Collections.ArrayList] $alc = @()
-
- $res1 = "^\s{2,3}\d{1,2}\s{4}.+"
- $res2 = "^\s{11}\S.+"
- $rea = "$res1" + "|" + "$res2"
- $renfs = "(\w{4})(\w{1,4})?(\w{1,4})?(\w{1,5})?(.*)"
- $renfd = "`$1-`$2-`$3-`$4-`$5"
- $rew = "^(.{4})(.{46})(.{13})(.{15})(.+)"
- $re2s = '^\s{2,3}(\d{1,2})\s{4,}(.+?)(?<=\S)\s{2,}(\d+(?:X\d+)?)\s{2,}(\S+(?:\s\S+)*)\s{2,}([\d]+(?:[.]\d+)?(?:\sM)?)$'
- $re2d = "`t`$1`t`$2`t`$3`t`$4`t`$5"
-
- get-childitem -path $pathSour "*.prt" | foreach-object {
- write-host $_.basename
- $fc = (get-content -Encoding utf8 $_.fullname) -match $rea
- for ($i = 0 ; $i -lt $fc.count ; $i++)
- {
- if ($fc[$i] -match "^\s{2}[\s\d]\d\s{4}")
- {
- $s1 = $fc[$i]
- # 以3+空格
- $a1 = (" " + $s1) -replace "(?<=\S) (?=\d)"," " -split " +"
- $a1line1 = $a1 -join "`t"
- #re2s
- $a2line1 = $s1 -replace $re2s,$re2d
- # 宽度
- $s1 -match $rew | out-null
- $aw = @("") * 6
- for ($j = 1 ; $j -le 5 ; $j++)
- {
- $aw[$j] = $matches[$j].trim()
- }
- $awline1 = $aw -join "`t"
-
- if ( ($a1line1 -ne $awline1 ) -or ($a2line1 -ne $awline1 ))
- {
- $alc.add( $_.basename + "`n" + $s1) | out-null
- $alc.add( "1" + $a1line1 + "`n" + "2" + $a2line1 + "`n" + "w" + $awline1 + "`n" ) | out-null
- }
- }
- }
- }
-
- $alc | out-file -encoding utf8 ($fileDest + "-c.txt")
复制代码
下面12w开头的行,对应三种方式- mdiv610ab21lr0035
- 21 SEE PIPING SPECIAL SUPPORT DRAWING SS1139 1
- 1 21 SEE PIPING SPECIAL SUPPORT DRAWING SS1139 1
- 2 21 SEE PIPING SPECIAL SUPPORT DRAWING SS1139 1
- w 21 SEE PIPING SPECIAL SUPPORT DRAWING SS1139 1
复制代码
作者: newswan 时间: 2021-6-27 11:07
用这个正则,和宽度分列一样了- $re2s = '^\s{2,3}(\d{1,2})\s{4,}(.+?)(?<=\S)\s{2,}(\d*(?:\s?X\d+)?)\s{2,}(\S+(?:\s\S+)*)\s{2,}([\d]+(?:[.]\d+)?(?:\sM)?)$'
复制代码
作者: idwma 时间: 2021-6-27 15:01
本帖最后由 idwma 于 2021-6-27 19:29 编辑
回复 52# jave000
完美匹配
改一下加几个分号可以自动分
再改一下,分号还是不稳改制表符来分隔
@echo off&setlocal EnableDelayedExpansion
set out=aa.txt
set d=3
for /f "delims=" %%a in ('mshta "about:<script>new ActiveXObject('Scripting.FileSystemObject').GetStandardStream(1).Write("\t\t");close();</script>"') do set tab=%%a
for /f "delims=" %%k in ('dir/b *.prt') do (
set na=%%~nk
set na=!na:~0,4!-!na:~4,4!-!na:~8,4!-!na:~12,5!-!na:~17!
call :con
for /f "delims=" %%i in (%%k) do (
set b=%%i
for /l %%j in (1,1,30) do (
if "!b:~0,3!" equ " %%j" set e=!b:~0,85! & set d=0
if "!b:~0,4!" equ " %%j" set e=!b:~0,85! & set d=0
)
if "!d!" equ "0" call :ff
if "!b:~0,11!" equ " " (if !b:~-3! neq NPD set c=!c!!b:~11! & set d=1)
if "!d!" equ "1" set f=!na! !tab! !eee!!c! !tab! !e:~50,13! !tab! !e:~63,15! !tab! !e:~-5!
if "!d!" equ "0" echo;!f! >> !out! && set "c=" & set "f=" & set d=1
echo;!f!
)
set d=3
)
echo;!f! >> !out!
pause
exit
:ff
for /l %%m in (49,-1,1) do (
set ee=!e:~0,49!
set eee=!ee:~1,%%m!
if not "!b:~%%m,1!" == " " goto :EOF
)
:con
for %%m in (A B C D E F G H I J K L M N O P Q R S T U V W X Y Z) do call set na=%%na:%%m=%%m%%
作者: WHY 时间: 2021-6-27 21:44
本帖最后由 WHY 于 2021-6-29 00:00 编辑
回复 51# jave000 - $fd = 'E:\Test\PRT'; #存放prt文件的目录路径
- $reg = '(?-i)^\s+(\d+)\s+(.+)(?<!\s)\s+(\d+(?:X\d+)?)\s+(.+)(?<![\sX])\s+(\d+(?:\.\d+)?)(?>\s+M)?$';
- [Collections.ArrayList]$out = @();
-
- forEach ( $f In (dir $fd -Filter *.prt) ){
- $flag = $false;
- $name = $f.BaseName.ToUpper();
- $name = $name -replace '^(....)(....)(....)(.....)(.+)$', '$1-$2-$3-$4-$5';
- forEach ( $s In @(gc $f.FullName) ){
- if ( $s -match $reg ){
- if ( $flag ){
- [void]$out.Add( '"' + ($arr -join '","') + '"' );
- }
- $arr = @( '', '', '', '', '', $name );
- for ( $i=1; $i -le 5; $i++ ){ $arr[$i-1] = $matches[$i].Replace('"', '""'); }
- $flag = $true;
- } elseif ( $s -match '^ {11}(\S.*)'){
- $arr[1] += ' ' + $matches[1].Replace('"', '""');
- }
- }
- if ( $flag ){
- [void]$out.Add( '"' + ($arr -join '","') + '"' );
- }
- }
-
- sc 1.csv $out;
复制代码
个别文件描述列本身包含了双引号,比如:mmdiv710ds22lr0121.prt,格式发生错乱
已修改。
作者: jave000 时间: 2021-6-27 23:44
回复 55# idwma
可能是运行模式不同,这个运行很慢。
我试了你的初始代码,生成的txt,复制到excel后会中间穿插空白列,序号和描述列也没有分开。但是更加完善了。
我把代码里的txt改成csv,生成的csv和刚才的txt一样大,但是分列很混乱了,没有想到这俩会有差别。
正文部分我一点也没看懂
末尾exit后还可以加代码我第一次见,也不知道是什么意思
涨知识了,谢谢
作者: jave000 时间: 2021-6-27 23:47
回复 56# WHY
尝试运行了一下,没有发现任何问题。
明天到公司再试试一些别的prt文件
感谢大神
作者: newswan 时间: 2021-6-28 13:57
- $pathSour = ".\prt"
- $fileDest = "22"
-
- if (-not ([string]::IsNullOrEmpty($args[0])))
- {
- $pathSour = $args[0]
- }
-
- remove-item $fileDest*.*
-
- [System.Collections.ArrayList] $al = @()
-
- $rem = "^\s{3}\d{1}|^\s{2}\d{2}|^\s{11,12}\S.+"
- $renfs = "(\w{4})(\w{1,4})?(\w{1,4})?(\w{1,5})?(.*)"
- $renfd = "`$1-`$2-`$3-`$4-`$5"
- $res = "`n(.{4})(.{46})(.{13})(.{15})(.+)((?:`n\s{11,12}.+)*)"
- #$res = "`n\s{2,3}(\d{1,2})\s{4,}(\S.+?\S)\s{2,}(\d+(?:X\d+)?)\s{2,}(\S+(?:\s\S+)*)\s{2,}(\d+(?:[.]\d+)?(?:\sM)?)((?:`n\s{11,12}.+)*)"
- $red = "`t`$1`t`$2`$6`t`$3`t`$4`t`$5::"
-
- get-childitem -path $pathSour "*.prt" | foreach-object {
- write-host " "$_.basename
- $nf = $_.basename.ToUpper() -replace $renfs,$renfd -replace "-+$","";
- $a = "`n" + (((get-content -Encoding utf8 -path $_.fullname ) -match $rem) -join "`n") ;
- if ( $a.length -gt 0 )
- {
- $a = $a -replace $res,$red -replace "[ `n]+"," " -replace " *`t *","`t"
- $a = $a -replace "::$","" -replace "::","`n" -replace "(?m)\sm$","" -replace "(?m)^(?=`t)",$nf
- $al.add($a) | out-null
- }
- }
- $al | out-file -encoding utf8 ($fileDest + "-a.txt")
复制代码
欢迎光临 批处理之家 (http://bathome.net./) |
Powered by Discuz! 7.2 |