简体   繁体   English

一个正则表达式,用于匹配没有引号引起来的选项卡

[英]A regex to match a tab that isn't surrounded by quotes

i have following string: 我有以下字符串:

ID Table 1 Table 2
1 "Column 1 Column 2 Column 3
1 2 3
4 5 6
7 8 9" "Column A Column B Column C
a b c
d e f
g h i"

The first row contains the columnheaders (ID, Table1 ,Table 2). 第一行包含列标题(ID,Table1,Table 2)。 The secound row the data. secound行数据。

The string is copied via the clipboard from this Excel-Sheet: http://i.stack.imgur.com/5lwaT.png 该字符串是通过剪贴板从此Excel表格中复制的: http : //i.stack.imgur.com/5lwaT.png

Columns are separated by \\t, line by \\r\\n. 列用\\ t分隔,行用\\ r \\ n分隔。 B2 and C2 are tables. B2和C2是表。 Her Columns and Row are seprareted by \\t and \\r\\n, too. 她的专栏和行也用\\ t和\\ r \\ n分隔。 Each Table are surrounded by Quotes. 每个表都用引号引起来。

Now i split the Row: 现在我将行拆分:

Dim rows() as String
Regex = New Regex("\r\n")
rows = Regex.Split(MyString)

That returns: 返回:

ID Table 1 Table 2

and

1 "Column 1 Column 2 Column 3
1 2 3
4 5 6
7 8 9" "Column A Column B Column C
a b c
d e f
g h i"

Now i need to split the Lines, but i need a pattern that returns every tab that isn't surrounded by quotes. 现在,我需要拆分行,但是我需要一个模式,该模式返回每个未用引号引起来的选项卡。

Can anybody help me with the regex? 有人可以帮我使用正则表达式吗?

Thanks :) 谢谢 :)

I use this for my CSV files, but should, with some minor tweaking, get it to work with tab-delimited as well: 我将其用于CSV文件,但应进行一些细微调整,使其也可以与制表符分隔符一起使用:

Regex rExp = new Regex(@"(?:^|\x09)(\""(?:[^\""]+|\"\")*\""|[^\x09]*)");

And for reference, CSV Regex: 供参考,CSV正则表达式:

Regex rExp = new Regex(@"(?:^|,)(\""(?:[^\""]+|\""\"")*\""|[^,]*)");

Please not this will capture the surrounding quotes as well. 请不要这样也会捕获周围的报价。

EDIT 编辑

Maybe I'm presuming too much, but it seems like you're trying to get the values and are getting caught up on the delimiter. 也许我想太多了,但似乎您正在尝试获取值并陷入定界符中。 This will capture the values within the delimiters. 这将捕获定界符内的值。

EDITv2 编辑v2

Used verbatim strings 使用的逐字字符串

Because I'm too tired to think of a good answer, here's a hack one instead. 因为我太累了,无法想到一个好的答案,所以这里有一个hack。 If you can be sure that the quotes are paired, you could hack this easily in three steps: 如果您可以确定引号是成对的,则可以通过以下三个步骤轻松破解:

  1. Find the tabs that ARE in the quotes and swap them out. 找到引号中的“ ARE”选项卡并将其换出。
  2. Split on tabs 在标签上拆分
  3. Put the real tabs back in. 重新放入真实标签。

Like so: 像这样:

// JS psuedo-code
str = str.replace( /("[^"]*)\t([^"]*")/g, '$1ëïÒ$2' );
pieces = str.split( /\t/ );
for (var i=0,len=pieces.length;i<len;++i){
  pieces[i] = pieces[i].replace( /ëïÒ/g, "\t" );
}

The horrible hack part of this is using a replacement string that you can hope will never occur naturally. 骇人听闻的技巧是使用替换字符串,您可以希望该字符串永远不会自然发生。

What you are trying to do is creating your CSV parser (replace comma with tab in your case). 您要尝试创建的是CSV分析器(在您的情况下,用制表符替换逗号)。 There is great article about why you should not do this: http://secretgeek.net/csv_trouble.asp I once tried to write my own parser but then stopped because it is really not that easy. 有一篇很棒的文章介绍了为什么您不应该这样做: http : //secretgeek.net/csv_trouble.asp我曾经尝试编写自己的解析器,但后来停止了,因为它确实不那么容易。 Check this free one . 检查这个免费的 It saved couple of hours for me. 它为我节省了几个小时。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM