如何在Python的csv ROW中提取和之间的内容（如果存在）

Question

The content of the csv is as follows: csv的内容如下：

"Washington-Arlington-Al, DC-VA-MD-WV  (MSAD)"  47894  1976
"Grand-Forks, ND-MN"                            24220  2006
"Abilene, TX"                                   10180  1977

The output required is read through the csv, find the content between "" in column 1 and fetch only DC-VA-MD-WV , ND-MN , TX and put this content in a new column. 通过csv读取所需的输出，在第1列中的“”之间找到内容，并仅获取DC-VA-MD-WV，ND-MN和TX并将此内容放入新列中。 (For Normalization) （用于归一化）

So far tried a lot of regex patterns in python, but could not get the right one. 到目前为止，在python中尝试了很多正则表达式模式，但没有找到正确的模式。

sample=""" "Washington-Arlington-Al, DC-VA-MD-WV  (MSAD)",47894,1976
           "Grand-Forks, ND-MN",24220,2006
           "Abilene, TX",10180,1977  """
 open('sample.csv','w').write(sample)
 with open('sample.csv') as sample, open('output.csv','w') as output:
    reader = csv.reader(sample)
    writer = csv.writer(output)
    for comsplit in row[0].split(','):
        writer.writerow([ comsplit, row[1]])
    print open('output.csv').read()

Output Expected is: 预期输出为：

DC-VA-MD-WV
ND-MN
TX

in a new row 在新行中

Answer 1

I'd do it like this: 我会这样：

with open('csv_file.csv', 'r') as f_in, open('output.csv', 'w') as f_out:
    csv_reader = csv.reader(f_in, quotechar='"', delimiter=',',
                            quoting=csv.QUOTE_ALL, skipinitialspace=True)
    csv_writer = csv.writer(f_out)
    new_csv_list = []
    for row in csv_reader:
        first_entry = row[0].strip('"')
        relevant_info= first_entry.split(',')[1].split('  ')[0]
        row += [relevant_info]
        new_csv_list += [row]
    for row in new_csv_list:
        csv_writer.writerow(row)

Let me know if you have any questions. 如果您有任何疑问，请告诉我。

Answer 2

There is no need to use regex here provided a couple of things: 提供了以下几点，因此无需使用正则表达式：

The city (?) always has a comma after it followed by 1 space of whitespace (though I could add a modification to accept more than 1 bit of whitespace if needed) 城市（？）后面总是有一个逗号，后跟1个空格（尽管我可以添加一个修改以接受超过1位的空格）
There is a space after your letter sequence before encountering something like (MSAD) . 在遇到类似(MSAD)类的字母序列之后，还有一个空格。

This code gives your expected output against the sample input: 这段代码针对示例输入给出了预期的输出：

with open('sample.csv', 'r') as infile, open('expected_output.csv', 'wb') as outfile:
    reader = csv.reader(infile)
    expected_output = []
    for row in reader:
        split_by_comma = row[0].split(',')[1]
        split_by_space = split_by_comma.split(' ')[1]
        print split_by_space   
        expected_output.append([split_by_space])

    writer = csv.writer(outfile)
    writer.writerows(expected_output)

Answer 3

I believe you could use this regex pattern, which will extract any alphanumeric expression (with hyphen or not) between a comma and a parenthesis: 我相信您可以使用此正则表达式模式，该模式将提取逗号和括号之间的所有字母数字表达式（带或不带连字符）：

import re
BETWEEN_COMMA_PAR = re.compile(ur',\s+([\w-]+)\s+\(')
test_str = 'Washington-Arlington-Al, DC-VA-MD-WV  (MSAD)'
result = BETWEEN_COMMA_PAR.search(test_str)
if result != None:
    print result.group(1)

This will print as a result: DC-VA-MD-WV , as expected. 结果将显示为： DC-VA-MD-WV ，如预期的那样。

Answer 4

It seems that you are having troubles finding the right regex to use for finding the expected values. 似乎您很难找到用于查找期望值的正确regex 。

I have created a small sample pythext which will satisfy your requirement. 我创建了一个小样本pythext ，它将满足您的要求。

Basically, when you check the content of every value of the first column, you could use a regex like /(TX|ND-MN|DC-VA-MD-WV)/ 基本上，当您检查第一列的每个值的内容时，可以使用正则表达式，例如/(TX|ND-MN|DC-VA-MD-WV)/

I hope this was useful! 我希望这是有用的！ Let me know if you need further explanations. 让我知道您是否需要进一步的解释。

如何在Python的csv ROW中提取和之间的内容（如果存在）

问题描述

4 个解决方案

解决方案1
1 2017-02-07 14:43:23

解决方案2
1 已采纳 2017-02-07 15:08:04

解决方案3
1 2017-02-07 17:16:36

解决方案4
0 2017-02-07 15:46:09

如何在Python的csv ROW中提取和之间的内容（如果存在）

问题描述

4 个解决方案

解决方案1 1 2017-02-07 14:43:23

解决方案2 1 已采纳 2017-02-07 15:08:04

解决方案3 1 2017-02-07 17:16:36

解决方案4 0 2017-02-07 15:46:09

解决方案1
1 2017-02-07 14:43:23

解决方案2
1 已采纳 2017-02-07 15:08:04

解决方案3
1 2017-02-07 17:16:36

解决方案4
0 2017-02-07 15:46:09