Python-走思想目录来处理csv文件并保存它们

Question

I want to walk through a list of csv files inside folders, perform some calculation (always the same) on each file, and save a new file for each one. 我想遍历文件夹内的csv文件列表，对每个文件进行一些计算（总是相同），然后为每个文件保存一个新文件。

files have data structured in this manner: 文件的数据结构如下：

"[Couplet 10 : Jul]
C'est 1.3.5 sur la plaque
Fais ton biz coupe ta plaque
C'est JU, JU , JUL qui débarque
Pour mes blancs , beurres et blacks
Passe moi un stunt pour voir si sa cabre
Embrouilles sur le sable , cocotiers sur la sappe
Je dors pas je suis tout pâle, je dis pas que je suis 2Pac
Je dis pas lui je vais le tuer si j'ai même pas 2 balles
C'est pour ceux qui XXX fais gaffe les shmits l'impact
Son anti B.D.H anti tapette",1

(...)

So far I have: 到目前为止，我有：

match = "^[\(\[].*?[\)\]]"
for d in directories:
        dir = os.path.join(data_dir, d)
        files_ = [os.path.join(dir, f) 
                      for f in os.listdir(dir) 
                      if f.endswith(".csv")]
        for f in files_:
            with open(f, 'rb') as f1, open('out.csv', 'wb') as out_file:
                reader = csv.reader(f1, delimiter='\t')
                for item in list(reader):
                item = re.sub(match, ' ', item, flags=re.MULTILINE)      
                out_file.write(item)

but I get this traceback: 但我得到这个回溯：

File "process_csv.py", line 75, in load_data
    item = re.sub(match, ' ', item, flags=re.MULTILINE)      
  File "/Users/username/anaconda/lib/python2.7/re.py", line 155, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or buffer

what is the best way of achieving this? 实现此目标的最佳方法是什么？

Answer 1

According to the re docs , re.sub expects third parameter as a string. 根据re docs ，re.sub期望第三个参数为字符串。 But list(reader) returns list of lists with CSV fields, not strings. 但是list(reader)返回带有CSV字段而不是字符串的列表列表。 So you need to extract string from this lists and pass it to re.sub : 因此，您需要从此列表中提取字符串并将其传递给re.sub ：

item = re.sub(match, ' ', item[0], flags=re.MULTILINE)

or whatever index you need to use in the calculations. 或您需要在计算中使用的任何索引。

To understand it better, try: 为了更好地理解它，请尝试：

test.csv: 
a 
b 
c

>>> f = open('test.csv')
>>> reader = csv.reader(f)
>>> list(reader)
[['a'], ['b'], ['c']]

UPDATE UPDATE

To make it working on the real data example: 要使其在真实数据示例中起作用：

Set delimiter to " (by default) or change regex if quotes are important for processing. 如果引号对于处理很重要，请将定界符设置为" （默认情况下）或更改正则表达式。
Specify newline character as '' when opening files. 打开文件时，将换行符指定为'' 。 In python 2 open doesn't accept newline argument, use io package instead. 在python 2中open不接受newline ，请改用io包。 io file opening has the same signature in general. io文件打开通常具有相同的签名。 Explanation from CSV package documentation: CSV软件包文档中的说明：

If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \\r\\n linendings on write an extra \\r will be added. 如果未指定newline =''，则嵌入引号中的换行符将无法正确解释，并且在使用\\ r \\ n linendings的平台上将添加额外的\\ r。 It should always be safe to specify newline='', since the csv module does its own (universal) newline handling. 由于csv模块会执行自己的（通用）换行符处理，因此指定newline =''应该总是安全的。

 with open(f, 'rb', newline='') as f1, open('out.csv', 'wb', newline='') as out_file:
    ...

It seems that substitution required for 1st column, so use item[0] for sub 似乎第一列需要替换，因此请使用item [0]作为sub

Finally, corrected code: 最后，更正代码：

import io

...

match = "^[\(\[].*?[\)\]]"
for d in directories:
    dir = os.path.join(data_dir, d)
    files_ = [os.path.join(dir, f) 
                  for f in os.listdir(dir) 
                  if f.endswith(".csv")]
    for f in files_:
        with io.open(f, 'rb', newline='') as f1, io.open('out.csv', 'wb') as out_file:
            reader = csv.reader(f1)
            writer = csv.writer(out_file) 
            for item in reader:
                writer.writerow([
                    re.sub(match, ' ', item[0], flags=re.MULTILINE),
                    item[1]
                ])

Python-走思想目录来处理csv文件并保存它们

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-09-28 17:27:39

Python-走思想目录来处理csv文件并保存它们

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-09-28 17:27:39

解决方案1
1 已采纳 2017-09-28 17:27:39