简体   繁体   English

尝试使用python删除csv文件中的多余定界符时,文本修饰符放错了位置

[英]Text qualifiers getting misplaced while trying to remove extra delimiters in csv file using python

I am trying to remove extra delimiters in between the data using a python script. 我正在尝试使用python脚本删除数据之间的多余定界符。 I usually work with large data sets. 我通常使用大型数据集。 For example: 例如:

"abc","def","ghi","jkl","mno","pqr"
"","","fds","dfs","adfadf","AAAA111"
"","","fds","df,s","adfadf","AAAA111"

If I run the script, the script will remove the extra delimiter in line 2 "df,s": 如果运行脚本,该脚本将删除第2行“ df,s”中多余的定界符:

"abc","def","ghi","jkl","mno","pqr"
"","","fds","dfs","adfadf","AAAA111"
"","","fds","dfs","adfadf","AAAA111"

I was able to run the script properly for one data type ,but i noticed for few text qualifier data the text qualifiers got misplaced and the result came out like this: 我能够针对一种数据类型正确运行脚本,但是我注意到很少有文本限定符数据,文本限定符放错了位置,结果如下所示:

"abc","def","ghi","jkl","mno","pqr"
"""","""""""""","""""fds""""","""""dfs""""","""""adfadf""""","AAAA111""""
"""","""""""""","""""fds""""","""""dfs""""","""""adfadf""""","AAAA111""""

The script is: 脚本是:

#export the data
# with correct quoting, and that you are stuck with what you have.
import csv
from csv import DictWriter

with open("big-12.csv", newline='') as people_file:
    next(people_file)
    corrected_people = []
    for person_line in people_file:
        chomped_person_line = person_line.rstrip()
        person_tokens = chomped_person_line.split(",")

        # check that each field has the expected type
        try:
            corrected_person = {
"abc":person_tokens[0],
"def":person_tokens[1],
"ghi":person_tokens[2],
"jkl":"".join(person_tokens[3:-3]),
"mno":person_tokens[-2],
"pqr":person_tokens[-1]

            }

            if not corrected_person["DR_CR"].startswith(
                    "") and corrected_person["DR_CR"] !="n/a":
                raise ValueError

            corrected_people.append(corrected_person)
        except (IndexError, ValueError):
            # print the ignored lines, so manual correction can be performed later.
            print("Could not parse line: " + chomped_person_line)

    with open("corrected_people.txt", "w", newline='') as corrected_people_file:
        writer = DictWriter(
            corrected_people_file,
            fieldnames=[
                "abc", "def", "ghi", "jkl", "mno", "pqr"
          ],delimiter=',',quoting=csv.QUOTE_ALL)
        writer.writeheader()
        writer.writerows(corrected_people)

This script is removing the extra delimiters in between but im having trouble with the text qualifiers. 该脚本删除了中间的多余定界符,但是我在使用文本限定符时遇到了麻烦。 If the text qualifiers issue is revolved then it wll be of great help. 如果解决了文本限定词问题,那么它将大有帮助。 Python version Python 3.6.0 :: Anaconda 4.3.1 (64-bit) Python版本Python 3.6.0 :: Anaconda 4.3.1(64位)

writer = DictWriter(
    corrected_people_file,
    fieldnames=[
        "abc", "def", "ghi", "jkl", "mno", "pqr"
    ],delimiter=',',quoting=csv.QUOTE_ALL)

QUOTE_ALL will force all fields to be quoted, and existing double quotes will be escaped with another double quote. QUOTE_ALL将强制所有字段加引号,而现有的双引号将被另一个双引号转义。

So try to use QUOTE_NONE or QUOTE_MINIMAL , or strip the fields of quotes before writing. 因此,请尝试使用QUOTE_NONEQUOTE_MINIMAL ,或在写入之前QUOTE_MINIMAL引号的字段。

I'm having trouble with the text qualifiers 我在使用文字限定词时遇到麻烦

Also, quoting fields does not mean those are text vs number, the quotes are only there to allow embedded separator characters and can be around numerical fields as well. 同样,引号字段并不意味着它们是文本还是数字,引号仅用于允许嵌入分隔符,并且也可以在数字字段周围。


In general it is better and safer to use a csv reader instead of using split() . 通常,使用csv阅读器而不是split()会更好,更安全。 With a csv reader the field "df,s" will be read correctly since it's quoted. 使用csv阅读器"df,s"因为使用了引号,所以可以正确读取"df,s"字段。 You can then remove the , from that single field. 然后,您可以从单个字段中删除。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM