簡體   English   中英

嘗試使用python刪除csv文件中的多余定界符時,文本修飾符放錯了位置

[英]Text qualifiers getting misplaced while trying to remove extra delimiters in csv file using python

我正在嘗試使用python腳本刪除數據之間的多余定界符。 我通常使用大型數據集。 例如:

"abc","def","ghi","jkl","mno","pqr"
"","","fds","dfs","adfadf","AAAA111"
"","","fds","df,s","adfadf","AAAA111"

如果運行腳本,該腳本將刪除第2行“ df,s”中多余的定界符:

"abc","def","ghi","jkl","mno","pqr"
"","","fds","dfs","adfadf","AAAA111"
"","","fds","dfs","adfadf","AAAA111"

我能夠針對一種數據類型正確運行腳本,但是我注意到很少有文本限定符數據,文本限定符放錯了位置,結果如下所示:

"abc","def","ghi","jkl","mno","pqr"
"""","""""""""","""""fds""""","""""dfs""""","""""adfadf""""","AAAA111""""
"""","""""""""","""""fds""""","""""dfs""""","""""adfadf""""","AAAA111""""

腳本是:

#export the data
# with correct quoting, and that you are stuck with what you have.
import csv
from csv import DictWriter

with open("big-12.csv", newline='') as people_file:
    next(people_file)
    corrected_people = []
    for person_line in people_file:
        chomped_person_line = person_line.rstrip()
        person_tokens = chomped_person_line.split(",")

        # check that each field has the expected type
        try:
            corrected_person = {
"abc":person_tokens[0],
"def":person_tokens[1],
"ghi":person_tokens[2],
"jkl":"".join(person_tokens[3:-3]),
"mno":person_tokens[-2],
"pqr":person_tokens[-1]

            }

            if not corrected_person["DR_CR"].startswith(
                    "") and corrected_person["DR_CR"] !="n/a":
                raise ValueError

            corrected_people.append(corrected_person)
        except (IndexError, ValueError):
            # print the ignored lines, so manual correction can be performed later.
            print("Could not parse line: " + chomped_person_line)

    with open("corrected_people.txt", "w", newline='') as corrected_people_file:
        writer = DictWriter(
            corrected_people_file,
            fieldnames=[
                "abc", "def", "ghi", "jkl", "mno", "pqr"
          ],delimiter=',',quoting=csv.QUOTE_ALL)
        writer.writeheader()
        writer.writerows(corrected_people)

該腳本刪除了中間的多余定界符,但是我在使用文本限定符時遇到了麻煩。 如果解決了文本限定詞問題,那么它將大有幫助。 Python版本Python 3.6.0 :: Anaconda 4.3.1(64位)

writer = DictWriter(
    corrected_people_file,
    fieldnames=[
        "abc", "def", "ghi", "jkl", "mno", "pqr"
    ],delimiter=',',quoting=csv.QUOTE_ALL)

QUOTE_ALL將強制所有字段加引號,而現有的雙引號將被另一個雙引號轉義。

因此,請嘗試使用QUOTE_NONEQUOTE_MINIMAL ,或在寫入之前QUOTE_MINIMAL引號的字段。

我在使用文字限定詞時遇到麻煩

同樣,引號字段並不意味着它們是文本還是數字,引號僅用於允許嵌入分隔符,並且也可以在數字字段周圍。


通常,使用csv閱讀器而不是split()會更好,更安全。 使用csv閱讀器"df,s"因為使用了引號,所以可以正確讀取"df,s"字段。 然后,您可以從單個字段中刪除。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM