![](/img/trans.png)
[英]Text qualifiers getting misplaced while splitting csv file using Python
[英]Text qualifiers getting misplaced while trying to remove extra delimiters in csv file using python
我正在嘗試使用python腳本刪除數據之間的多余定界符。 我通常使用大型數據集。 例如:
"abc","def","ghi","jkl","mno","pqr"
"","","fds","dfs","adfadf","AAAA111"
"","","fds","df,s","adfadf","AAAA111"
如果運行腳本,該腳本將刪除第2行“ df,s”中多余的定界符:
"abc","def","ghi","jkl","mno","pqr"
"","","fds","dfs","adfadf","AAAA111"
"","","fds","dfs","adfadf","AAAA111"
我能夠針對一種數據類型正確運行腳本,但是我注意到很少有文本限定符數據,文本限定符放錯了位置,結果如下所示:
"abc","def","ghi","jkl","mno","pqr"
"""","""""""""","""""fds""""","""""dfs""""","""""adfadf""""","AAAA111""""
"""","""""""""","""""fds""""","""""dfs""""","""""adfadf""""","AAAA111""""
腳本是:
#export the data
# with correct quoting, and that you are stuck with what you have.
import csv
from csv import DictWriter
with open("big-12.csv", newline='') as people_file:
next(people_file)
corrected_people = []
for person_line in people_file:
chomped_person_line = person_line.rstrip()
person_tokens = chomped_person_line.split(",")
# check that each field has the expected type
try:
corrected_person = {
"abc":person_tokens[0],
"def":person_tokens[1],
"ghi":person_tokens[2],
"jkl":"".join(person_tokens[3:-3]),
"mno":person_tokens[-2],
"pqr":person_tokens[-1]
}
if not corrected_person["DR_CR"].startswith(
"") and corrected_person["DR_CR"] !="n/a":
raise ValueError
corrected_people.append(corrected_person)
except (IndexError, ValueError):
# print the ignored lines, so manual correction can be performed later.
print("Could not parse line: " + chomped_person_line)
with open("corrected_people.txt", "w", newline='') as corrected_people_file:
writer = DictWriter(
corrected_people_file,
fieldnames=[
"abc", "def", "ghi", "jkl", "mno", "pqr"
],delimiter=',',quoting=csv.QUOTE_ALL)
writer.writeheader()
writer.writerows(corrected_people)
該腳本刪除了中間的多余定界符,但是我在使用文本限定符時遇到了麻煩。 如果解決了文本限定詞問題,那么它將大有幫助。 Python版本Python 3.6.0 :: Anaconda 4.3.1(64位)
writer = DictWriter(
corrected_people_file,
fieldnames=[
"abc", "def", "ghi", "jkl", "mno", "pqr"
],delimiter=',',quoting=csv.QUOTE_ALL)
QUOTE_ALL
將強制所有字段加引號,而現有的雙引號將被另一個雙引號轉義。
因此,請嘗試使用QUOTE_NONE
或QUOTE_MINIMAL
,或在寫入之前QUOTE_MINIMAL
引號的字段。
我在使用文字限定詞時遇到麻煩
同樣,引號字段並不意味着它們是文本還是數字,引號僅用於允許嵌入分隔符,並且也可以在數字字段周圍。
通常,使用csv閱讀器而不是split()
會更好,更安全。 使用csv閱讀器"df,s"
因為使用了引號,所以可以正確讀取"df,s"
字段。 然后,
您可以從單個字段中刪除。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.