![](/img/trans.png)
[英]csv file contains data with special chars including comma(,) , (\) and (“ ”) .Unable to create df with correct no of columns? -py-spark
[英]Unable to generate correct hash table for columns in CSV file
我有一個包含以下各列的CSV文件,
ZoneMaterialName1,ZoneThickness1
Copper,2.5
Copper,2.5
Aluminium,3
Zinc,
Zinc,
Zinc,6
Aluminium,4
可以看出,某些值重復多次,有時可能是空白或一個句點。
我想要僅具有唯一值的哈希表,例如
ZoneMaterialName1,ZoneThickness1
Copper:[2.5]
Aluminium:[3,4]
Zinc:[6]
這是我想出的代碼,輸出缺少2.5之類的浮點數,並且也允許空白和句點。
import csv
from collections import defaultdict
import csv
afile = open('/mnt/c/python_test/Book2.csv', 'r+')
csvReader1 = csv.reader(afile)
reader = csv.DictReader(open('/mnt/c/python_test/Book2.csv'))
nodes = defaultdict(type(''))
for row in reader:
if (row['ZoneThickness1'] !=' ' and row['ZoneThickness1'] !='.'):
nodes[row['ZoneMaterialName1']]+=(row['ZoneThickness1'])
new_dict = {a:list(set(b)) for a, b in nodes.items()}
print new_dict
方法:我最初創建了一個字典,並將其值轉換為集合。
我建議您嘗試將第二列強制轉換為float
並僅添加那些是有效浮點數的值。 同樣,您可以使用一個set
來避免某些材料的重復值。
可以這樣完成(我使用了Python 3.x
因為您為兩個python版本都標記了這個問題):
import collections
import csv
result = collections.defaultdict(set)
with open('test.txt', 'r') as f:
csv_r = csv.DictReader(f)
for row in csv_r:
try:
v = float(row['ZoneThickness1'])
except ValueError:
# skip this line, because it is not a valid float
continue
# this will add the material if it doesn't exist yet and
# will also add the value if it doesn't exist yet for this material
result[row['ZoneMaterialName1']].add(v)
for k, v in result.items():
print(k, v)
這給出以下輸出:
Copper {2.5}
Aluminium {3.0, 4.0}
Zinc {6.0}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.