[英]When I run the following code,I get this error:ValueError: invalid literal for int() with base 10: “(1, 0, 'Friday')”
When I run the following code,I get this 当我运行以下代码时,我得到了
ValueError: invalid literal for int() with base 10: "(1, 0, 'Friday')"
pointing to the line: 指向线:
monthwise = list(map(int, read_from_csv(infile_csv, month=True)[0])) .
I have included the sample .csv file 我已经包含了示例.csv文件
Output: Plot I need that compares monthly readership among subscribers and customers 输出:我需要比较订户和客户每月读者人数的图
import calendar
import datetime
infile_csv = 'C:/pythonscripts/NYC-2016-Summary.csv'
def read_from_csv(input_csvfile, duration=False, month=False, hour=False, day_of_week=False):
# assign columns name
if duration==True:
col_name='duration'
elif month==True:
col_name='month'
elif hour==True:
col_name='hour'
elif day_of_week==True:
col_name='day_of_week'
# init lists for output
n_ridership4column = []
n_ridership_sub = []
n_ridership_cust = []
with open(infile_csv, 'r') as f_in:
filereader = csv.DictReader(f_in)
for row in filereader:
n_ridership4column.append(row[col_name])
if row['user_type'] == 'Subscriber':
n_ridership_sub.append(row[col_name])
else:
n_ridership_cust.append(row[col_name])
return n_ridership4column, n_ridership_sub, n_ridership_cust
# using the function above to get monthly ridership
monthwise = list(map(int, read_from_csv(infile_csv, month=True)[0]))
monthwise_sub = list(map(int, read_from_csv(infile_csv, month=True)[1]))
monthwise_cust = list(map(int, read_from_csv(infile_csv, month=True)[2]))
The below code is for plotting. 以下代码用于绘图。 This is not required for the question but for the clarity of output.
对于问题而言,这不是必需的,但对于输出的清晰性而言,则不是必需的。
fig, ax = plt.subplots()
bins = [i for i in range(1,14)]
#upper bound is 14 to accomodate bin for december
#### Plotting monthly total along with customers and subscribers stacked
ax.hist(monthwise, bins=bins, edgecolor='k', align='left', label='Total Ridership', stacked= True)
ax.hist(monthwise_sub, bins=bins, edgecolor='k', align='left', label='Subscribers', stacked=True)
ax.hist(monthwise_cust, bins=bins, edgecolor='k', align='left', label='Customer', stacked=True)
ax.set_xticks(bins[:-1])
ax.set_xticklabels(list(calendar.month_abbr[i] for i in bins[:-1]))
plt.title('Monthly Ridership in NYC', fontsize=16)
plt.xlabel('Monthly', fontsize=14)
plt.ylabel('Rides', fontsize=14)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.legend()
plt.show()
This is my sample .csv file for the above code 这是上面代码的示例.csv文件
duration month hour day_of_week user_type
13.98333333 (1, 0, 'Friday') (1, 0, 'Friday') Customer
11.43333333 (1, 0, 'Friday') (1, 0, 'Friday') Subscriber
5.25 (1, 0, 'Friday') (1, 0, 'Friday') Subscriber
12.31666667 (1, 0, 'Friday') (1, 0, 'Friday') Subscriber
20.88333333 (1, 0, 'Friday') (1, 0, 'Friday') Customer
8.75 (1, 0, 'Friday') (1, 0, 'Friday') Subscriber
10.98333333 (1, 0, 'Friday') (1, 0, 'Friday') Subscriber
7.733333333 (1, 1, 'Friday') (1, 1, 'Friday') Subscriber
3.433333333 (1, 1, 'Friday') (1, 1, 'Friday') Subscriber
7.083333333 (1, 1, 'Friday') (1, 1, 'Friday') Customer
13.3 (1, 2, 'Friday') (1, 2, 'Friday') Subscriber
9.733333333 (1, 2, 'Friday') (1, 2, 'Friday') Subscriber
8.416666667 (1, 2, 'Friday') (1, 2, 'Friday') Subscriber
The error message means you are trying to parse a value which isn't numeric as an integer. 错误消息表示您正在尝试将非数字值解析为整数。 When you ask Python to do something it can't do (divide a number by zero, reference an undeclared variable, etc) it throws an error.
当您要求Python执行其无法执行的操作(将数字除以零,引用未声明的变量等)时,它将引发错误。 Usually the error message is clear enough, though when you are only just learning Python, you sometimes need to google.
通常,错误消息很清楚,尽管当您仅学习Python时,有时您需要使用google。
To the extent that blame can be allocated, whichever program wrote this broken pseudo-CSV is wrong and should be fixed or replaced. 就应负责任的程度而言,无论哪个程序编写此错误的CSV都是错误的,应予以修复或替换。 For CSV to be useful, it needs to be nomalized to one datum per field, though you see this principle violated from time to time.
为了使CSV有用,需要将其标准化为每个字段一个基准,尽管您有时会发现违反此原则。 Writing a composite field in a Python-specific format is at the very least misdirected, and in this case quite probably a bug.
至少以Python特定格式编写复合字段是错误的,在这种情况下很可能是一个错误。
In addition, there is one column less in the sample data than the sample headers suggest. 此外,样本数据中的一列比样本标题所建议的少一列。 On the other hand, columns 2 and 3 always seem to be identical, and vaguely appear to be composed of values which would fit the apparently expected values for columns 2, 3, and 4 in the headers.
另一方面,第2列和第3列始终看起来是相同的,并且模糊地似乎由适合表头中第2列,第3列和第4列的预期值的值组成。
Your code is weird in that it seems to read the file every time it wants to extract a column. 您的代码很奇怪,因为它似乎在每次要提取列时都读取文件。 This might vaguely make sense if your input file is too huge to fit into memory all at once;
如果您的输入文件太大而无法一次全部放入内存,那么这可能会有些含糊。 but in the absence of any such concerns in your question or in comments within the code, I would recommend reading all the columns into memory once.
但是如果您的问题或代码中的注释中没有任何此类问题,我建议一次将所有列读入内存。 This should also make your program at least an order of magnitude faster.
这也应该使您的程序至少快一个数量级。
The DictReader
already takes care of collecting its input into an OrderedDict
so the append
in the loop is simply duplicating work which this Python library is already performing for you. DictReader
已经负责将其输入收集到OrderedDict
因此循环中的append
只是复制了该Python库已经为您执行的工作。
Perhaps something like this would suit your needs if you are stuck with this broken CSV. 如果您坚持使用此损坏的CSV,则可能会满足您的需求。
def parse_broken_csv(filename):
rows = []
with open(filename, 'r') as fh:
reader = csv.reader(fh, delimiter='\t')
headers = reader.__next__()
for duration, tpl, _, cust in reader:
month, hour, dow = tpl.strip('()').split(', ')
rows.append({
'duration': float(duration),
'month': int(month),
'hour': int(hour),
'day_of_week': dow.strip("'"),
'cust': cust})
return rows
rows = parse_broken_csv(r'NYC-2016-Summary.csv')
monthwise = [row['month'] for row in rows]
monthwise_sub = [row['month'] for row in rows if row['cust'] == 'Subscriber']
monthwise_cust = [row['month'] for row in rows if row['cust'] == 'Customer']
For the sample CSV you posted, the value of rows
is 对于您发布的示例CSV,
rows
值为
[
{'duration': 13.98333333, 'month': 1, 'day_of_week': 'Friday', 'cust': 'Customer', 'hour': 0},
{'duration': 11.43333333, 'month': 1, 'day_of_week': 'Friday', 'cust': 'Subscriber', 'hour': 0},
{'duration': 5.25, 'month': 1, 'day_of_week': 'Friday', 'cust': 'Subscriber', 'hour': 0},
{'duration': 12.31666667, 'month': 1, 'day_of_week': 'Friday', 'cust': 'Subscriber', 'hour': 0},
{'duration': 20.88333333, 'month': 1, 'day_of_week': 'Friday', 'cust': 'Customer', 'hour': 0},
{'duration': 8.75, 'month': 1, 'day_of_week': 'Friday', 'cust': 'Subscriber', 'hour': 0},
{'duration': 10.98333333, 'month': 1, 'day_of_week': 'Friday', 'cust': 'Subscriber', 'hour': 0}
]
and the value of monthwise
is 并且
monthwise
的值是
[1, 1, 1, 1, 1, 1, 1]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.