[英]Solving ValueError: cannot convert float NaN to integer
I am writing a function that returns a dictionary where the year of the creation date of all the citations in the dataset is used as key and, as value, it specifies a tuple of two items returning by the function do_get_citations_per_year
.我正在编写一个函数,该函数返回一个字典,其中数据集中所有引用的创建日期的年份用作键,作为值,它指定函数do_get_citations_per_year
返回的两个项目的元组。
def do_get_citations_per_year(data, year):
result = tuple()
my_ocan['creation'] = pd.DatetimeIndex(my_ocan['creation']).year
len_citations = len(my_ocan.loc[my_ocan["creation"] == year, "creation"])
timespan = my_ocan.loc[my_ocan["creation"] == year, "timespan"].fillna(0).mean()
result = (len_citations, round(timespan))
return result
def do_get_citations_all_years(data):
mydict = {}
s = set(my_ocan.creation)
print(s)
for year in s:
mydict[year] = do_get_citations_per_year(data, year)
#print(mydict)
return mydict
I keep getting the error message:我不断收到错误消息:
(32, 240)
{2016, 2017, 2018, 2013, 2015}
File "/Users/lisa/Desktop/yopy/execution_example.py", line 28, in <module>
print(my_ocan.get_citations_all_years())
File "/Users/lisa/Desktop/yopy/ocan.py", line 35, in get_citations_all_years
return do_get_citations_all_years(self.data)
File "/Users/lisa/Desktop/yopy/lisa.py", line 113, in do_get_citations_all_years
mydict[year] = do_get_citations_per_year(data, year)
File "/Users/lisa/Desktop/yopy/lisa.py", line 103, in do_get_citations_per_year
result = (len_citations, round(timespan))
ValueError: cannot convert float NaN to integer
Process finished with exit code 1
UPDATE: To provide a working example I am posting here other function, specifically the one that processes my dataframe (my_ocan) do_process_citation_data(f_path)
and my parsing function parse_timespan
:更新:为了提供一个工作示例,我在这里发布了其他函数,特别是处理我的数据帧 (my_ocan) do_process_citation_data(f_path)
和我的解析函数parse_timespan
函数:
def do_process_citation_data(f_path):
global my_ocan
my_ocan = pd.read_csv(f_path, names=['oci', 'citing', 'cited', 'creation', 'timespan', 'journal_sc', 'author_sc'],
parse_dates=['creation', 'timespan'])
my_ocan = my_ocan.iloc[1:] # to remove the first row
my_ocan['creation'] = pd.to_datetime(my_ocan['creation'], format="%Y-%m-%d", yearfirst=True)
my_ocan['timespan'] = my_ocan['timespan'].map(parse_timespan)
print(my_ocan['timespan'])
return my_ocan
#print(my_ocan['timespan'])
timespan_regex = re.compile(r'P(?:(\d+)Y)?(?:(\d+)M)?(?:(\d+)D)?')
def parse_timespan(timespan):
# check if the input is a valid timespan
if not timespan or 'P' not in timespan:
return None
# check if timespan is negative and skip initial 'P' literal
curr_idx = 0
is_negative = timespan.startswith('-')
if is_negative:
curr_idx = 1
# extract years, months and days with the regex
match = timespan_regex.match(timespan[curr_idx:])
years = int(match.group(1) or 0)
months = int(match.group(2) or 0)
days = int(match.group(3) or 0)
timespan_days = years * 365 + months * 30 + days
return timespan_days if not is_negative else -timespan_days
When I print my_ocan['timespan']当我打印 my_ocan['timespan']
I get:我得到:
1 486.0
2 1080.0
3 730.0
4 824.0
5 365.0
6 0.0
...
I think that the problem is 0.0我认为问题是 0.0
How could I solve this float NaN to integer problem?我怎样才能解决这个浮点 NaN 到整数问题?
Thank you in advance!先感谢您!
I have tried with python 2.7 this:我试过用python 2.7这个:
>>> round(float('NaN'))
nan
>>> round(float(0.0))
0.0
And this with python 3.6:这与python 3.6:
>>> round(float('NaN'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: cannot convert float NaN to integer
>>> round(float(0.0))
0
So it seems that you are getting any NaN values into the round function.因此,您似乎将任何 NaN 值放入轮函数中。 You can use a try except statement to manage this problem:您可以使用 try except 语句来管理此问题:
try:
result = (len_citations, round(timespan))
except ValueError:
result = (len_citations, 0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.