![](/img/trans.png)
[英]How do I assign the output of a `str.contains` to a Pandas column?
[英]How do I get pandas str.contains() to correctly select the rows with 'Virginia' and 'West Virginia'?
我正在嘗試解析包含 state 列的 csv 。 我想從一個聚合的 csv 中為每個單獨的 state 制作一個 csv。 該代碼為“弗吉尼亞”和“西弗吉尼亞”生成了 dataframe,但問題是“弗吉尼亞”df 還包括所有“西弗吉尼亞”行。 有想法該怎么解決這個嗎? 通過設置 regex=False,我能夠解決與“Arkansas”和“Kansas”相同的問題。
df = pd.read_csv(io.StringIO(stat.decode('utf-8')))
states = parse(df, 'state')
write_states(df, states)
def parse(df, suffix):
df = df.sort_values(by=[suffix])
df = df[suffix]
df = df.drop_duplicates()
df = [df for df in df]
return df
def write_states(df, states):
mk_dir('states')
print(f"writing to '{os.path.join(os.getcwd(), 'states')}'")
d = df
s = tqdm(states, ncols=103, leave=False, ascii=' #')
for state in s:
s.set_description(state)
df = d[d['state'].str.contains(state, regex=False)]
dates = np.array(df['date'], dtype='datetime64')
states = np.array(df['state'])
total_cases = np.array(df['cases'], dtype='int64')
total_deaths = np.array(df['deaths'], dtype='int64')
new_cases = get_diff(total_cases)
new_deaths = get_diff(total_deaths)
df = pd.DataFrame({'date': dates, 'state': states, 'total cases': total_cases,
'total deaths': total_deaths, 'new cases': new_cases, 'new deaths': new_deaths})
df.to_csv(f"states/{state}.csv", index=False)
在正則表達式中添加^
和$
怎么樣? 這應該可以處理諸如 West/Virginia、Ar/kansas 等的歧義。
df = d[d['state'].str.contains(f'^{state}$', case=False)]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.