[英]How to drop rows from a dataframe based on condition in python?
所以我有一个 CSV 文件,其中包含以下方式的数据:
|Variable |Time |Value|
|A1 |Jan | 33 |
| |Feb | 21 |
| |Mar | 08 |
| |Apr | 17 |
| |May | 04 |
| |Jun | 43 |
| |Jul | 40 |
| |Aug | 37 |
| |Sep | 30 |
| |Oct | 46 |
| |Nov | 10 |
| |Dec | 13 |
| B1 |Jan | 20 |
| |Feb | 11 |
| |Mar | 02 |
| |Apr | 18 |
| |May | 10 |
| |Jun | 35 |
| |Jul | 45 |
| |Aug | 32 |
| |Sep | 39 |
| |Oct | 42 |
| |Nov | 15 |
| |Dec | 18 |
像这样一直持续到 A10 和 B10。
我只需要从一月到十二月的时间以及与 B 对应的值和删除值。怎么做? 会是什么条件?
两种不同的方法:
如果列宽是固定的:
df = pd.read_fwf('file.csv', colspecs=[(1,9), (11,16), (17, 22)])
df = df[df.replace('', np.nan).ffill()['Variable'].str.startswith('A')]
print(df)
输出:
Variable Time Value
0 A1 Jan 33
1 Feb 21
2 Mar 8
3 Apr 17
4 May 4
5 Jun 43
6 Jul 40
7 Aug 37
8 Sep 30
9 Oct 46
10 Nov 10
11 Dec 13
如果事情更脏:
with open('file.csv', 'r') as f:
df = pd.DataFrame([[y.strip() for y in x.split('|')[1:4]] for x in f.readlines() if x.strip()])
df.columns = df.iloc[0].values
df = df.drop(0).reset_index(drop=True)
df['Value'] = pd.to_numeric(df['Value'])
print(df)
输出:
Variable Time Value
0 A1 Jan 33
1 Feb 21
2 Mar 8
3 Apr 17
4 May 4
5 Jun 43
6 Jul 40
7 Aug 37
8 Sep 30
9 Oct 46
10 Nov 10
11 Dec 13
12 B1 Jan 20
13 Feb 11
14 Mar 2
15 Apr 18
16 May 10
17 Jun 35
18 Jul 45
19 Aug 32
20 Sep 39
21 Oct 42
22 Nov 15
23 Dec 18
使用 pandas 的ffill()
来估算变量列,以方便进行所需的选择,如下所示。
sample = pd.read_csv('sample.csv')
sample['Variable'].ffill(axis=0,inplace=True)
sample = sample.loc[sample['Variable'].str.startswith('A')]
n_months = 12
indexes_to_impute_as_empty = list(range(0,len(sample),n_months))
sample.loc[indexes_to_impute_as_empty,'temp_Variable'] = sample.loc[indexes_to_impute_as_empty,'Variable']
sample['Variable'] = sample['temp_Variable']
sample.drop(columns=['temp_Variable'],inplace=True)
sample.replace(np.nan,"",inplace=True)
sample
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.