[英]How to drop rows from a dataframe based on condition in python?
So I have a CSV file that has data in the following manner:所以我有一个 CSV 文件,其中包含以下方式的数据:
|Variable |Time |Value|
|A1 |Jan | 33 |
| |Feb | 21 |
| |Mar | 08 |
| |Apr | 17 |
| |May | 04 |
| |Jun | 43 |
| |Jul | 40 |
| |Aug | 37 |
| |Sep | 30 |
| |Oct | 46 |
| |Nov | 10 |
| |Dec | 13 |
| B1 |Jan | 20 |
| |Feb | 11 |
| |Mar | 02 |
| |Apr | 18 |
| |May | 10 |
| |Jun | 35 |
| |Jul | 45 |
| |Aug | 32 |
| |Sep | 39 |
| |Oct | 42 |
| |Nov | 15 |
| |Dec | 18 |
Like this it goes on until A10 and B10.像这样一直持续到 A10 和 B10。
I need only A with time from Jan to Dec along with the values and drop values corresponding to B. How to do it?我只需要从一月到十二月的时间以及与 B 对应的值和删除值。怎么做? What will be the condition?
会是什么条件?
Two different methods:两种不同的方法:
If the column widths are fixed:如果列宽是固定的:
df = pd.read_fwf('file.csv', colspecs=[(1,9), (11,16), (17, 22)])
df = df[df.replace('', np.nan).ffill()['Variable'].str.startswith('A')]
print(df)
Output:输出:
Variable Time Value
0 A1 Jan 33
1 Feb 21
2 Mar 8
3 Apr 17
4 May 4
5 Jun 43
6 Jul 40
7 Aug 37
8 Sep 30
9 Oct 46
10 Nov 10
11 Dec 13
If things are more dirty:如果事情更脏:
with open('file.csv', 'r') as f:
df = pd.DataFrame([[y.strip() for y in x.split('|')[1:4]] for x in f.readlines() if x.strip()])
df.columns = df.iloc[0].values
df = df.drop(0).reset_index(drop=True)
df['Value'] = pd.to_numeric(df['Value'])
print(df)
Output:输出:
Variable Time Value
0 A1 Jan 33
1 Feb 21
2 Mar 8
3 Apr 17
4 May 4
5 Jun 43
6 Jul 40
7 Aug 37
8 Sep 30
9 Oct 46
10 Nov 10
11 Dec 13
12 B1 Jan 20
13 Feb 11
14 Mar 2
15 Apr 18
16 May 10
17 Jun 35
18 Jul 45
19 Aug 32
20 Sep 39
21 Oct 42
22 Nov 15
23 Dec 18
Assuming your data is arranged as you described, with some extrapolation as below假设您的数据按照您的描述排列,并进行如下推断
Use pandas' ffill()
to impute the variable column to facilitate the desired selection as below.使用 pandas 的
ffill()
来估算变量列,以方便进行所需的选择,如下所示。
sample = pd.read_csv('sample.csv')
sample['Variable'].ffill(axis=0,inplace=True)
sample = sample.loc[sample['Variable'].str.startswith('A')]
n_months = 12
indexes_to_impute_as_empty = list(range(0,len(sample),n_months))
sample.loc[indexes_to_impute_as_empty,'temp_Variable'] = sample.loc[indexes_to_impute_as_empty,'Variable']
sample['Variable'] = sample['temp_Variable']
sample.drop(columns=['temp_Variable'],inplace=True)
sample.replace(np.nan,"",inplace=True)
sample
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.