[英]Select column from multiple DataFrames based on same header prefix
I have a function that iterates over the rows of a csv
for the Age
column and if an age is negative, it will print the Key
and the Age
value to a text file.我有一个函数,它遍历
Age
列的csv
行,如果年龄为负,它将把Key
和Age
值打印到文本文件中。
def neg_check():
results = []
file_path = input('Enter file path: ')
file_data = pd.read_csv(file_path, encoding = 'utf-8')
for index, row in file_data.iterrows():
if row['Age'] < 0:
results.append((row['Key'], row['Age']))
with open('results.txt', 'w') as outfile:
outfile.write("\n".join(map(str, results)))
outfile.close()
In order to make this code repeatable, how can I modify it so it will iterate the rows if the column starts with " Age
"?为了使此代码可重复,我该如何修改它,以便它在列以“
Age
”开头时迭代行? My files have many columns that start with " Age
" but end differently.我的文件有许多列以“
Age
”开头但以不同的方式结束。 . . I tried the following...
我尝试了以下...
if row.startswith['Age'] < 0:
and和
if row[row.startswith('Age')] < 0:
but it throws AttributeError: 'Series' object has no attribute 'startswith'
error.但它抛出
AttributeError: 'Series' object has no attribute 'startswith'
错误。
sample 1样本 1
Key Sex Age
1 Male 46
2 Female 34
sample 2样本 2
Key Sex AgeLast
1 Male 46
2 Female 34
sample 3样本 3
Key Sex AgeFirst
1 Male 46
2 Female 34
I would do this in one step, but there are a few options.我会一步完成,但有几种选择。 One is
filter
:一种是
filter
:
v = df[df.filter(like='AgeAt').iloc[:, 0] < 0]
Or,或者,
c = df.columns[df.columns.str.startswith('AgeAt')][0]
v = df[df[c] < 0]
Finally, to write to CSV, use最后,要写入 CSV,请使用
if not v.empty:
v.to_csv('invalid.csv')
Looping over your data is not necessary with pandas.使用熊猫不需要循环数据。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.