简体   繁体   English

Pandas 遍历 CSV 个文件以按名称查找列

[英]Pandas loop over CSV files to find a column by name

Could someone give me a tip with Pandas on how I could loop over csv files in a directory, find a columns in the CSV files called Temp where then the values of the columns need to be converted from degree C to degrees F, something like degF = degC * 1.8 + 32有人可以给我 Pandas 的提示,告诉我如何循环访问目录中的 csv 个文件,在 CSV 文件中找到一个名为Temp的列,然后列的值需要从 C 度转换为华氏度,类似于degF = degC * 1.8 + 32

I think I am close but the last bit errors out:我想我很接近,但最后一点错误:

import pandas as pd
import os
import glob
  
  
# use glob to get all the csv files 
# in the folder
path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.csv"))
  
  
# loop over the list of csv files
for f in csv_files:
      
    # read the csv file
    df = pd.read_csv(f)
      
    # print the location and filename
    file_name = f.split("\\")[-1]
    print('File Name Original:', file_name)

    # print the content
    print('Columns:', df.columns)

    # Find Columns with Temp in the Column Name
    temp_cols = [col for col in df.columns if 'Temp' in col]

    # print the content
    print('temp_cols Columns:', temp_cols)

    for i in range(len(temp_cols)):
        print(df.temp_cols[i].values)

Prints a few lines then errors:打印几行然后出错:

File Name Original: ADMIN FRONT DESK.csv
Columns: Index(['Date', 'Temp', 'RH', 'CO2'], dtype='object')
temp_cols Columns: ['Temp']
Traceback (most recent call last):
  File "C:\OneDrive - \fix_temp.py", line 34, in <module>
    print(df.temp_cols[i].values)
  File "C:\Python39\lib\site-packages\pandas\core\generic.py", line 5465, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'temp_cols'

This will print the name of the dataframe column, how do I modify?这样会打印出dataframe这一列的名称,如何修改呢?

for i in range(len(temp_cols)):
    #print(df.temp_cols[i].values)
    print(temp_cols[i])

I'm not completely clear on the question, but this should allow you to find if the 'Temp' column is in a DataFrame:我对这个问题不是很清楚,但这应该可以让你找到“临时”列是否在 DataFrame 中:

import pandas as pd
import os
import glob
  
  
# use glob to get all the csv files 
# in the folder
path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.csv"))
  
  
# loop over the list of csv files
for f in csv_files:
      
    # read the csv file
    df = pd.read_csv(f)
      
    # print the location and filename
    file_name = f.split("\\")[-1]
    print('File Name Original:', file_name)

    # print the content
    print('Columns:', df.columns)

    columns = df.columns
    
    if 'Temp' in columns:
        df['Temp_F'] = (df['Temp'] * 1.8) + 32

If your dataset has multiple target columns to modify with the help of @speeder answer, I still needed this:如果您的数据集有多个目标列需要在@speeder answer 的帮助下修改,我仍然需要这个:

# Find Columns with Temp in the Column Name
temp_cols = [col for col in df.columns if 'Temp' in col]

# print the content
print('temp_cols Columns:', temp_cols)

for target in temp_cols:
    print(f'in the loop fixing the {target} column...')
    df[target] = (df[target] * 1.8) + 32

Complete script to loop over multiple CSV files that may have multiple target columns to modify.完成脚本以遍历多个 CSV 文件,这些文件可能有多个要修改的目标列。 My target columns is temp or a column that contains the name of temp :我的目标列是temp或包含temp名称的列:

import os
import glob
import pandas as pd
  
# use glob to get all the csv files 
# in the folder
path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.csv"))
  
  
# loop over the list of csv files
for f in csv_files:
      
    # read the csv file
    df = pd.read_csv(f)
      
    # print the location and filename
    file_name = f.split("\\")[-1]
    print('File Name Original:', file_name)

    # print the content
    print('Columns:', df.columns)

    # Find Columns with Temp in the Column Name
    temp_cols = [col for col in df.columns if 'Temp' in col]

    # print the content
    print('temp_cols Columns:', temp_cols)

    for target in temp_cols:
        print(f'in the loop fixing the {target} column...')
        df[target] = (df[target] * 1.8) + 32


    print(f"File {file_name} done!")
    df = df.dropna()
    df.to_csv(file_name, index=False)
    print(f"File {file_name} saved success!")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM