简体   繁体   English

如何使用 pandas 修改特定的 csv 文件

[英]How to amend specific csv files using pandas

At the moment I have 133 csv files and I want to try and combine them together into catagories according to certain variables such as percentage and substrates.目前我有 133 个 csv 文件,我想尝试根据某些变量(例如百分比和底物)将它们组合成类别。 I have been trying to get it so that if the substrate string and the percentage string are in the name and if so create a csv relating to those particular percentages and substrates.我一直在尝试获取它,以便如果名称中包含基材字符串和百分比字符串,则创建与这些特定百分比和基材相关的 csv。 Below is the code下面是代码

df_a = pd.DataFrame()

percentages = ['50', '60', '70', '80', '90', '100']
substrates = ['PS(3.0)', 'CA(1.5)', 'CA(3.0)', 'CA(4.5)',
              'BCP(3.0)', 'PVK(3.0)', 'PVP(2.0)']

for csv in files:
    for per in range(0, len(percentages)):
        percentage = percentages[per]
        for x in range(0, len(substrates)):
            substrate = substrates[x]
            # percentage = str(percentage)
            # substrate = str(substrate)
            name, ext = os.path.splitext(csv)
            if ext == '.csv':
                match = 'water(' + percentages[per] + '%)-' + substrates[x]  # if percentage in name and substrate in name:
                df = pd.DataFrame()
                if match in name:
                    print(match)
                    file_path = folder_path + csv
                    print(file_path)
                    data = np.genfromtxt(file_path, delimiter=',', skip_header=1)
                    data = np.reshape(data, (1, -1))

                    data_fit = data
                    df = pd.DataFrame(data_fit,
                                      columns=['Number', 'Number of droplets',
                                               'Substance', 'Percentage', 'Substrate',
                                               'Middle of droplet', 'Frame Rate',
                                               'Total time', 'Difference in Frames',
                                               'Initial height',
                                               'Exponential decay constant',
                                               'Angular frequency',
                                               'Frequency', 'Phi offset',
                                               'The amplitude', 'Last scanned at',
                                               ])
                    df_a = df_a.append(df, ignore_index=True)
                    print("Substrate: ", substrates[x], ", Percentage: ", percentages[per])
                    data_path = "/Users/harry/Desktop/Droplet Experiment/Analysis/"
                    df_a.to_csv(data_path + percentages[per] + '% - ' + substrates[x] + ' analysis.csv')

However at the moment it merging all of them and producing csv files which are 133 lines long instead of around 10. As for each given percentage and substrate there are 10 files with those given variables in them.然而,目前它合并了所有这些文件并生成了 csv 文件,这些文件有 133 行而不是大约 10 行。对于每个给定的百分比和基板,有 10 个文件中包含这些给定的变量。 Does anyone know what's wrong with my code.有谁知道我的代码有什么问题。 Any help is appreciated.任何帮助表示赞赏。 The 'name' would look something like the pic attached. “名称”看起来像附上的图片。 在此处输入图像描述

The problem is that you are overwriting your.csv files at each iteration of your triple nested for loop;问题是您在三重嵌套for循环的每次迭代中都覆盖了您的 .csv 文件; the files you obtain correspond to the.csv file of the last iteration.您获得的文件对应于最后一次迭代的.csv 文件。

One way to get one.csv output file per percentage/substrate combination is to add a percentage/substrate column to your main data frame df_a .每个百分比/底物组合获取 one.csv output 文件的一种方法是在主数据框df_a中添加一个百分比/底物列。 Then after exiting all your loops you can print your.csv files by selecting from df_a each percentage/substrate combination in turn.然后在退出所有循环后,您可以通过依次从df_a中选择每个百分比/底物组合来打印您的.csv 文件。

Without having access to your files it is somewhat difficult to provide a definitive solution, but it could look like something like this:如果无法访问您的文件,提供明确的解决方案有点困难,但它可能看起来像这样:

df_a = pd.DataFrame()

percentages = ['50', '60', '70', '80', '90', '100']
substrates = ['PS(3.0)', 'CA(1.5)', 'CA(3.0)', 'CA(4.5)',
              'BCP(3.0)', 'PVK(3.0)', 'PVP(2.0)']

for csv in files:
    for percentage in percentages:
        for substrate in substrates:
            name, ext = os.path.splitext(csv)
            if ext == '.csv':
                match = 'water(' + percentage + '%)-' + substrate  # if percentage in name and substrate in name:
                if match in name:
                    print(match)
                    file_path = folder_path + csv
                    print(file_path)
                    data = np.genfromtxt(file_path, delimiter=',', skip_header=1)
                    data = np.reshape(data, (1, -1))
                    data_fit = data

                    print("Substrate: ", substrate, ", Percentage: ", percentage)
                    combination = percentage + '% - ' + substrate
                    df_combination = pd.DataFrame({'combination': [combination] * data_fit.shape[0]})

                    df = pd.DataFrame(data_fit,
                                      columns=['Number', 'Number of droplets',
                                               'Substance', 'Percentage', 'Substrate',
                                               'Middle of droplet', 'Frame Rate',
                                               'Total time', 'Difference in Frames',
                                               'Initial height',
                                               'Exponential decay constant',
                                               'Angular frequency',
                                               'Frequency', 'Phi offset',
                                               'The amplitude', 'Last scanned at',
                                               ])
                    df = pd.concat([df_combination, df], axis=1)
                    df_a = df_a.append(df, ignore_index=True)

data_path = "/Users/harry/Desktop/Droplet Experiment/Analysis/"
for combination in df_a['combination'].unique():
    df_subset = df_a[df_a['combination'] == combination]
    df_subset.to_csv(data_path + combination + ' analysis.csv')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM