简体   繁体   English

R中的.csv数据操作而不是python

[英].csv data manipulation in R rather than python

I have a simple .csv format data that need to be manipulated first before I able to create the plot accords to this data. 我有一个简单的.csv格式数据,需要先进行操作才能创建符合此数据的绘图。 However, I understand how to manipulate .csv format data from python. 但是,我理解如何从python中操作.csv格式数据。 I want to apply same logic in R but I am not sure how to do this. 我想在R中应用相同的逻辑,但我不知道如何做到这一点。

Below is the example data from .csv file but load into R. I have created the code for us to discuss this issue. 下面是来自.csv文件的示例数据,但加载到R.我已经为我们创建了代码来讨论这个问题。

df <- data.frame(Name = c("AC", "AC", "PT", "PT", "OR", "OR"),
    useless_column = c("","","A",3,4," "),
  measurement = c("H", "", "K", "M", "", "H"),
  amount = c(12, 54, 20, 87, 75, 22),
    useless_column = c("","","A",3,4," ")) 

In python, I generally will do this: 在python中,我通常会这样做:

import csv
import os
import glob
import sys
fileList = glob.glob("R:xxxxxxxxxxxxxxxxxxxxx\*.csv")
for inputFile in fileList:
        outputFilename = inputFile + "output.csv"
        csvInput = csv.reader(open(inputFile,'r'),delimiter=",")
        outputFile = open(outputFilename,'w')
        outputFile.write("Name,measurement,amount\n")
        csvInput.next()
        for line in csvInput:
            if line[2] == "H":
               meas = "100"
            elif line[2] == "K":
               meas = "1000"
            elif line[2] == "M":
               meas = "1000000"
            else:
               meas = "1"
            amount = int(meas) * line[3]

            outputFile.write(",".join(line[0],line[2],amount+"\n"]))
outputFile.close()

In python, I can load the csv and then using for loop to identify of each line from the csv file. 在python中,我可以加载csv然后使用for循环来识别csv文件中的每一行。 Then tailor-made my output file before I continue my analysis. 然后在继续我的分析之前定制我的输出文件。 From above, I expect my output something like below and the code is in R format: 从上面,我希望我的输出类似于下面的代码是R格式:

    df <- data.frame(Name = c("AC", "AC", "PT", "PT", "OR", "OR"),
  measurment = c("H", "", "K", "M", "", "H"),
  amount = c(1200, 54, 20000, 87000000, 75, 2200))

I would like to know to do this in R? 我想知道在R中这样做吗? I have a small code of R and plese anyone can guide me into the correct direction: 我有一个小的R代码,任何人都可以引导我进入正确的方向:

x <- read.csv("xxxx.csv", header=T,sep=",")
xC = ncol(x)
xR = nrow(x)
op = data.frame(matrix(data = x, nrow= xR, ncol=3,byrow=T))
for (x in :xC)
{
    for (r in 1:xR)
    {
    xxxxxxxx

    }

Adapting python code in R means giving up the loops in favor of vectorized operations. 在R中调整python代码意味着放弃循环以支持向量化操作。 Here, we can create meas based on a named vector, and then compute amount: 在这里,我们可以根据命名向量创建meas,然后计算金额:

# dictionnary of measurement values:
m <- c(H = 100, K = 1000, M = 1000000)

# create meas based on measurement
df$meas <- m[df$measurment]
df$meas[is.na(df$meas)] <- 1
# compute amount
df$amount <- df$meas * df$amount

Data 数据

df <- data.frame(Name = c("AC", "AC", "PT", "PT", "OR", "OR"),
                 measurment = c("H", "", "K", "M", "", "H"),
                 amount = c(1200, 54, 20000, 87000000, 75, 2200))

Have you tried using pandas.read_csv? 你尝试过使用pandas.read_csv吗? Or are the csv files so irregular that you cannot use pandas' read_csv method to read them? 或者csv文件是如此不规则,你不能使用pandas的read_csv方法来读取它们?

You can do a for loop to manipulate your data from each file, and then append it to a master DataFrame . 您可以执行for循环来操作每个文件中的数据,然后将其附加到主DataFrame

Example: 例:

import pandas as pd

PATH = '/home/data/' # Example path

master_df = pd.DataFrame()
for inputFile in fileList:
    csv_file = pd.read_csv(path + inputFile, sep=',')
    H_index = csv_file[csv_file.loc[:, 2] == 'H'].index
    csv_file.loc[H_index, 3] = csv_file.loc[H_index, 3] * 100
    master_df = master_df.append(csv_file)

I've skipped the K and M part of the manipulation. 我已经跳过了操纵的KM部分。

You could directly plot from master_df by doing something like 您可以通过执行类似的操作直接从master_df绘图

master_df.plot()

You've got the code to read in the data (read.csv), so am I right in thinking your main struggle is in the manimpuation itself? 你已经有了读取数据的代码(read.csv),所以我认为你的主要斗争是在manimpuation本身吗?

If so, you could carry on using lots if and for loops, but I think there are much easier ways. 如果是这样,你可以继续使用批量if和for循环,但我认为有更简单的方法。 Something like: 就像是:

df <- read.csv("xxxx.csv", header=T,sep=",")
df$meas <- df$measurement # Create a new column called 'meas' by copying column 'measurement'
df$meas[df$meas == "H"] <- 100 # Replace H's with 100
df$meas[df$meas == "K"] <- 1000
df$meas[df$meas == "M"] <- 1000000
df$value <- df$meas * df$amount

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM