简体   繁体   English

如何修改下面的Python代码以在Pandas的字符串开头添加一个字符?

[英]How do I modify my Python code below to append a character to the beginning of the string in Pandas?

I am doing a data visualization assignment where I need to take in a dataset and make certain visualizations. 我正在做一个数据可视化任务,需要接收数据集并进行某些可视化。 Consider the following about the dataset: 考虑有关数据集的以下内容:

  • The columns are represented by longitude (list of strings with a 'E' or 'W' attached to them denoting eastern or western longitude respectively) 列由经度表示(附有“ E”或“ W”的字符串列表分别表示东经或西经)
  • The rows are represented by the latitude (a column of strings with 'N' or 'S' denoting the northern or southern latitudes respectively) 这些行由纬度表示(一列带有“ N”或“ S”的字符串分别代表北纬或南纬)

So I have to read the dataset, convert the latitudes with 'N' attached to them into positive float values and 'S' attached to them as negative float values (the whole data is in string). 因此,我必须阅读数据集,将带有“ N”的纬度转换为正浮点值,并将带有“ S”的纬度转换为负浮点值(整个数据都在字符串中)。

Similarly, I have to convert the longitudes with 'E' attached to them into positive float values and 'W' attached to them as negative float values. 同样,我必须将附加了“ E”的经度转换为正浮点值,将附加“ W”的经度转换为负浮点值。

Since I am new to Python, Pandas, Numpy I am having a lot of difficulties to achieve the same. 由于我是Python,Pandas和Numpy的新手,因此要实现相同目标,我会遇到很多困难。 I have so far been able to convert the latitudes and longitudes in string format into float format and get rid of the 'N', 'S', 'E', 'W' characters respectively. 到目前为止,我已经能够将字符串格式的纬度和经度转换为浮点格式,并且分别摆脱了'N','S','E','W'字符。 However, I am unable to figure out how do I make the float values positive or negative based on the characters ('N', 'S', 'E', 'W') prior to float conversion. 但是,我无法弄清楚如何在浮点转换之前根据字符(“ N”,“ S”,“ E”,“ W”)使浮点值为正或负。
Below is the code I have written so far: 以下是我到目前为止编写的代码:

import pandas as pd

df = pd.read_csv("Aug-2016-potential-temperature-180x188.txt", skiprows = range(7))
df.columns = ["longitude"]
df = df.longitude.str.split("\t", expand = True)
smaller = df.iloc[::10,:]

print(df.head(10), end = "\n")
print(smaller, end = "\n")
print(df.iloc[1][3], end = "\n")
print(smaller.iloc[2][175], end = "\n")

import numpy as np
import pandas as pd

data = pd.read_csv('~/documents/datasets/viz_a1/Aug-2016-potential-temperature-180x188.txt', skiprows=7)
data.columns = ['longitudes']
data = data['longitudes'].str.split('\t', expand=True)
df = data.iloc[::10,:]
df.head()

# replace 'E' with '' and 'W' with ''
df.loc[0] = df.loc[0].str.replace('E', '').str.replace('W', '')

# convert the longitude values to float values (THIS ONE WORKS)
df.loc[0] = df.loc[0][1:].astype(float)

# replace 'S' with '' and 'N' with ''
df.loc[:][0] = df.loc[:][0].str.replace('S', '').str.replace('N', '')

# convert latitude values into float values (THIS ONE DOES NOT WORK!!)
df.loc[:][0] = df.loc[:][0].astype(float)

# checking if the float values exist
print(df.loc[0][2], ' data-type ', type(df.loc[0][2])) # columns converted into float
print(df.loc[30][0], ' data-type ', type(df.loc[30][0])) # rows not converted into float  

Doubts: 释疑:

  • How do I convert the values into positive and negative float values based on symbol ('S', 'W' represent -ve float values and 'E', 'N' represent positive float values) 如何基于符号将值转换为正浮点值和负浮点值(“ S”,“ W”代表-ve浮点值,而“ E”,“ N”代表正浮点值)
  • How do I successfully convert the latitudes into float values (the code I wrote did not convert the rows into floats; did not throw any error also!) 如何成功将纬度转换为浮点值(我编写的代码没有将行转换为浮点;也没有引发任何错误!)

PS The conversions for longitudes was generating a lot of warnings. PS经度的转换产生了很多警告。 Would be nice if someone could explain why am I getting those warnings and how to prevent them? 如果有人可以解释为什么我会收到这些警告以及如何防止这些警告,那将会很好。 (again, I am new to Python and Pandas!) (再次,我是Python和Pandas的新手!)

The dataset can be found here 数据集可以在这里找到

Here is a screenshot of the dataset: 这是数据集的屏幕截图:
将数据放入数据框中后的屏幕截图

I would add a few more arguments in the read_csv function to get a dataframe in which the columns are the longitudinal strings and the index is the latitude. 我将在read_csv函数中添加更多参数,以获取一个数据框,其中的列为纵向字符串,而索引为纬度。 The data in your dataframe is now the raster data 现在,数据框中的数据就是栅格数据

df = pd.read_csv(r'Aug-2016-potential-temperature-180x188.txt',
                 skiprows=8, delimiter='\t', index_col=0)

Then I would convert the longitudinal strings, the columns of the dataframe, to floats with the following code: 然后,我将使用以下代码将纵向字符串(数据帧的列)转换为浮点数:

column_series = pd.Series(df.columns)
df.columns = column_series.apply(lambda x: float(x.replace('E','')) if x.endswith('E') else -float(x.replace('W','')))

After I convert the latitude strings, the index of the dataframe, to floats with this code: 在将纬度字符串(数据帧的索引)转换为使用以下代码浮动后:

index_series  = pd.Series(df.index)
df.index = index_series.apply(lambda x: float(x.replace('N','')) if x.endswith('N') else -float(x.replace('S','')))

This might not be the cleanest, but you could replace 'N' and 'E' with "", then use np.where to replace 'S' and 'W', convert to float, and multiply by -1 这可能不是最干净的方法,但是您可以将“ N”和“ E”替换为“”,然后使用np.where替换“ S”和“ W”,转换为浮点,然后乘以-1

I made an example df where I apply this procedure to the first column 我做了一个df示例,在其中将这一过程应用于第一列

example = pd.DataFrame({'1':['S35', 'E24', 'N45', 'W66'],
           '2': ['E45', 'N78', 'S12', 'W22']})

example
Out[153]: 
     1    2
0  S35  E45
1  E24  N78
2  N45  S12
3  W66  W22

col = example.loc[:, '1']

col = col.str.replace('N|E', "")

col
Out[156]: 
0    S35
1     24
2     45
3    W66
Name: 1, dtype: object

example.loc[:,'1'] = np.where(col.str.contains('W|S'), col.str.replace('W|S', '').astype('float') * -1, col)


example
Out[158]: 
    1    2
0 -35  E45
1  24  N78
2  45  S12
3 -66  W22

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM